This is a known issue caused by a race condition where a node starts executing the EventFlow before the shared query tables in transactional memory are fully connected to the other nodes. This is often because the other nodes are also still starting up.
Starting all the nodes at the same time using command "
epadmin servicename={
cluster-name-only}
start node" may trigger this problem.
Avoid this problem by starting each node to completion, waiting for the '
epadmin start node' command to report "
Node started", before attempting to start another node, as so:
$ epadmin servicename=A.X start node
[A.X] Starting node
[A.X] Engine application::default-engine-for-com.example.MyApp started
[A.X] Loading node configuration
[A.X] Auditing node security
[A.X] Host name mysystem
[A.X] Administration port is 55139
[A.X] Discovery Service running on port 54321
[A.X] Service name is A.X
[A.X] Node started
This issue is most often triggered when the application in the node starts writing to the shared Query Table immediately after startup, for example when loading historical data using a
CSV File Reader adapter set to start reading immediately, or table-write actions triggered by a
Once Operator. These activities should be delayed until the node has joined the existing cluster.
You may also see the warning:
[A.X] default-engine-for-com.example.MyApp:2019-05-07 14:50:11.000000-0400 [10624] WARN
com.tibco.ep.dtm.highavailability.distribution: (csmarshal.cpp:2175) Request from remote node B.X failed.
Version mismatch detected for partition default-cluster-wide-availability-zone_VP_26, active version
(v13382222546198132), object version (v13382209213472372), for operation 'create' on object
'default.QueryTable:459 (1544710558:3463062280:3266951585314:459)', concurrent migration detected.
This indicates that there were concurrent changes to be reconciled as the new node's table object was created. This is resolved by the platform without error and is reported in the log to assist with troubleshooting other errors (if any).