Products | Versions |
---|---|
TIBCO BPM Enterprise (formerly TIBCO ActiveMatrix BPM) | - |
Not Applicable | - |
Description:
Occasionally network or hardware issues between TIBCO ActiveMatrix BPM systems and the Microsoft SQL system where MS-DTC is running cause a problem with "in doubt" transactions being left in MS-DTC. These "in doubt" transactions can hold locks on rows in the database tables, blocking other transactions and leading to process instances not starting or not being processed.
In addition to network issues, a crash of the MS-DTC process, or failure of the disk where its logs are stored could also cause this problem.
Network issues that can provoke this problem include the following:
• A network outage between the systems.
• System time on the TIBCO ActiveMatrix BPM system jumping forward for
example, because the system clock has drifted backwards and is now
adjusting. This can be because:
o Virtual machine hosted systems that are low on resource (another
system is using CPU, disk, or memory to the detriment of the TIBCO
ActiveMatrix BPM system).
o A backup being taken of the virtual machine, resulting a snapshot
delete. As part of the delete the virtual machine is "stunned" briefly,
possibly a number of times, so that at the end of the snapshot delete
process, the virtual machine brings the time back in synchronization.
o The virtual machine has been "stunned" due to its resources
(CPU, disk, or memory) being moved by the Distributed Resource
Scheduler (DRS/SDRS) management.
When time jumps forward, the socket layer can throw a "read timed out"
exception since it didn’t get a response in a timely manner, even though no
time has actually passed. This can be seen in network traces on either end
of the socket, where microseconds pass on one side, whereas seconds
can appear to pass on the other.
These events are logged in the Windows event logs under the Kernel-General
source and can be correlated with any warnings/errors that might appear in
the BPM logs.
• Microsoft SQL is unable to initiate a connection with MS-DTC due to it
appearing to refuse connections. This can happen when a snapshot
delete of the virtual machine that is running Microsoft SQL is being performed.
As partitions are processed during the process, various components are
"frozen" as their particular partition is affected. You can see in the logs that
SQL gets frozen for a short time (perhaps 5 seconds). Although this has no
effect on ActiveMatrix BPM, if the partition that hosts MS-DTC is frozen, it
appears to refuse connections for very short while. This results in a Windows
event log entry from the MSDTC Client about E_CM_SERVER_NOT_READY,
and the ActiveMatrix BPM threads receiving a "connection closed" exception.
Symptoms:
Cause: