TIBCO ActiveMatrix BPM: Process Instance Problems Due to Network Or Hardware Issues

TIBCO ActiveMatrix BPM: Process Instance Problems Due to Network Or Hardware Issues

book

Article ID: KB0088049

calendar_today

Updated On:

Products Versions
TIBCO BPM Enterprise (formerly TIBCO ActiveMatrix BPM) -
Not Applicable -

Description

Description:
Occasionally network or hardware issues between TIBCO ActiveMatrix BPM systems and the Microsoft SQL system where MS-DTC is running cause a problem with "in doubt"  transactions being left in MS-DTC. These "in doubt" transactions can hold locks on rows in the database tables, blocking other transactions and leading to process instances not starting or not being processed.

In addition to network issues, a crash of the MS-DTC process, or failure of the disk where its logs are stored could also cause this problem.

Network issues that can provoke this problem include the following:

     •    A network outage between the systems.
     •    System time on the TIBCO ActiveMatrix BPM system jumping forward for
           example, because the system clock has drifted backwards and is now
           adjusting. This can be because:

               o    Virtual machine hosted systems that are low on resource (another
                      system is using CPU, disk, or memory to the detriment of the TIBCO
                      ActiveMatrix  BPM system).
               o    A backup being taken of the virtual machine, resulting a snapshot
                      delete.  As part of the delete the virtual machine is "stunned" briefly,
                      possibly a number of times, so that at the end of the snapshot delete
                      process, the virtual machine brings the time back in synchronization.
               o    The virtual machine has been "stunned" due to its resources
                      (CPU, disk, or memory) being moved by the Distributed Resource
                      Scheduler (DRS/SDRS) management.

           When time jumps forward, the socket layer can throw a "read timed out"
           exception since it didn’t get a response in a timely manner, even though no
           time has actually passed.  This can be seen in network traces on either end
           of the socket, where microseconds pass on one side, whereas seconds
           can appear to pass on the other.

           These events are logged in the Windows event logs under the Kernel-General
           source and can be correlated with any warnings/errors that might appear in
           the BPM logs.

     •    Microsoft SQL is unable to initiate a connection with MS-DTC due to it
           appearing to refuse connections.  This can happen when a snapshot
           delete of the virtual machine that is running Microsoft SQL is being performed.
           As partitions are processed during the process, various components are
           "frozen" as their particular partition is affected. You can see in the logs that
           SQL gets frozen for a short time (perhaps 5 seconds). Although this has no
           effect on ActiveMatrix BPM, if the partition that hosts MS-DTC is frozen, it
           appears to refuse connections for very short while. This results in a Windows
           event log entry from the MSDTC Client about E_CM_SERVER_NOT_READY,
           and the ActiveMatrix BPM threads receiving a "connection closed" exception.
Symptoms:

Cause:

Issue/Introduction

TIBCO ActiveMatrix BPM: Process Instance Problems Due to Network Or Hardware Issues