What is the cause of the message "WARNING: Fault-tolerance error: Dual-Active server detected at xxx" in the EMS erver log?

What is the cause of the message "WARNING: Fault-tolerance error: Dual-Active server detected at xxx" in the EMS erver log?

book

Article ID: KB0088865

calendar_today

Updated On:

Products Versions
TIBCO Enterprise Message Service -
Not Applicable -

Description

Resolution:
Description:
============
When EMS FT is configured, the primary server sends a heartbeat to a standby server at interval ft_heartbeat (in tibemsd.conf) and advertises whether it is an active server or not.  If the standby server does not receive a heartbeat within ft_activation (in tibemsd.conf), it tries to obtain a lock on the datastore and then become active.  When it does becomes active it announces that it is the active server.  If at this time the primary server is not down and receives this message, it throws the error "WARNING: Fault-tolerance error: Dual-Active server detected at &ltthe other FT server>".  The error can also be thrown by both servers. Under a normal situation, if one of the servers is running as the active server, it holds the lock to the datastore so the other server should not be able to obtain the lock.  In the above situation, both servers can access the datastore and run as the active server. This is not correct behavior.

Causes:
=======
Indicates that your shared storage may not meet the exclusive locking criteria required by EMS fault tolerance configuration.

If you implement shared state using shared storage devices, EMS requires that your storage solution guarantee the following four criteria.  

============================================================================
1).  Write Order: The storage solution must write data blocks to shared storage in the same order as they occur in the data buffer.  Solutions that write data blocks in any other order (for example, to enhance disk efficiency) do not satisfy this requirement.

2). Synchronous Write Persistence: Upon return from a synchronous write call, the storage solution guarantees that all data has been written to durable, persistent storage.

3). Distributed File Locking: The EMS Servers must be able to request and obtain an exclusive lock on the shared storage. The storage solution must not assign the locks to two servers simultaneously. EMS servers use this lock to determine the primary server.

4). Unique Write Ownership: The EMS server process that has the file lock must be the only server process that can write to the file. Once the system transfers the lock to another server, pending writes queued by the previous owner must fail.
===================================================================================

Refer to the EMS User's Guide -> Chapter 18 Fault Tolerance -> Shared State.

Resolution:
============
Always consult the shared storage vendor and operating system vendor to confirm that the storage solution satisfies all four criteria.

We have a few tests that could help you ascertain the distributed file-locking capabilities of your storage solution.  Refer to FAQ1-81WS6O. We also provide a small test program "file_lock.c" to help identify file lock issue.  Refer to FAQ1-8L9GCR.

Environments:
============
All

Issue/Introduction

What is the cause of the message "WARNING: Fault-tolerance error: Dual-Active server detected at xxx" in the EMS erver log?