book
Article ID: KB0086482
calendar_today
Updated On:
Description
Resolution:
EMS FT in two boxes, the two EMS servers are in hot-warm setup and use heartbeat to determine the fail over of the primary server to the secondary. However, as the primary and the secondary EMS servers are located in different physical boxes, network glitches or busy server can stop the secondary server from receiving heartbeats. File corruption may occur when both servers try to access the same storage. Thus, besides out-of-box EMS heartbeat and EMS file locking, EMS FT requires additional configurations: Extra hardware (e.g. SAN to connect the two boxes) as well as software (Shared access to file AND File-Lock Manager) to achieve hot-warm fault tolerant model.
Shared access to File:
The shared file access can be provided by either SAN vendor or 3rd party software on top of SAN. The typical configuration is SAN with SCSI or Fiber Channel to facilitate the access of SAN. We have customers deployed with EMC SAN, Hitachi etc. However, not every SAN vendor meets the Shared access to file requirement. HPMSA500, for example, does not support the requirement. As for 3rd party software on top of SAN, Polyserve, for example, provides clustered file system that enable data sharing over SAN. Please check with the corresponding SAN vendor for details.
File-Lock Manager:
In EMS FT, the primary server always maintains the lock on the store file until its process terminates. The secondary server, before assuming the primary role, will first try to grab the lock. File-Lock Manager is needed in order to make sure locks are only released when there is an absolute failure of the primary server. Without the File-Lock Manager, both the primary and secondary servers may write to the file at the same time and file corruption may happen.
Notes on File Locking:
To prevent servers from using the same store file, each server restricts access to its store file for the duration of the server process. On Windows platforms, EMS servers use the standard Windows CreateFile function, supplying FILE_SHARE_READ as the dwShareMode (third parameter position) to restrict access to other servers. On UNIX platforms, servers use the standard fcntl operating system call to implement cooperative file locking.
Also, please do not confuse File Locking with File-Lock Manager, File Locking is something EMS provided out-of-the-box, and File-Lock Manager is an extra requirement EMS needs to perform FT.
EMS High Availability (HA) – Besides configuring EMS FT, clustering software can also be used to achieve HA. As opposed to EMS FT, which is a hot-warm setup, HA is a hot-cold setup which typically requires more time to fail over, though the fail over time mainly depends on the clustering software. When using clustering software like Veritas, the secondary server is started by the clustering software after its remounted the shared store to the secondary host. The newly started EMS server then initialises its state from the backbend store. To achieve EMS HA, hardware for connected storage (e.g. SAN) and clustering software (e.g. Veritas) are required.
Clustering environment still requires the use of connected hardware storage (e.g. SAN); however, the requirements for shared access to file and file-lock manager are gone. The only software requirement for HA is the clustering software.
==================
The important thing is that when a customer wants to use the EMS FT feature out-of-the box, i.e. hot-warm setup with an active server and a standby server, there are certain requirements to make it a success.
In this case, both EMS Servers need to share the same datastore and the lock on the db file must be maintained until the active EMS Server process terminates. This requires 3rd party software (called distributed file lock managers) and the setup of the shared storage and file lock manager is out of our scope.
Customers always have the option of not using the out-of-the-box FT feature and rely on 3rd party software or script to detect the EMS Server failure and unmount/mount the storage on the backup machine and start the EMS Server on that backup machine. This scenario is what we call hot-cold setup, which means that only one server is running at any time.
Another requirement that is not actually specific to FT but to the storage is that we do not support NFS because of the reliability of this protocol. EMS needs the guarantee of ordering writes.
So at last, to implement FT, your h/w and s/w combination and your storage solution should satisfy all these four criteria.
1. Write Order: The storage solution must write data blocks to shared storage in the same order as they occur in the data buffer.
(Solutions that write data blocks in any other order (for example, to enhance disk efficiency) do not satisfy this requirement.)
2. Synchronous Write Persistence: Upon return from a synchronous write call, the storage solution guarantees that all the data have been written to durable, persistent storage.
3. Distributed File Locking: The EMS servers must be able to request and obtain an exclusive lock on the shared storage. The storage solution must not assign the locks to two servers simultaneously. (See Software Options.) EMS servers use this lock to determine the primary server.
4. Unique Write Ownership: The EMS server process that has the file lock must be the only server process that can write to the file. Once the system transfers the lock to another server, pending writes queued by the previous owner must fail.
Always consult the shared storage vendor and the operating system vendor to ascertain that the storage solution you select satisfies the above four criteria. If any of the above four criteria is not satisfied, then such a solution is not supported by TIBCO.
Issue/Introduction
What are required for configuring EMS Server in FT using store system?