EMS standby server throws " ERROR: Failed to lock store **** [ERRSTR=com.tibco.tibems.tibemsd.internal.db.EMSDataStoreLockException: Lock already held by ****]" error message when it tries to become active after EMS primary srever is down.
book
Article ID: KB0090876
calendar_today
Updated On:
Products
Versions
TIBCO Enterprise Message Service
-
Not Applicable
-
Description
Resolution: Description: =============== EMS standby server throws " ERROR: Failed to lock store **** [ERRSTR=com.tibco.tibems.tibemsd.internal.db.EMSDataStoreLockException: Lock already held by ****]" error message when it tries to become active after EMS primary server is down.
Resolution: =============== When EMS server FT servers are configured to use a database store as a shared store, they use EMS_HBLOCK_TABLE table and FT heartbeat to ensure the shared lock is maintained properly.
EMS_HBLOCK_TABLE table is a single row table with four columns. One of the columns is SERVER_ID that indicates the EMS server that has locked the connection with the database. Another column called "LAST_UPDATE" indicates when was the last time the EMS server refresh the lock.
During the EMS FT failover, the EMS primary server could be down under following two situations:
1). If the EMS primary server is properly shutdown by issuing a shutdown command through the EMS Admin tool or it receives SIGNIT/SIGTERM signals, it will release the lock at EMS_HBLOCK_TABLE, that is this table will be deleted. Once the EMS standby server detects a missing heartbeat from the EMS primary server, it is activated to obtain the database lock and become the active server.
2). If the EMS primary server is shutdown abnormally, the lock at EMS_HBLOCK_TABLE table will not be released, that is this table is not deleted. After the EMS standby server detects a missing heartbeat from the EMS primary server, it is activated to obtain the database lock and check the “last update” in this EMS_HBLOCK_TABLE table to determine whether it should become active. If the “last update” is still within the database store activation timeout period (default value: 10 sec) , the EMS standby server will try to fresh the lock again until “last update” exceeds the database store activation timeout period . (See SOL1-AUGIYA for refresh interval information.) Based on the timing regards to when the primary server is stopped due to various reasons and how fast the refresh interval is, the standby server might not obtain the lock and report this error message: " ERROR: Failed to lock store **** [ERRSTR=com.tibco.tibems.tibemsd.internal.db.EMSDataStoreLockException: Lock already held by ****]" for several tries. Eventually the EMS standby server should start up.
This behavior is expected. It is suggested to always shutdown EMS server gracefully.
Note: The machines running with EMS primary and standby servers need to sync with the system time to make sure the locking mechanism and Fault Tolerance feature work properly.
Issue/Introduction
EMS standby server throws " ERROR: Failed to lock store **** [ERRSTR=com.tibco.tibems.tibemsd.internal.db.EMSDataStoreLockException: Lock already held by ****]" error message when it tries to become active after EMS primary srever is down.