Resolution: Abstract
========
TIBCO Business Events engines fails to join the Active Spaces cache cluster.
Description:
=====================
TIBCO Business Events engines tries indefinitely to acquire lock on the cluster. When members are not able to join the cluster or not able to leave the cluster as expected, you can collect details as per the steps provided in the resolution section. You will see messages such as the following in the engine logs:
Info [AgentMgr.Thread.1] - [runtime.service] Attempting to lock the cluster, retrying
Environment:
===========
TIBCO Business Events 5.1.1
TIBCO Business Events 5.1.2
Cause:
========
The Master space is locked by a running engine and is not released.
Resolution:
=========
To confirm which engine is locking the master space try the following:
For TIBCO BusinessEvents 5.1.1
============================
If you connect to your cluster using as-admin and do a "show spaces", you will find a space which has the string "Master" in it. You can grep for that name.
as-admin > browse locks space <Space_NAME_Master>
Check if any agents have locked this space.
For TIBCO BusinessEvents 5.1.2
============================
If you are running BE 5.1.2 or above then the Active Spaces version shipped with this version of Business Events only allows members of the space to browse locks.
You need to perform an additional step of joining the space if you are running the commands from as-admin.
Here is the sequence:
as-admin> join space "repl-unlimited-fdcache--Master"
Join Space:
--------------------------------------------------------------------
Joined space 'repl-unlimited-fdcache--Master'
--------------------------------------------------------------------
as-admin> show space "repl-unlimited-fdcache--Master" locks
Show Space Locks:
___________________________________________________________________
Space Name : repl-unlimited-fdcache--Master
No space entries are locked
____________________________________________________________________
Also, these steps would provide more details on the overall cluster members:
1). Create a file called “as-config.txt” in the working directory of the BE engine. The working directory of BE engine is in the directory from which the BE engine command is launched. If launched using a script or batch file, check if this has any “cd <dir>” and the BE startup command then as-config.txt should be in the “<dir>”. In a test, the working directory for the BE engine was c:\temp, so it copied the file there.
2). In the file, add a line that looks like the following:
pid=* debug.port=<port num>-*
This will make sure that all engines (pid=* or can specify just one pid instead of *) will enable a debug port in the port range (port number starting and ending can be given too instead of port-num-*).
Note: If the AS version is 2.1, replace the “=” sign above in “pid= and debug.port=” with a empty space.
A sample file (Filename:fdcache.cdd) is attached to this KB.
3.) If you are deploying TIBCO BusinessEvents applications in TIBCO Administrator, you can add the property 'as.debug.port" in the application TRA file under location <TIBCO_HOME>/tra/domain/application/<application_name>.
The value to set is the port number as indicated in the sample entry below:
as.debug.port=55559-*
4). Once you create this file with said format or in case of admin deployments, specify the property in application TRA file. From a different command window do a “telnet <ip address on which TIBCO BusinessEvents engine connected to metaspace> <port number from as-config.txt file>.
as-config.txt example:
pid=* debug.port=9696-*
Assume the TIBCO BusinessEvents engine picked up IP 192.168.1.120 for the AS connection, then “telnet 192.168.1.120 9696” will provide the CLI type interface where you can type the commands listed below. Port number can start from the lower range and each engine will pick the next one for the debug port.
In a sample test scenario, the engine is started after setting the debug-port in the as-config.txt file you could see two extra lines printed in the Active Spaces log file. The first engine you start on the machine picks the port that you configured and the next engine will automatically pick up next port in the sequence. In a sample case, the cache agent picked up 55550 port and inference picked up 55551 port.
[2013-06-21 14:56:20:876][19972][22152][INFO][transport] ip_address=10.98.196.207 port=55550
[2013-06-21 14:56:20:877][19972][22152][INFO][common] debug port opened, port=55550
The best way to open the Telnet window is to first identify the debug-port assigned to the engine through the Active Spaces log files. You can search for the key word “debug port opened” to get the port number. Once you pick the IP and port do a Telnet for that engine, such as “telnet <ip> <port>”, run the commands in step 4.
4). The following commands would help identifying the cluster state. When you run each command, the result is written to your Active Spaces log file for that ProcessID. You can capture the Active Spaces log file immediately before it rolls over.
a). “ql verify gmp”
b). “log gmp <metaspace name>”
c). “log space <space name>” - <space name> is the space that is stuck, the master space should do here.
Note: The space name and metaspace name do not require quotes around them.
Collect the output of commands for Manager node, one Seeder and one Leech. You can connect to a cluster from as-admin and do a show members and pick the desired BE engines.