Configure BW client reconnections with EMS Server

Configure BW client reconnections with EMS Server

book

Article ID: KB0076482

calendar_today

Updated On:

Products Versions
TIBCO ActiveMatrix BusinessWorks 5.x

Description

In this article we will discuss how user can configure BW client reconnections with EMS Server and commonly known issues around this area.
BW use underlying EMS client library to enable the feature of reconnect during any disconnections between EMS Server and BW Client. To configure the reconnect feature of EMS Client library, we need to set/tune the below parameters as per the requirement.  Please note, setting these parameters do not necessarily guarantee the client will always be able to reconnect to EMS Server for any disconnections. To achieve maximum EMS reconnect rate,  we need to sync all the parameters accordingly. It will be discussed details in the later part of this article.

We need to ensure we have the below property enable as per the requirement:

•    BW JMS Connection should be using a FT EMS Url (for FT EMS Servers) or comma separated url for a single EMS Server. Please note, unless your EMS is configured with FT Url, client won’t be able to utilize the reconnections functionality. 
•    If JNDI lookup is not enabled in the shared JMS connection used by the BW process, the reconnection properties can be configured in the application's tra file as follows: (property values set in this article are just for illustration purposes, you need to set the value as per your requirement or design)
 
                    java.property.com.tibco.tibjms.reconnect.attempts 12,10000
 
•    If the BW process uses a JMS connection where JNDI lookup is enabled, the following properties need to be set for the corresponding factories in the factories.conf file of the JNDI EMS server: (property values are just for examples)
 
connect_attempt_count = 5
connect_attempt_delay = 10000
reconnect_attempt_count= 12
reconnect_attempt_delay= 10000
 
When BW application is started and connects to the JNDI EMS Server, it retrieves the properties from the respective factory and then sets itself up. The reconnect parameters set in the factories.conf file can be used to achieve the reconnection by BW. 

•    ft_reconnect_timeout should be set.

The following configuration parameters are involved in this topic:

- ft_reconnect_timeout: The amount of time (in seconds) that a backup server waits for clients to reconnect. If a client does not reconnect within this time period, the server removes its state from the shared state files. It starts once the server has fully recovered the shared state. The default is 60. 

- reconnect_attempt_count: When attempting to reconnect, the client iterates through its URL list until it re-establishes a connection with an EMS server. This property determines the maximum number of iterations. When absent, the default is 4.

- reconnect_attempt_delay: When attempting to reconnect, the client sleeps for this interval (in milliseconds) between iterations through its URL list. When absent, the default is 500 milliseconds.

Please note, you need to set the values of the above parameters according to your requirement. With the above example of reconnect_attempt_count (12) and reconnect_attempt_delay (10000) setting, in case of any disconnections, BW Client will be trying to reconnect to the ems server 12 times with 10 seconds delay, so it will cover an outage period of (reconnect_attempt_count* reconnect_attempt_delay) which is 120 secs. So in case you have an ems issue (network outage) that lasts for more than 120 secs, client will not connect to ems server. This is expected. Also, you need to set the ft_reconnect_timeout accordingly so that EMS Server will not purge the connections when BW Client will try to reconnect to the EMS server (period of  reconnect_attempt_count* reconnect_attempt_delay). 
How to coordinate the ft_reconnect_timeout value with ReconnAttempCount and ReconnAttemptDelay from KB 000013345
If Ft_reconnect_timeout value is set  low and reconnect_attempt_count* reconnect_attempt_delay  is  high, the client still continues to try connecting to the server(s) after the server had purged the connection. You may run into "reconnect failed: connection unknown", which indicates that the server does not have or no longer has the client connections information. 
If Ft_reconnect_timeout value is set to high  and a lot of connections and connection related objects are kept in the memory for a long time, you may have a memory issue. And if the connection is using clientID, you may run into “clientID already exists” issue. 

The following are EMS Team recommendations:

- reconnect_attempt_delay should be less than  ft_reconnect_timeout, so that the client at least have one chance to connect to the EMS server before the EMS server purge the connection. EMS server can accept the connections as soon as it starts up, even though the operations may be blocked while the server recovering the datastore.

- If only one EMS server running, but the client use multiple URLs pointing to the same server to automatically reconnect to the server when it becomes available after a failure, the reconnect_attempt_count* reconnect_attempt_delay should be cover the EMS server down time, that is, when EMS server is not running. 

- If you have a lot of connections (more than 500), you should configure reconnect_attempt_delay more than 1 second to avoid too many reconnecting in a short period time. In our code, we have implemented a random delay for each reconnect attempt. 

- If you feel that the default 60 seconds ft_reconnect_timeout is not enough for any reasons like a busy EMS server or a lot of concurrent EMS clients, please increase the ft_reconnect_timeout value.

Also another important point: users have to make sure that client and server can detect network disconnection in timely manner so that client can reconnect back to server. In many cases, we noticed a delay in client/server side to detect the network disconnections, mainly if the EMS and BW are hosted in a different subnet.
 
- client_heartbeat_server and server_timeout_client_connection - These instruct the server when to drop the connection.
 
- server_heartbeat_client and client_timeout_server_connection - These instruct the client when to drop the connection.

There are set of parameters that instruct the server to drop the connection when not receiving heartbeats from the client, and other set of parameters that instruct the client to drop the connection when not receiving heartbeat from the server.
 
If the parameters are not set, the server/client will drop the connections upon OS notification.  That is, only when the OS reports the connection as broken will the EMS Server detect the failure be notified.  The timeout will vary depending on the OS. For example, if there is a network outage between the client and server, then upon OS notification, the server will disconnect the client.  It is also possible that the client machine would detect that the connection is broken before the server machine does.
 
We suggest that you use the set of heartbeat parameters in EMS server side so client gets the capability to send heartbeat, a timeout parameter can also be set so that both server and client can detect the absence of the heartbeat and close the connection. 
 
Please set the following parameters in EMS main configuration files and then restart the EMS server to take effect.
 
client_heartbeat_server=10
client_timeout_server_connection=35
server_heartbeat_client=10
server_timeout_client_connection=35
 
With above settings, client applications should be able to know disconnection from EMS server after 35 seconds and start to reconnect.

Issue/Introduction

Configure BW client reconnections with EMS Server

Environment

Product: TIBCO ActiveMatrix BusinessWorks Version: 5.x OS: All Supported Operating Systems

Additional Information

https://support.tibco.com/s/article/Tibco-KnowledgeArticle-Article-29948
https://support.tibco.com/s/article/Tibco-KnowledgeArticle-Article-25656
https://support.tibco.com/s/article/Tibco-KnowledgeArticle-Article-25180