Description: How to rejoin DR backup machines to a cluster. Symptoms: Error messages regarding communications errors and unexpected DR states of AltSync, CommLost, Commwait or DR_Down (generally shown via the "info" command).
Cause: In some instances, backup machines can be dropped from a cluster. This can be triggered by configuration changes, by commands run such as dr-config or ems-config.
Issue/Introduction
How to rejoin DR backup machines to a cluster.
Resolution
Steps to rejoin DR machines to a cluster are listed below:
The main caveat for using the “rejoin” sequence below is that the Production-A machine should be the DR_Leader, and DR should have “Active” status (not “Restore” or “Recover”). This can be viewed in the "info" command output:
********* HOST ROLE: Production-A, DR_Leader: Active (TIBCO-EMS-HI) ********* _____ INSTANCE 0 , ROLE = PRIMARY _____ EMS State - ACTIVE-REPLICATING : Accepting connections on tcp://10.160.5.21:7222 DR Sync Service State - DR_Down ERROR: Sync Service Error, Sync Snapshots Disabled, Manual recovery required. _____ INSTANCE 1 , ROLE = SECONDARY _____ EMS State - STANDBY-REPLICATING : Standing by for failover DR Sync Service State - Waiting
STEPS: 1). Logon to Production-A EMSA. Production-A CLI> key-accept
2). Logon to Production-B. Production-B CLI> key-join
3). Logon to DR-A. Backup-A CLI> key-join
4). Logon to DR-B. Backup-B CLI> key-join
5). Log back in to Production-A. ProductionA CLI> key-accept off
This can then be checked by running the dr-test command on each machine to ensure all machines are appropriately joined.
NOTES: There is no harm in running the key-join again on an appliance that is already joined. The dr-test command is available for support starting in version 2.2.1 to check cluster “joined” status. It will likely be a general “user command” and documented in an upcoming release.