How to start a TIBCO Streaming node after reboot without re-installing the node

How to start a TIBCO Streaming node after reboot without re-installing the node

book

Article ID: KB0072132

calendar_today

Updated On:

Products Versions
TIBCO Streaming 11

Description

If a node is running on a system, and is stopped using 'epadmin stop node', and the system is restarted, after reboot the node cannot be started due to errors like:
  • "Failed to resolve service name "node.cluster" : service not found."
  • "The node was not shutdown cleanly and must be removed"
How can the node be stopped cleanly and restarted so that a new 'epadmin install node' is not required?

Issue/Introduction

Node administration guidance and maintenance script examples

Resolution

To stop a node and its node services so that the node may be restarted later after a system operating system reboot, use two commands in order:
  1. epadmin --servicename=node.cluster stop node
  2. epadmin --servicename=node.cluster terminate node
Once the 'terminate node' command is complete, the node services are stopped cleanly and the system may be turned off, slept, or rebooted without damaging the node transactional memory state.

This sequence of commands may be scripted for easier administration...

For Linux:

1. Install and start node
Filename: epstart
# use: epstart A.X application.zip
epadmin install node --nodename $1 application $2;epadmin --servicename $1 start node
2. Stop & terminate node
Filename: epstop
# use: epstop A.X
epadmin --servicename=$1 stop node;epadmin --servicename=$1 terminate node
3. Restart terminated node
Filename: eprestart
# use: eprestart A.X
epadmin start node --installpath=$1

For Windows:

1. Install and start node
Filename: epstart.bat
REM use: epstart A.X application.zip
epadmin install node --nodename %1 --application %2 && epadmin --servicename %1 start node
2. Stop & terminate node
Filename: epstop.bat
REM use: epstop A.X
epadmin --servicename %1 stop node && epadmin --servicename %1 terminate node
3. Restart terminated node
Filename: eprestart.bat
REM use: eprestart A.X
epadmin start node --installpath %1

For example, here is a simple application which logs a timestamp every five seconds. The console output from this test was:
 
C:\tmp\test>epadmin install node nodename=A.chp application=ChirpApp-0.1.zip
[A.chp]         Installing node
[A.chp]                 PRODUCTION executables
[A.chp]                 File shared memory
[A.chp]                 7 concurrent allocation segments
[A.chp]                 Host name gbuhtz-p50
[A.chp]                 Container tibco/sb
[A.chp]                 Starting container services
[A.chp]                 Loading node configuration
[A.chp]                 Auditing node security
[A.chp]                 Deploying application
[A.chp]                         Engine default-engine-for-com.sbsupport.Chirp
[A.chp]                 Application deployed
[A.chp]                 Administration port is 63701
[A.chp]                 Discovery Service running on port 54321
[A.chp]                 Service name is A.chp
[A.chp]         Node installed

C:\tmp\test>epadmin servicename=A.chp start node
[A.chp]         Starting node
[A.chp]                 Engine application::default-engine-for-com.sbsupport.Chirp started
[A.chp]                 Loading node configuration
[A.chp]                 Auditing node security
[A.chp]                 Host name gbuhtz-p50
[A.chp]                 Administration port is 63701
[A.chp]                 Discovery Service running on port 54321
[A.chp]                 Service name is A.chp
[A.chp]         Node started

C:\tmp\test>epadmin servicename=A.chp tail logging
[A.chp] default-engine-for-com.sbsupport.Chirp:2019-10-07 13:16:55.014000-0400 [AsyncLogger] INFO  Logger: 2019-10-07 13:16:55.000-0400
[A.chp] default-engine-for-com.sbsupport.Chirp:2019-10-07 13:17:00.000000-0400 [AsyncLogger] INFO  Logger: 2019-10-07 13:17:00.000-0400
^C

C:\tmp\test>epadmin servicename=A.chp stop node
[A.chp]         Stopping node
[A.chp]                 Engine application::default-engine-for-com.sbsupport.Chirp stopped
[A.chp]         Node stopped

C:\tmp\test>epadmin servicename=A.chp terminate node
[A.chp]                 Shutting down container services

C:\tmp\test>epadmin start node installpath=./A.chp
[A.chp]         Starting node
[A.chp]                 Coordinator started
[A.chp]                 Loading node configuration
[A.chp]                 Auditing node security
[A.chp]                 Engine application::default-engine-for-com.sbsupport.Chirp started
[A.chp]                 Loading node configuration
[A.chp]                 Auditing node security
[A.chp]                 Host name localhost
[A.chp]                 Administration port is 63701
[A.chp]                 Discovery Service running on port 54321
[A.chp]                 Service name is A.chp
[A.chp]         Node started

C:\tmp\test>epadmin servicename=A.chp tail logging
[A.chp] default-engine-for-com.sbsupport.Chirp:2019-10-07 13:18:00.002000-0400 [AsyncLogger] INFO  Logger: 2019-10-07 13:18:00.000-0400
[A.chp] default-engine-for-com.sbsupport.Chirp:2019-10-07 13:18:05.001000-0400 [AsyncLogger] INFO  Logger: 2019-10-07 13:18:05.000-0400
^C

C:\tmp\test>

This test shows that the node and its services were able to start without error after being stopped and terminated cleanly. After terminate and before restart, the system may be rebooted without affecting the node.

If a system shuts down abruptly, after reboot the node will need to be re-installed and obtain its state from the other nodes in the cluster. since 'terminate node' was not used to preserve the node state.