Streaming node startup is slow when NodeDeploy defines multiple nodes

Products	Versions
TIBCO Streaming	10.x and later

Description

Our TIBCO Streaming NodeDeploy HOCON configuration declares multiple nodes.
We find that starting the first node takes longer for every additional node we add to the NodeDeploy "nodes" list.

Why is that happening and what can we do to make node startup faster?

Resolution

When starting up, a node will check for other nodes it knows about from the NodeDeploy configuration to see if they are active in the cluster. The more nodes it knows to check for, the longer the wait.

The wait for each node is performed serially, and the timeout for determining whether a specific node is running is managed by the 'nodeActiveTimeoutSeconds' in the ApplicationDefinition HOCON configuration, for example:

ApplicationDefinition = {
  execution {
    dataTransport = {
      nodeActiveTimeoutSeconds = 60

The default nodeActiveTimeoutSeconds value is 60 seconds for each defined node.

If this is changed to be longer, for example:
nodeActiveTimeoutSeconds = 300 (5 minutes)
then the delay will be increased by this difference multiplied by the number of defined nodes.

The delay is most severe for the first node since it will have to wait the full timeout period for each node since no other node in the cluster is running. The second node will start more quickly because when it checks on the first node the first node will respond quickly and the timeout will be skipped. The last node to start will receive quick responses from all the other nodes and not have to wait at all.

We advise that the nodeActiveTimeoutSeconds delay not be reduced to the point where an active node cannot respond quickly enough and the newly started node misses seeing its response. The optimal setting for nodeActiveTimeoutSeconds must be determined by experiment in each new operational environment.

A NodeDeploy which has multiple nodes defined looks like this:

name = "NodeDeployment"
version = "1.0"
type = "com.tibco.ep.dtm.configuration.node"
configuration = {
  NodeDeploy = {
    nodes = {
      "A.X" = { }
      "B.X" = { }
      "C.X" = { }
      "D.X" = { }
    }
  }
}

The NodeDeploy configuration is best not included in the Application archive, but maintained separately and specified on the installation command-line like this:
epadmin install node --nodename=A.X --application=myapplication.zip --nodedeploy=mydeploy.conf
When deploying this way, you may use a different NodeDeploy configuration for each node that only includes the configuration for one node. This will prevent checking for other nodes until later and avoid timeouts. All nodes in a cluster will still find each other. This would look like:
epadmin install node --nodename=A.X --application=myapplication.zip --nodedeploy=mydeploy_for_A.conf
epadmin install node --nodename=B.X --application=myapplication.zip --nodedeploy=mydeploy_for_B.conf
and each NodeDeploy file would only have that single node's declaration.

Note that when using Proxy Discovery in 10.5 and later, the NodeDeply configuration must declare all the nodes since they must be told all IP addresses in advance of launch so that Discovery works. Startup time will be a function of "nodeActiveTimeoutSeconds" multiplied by the number of declared nodes for all nodes which are not installed.

Issue/Introduction

Configuration guidance

Welcome to "KB Articles"