In a cluster, the Spotfire Server may go offline due to long garbage collection (GC) pauses. In the catalina.log you may see a warning like:
WARNING [jvm-pause-detector-worker] org.apache.ignite.logger.java.JavaLogger.warning Possible too long JVM pause: 816 milliseconds.
In the server.log you would see the following errors:
WARN 2019-03-22T18:08:56,678-0400 [] discovery.tcp.TcpDiscoverySpi: Timed out waiting for message delivery receipt (most probably, the reason is in long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout' configuration property). Will retry to send message with increased timeout [currentTimeout=10000, rmtAddr=TIBCO.Spotfire.Server/xx.xx.xx.xx:xxxx, rmtPort=xxxx]
WARN 2019-03-22T18:08:56,691-0400 [] discovery.tcp.TcpDiscoverySpi: Failed to send message to next node [msg=TcpDiscoveryConnectionCheckMessage [super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=160cc2f6961-299208f8-08f8-44d8-a99d-a4a3a5df7537, verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], next=TcpDiscoveryNode [id=b80724c4-1a62-4149-b616-56284fe4a6f8, addrs=[xx.xx.xx.xx], sockAddrs=[TIBCO.Spotfire.Server/xx.xx.xx.xx:xxxx], discPort=5702, order=6, intOrder=4, lastExchangeTime=1552305360552, loc=false, ver=2.5.0#20180523-sha1:86e110c7, isClient=false], errMsg=Failed to send message to next node [msg=TcpDiscoveryConnectionCheckMessage [super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=160cc2f6961-299208f8-08f8-44d8-a99d-a4a3a5df7537, verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], next=ClusterNode [id=b80724c4-1a62-4149-b616-56284fe4a6f8, order=6, addr=[10.209.129.158], daemon=false]]]
WARN 2019-03-22T18:08:56,693-0400 [] discovery.tcp.TcpDiscoverySpi: Local node has detected failed nodes and started cluster-wide procedure. To speed up failure detection please see 'Failure Detection' section under javadoc for 'TcpDiscoverySpi'
...
WARN 2019-03-22T18:08:56,977-0400 [] discovery.tcp.TcpDiscoverySpi: Node is out of topology (probably, due to short-time network problems).
INFO 2019-03-22T18:08:56,982-0400 [] discovery.tcp.TcpDiscoverySpi: Finished serving remote node connection [rmtAddr=/xx.xx.xx.xx:xxxx, rmtPort=xxxx
WARN 2019-03-22T18:08:56,994-0400 [] managers.discovery.GridDiscoveryManager: Local node SEGMENTED: TcpDiscoveryNode [id=b80724c4-1a62-4149-b616-56284fe4a6f8, addrs=[10.209.129.158], sockAddrs=[TIBCO.Spotfire.Server/xx.xx.xx.xx:xxxx], discPort=xxxx, order=6, intOrder=4, lastExchangeTime=1553292536994, loc=true, ver=2.5.0#20180523-sha1:86e110c7, isClient=false]
ERROR 2019-03-22T18:08:57,014-0400 [] : Critical system error detected. Will be handled accordingly to configured handler [hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread tcp-disco-srvr-#3%TIBCO-Spotfire% is terminated unexpectedly.]]
java.lang.IllegalStateException: Thread tcp-disco-srvr-#3%TIBCO-Spotfire% is terminated unexpectedly.
at org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686) ~[ignite-core.jar:2.5.0]
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) ~[ignite-core.jar:2.5.0]
ERROR 2019-03-22T18:08:57,014-0400 [] : JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread tcp-disco-srvr-#3%TIBCO-Spotfire% is terminated unexpectedly.]]
Apache Ignite is sensitive to long GC pauses (i.e. a few seconds) and this, high CPU utilization, high memory utilization, or network communication issues can cause cluster segmentation issues and cluster disconnects which in turn causes the Spotfire Server to shut down.