Products | Versions |
---|---|
Spotfire Server | 7.5 and higher |
In a cluster, the Spotfire Server may go offline due to long garbage collection (GC) pauses. In the catalina.log you may see a warning like:
WARNING [jvm-pause-detector-worker] org.apache.ignite.logger.java.JavaLogger.warning Possible too long JVM pause: 816 milliseconds.In the server.log you would see the following errors:
WARN 2019-03-22T18:08:56,678-0400 [] discovery.tcp.TcpDiscoverySpi: Timed out waiting for message delivery receipt (most probably, the reason is in long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout' configuration property). Will retry to send message with increased timeout [currentTimeout=10000, rmtAddr=TIBCO.Spotfire.Server/xx.xx.xx.xx:xxxx, rmtPort=xxxx] WARN 2019-03-22T18:08:56,691-0400 [] discovery.tcp.TcpDiscoverySpi: Failed to send message to next node [msg=TcpDiscoveryConnectionCheckMessage [super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=160cc2f6961-299208f8-08f8-44d8-a99d-a4a3a5df7537, verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], next=TcpDiscoveryNode [id=b80724c4-1a62-4149-b616-56284fe4a6f8, addrs=[xx.xx.xx.xx], sockAddrs=[TIBCO.Spotfire.Server/xx.xx.xx.xx:xxxx], discPort=5702, order=6, intOrder=4, lastExchangeTime=1552305360552, loc=false, ver=2.5.0#20180523-sha1:86e110c7, isClient=false], errMsg=Failed to send message to next node [msg=TcpDiscoveryConnectionCheckMessage [super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=160cc2f6961-299208f8-08f8-44d8-a99d-a4a3a5df7537, verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], next=ClusterNode [id=b80724c4-1a62-4149-b616-56284fe4a6f8, order=6, addr=[10.209.129.158], daemon=false]]] WARN 2019-03-22T18:08:56,693-0400 [] discovery.tcp.TcpDiscoverySpi: Local node has detected failed nodes and started cluster-wide procedure. To speed up failure detection please see 'Failure Detection' section under javadoc for 'TcpDiscoverySpi' ... WARN 2019-03-22T18:08:56,977-0400 [] discovery.tcp.TcpDiscoverySpi: Node is out of topology (probably, due to short-time network problems). INFO 2019-03-22T18:08:56,982-0400 [] discovery.tcp.TcpDiscoverySpi: Finished serving remote node connection [rmtAddr=/xx.xx.xx.xx:xxxx, rmtPort=xxxx WARN 2019-03-22T18:08:56,994-0400 [] managers.discovery.GridDiscoveryManager: Local node SEGMENTED: TcpDiscoveryNode [id=b80724c4-1a62-4149-b616-56284fe4a6f8, addrs=[10.209.129.158], sockAddrs=[TIBCO.Spotfire.Server/xx.xx.xx.xx:xxxx], discPort=xxxx, order=6, intOrder=4, lastExchangeTime=1553292536994, loc=true, ver=2.5.0#20180523-sha1:86e110c7, isClient=false] ERROR 2019-03-22T18:08:57,014-0400 [] : Critical system error detected. Will be handled accordingly to configured handler [hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread tcp-disco-srvr-#3%TIBCO-Spotfire% is terminated unexpectedly.]] java.lang.IllegalStateException: Thread tcp-disco-srvr-#3%TIBCO-Spotfire% is terminated unexpectedly. at org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686) ~[ignite-core.jar:2.5.0] at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) ~[ignite-core.jar:2.5.0] ERROR 2019-03-22T18:08:57,014-0400 [] : JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread tcp-disco-srvr-#3%TIBCO-Spotfire% is terminated unexpectedly.]]
service.bat remove
if "%JvmMs%" == "" set JvmMs=4096 if "%JvmMx%" == "" set JvmMx=4096
service.bat install
service.bat remove
JvmMs=512 JvmMx=4096
service.bat install
set JAVA_HOME=C:\tibco\tss\7.11.0\jdk set JRE_HOME=C:\tibco\tss\7.11.0\jdk\jre set JAVA_OPTS=-server -XX:+DisableExplicitGC -Xms4096M -Xmx4096M set CATALINA_OPTS=-Dcom.sun.management.jmxremote -Dorg.apache.catalina.session.StandardSession.ACTIVITY_CHECK=true -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector -Xms4096M -Xmx4096M
set JAVA_HOME=C:\tibco\tss\10.5.0\jdk set JRE_HOME=C:\tibco\tss\10.5.0\jdk\jre rem Uncomment the line below to enable GC logging set GC_LOG=-XX:+PrintGCDetails -XX:+PrintAdaptiveSizePolicy -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=25M -Xloggc:%CATALINA_HOME%\logs\gc-%%t.log set JAVA_OPTS=-server -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC %GC_LOG% set CATALINA_OPTS=-Xms4096M -Xmx4096M -Dcom.sun.management.jmxremote -Dorg.apache.catalina.session.StandardSession.ACTIVITY_CHECK=true -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector -Djava.library.path="%PATH%;C:\tibco\tss\10.5.0\tomcat\spotfire-lib;C:\tibco\tss\10.5.0\tomcat\custom-ext"
set JAVA_HOME=C:\tibco\tss\10.7.0\jdk set JRE_HOME=C:\tibco\tss\10.7.0\jdk\jre set JvmMs=4096 set JvmMx=4096 rem Uncomment the line below to enable GC logging rem set GC_LOG=-XX:+PrintGCDetails -XX:+PrintAdaptiveSizePolicy -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=25M -Xloggc:%CATALINA_HOME%\logs\gc-%%t.log set JAVA_OPTS=-server -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC %GC_LOG% set CATALINA_OPTS=-Xms%JvmMs%M -Xmx%JvmMx%M -Dcom.sun.management.jmxremote -Dorg.apache.catalina.session.StandardSession.ACTIVITY_CHECK=true -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector -Djava.library.path="%PATH%;C:\tibco\tss\10.7.0\tomcat\spotfire-lib;C:\tibco\tss\10.7.0\tomcat\custom-ext"
<Spotfire Server installation Directory>\tomcat\spotfire-bin> config export-config
<Spotfire Server installation Directory>\tomcat\spotfire-bin> config set-config-prop --name="clustering.apacheignite.timeouts.failure-detection-timeout" --value=60000
<Spotfire Server installation Directory>\tomcat\spotfire-bin> config import-config -c "increased cluster failure detection timeout "
<Spotfire Server installation Directory>\tomcat\bin> config export-config
<Spotfire Server installation Directory>\tomcat\bin> config set-config-prop --name="clustering.apacheignite.timeouts.failure-detection-timeout" --value=60000
<Spotfire Server installation Directory>\tomcat\bin> config import-config -c "increased cluster failure detection timeout "