StreamBase shutdown errors and exceptions

StreamBase shutdown errors and exceptions

book

Article ID: KB0073735

calendar_today

Updated On:

Products Versions
TIBCO Streaming 7

Description

When a Streaming server is shutting down, errors may be reported in the server console log and by clients (StreamBase Client API, LiveView Client API, websockets, and other persistent connections). 

Here are some examples...

Errors reported by the server

WARN  c.streambase.sb.runtime.RuntimeEnv - Error at operator: Publisher1.Worker1.DataFeed, port: -1, tuple: , exception: Adapter Exception: Operator thread OperatorThread(Publisher1.Worker1.DataFeed:1) shut down while trying to send output

ERROR com.streambase.sb.operator.Operator - Change to state 'SHUTDOWN' timed out for Operator 'Publisher1.Worker1.DataFeed', Class 'com.streambase.sb.adapter.csvreader.CSVSocketReader', shutting down the operator

[INFO] [TaskRunner] SBD2SBDInput - Shutdown called
[WARN] [OperatorThread(Processor.SBD2SBDInput)] SBD2SBDInput - Connection down to sb://10.0.0.2:61300/default.ConfigOUT

WARN  com.tibco.ep.dtm.javaruntime: (ktvmmgr.cpp:3669) JVM shutdown stalled.  Attempting to abort outstanding transactions.

WARN  com.streambase.sb.runtime.MainModule$SharedObjectContainer: In attempt to shutdown shared object vehicle.ActiveSpaces DataGrid Manager: expected anything but state SHUTDOWN, got state INITIAL

Error reported by a Java Client:

(may also be reported by the server)

Exception in thread "DequeueThread(sb://10.0.0.2:10011/)" java.lang.NullPointerException

WARN  c.s.s.a.liveview.LiveViewPublish - Adapter Feed: Not connected to https://10.0.0.2:10443, tuple dropped: ...

ERROR com.streambase.sb.operator.Operator - Change to state 'SHUTDOWN' timed out for Operator 'default.Worker1.CalculateExposure.LV_UpSert', Class 'com.streambase.sb.adapter.liveview.LiveViewPublish', shutting down the operator

Issue/Introduction

Causes of exceptions and errors when the server is shutting down.

Resolution

Errors that occur AFTER shutdown is started can be safely ignored.

Typical log messages showing shutdown has started:

INFO  com.tibco.ep.dtm.runtime: Stopping engine with exit code 0, sending shutdown signal
INFO  com.tibco.ep.dtm.runtime: Sending pid (2979038) signal (SW_SHUTDOWN_SIGNAL)(30)
INFO  com.tibco.ep.dtm.runtime: Pid (2979038) received signal (30)
INFO  com.tibco.ep.dtm.runtime: SWSignal::wait returning with termination signal (Terminate)(30)
INFO  com.tibco.ep.dtm.engine: Performing engine exit
INFO  com.tibco.ep.dtm.highavailability.distribution: Network thread finished executing.
INFO  com.tibco.ep.dtm.javaruntime: Component: component pre-shutdown handler default-engine-for-*
INFO  com.tibco.ep.dtm.javaruntime: Component: Unloading component
INFO  com.tibco.ep.dtm.javaruntime: Component: Component "name" unloaded

Explanation

A StreamBaseClient or Adapter can be in one of several states when the server disconnects. Some of these states, such as sending a tuple or being blocked on an external resource, lead to race conditions where an error is reported instead of a silent or INFO-level disconnection. Race conditions are unavoidable in a multi-threaded high-throughput streaming system where, during a shutdown, some subsystems may be alive longer than others in no specific order.

In all cases, the server is shutting down so activities are expected to be interrupted. No errors  following the beginning of a server shutdown should be considered important or serious. If the shutdown was unexpected, then the initiating error condition or activity is important, but all following errors are side effects and may be ignored for the purpose of identifying the root cause.

During a shutdown initiated by 'epadmin stop node', adapter disconnection errors are minimized through using the expected 3rd-party APIs to cleanly disconnect from external services. However, if a connection cannot be stopped cleanly, an abrupt disconnect will occur after a short timeout so that the server is not blocked for arbitrarily long periods.

To reduce shutdown errors, when a planned shutdown will be initiated, first stop all data input to the server so the server can settle into an idle state. This is typically accomplished by sending a "disconnect" command tuple into the control ports of all Input Adapters. After a short wait, initiate the server shut down. You may wish to have the server send a message to all clients indicating a shutdown is imminent so clients can initiate disconnection themselves.

Unplanned Shutdown or Crash

Shutdowns initiated by a server error should be investigated. These shutdowns are often reported with a non-zero exit code, for example:

INFO  com.tibco.ep.dtm.runtime: Stopping engine with exit code -1, sending shutdown signal

Again, errors that occur AFTER shutdown starts are typically unrelated to the root cause of the unplanned shutdown and are side effects of the initiating error condition. ERROR and WARN messages BEFORE the unplanned shutdown, perhaps up to an hour before, should be reviewed for possible insight into what caused the failure. For example, an error like this could lead to an unplanned shutdown:

ERROR com.tibco.ep.dtm.stderr: Exception in thread "Thread-1" java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached