TIBCO Streaming Application Queue Self-Monitoring

TIBCO Streaming Application Queue Self-Monitoring

book

Article ID: KB0073849

calendar_today

Updated On:

Products Versions
TIBCO Streaming 10

Description

Our TIBCO Streaming application sometimes starts to queue when the data input rate exceeds our machine throughput capacity. We would like to know when this is happening in order to know when to add additional capacity.

Can we monitor application queue size?

Resolution

Obtain the earliest indication of the potential for queuing by monitoring latency, which is the time a data item is received to the time all output from that data item are emitted. If latency grows then the application is having to wait on a limited resource (CPU, Memory, Disk, Network, or external Service) and may start to queue. You can calculate latency by subtracting an input timestamp using now() or long using nanotime() from a second call just before emitting the output tuple. This, however, doesn't tell you where in the application the added latency is being generated from.

To understand which queue is involved, inspect the EventFlow engine monitoring statistics for Operators, Queues, Threads, Client connections, and Memory use in the running server. This is what the 'sbprofile' command does to generate its logs which are useful for pre-production testing or post-mortem root cause analysis. For second-by-second queue information, use the "StreamBase Monitor Input Adapter" and select only "Enable Module Info". This is described in the documentation here:
  TIBCO Streaming > Adapters Guide > StreamBase Embedded Adapters > StreamBase Monitor Input Adapter

Note that monitoring an application with tens of thousands of operators can produce more data than even an in-process single-threaded client can consume. To eliminate receiving unwanted data, the "StreamBase Monitor" can be further configured by the Java system property "streambase.monitor.predicate" that allows suppression of unwanted types of statistics data and can greatly reduce the processing burden. (Note: this does not change what is reported by 'sbprofile' for troubleshooting.)

For example, this predicate only delivers stream enqueue/dequeue, thread, and queue statistics. Operator data is not delivered:
    systemProperties = {
      "streambase.monitor.predicate" = "isnull(what) || (what!='bsta' && what!='ienq' && what!='oenq' && what!='size' && what!='bprh' && what!='btph')"
    }
This is what LiveView uses for self monitoring.

To only receive current Queue size status, use predicate:
"streambase.monitor.predicate" = "isnull(what) || what='micq' || what='begn' || what='mimq' || what='mltq' || what='mbps' || what='mtsq'"

The various filter items for the "what" field are documented at:
  TIBCO Streaming > API Guide > API Guide Contents > "Java Client API", "com.streambase.sb", "Constants", "Constant Field Values"
  com.streambase.sb.monitor.StatKey

Always include "isnull(what) || what='begn'" in the predicate so that the "StreamBase Monitor" receives the header record to tell it when to emit a tuple.

From the "StreamBase Monitor" adapter, Module Info output port, the current queue-length value is in the list queues.currentQueueLength and this may be compared to a threshold or compared to previous values to determine growth. There will always be some minor queuing even in a well running server, but the queue should not persist at elevated levels for more than a small number of seconds.

For example, as long as data continues to flow:
A. If the currentQueueLength plateaus at a non-zero value, then the application is just keeping up with the rate of data input.
B. If the currentQueueLength is increasing, then the application is not keeping up with the rate of data input. This may be for a short period due to a temporary burst of data. This could be a persistent condition and some administrative action will be needed to either reduce the rate of input data, add resources, or add an additional node to the cluster to split the load.
C. If the currentQueueLength is decreasing from a higher value, then the cause of the higher value has stopped and the application is draining the queue.



 

Issue/Introduction

Design guidance