Detecting if a Streaming system has too few CPUs

Products	Versions
TIBCO Streaming	-

Description

The Streaming application runs more slowly than expected.
How can we tell if the application needs additional CPU units?

Environment

Microsoft Windows, Linux

Resolution

A Central Processing Unit (CPU) "core" is the computer hardware element which can run a single active thread. In modern systems a CPU may have multiple cores, and each core may be hyper-threaded, allowing two threads to interleave operations. A program thread may be active, on the run queue (waiting for a core), or idle (waiting on input).

When multiple threads are on the run queue the system has too few cores to do the work immediately. A run queue greater than two indicates that some threads had to wait. A system in this state is said to be "oversubscribed". Since work is waiting the application is running slower than it could. This condition may be brief, or there may be enough work requested that the system remains in an oversubscribed state as long as the applications are running.

A TIBCO Streaming application typically has a few adapter threads that are mostly idle until data becomes available, container top-level threads which are actively processing a tuple through their flows, concurrent regions processing their tuples in parallel, timer threads (also mostly idle), and additional threads for managing node and cluster state.

Detect whether a system is frequently or continuously oversubscribed by monitoring the system run queue. On Linux, run 'vmstat'. On Windows, run 'perfmon.msc' or from the command-line run 'typeperf'.

Linux: vmstat

Run:

$ vmstat 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 22281952 252436 1819752    0    0    21    66  195  233  1  1 98  0  0
 1  0      0 22282676 252436 1819768    0    0     0    47 1517 6563  1  1 98  0  0
 0  0      0 22281896 252436 1819780    0    0     0    39 1493 6457  1  1 98  0  0
 0  0      0 22281140 252436 1819796    0    0     0    47 1651 7032  1  1 98  0  0

Columns:

r - run queue length
b - processes sleeping
swpd - virtual memory used (swapped to disk)
free - idle memory
buff - memory used as buffers (I/O)
cache - memory used as cache (essentially free)
si - amount of memory swapped in from disk per second
so - amount memory swapped out to disk per second
bi - disk blocks read per second
bo - disk blocks written per second
in - interrupts per second
cs - context switches per second
us - time spent running non-kernel code
sy - time spent running kernel code (system time)
id - idle time
wa - I/O wait time
st - time stolen from a VM

In the above example which reports every ten seconds, the "r" column indicates that there is no problem, the run queue is frequently empty. Threads are not needing to wait.

For more information on the 'vmstat' command run:
man vmstat

Windows: typeperf

Run:

>typeperf "\System\Processor Queue Length" -si 10
"(PDH-CSV 4.0)","\\GBUHTZ-P50\System\Processor Queue Length"
"05/25/2021 10:32:41.215","3.000000"
"05/25/2021 10:32:51.222","0.000000"
...

In the above example which reports every 10 seconds, the run queue counter value indicates that there is no problem, the run queue is frequently empty.

For more information on the 'typeperf' command, see:
https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/typeperf

For more information on the Windows Performance Counters, see:
https://docs.microsoft.com/en-us/windows/win32/perfctrs/performance-counters-portal

Issue/Introduction

Troubleshooting guidance

Welcome to "KB Articles"