Queue grows before Aggregate operator

Queue grows before Aggregate operator

book

Article ID: KB0073602

calendar_today

Updated On:

Products Versions
TIBCO Streaming -

Description

The queue ahead of the Aggregate operator builds up rapidly, increasing latency and memory use. What is the best way improve the performance of the Aggregate operator?

Issue/Introduction

Design guidance for performance

Resolution

It is possible to overload the Aggregate operator with too many tuples-per-second (tps) based on how much work has to be done in the Aggregate operator thread. In order to handle a higher tps rate the work needs to be split across multiple Aggregate operators each in their own threads. 

Splitting the data into the Aggregate operator is only available when using Group Options because each individual group can be processed in a separate operator instance. If there is only one windowed group, then all input must be processed by the same Aggregate operator instance for the calculation result to be valid. In the case shown here, the field "n1" is being used to group and has 20 or more unique values by which to distinguish each group by the hash result, resulting in each tuple of a group being directed to the same Aggregate instance and all Aggregate instances being nearly uniformly used. This is the ideal use-case.

Concurrency tab settings for parallelism, 20 instance multiplicity, and Numeric dispatch style for performance.

The key parts of this configuration are:
1. The Aggregate is using Group Options to split windows by a data value.
2. The Aggregate is set to many instances, but less than or equal to the number of unique hashes produced by the 'hash(string-field)' function.
3. The Aggregate is made Concurrent by enabling "Run each instance of this component in a parallel region". 

If the grouping field is a number, then the field value can be used directly without using the 'hash()' function. 

StreamBase 7.x: To determine the number of Aggregate operators needed, drive data in at various rates and monitor the application with Studio SB Manager, sbmonitor, or sbprofile to identify at what rate persistent queuing begins. Note that brief queuing for several seconds may occur with tps rates greater than 10,000 tps, but with enough instances any queuing is also frequently drawn back down to single digits queued.

TIBCO Streaming 10.x: To determine the number of Aggregate operators needed, drive data in at various rates and monitor the application with Studio SB Manager, SB Cluster Monitor, or sbprofile to identify at what rate persistent queuing begins. Note that brief queuing for several seconds may occur with tps rates greater than 10,000 tps, but with enough instances any queuing is also frequently drawn back down to single digits queued.

Will I need as many individual CPU units available as Aggregate operator instances?

No. In practice when an Aggregate operator instance is fully utilized (moving tuples as fast as possible), it does not use the entire CPU time-slice due to internal blocking on shared resources. This allows another thread into the core to do its work. You will need fewer individual physical CPUs than concurrent Aggregate operator instances and still see improved throughput.