Troubleshooting TIBCO Streaming Multi-Threaded Query Performance

Products	Versions
TIBCO Streaming	10

Description

What are the main reasons for poor Query Table reading and writing performance?

Resolution

The main reason is that there are multiple reader and writer threads attempting to use the Query Table (QT) at the same time resulting in blocking. Secondary reasons that may also contribute are that the QT uses a slower storage implementation, such as on-Disk or in Transactional Memory (TM).

Troubleshooting Summary

If you are seeing bias and starvation of certain Query Read and Query Write operators on the same table, it must be due to one or more of:

a) a difference in the query criteria and use of indexes,
b) a difference in output expressions between the operators,
c) a difference in how the data is causing more frequent updates versus inserts,
d) an correlation with the timing of activity that leads to blocking.

Architectural Considerations:

Different implementations work at different speeds. Query Tables may be implemented in Heap memory (fastest), in Transactional Memory (less fast), and on-Disk (slowest). If implemented on-Disk, only indexes are preserved in memory and no row-data is preserved or cached. If in TM, this may also be on-Disk, but there is a Physical Memory cache to assist read access.
Multiple readers and writers in different threads cause blocking. Query Tables may be accessed from multiple threads simultaneously. Reading threads only block Writing threads. Writing threads block Reading and Writing threads. A table lock blocks Writing threads from access to the entire table for the duration of the lock. A table lock is maintained for only as long as the execution thread is in the Query Operator.
Queries should be indexed. For tables with indexes, each query is scored for most efficient use of the available indexes. If there is no index to support the predicate of the query, the score is zero '0' and the query will force a table-scan to produce its result (this will lock the table the longest to obtain the result set).
Queries with output expressions take longer. Queries may return multiple rows, which are then processed: a) individually, b) by an aggregate expression. Multiple rows to be processed individually are placed in a separate cache and the table lock is released. If multiple rows are to be processed by Aggregation expressions within the Query operator, no cache is created and the entire table remains locked for as long as it takes to produce the aggregate result (this will lock the table the longest to process the result set).
There is no implicit bias between similar readers. Readers do not block whether one reader is reading more frequently than another, or whether a reader returns greater or lesser amounts of data. For example, a reader reading six times as much data from the same table as as another reader takes six times as long on average, as expected.
Unblocked writing speed is impacted by the type of write and the number of indexes to maintain. More indexes will marginally slow write actions. Data that results in an insert makes for a faster write, and an update a slower write.

Fastest Use Case

A Read operation will perform the fastest if the query predicate (the selection criteria) uses an index, the index excludes the most rows with the first index field over the rest of the index fields, only a single index needs to be referenced, and the result is emitted one row at a time (Ex: if aggregation is needed, this is done in a downstream Aggregate operator and not in the Query Output expressions).

A Write operation will perform the fastest if there is no other thread actively using the table (otherwise it must wait), is in an insert of a new row, there is only one index to update, and there is no output or output consists of only simple field-copy expressions (no complex output expressions).

Blocking

Blocking occurs when a writer has a table lock, or when a writer is waiting for readers to complete. Table locks are maintained only as long as the tuple remains in the Query operator to perform:
1. result-set collection using the available indexes
2. Output-tab expression processing using that result-set
The table lock is released as soon as the first row of the result-set is emitted from the Query operator.

Continuous read and write access to a table will result in blocking waits. A continuous and rapid stream of read operations may starve write operations indefinitely. The only way to avoid arbitrarily long starvation of writers is to make sure the readers are not working continuously. There must be frequent breaks for the writers to consistently work.

Note that use of the Metronome operator to drive reading and writing from a table will result in blocking behavior because the Metronome sends its tuple output on the "zero" milliseconds (Ex: 10:07:20.0000). For example, if one Metronome sends output every 5 seconds to cause a Read, and another causes a Write every 2 seconds, then every 10 seconds these will be accessing the table in the same millisecond and blocking will occur. This is resolved by putting a slight delay after the Metronome operator so that the operations happen a few milliseconds offset from each other. The EventFlow setup to add a tiny delay is to use a Map operator with row:
Add, delay, sleep(milliseconds(random(10)))
and set the Map's Concurrency to "Run this component in a parallel region".

Query Tables on Disk

Using disk storage for a local the table allows the table to grow arbitrarily large without using additional Heap memory (except for the indexes) and allows the data to be preserved across node shutdown and restart. The additional I/O for doing this causes delay which will make any multiple access blocking issues more likely.

Query Tables in Transactional Memory

Transactional Memory adds the overhead of blocking transactions when a write is performed and when a cluster-wide read is performed to maintain consistency across the nodes. This tends to make any multiple access blocking issues more likely and more severe because transactions are slower to resolve. Two threads in different nodes accessing the table result in a brief Transaction Deadlock which involves a timeout and back-off. Additional delay is incurred if the TM is backed by disk instead of RAM.

For example, to maintain a consistent counter across all nodes requires updating an integer field in a QT-in-TM. When two nodes want to update the count at the same time (conflicting write operations) both will lock the table locally, one will be chosen to delay, the write transaction will complete, and then the other will be released to perform its write. In this case, better performance is obtained by adding a tiny random delay to all writes to the table as described in the "Blocking" section above.

Issue/Introduction

Application Design Guidance

Welcome to "KB Articles"