Minimum Setup for sharing a Query Table between Streaming nodes

Products	Versions
TIBCO Streaming	10

Description

What are the configurations that need to be done if I want to share a query table across nodes in a cluster?

Resolution

Sharing a Query Table involves changing the location of the Query Table to cluster Transactional Memory.

In the Query Table properties, set:
Table Settings:
Type = In transactional memory
table type, in transactional memory

Data Distribution:
check Distribute Data
Policy Name: default-dynamic-data-distribution-policy
Mapper Type: Round-Robin
table distribution, round-robin

You may also select a Mapper Type of "Hash" and key fields from the table schema to use to compute the hash value so that similar rows are distributed together.

In the product documentation, see: TIBCO Streaming > Authoring Guide > Using Query Tables > Persistence of Query Tables, "Transactional Memory Option"

In the associated Query Read operators, set the scope to "Cluster-wide" to see everything from any node, but this will be slower than using "Local node" (which only returns rows that are physically on this node). The "Local node" scope is best combined with a Mapper Type of Hash so the Query has immediate access to frequently used Hash "key" values for the node from which the query is executed. For example, if you have nodes in Glasgow, Bangor, and Cardiff, then you may hash on "Town" with the expectation that most queries executed in each location will be interested in table data related to that location.

Frequently Asked Questions

Q: Are there any changes needed to the HOCON configuration?
A: No HOCON changes for the simplest setup. If you want a custom distribution policy for special needs, then that's a HOCON change.

Q: Is there any other way to share data among nodes in a Highly Available (HA) mode other than using query tables?
A: No. Query Tables in Transactional Memory are the only way to share "state" between nodes. The Distributed Router operator will send single tuples to the node that should process that tuple, but this is mainly for load balancing and does not persist the tuples to support fail-over.

Q: What can be done to improve application speed when using Query Tables in Transactional Memory?
A:

1) Keep the recommended number of partitions (64) in any distribution policy you define. The "default-dynamic-data-distribution-policy" is always available, but you may select your own if you have created a different policy. Note that any distribution policy should have 10 or more partitions. The default policy has 64 in order to provide granularity when copying data between nodes so it can transmit small sets of data in each transaction. Using fewer partitions may cause transactions to temporarily block each other, slowing table updates.

2) Put shared Query Tables at the beginning (for reference to the original input values) or end (for computed results) of the logical flow for a tuple. Not every intermediate Query Table that is supporting transforming the tuple needs to be in Transactional Memory. Tables that are rapidly changing are best kept in the Heap and use Transactional Memory for more slower-paced results and updates.

Q: With this approach even if one node goes down, the data is still available in the Transactional Memory as long as minimum one node is running. Am I correct?
A: Yes. The default way Transactional Memory works is that every Query Table row has a primary node it is "local" on and at up to two replica nodes on which the row is also local. To all other nodes that row is remote and only available via a Cluster-wide query.
Custom dynamic data-distribution-policy configurations should be designed to preserve this behavior. Static policies have a Primary and Replica relationship that is used to duplicate the data in Transactional Memory between specific (not dynamically determined) nodes.

Q: So the node data volumes should have the capacity to handle the distributed data right?
A: Transactional Memory is not stored in the JVM (engine) Heap, so ideally the system will have available RAM apart from what is allocated to the Heap that can be used for this. On Linux operating systems that use System-5 shared memory, this is RAM and Swap (disk used as needed). On other operating systems this is emulated entirely in a file in the disk file-system (and therefore slower).

Q: Transactional Memory runs as separate distributed memory in the cluster. Right?
A: Yes, Transactional Memory is maintained by cluster services which a StreamBase, LiveView, or Java engine may take advantage of. Once you have installed one node you then have cluster services available, even without an engine running (for example: ' install node' starts cluster services, ' start node' runs the engines that make use of cluster services).

Issue/Introduction

Configuration guidance

Welcome to "KB Articles"