book
Article ID: KB0083703
calendar_today
Updated On:
Description
It is not uncommon that analysis developers want to work on data housed on different data source locations, and join them as a single table to drive their analyses. In the TIBCO Spotfire software, using the Information Designer tool to build your models, there are many of ways to go about getting the same results that are not always that obvious to the end user. The different ways can impact the performance of the query or the clients and servers differently as well. Here we go through one common scenario that can help decrease the work on all involved components.
Issue/Introduction
This article goes through a scenario when joining tables from different sources and what to think about when creating such Information Links.
Resolution
Many times users will load all available data from big sources when in reality they are only after a small subset of the data. This can often be because the users are not familiar with the data and the source and they therefore want to see what is available and what can be used to drive their analyses. In such situations, we recommend trying to help these users write filtered queries to only retrieve the data they are interested in (if possible).
Other times large amounts of data are needed, or initially needed as input to a join that will then result in a smaller subset of data as outcome. If they have not set up these Information Models to make use of the sources they read the data from, it can impact the performance of the query as well as resources needed by all involved parts of the environment.
For example, the user wants to join tables from the following sources.
- Oracle, Table A, 10 million rows
- MySQL, Table B, 10 000 rows
By default, data sources created in Information Designer will not have the option “Allow create temporary tables” selected. The default behavior when temporary tables can not be used, such as when joining the two tables from table A (10m) and table B (10k), will be to read all rows and send them to the default join data source. The default join data source, unless something else is configured, is the database server hosting the TIBCO Spotfire database schema.
The following steps are performed.
- Read data from table A and send over the network to the database server hosting the Spotfire schema.
- Read data from table B and send over the network to the database server hosting the Spotfire schema.
- Create temporary tables for table A and B.
- Join on the values of the temporary tables.
- Send the result to the client.
In our example, we would probably get much better performance if the join could take place in the Oracle database (table A, where the large data volume resides). This requires that the Oracle data source user in Information Designer has permission to create temporary tables.
The recommended steps, since we know the Oracle table A has the most data of the tables that we use, are to:
- Edit the Oracle data source element in Information Designer and check the “Allow create temporary tables” option.
- Save the data source element after the change.
- Edit the Information Link containing columns from table A and B as well as the join element.
- Set the default join data source to the Oracle data source element that you enabled temporary table creation for.
- Save the Information Link.
You have now specified that the join between tables A and B should be performed on the Oracle database server. The steps it would go through are:
- Read data from table B and send over the network to the database server hosting table A .
- Create a temporary table for table B.
- Join on the values of the temporary table and table A .
- Send the result to the client.
Thanks to the changes made above, you put less stress on the database server hosting the TIBCO Spotfire database schema as well as the TIBCO Spotfire Server as you do not have to transfer the whole set of rows from Table A over the network before being able to execute the join and then send the selected rows back. This should not only affect the load on the system, but also the time to execute the Information Link.