Incremental data caching approach

Incremental data caching approach

book

Article ID: KB0079858

calendar_today

Updated On:

Products Versions
Spotfire Analyst 7.5 and higher
Spotfire Server 7.5 and higher

Description

In some situations with very large data sets, it may be impractical to completely refresh the cache data each time. It may be desirable to instead only update the most recent data (like for the current month or current day). This caching mode is called incremental data caching.

There is no built-in feature for incremental data caching in TIBCO Spotfire currently, like there is in TIBCO Data Virtualization, but this can still be accomplished with a solution using TIBCO Spotfire Server, TIBCO Spotfire Analyst, and TIBCO Spotfire Automation Services.  In this example solution there are two ways of achieving the same. One solution is the current month’s data is updated regularly (hourly, daily, etc) while the historical months records remain cached and are updated monthly.  This an be accomplished by embedding the historical months data and refreshing only the current month's data through an  Information Link, while having the historical months data refreshed at the start of each month from database
In second approach, current month's data will be updated regularly, while historical data will remain cached in SBDF and at end of every month historical data in SBDF will be updated with current month's data as well avoiding to refresh the historical data from database

Issue/Introduction

This article describes how to achieve incremental data caching in TIBCO Spotfire

Resolution

To implement incremental caching natively in TIBCO Spotfire, the following steps can be used:
Scenario 1:
  1. Create an Information Link for fetching historical months data (all data before the current month) using Tools > Information Designer > Information link and then open the data in the TIBCO Spotfire Analyst client.  Add "HistoricalDateFilter" to the information link which is made up of expression %1<datefromparts(DATEPART(year, getdate()),DATEPART(month, getdate()),01) where it will only pull data whose date is prior to current month's first date. This expression is based on MS SQL server.
    User-added image
  2. Create an Information Link for fetching the current month's data using Tools > Information Designer > Information link Create current month information link.
    User-added image
    Add an filter "CurrentDateFilter' which is made up of expression datefromparts(DATEPART(year, %1),DATEPART(month, %1),01)=datefromparts(DATEPART(year, getdate()),DATEPART(month, getdate()),01) where it will only pull data whose date is equivalent to current month's first date. This expression is based on MS SQL server.
  3. Create a dashboard which will have one data table linked to Current Month data IL and other data table linked to Historical Data IL.
  4. Now insert rows from Historical data table to current month data table and create a dummy integer document property ChangeValue with Value as 0. Beneath this document property create a action control python script which will change the Historical Data Table to embedded
     #HistoricalTable is the script parameter set to Historical Data table from Spotfire.Dxp.Data import DataTableSaveSettings settings = DataTableSaveSettings (HistoricalTable,False, True);  #The 2nd argument decides the type ,False = embedded and True = Linked to source Document.Data.SaveSettings.DataTableSettings.Add(settings);
  5. To refresh Historical data IL every first day of month, you can setup Automation job with following steps which will embed Historical Data table by changing the document property value through configuration block which in turn will trigger iron python script and then save as a new dxp
    1. Open Analysis
      Configuration block : ChangeValue =1;
    2. Save Analysis​ with new name
  6. The above created new dxp can be cached using Scheduled updates daily for Web Player Users which will contain complete dataset but information link will only refresh current month data from source database as data table linked to Historical data IL will be embedded
Scenario 2:
  1. Create an Information Link for fetching historical months data (all data before the current month) using Tools > Information Designer > Information link and then open the data in the TIBCO Spotfire Analyst client:
    User-added image
    Add a filter "HistoricalDateFilter " which is made up of expression %1<datefromparts(DATEPART(year, getdate()),DATEPART(month, getdate()),01) where it will only pull data whose date is prior to current month's first date. This expression is based on MS SQL server.
  2.  After the data is loaded, export the data to the library with File > Export > Data to Library > Save Historical data to SBDF
    User-added image
  3. Create an Information Link for fetching the current month's data using Tools > Information Designer > Information link Create current month information link.
    User-added image
    Add a filter "CurrentDateFilter" which is made up of expression datefromparts(DATEPART(year, %1),DATEPART(month, %1),01)=datefromparts(DATEPART(year, getdate()),DATEPART(month, getdate()),01) where it will only pull data whose date is equivalent to current month's first date. This expression is based on MS SQL server.
  4. Create a dashboard which will have one data table linked to Historical data SBDF and insert rows from current month information link
    User-added image
  5. The above created dashboard can be cached using Scheduled updates daily for Web Player Users which will contain complete dataset but information link will only refresh current month data from source database
  6. Setup TIBCO Spotfire Automation Services job which will open the analysis created in step 3 and Export that data table to Historical SBDF data.
  7. Set this job to run at the end of every month so before new month starts, so current month complete data can be added to cached Historical months data. Automation job need to be setup to run after business hours or at the end of day to avoid duplication of data i.e. if automation job runs on the last day of month, historical sbdf is updated with previous months and current month data. Then if scheduled updates is run to cache the analysis then duplication of current month data would occur as historical sbdf already will have current month data and current month information link will also insert current month data. Hence, automation job should always be scheduled after scheduled updates has cache the analysis for that day.
    User-added image
Note: If you would like to refresh previous months data from database, Historical information link created in step 1 needs to be refreshed and exported to SBDF file in step 2
Conclusion:
When new month starts, your current month information link will fetch new month data and Historical data SBDF will now have previous months data cached as well. So the dataset in analysis will have merged data i.e. current as well as Historical data.

Considerations:

  • If you are using Microsoft SQL as the TIBCO Spotfire Server database, then there will be a 2GB limit on the size of any content saved to the library which is imposed by the database itself. This includes dxp analysis files saved with embedded data and SBDF files saved or exported to the library. See KB 000028303 Size limit of items stored in the TIBCO Spotfire Library for more details.

Additional Information

Doc: Creating Automation Jobs KB: 000021541 How to schedule an Automation Services job using the Microsoft Windows Task Scheduler KB 000028303 Size limit of items stored in the TIBCO Spotfire Library Doc: Change Data Table storing type Doc: Creating Information Link