Troubleshooting Performance issues on TIBCO Spotfire Web Player

Troubleshooting Performance issues on TIBCO Spotfire Web Player

book

Article ID: KB0076345

calendar_today

Updated On:

Products Versions
Spotfire Analyst All Versions

Description

Performance issues are generally identified either through comparison with TIBCO Spotfire Analyst, with previous TIBCO Spotfire versions, or with past behavior on the same TIBCO Spotfire Web Player. When identifying performance issues it is important to have realistic expectations and ensure any comparisons are accurate. For example, if you are comparing performance between 2 different environments: do the environments have the same resources? Is the exact same data accessed? Are there the same number of users on the system the same? Etc. Any differences like this can have an effect on performance.

This article breaks out the investigation of performance issues in two sections:
  • Issues that are seen across the environment (with almost all analyses)
  • Issues that are seen with only a specific analysis.
It is important to be specific when identifying what exactly is slow as that will determine what is involved in that action. For example, the time to initially load a dxp file, the time to change tabs, time to finish report update after triggering an on-demand data refresh, will all involve different components and require slightly different investigations.

Notes:
  • In this article, the TIBCO Spotfire Server (TSS) will be denoted as "TSS" and TIBCO Spotfire Node Manager (TSNM) will be denoted as "TSNM".
  • For useful general background on properly scaling the TIBCO Spotfire Server and Web Player for your environment and usage see Dr. Spotfire Session: Scaling Spotfire Server and Web Player.

Issue/Introduction

This article shows basic troubleshooting steps to identify performance issues seen with analysis on TIBCO Spotfire Web Player

Resolution

Here are a number of steps to check when investigating performance issues:

1) Performance issues seen across the TIBCO Spotfire Web Player environment:

In this situation, the slowness may be seen with any analysis file and effects the entire system.

Update software:

Verify network connectivity:

  • Ensure the network communication between the TSS and TSNM machines is adequate. This should be checked through telnet command when the issue occurs.
    • From impacted TSNM machine to TSS machine:
      • Telnet "TSS hostname" TSS back-end communication port (Usually 9443)
      • Telnet "TSS hostname" TSS back-end registration port (Usually 9080)
    • From TSS machine to impacted TSNM machine:
      • Telnet "impacted TSNM hostname" TSNM communication port (Usually 9443)
      • Telnet "impacted TSNM hostname" TSNM registration port (Usually 9080)
      • Telnet "impacted TSNM hostname" Web Player Service communication port (Usually 9501)
See more in Ports and Firewall. If there is a network issue, some errors as mentioned in this article " How to determine when the error "The analysis could not be opened because no web player services are available. Contact your Spotfire administrator " is due to network connection issue" can be seen in logs.

Resource Usage:

Check for resource usage on the Node Manager where the problematic Web Player is installed. An analysis file in a given use case consumes basically two types of resources: memory and CPU. There are other metrics that affect performance, such as input/output (IO) and network speed, but memory and CPU are the first major items to consider when defining a hardware setup.

CPU usage is very high:

If the level of CPU use is very high, the response times will typically become longer for the end user but the system will still work. Note: It is not unexpected for the CPU usage to spike when intensive operations are being performed (i.e calculating a large cross table) and the CPU usage should return to normal afterwards. If the level is high for sustained periods of time and chronically exhausted, consider scaling up the number of CPUs, speed of the CPUs, or adding more Web Player computers. For additional information, see Performance troubleshooting.
 
Check the following metrics at the time of CPU usage spikes to see if the increased resource utilization is just explained by increase in usage:

Memory usage is very high:

Memory utilization is more important and also more complicated. If the system runs out of memory and the system is not able to page out enough data, the system may stop functioning. There are two types of memory involved:
  • .NET memory
  • data engine memory
Spotfire will page out data engine memory, if needed, to free up memory. In many cases this works very well because analysis files typically contain a lot of data that is not constantly needed. However, if the same data is rapidly and repeatedly paged out and in, the response times will be poor since the performance will be dependent on the disk I/O speeds which are much slower than RAM speeds. See the "Data engine paged in memory" counter (See Web Player service performance counters for more detail) to see if data is chronically being paged in, as that can negatively effect performance.

One part of the data engine memory is the query cache that speeds up the system by reusing calculations. The query cache will typically be paged out early when system is low on RAM.

The .NET memory is used to build up the document, so complex analyses with many pages, plots, columns, etc  will use more .NET memory. This memory is handled by .NET, which means that data that has been released may still be left in the heap, and there may be free memory left in the process if the system thinks that it may be used later. The internal memory handling is done by garbage collection (GC), which frees up chunks of memory that are no longer required. Some GCs are quick, and others are more costly and may block the system for several seconds. For more information, see Web Player Paging Behavior for more information.

If the RAM is chronically exhausted and paging to disk continues to occur then this can negatively effect performance and you should look at scaling up your system (add additional RAM, or add additional Node Manager) to accommodate the load on your system.

Additional checks to perform to help diagnose the cause of slowness:

  • If there is a load balancer in front of the TIBCO Spotfire Server, open the dxp on the Web Player using the direct TSS URL instead of load balanced URL to see if there is any issue in load balanced requests.
  • Check the behavior with the default deployment packages if any custom package/extensions are deployed in the deployment area used by the Web Player
  • Try to load balance the environment through routing specific analysis occupying more space to specific nodes only. For more information, see How to configure a specific set of analyses to route only to a dedicated Node Manager (without allowing any other analyses to be opened on that Node Manager)
  • Check for driver version used for TSS database. If there is driver version mismatch and previous driver version was providing better performance, try updating the driver.
If performance issues are across the environment, comparisons can be done with same TIBCO Spotfire Web Player version across machines like UAT and production environment with multiple analyses

2) Performance issue seen on TIBCO Spotfire Web Player with specific dxps only:

In this situation, the slowness is only seen with a particular analysis file (or set of similar analysis files) but other analysis files do not have any slowness.

Initial checks to perform:

  • Ensure the comparison is done using the same analysis file in the TIBCO Spotfire Analyst or previous Spotfire versions i.e. ensure the same data is fetched from the same database, same dxp design with calculated columns, data tables, scripts, etc.
  • If any custom extension or visualization is present in the dxp, test the performance with just the core Spotfire modules (latest hotfix) to isolate any issues related to custom code.
  • If analysis file on a TIBCO Spotfire Analyst on a specific machine shows better performance than a Node Manager machine (where the Web Player is installed), compare the CPU clock speeds between the machines. A client machine with higher CPU speed will be able to perform single-threaded action faster than a server machine with slower CPU.
  • If there is any RAM exhaustion (paging occurring), CPU exhaustion, or high user load observed during performance issues on the Web Player, then try to increase RAM or CPU respectively. When testing, ensure there is no other activity on the Node Manager to help remove system resources as an issue.

Enable Performance Counter logging and Export resource usage analysis:

  1. Go to Monitoring & Diagnostics on Spotfire Server
  2. On web player instances, click on (...) Enable monitoring logging to capture logs in debug mode
  3. Once the behavior is replicated, logs are captured, and you can export monitoring logs and analysis by clicking on (...) in Web Player Instances section
  4. This exported folder will have web server resource monitoring.dxp which would be helpful to get RAM, CPU spikes, along with opened documents, # of users at that time
See the following references for more information:

Determine exactly which action is slow:

If none of the above tests show any obvious issue, identify exactly where the issue lies:
  • Initial loading of analysis i.e. Opening Analysis on Web Player
 OR
  • Operations after loading of analysis as below:
    • Switching tabs
    • Rendering Visualizations especially Map Chart
    • On demand table loading
    • Script/Data function execution
    • Changing document properties
    • Data table refresh i.e. transformation, insert rows/columns etc.
    • Applying marking/filtering

Initial loading of analysis:

The issue falls in this category when the analysis takes more time while caching in scheduled updates than usual, cached analysis takes time to open up, or analysis takes time to load on web player when it is not cached.
 

1. Opening/Loading analysis which is not cached/cached using scheduled updates:

  • Check if the analysis contains any personalized data which refreshes on initial load of dxp (Check if any data table has "Reload data for each user" checked in Data Table Properties > Scheduled Updates tab on TIBCO Spotfire Analyst client) If any data table is getting reloaded for each user, un-check the data tables not required to be reloaded for each user:
  • User-added image
  • Check if there is any on-demand data table which is set to refresh on initial load of dxp based on any insert row/column using personalized data
  • Check if the analysis contains any data function/iron python/JavaScript set to refresh on initial load of dxp
  • Check how much time data loading takes from data source i.e. from information link, excel, data connection etc. when dxp is not cached from the logs

For an Information link:

From Spotfire Server's sql.log:
 2019-04-02T12:47:24,031-0700 [*pool-13-thread-1, abc, #48, #79834*]: Executing Information Link 'Customer_IL' (b2bdc9aa-d002-4243-8bc9-6faa60bf8c0e) 2019-04-02T12:47:25,359-0700 [*pool-13-thread-2, abc, #48, #79834*]: Information Link 'Customer_IL' (b2bdc9aa-d002-4243-8bc9-6faa60bf8c0e) executed successfully, time: 1.328 seconds 2019-04-02T12:47:25,546-0700 [*Thread-41, abc, #48, #79834*]: Information Link 'Customer_IL' (b2bdc9aa-d002-4243-8bc9-6faa60bf8c0e) job closed, time: 1.515 seconds
So query was sent to database at 2019-04-02T12:47:24, query execution took 1.328 sec on database and then result was provided to webplayer by spotfire server. Complete job took 1.515 seconds

From the Spotfire Server's server.log:
 DEBUG 2019-04-02T12:47:24,031-0700 [abc, #48, #79834] ws.dat.Job: Scheduling new com.spotfire.ws.dat.Job$OpenTask for job 858d6a27-46d2-4d1a-a11a-251c0cfb635d DEBUG 2019-04-02T12:47:24,046-0700 [*pool-13-thread-1, abc, #48, #79834*] ds.sql.JDBCDataSourceManager: Retrieving a connection from a connection pool Postgres . . . DEBUG 2019-04-02T12:47:25,421-0700 [*pool-13-thread-2, abc, #48, #79834*] ws.dat.Job$GetDataTask: Job finished, no more data blocks to retrieve DEBUG 2019-04-02T12:47:25,421-0700 [abc, #48, #79843] attachment.web.AttachmentServletRequestHandler: Writing attachment 68ec72d6-98dc-43b0-9215-a2af0c4a369a DEBUG 2019-04-02T12:47:25,437-0700 [abc, #48, #79843] attachment.web.AttachmentServletRequestHandler: Wrote attachment 68ec72d6-98dc-43b0-9215-a2af0c4a369a (26217 bytes) in 16 ms DEBUG 2019-04-02T12:47:25,530-0700 [abc, #48, #79844] api.query.InformationLinkCache: Failed to close cache job '858d6a27-46d2-4d1a-a11a-251c0cfb635d' (it likely isn't a cache job) DEBUG 2019-04-02T12:47:25,765-0700 [*Thread-41, abc, #48, #79834*] ws.dat.Job$CloseTask: Waking up any waiting GetDataTask DEBUG 2019-04-02T12:47:26,218-0700 [monitoring@SPOTFIRESYSTEM, #5, #79848] web.controller.NodeManagerController: Received message from WorkerHost [serviceId=4c72cfcf-a4df-4b42-8d88-91116abc6023, uri=https://10.10.10.10:9501/4c72cfcf-a4df-4b42-8d88-91116abc6023/, WebPlayerInstanceId=2216154e96uKMs].
From the Web Player's Spotfire.Dxp.Worker.Host.Debug log:
 DEBUG;2019-04-02T12:47:24,241-07:00;2019-04-02 20:53:32,241;4c72cfcf-a4df-4b42-8d88-91116abc6023;2216154e96uKMs;WorkThread 99;abc WAT 10;Spotfire.Dxp.Data.DataTable;"Setting refreshing on table Customer_IL to True." . . . DEBUG;2019-04-02T12:47:25,522-07:00;2019-04-02 20:53:33,522;4c72cfcf-a4df-4b42-8d88-91116abc6023;2216154e96uKMs;WorkThread 99;abc WAT 10;Spotfire.Dxp.Data.ColumnFactory;"Imported 149 rows, 8 columns." DEBUG;2019-04-02T12:47:25,522-07:00;2019-04-02 20:53:33,522;4c72cfcf-a4df-4b42-8d88-91116abc6023;2216154e96uKMs;108;abc;Spotfire.Dxp.Framework.Utilities.ServerLoggerManager;"Server log manager did not log Audit since category is disabled: 2019-04-02T13:53:33,522-07:00, 10.98.10.37, datasource_wp, execute, True, , /Customer_IL, Customer_IL, 1. Select File > Add Data Tables... 	Type: Information Link 	Path: /Customer_IL 	Id: b2bdc9aa-d002-4243-8bc9-6faa60bf8c0e , 1013, 149, 4c72cfcf-a4df-4b42-8d88-91116abc6023." DEBUG;2019-04-02T12:47:25,522-07:00;2019-04-02 20:53:33,522;4c72cfcf-a4df-4b42-8d88-91116abc6023;2216154e96uKMs;WorkThread 99;abc WAT 10;Spotfire.Dxp.Data.Persistence.DataPool;"Finished loading data for producer with key edcd5987-869a-4482-9b84-39d58b41119f." . . . DEBUG;2019-04-02T12:47:25,241-07:00;2019-04-02 20:53:32,241;4c72cfcf-a4df-4b42-8d88-91116abc6023;2216154e96uKMs;WorkThread 99;abc WAT 10;Spotfire.Dxp.Data.DataTable;"Setting refreshing on table Customer_IL to False."
 

For Data connection:

From the Web Player's Spotfire.Dxp.Worker.Host.Debug log, the time taken to execute on database:
 DEBUG;2019-04-02T12:47:04,929-07:00;2019-04-02 19:47:04,929;4c72cfcf-a4df-4b42-8d88-91116abc6023;2216154e96uKMs;46;abc;Spotfire.Dxp.Data.Adapters.SqlServer.SqlServerAdapterConnection;"SqlServer query completed in 9642 ms. Retrieved 5 columns and 5037 rows."
The time taken to complete data table execution job:
 DEBUG;2019-04-02T12:47:04,929-07:00;2019-04-02 19:47:04,929;4c72cfcf-a4df-4b42-8d88-91116abc6023;2216154e96uKMs;46;abc;Spotfire.Dxp.Data.Access.QueryExecution.ExternalQueryExecutionUtilities;"25FD8ED6-5C19-4515-BC7B-77E9CA9A7C18 Execution Time 10172 ms"
 

2. Analysis taking more time while caching in scheduled updates:

  • Check when the request was sent to the Node Manager from the Spotfire server and when the Node Manager actually started caching the dxp.
    • From the Spotfire Server's server.log:
       DEBUG 2019-04-01T06:10:04,424+0800 [SPOTFIRESYSTEM\monitoring, #24, #5017205] scheduler.service.RoutingTableUtil: Analysis for libItemId a490b751-7f71-402c-9fc7-d48efd36a309 is loaded on these destinations: 8d64ba0a-c413-4dd6-861f-0a8d5e5847ad DEBUG 2019-04-01T06:10:04,424+0800 [SPOTFIRESYSTEM\monitoring, #24, #5017205] scheduler.service.RoutingTableUtil: There are no other available destinations to load the analysis for libItemId a490b751-7f71-402c-9fc7-d48efd36a30
    • From the Spotfire Server's sql.log:
       019-04-01T06:48:58,600+0800 [*Thread-233821, SPOTFIRESYSTEM\scheduledupdates, #30, #5050885*]: Information Link 'abc - Information link' (5907e276-abc3-41c7-b27d-0848d953ef17) job closed, time: 67 seconds   2019-04-01T06:47:41,161+0800 [*Thread-233795, SPOTFIRESYSTEM\scheduledupdates, #30, #5050110*]: Information Link 'abc - Information link' (5907e276-abc3-41c7-b27d-0848d953ef17) job closed, time: 56 seconds
    • There are no logs in sql.log around 6:10 for this dxp, instead the first sql execution is seen at 06:47
  • Check how many concurrent updates are set (See How to change the Scheduled Updates "concurrentUpdates" parameter in Spotfire Web Player 7.5 and higher for more details)
  • For environments with multiple Web Players and Node Managers, try and schedule the report to load on a different Web Player or Node Manager to compare loading times.
  • Check the TIBCO Spotfire Server sql.log files for query execution and job closing times. If there seems to be an issue with information link performance, see Troubleshooting Guide: Performance issues with Information Links for more details.
  • If the data loading is through data connection, check for the data table refresh times in Web Player's Spotfire.Dxp.Worker.Host.Debug logs in debug mode as mentioned above
  • Check for network issues during that caching window: Was the connection dropping/appearing again? Are there any timeouts seen in TSS or Spotfire.Dxp.Worker.Host.Debug logs causing delays in caching?
  • If disk caching is enabled, make sure the communication between the Node Manager machine and the shared drive where the cache is stored is quick otherwise it will take time to cache the analysis file since it needs to be cached on disk as well. Try disabling the disk cache and see if there is any improvement in overall cache time; if there is a huge improvement in loading time, then either disk caching or communication between machines is the bottleneck.

Operations after loading of analysis:

The following are items to check when investigating these potential delays that may be observed after the analysis is loaded on Web Player:

1. Switching tabs

  • Check for any JavaScripts present in Text Area which is getting refreshed or any JavaScript triggering Iron Python script on tab load etc.
  • If the tab navigation is done through any script, remove the script and try to do direct navigation to that tab. If direct navigation is fast, then the script needs to be optimized to get similar performance
  • Check if script trust is taking time (when tab switch occurs, scripts trust approval check is made on destination tab containing scripts in action control items)
  • If there is any custom visualization present, try to remove it and check the behavior
  • If there is a lot of complex bootstrap, HTML etc. is used in a Text Area, first remove that Text Area and then test switch to that tab. If it works, then complex code needs to be optimized
  • If there are images in a Text Area which are used from a network drive, try to embed them in the Text Area as it may take a few seconds to render that image

2. Rendering Visualizations, especially Map Chart

  • Check for resource utilization (RAM, CPU) for any bottlenecks on Node Manager machine), and the # Concurrent Users during the time issue is observed
  • Check the performance on the TIBCO Spotfire Analyst client installed on the the same Node Manager machine to see if the Analyst client and the Web Player on the Node Manager show the same performance trend. If it shows the same slow performance, you can compare the resources/CPU processor/CPU clock speed to the fast performing machine and accordingly can make relevant upgrades to the Node Manager machine
  • Check for complex custom expressions. Try to simplify those by using calculated columns wherever there is static calculation i.e. not necessary to get recalculated on filter/marking/document property change.
  • Check for data relations/filters taking time, filtering on huge amount of data due to which rendering is occurring. If so, try to remove unwanted columns from your data table and you can also create data table relation on only one column of type Integer to improve the performance. See Working with large data volumes for more details.
  • If the Map chart takes time rendering, check if there is any WMS layer and connectivity of that WMS URL from Node Manager machine. If marker by column is used, check how many unique values are plotted. Because Marker by column will further split the X&Y plotted data points based on the column used i.e. if X&Y are latitude/longitude for state, marker by column is used as well_id which may be in millions, so instead try to plot marker by column as well_location or city which will have less amount of unique values not in millions. Using column having 10,000 unique values will perform better than using column having millions of unique values.
  • In case of Scatter Plot rendering issue, when there are millions of records, try to use aggregated data for the plot instead of plotting unaggregated data as the Web Player does not use hardware acceleration like the TIBCO Spotfire Analyst client.
  • If marking shows delay in loading data in visualization: Check if the marking triggers any data function which is being executed on an external TIBCO Spotfire Statistics Services (TSSS) server. Check if the relations applied between the two tables is using single column or multiple columns
  • Try to aggregate the data to higher dimensional grain with required columns in visualization, filter using pivot transformation, then use the transformed data table in visualization
  • If you have data in millions and have used Uniquecount() on String column, try to see if Uniquecount() can be used for Integer column instead. Many times performance is better with Integer column usage in calculations.

3. On-demand table loading

  • Check the TIBCO Spotfire Server's sql.log files for how much time it takes on the database to retrieve the records and if Information Link performance is the bottleneck, assuming the same set of records is pulled in on demand table on Spotfire Analyst. For more information, see Troubleshooting Guide: Performance issues with Information Links for more details.
  • Check for any data transformation, insert rows/columns getting executed in any other data table when refreshing on demand data table
  • Check how many calculated columns are getting refreshed on refreshing on demand data table which may account for the delay in populating results to visualization
  • Check for any scripts/data functions getting refreshed based on input values from on demand data table

4. Script/Data function execution

  • Check if there is any network/communication issue between TIBCO Spotfire Server and the TIBCO Spotfire Statistics Services (TSSS) server
  • Check the performance with each statement of the script to see if any particular statement accounts for issue
  • Check if the data function updates a document property which in turn triggers any Iron Python script or updates calculated columns

5. Changing document properties

  • Check for any data function/Iron Python/JavaScript execution on change of document property value
  • Check for any document properties used in calculated columns such that when a property value changes, calculated columns will be recalculated and if there is any underlying data transformations based on those calculated columns, then those data tables/inserted rows/inserted columns/data transformations will also be refreshed
  • Check for document properties in visualization expressions such that a change of property value will cause those visualizations to refresh/render

6. Data table refresh i.e. transformation, insert rows/columns etc.

7. Marking/Filters:

    • If marking any record on a tab takes a lot of time, then check if that marking is used across all the elements in dxp. Each calculated expression is getting recalculated. 
    • When applying filter on column which has unique values in millions, it would show slow performance compared to any column with unique values in thousands. Try to see if any adjustment can be made in data volume or column can be changed. You can also see if column data type can be changed to integer instead of string to yield better performances like zip code values (can be string or integer column)
    • If Listbox filter is showing slow performance, try to change the filter type to Text filter
    • To have better performance with filters, loading of analysis, keep only the filters which are actively required in the filtering scheme used in the Filter panel
    • When using Action Control buttons to reset filters, in the situation where there are 1000+ columns, it is better to reset visible filters only to avoid slow performance experience

    Additional Information

    Dr. Spotfire Session: Scaling Spotfire Server and Web Player Doc: Working with large data volumes

    KB: Tips to improve Dashboard performance 

    KB: Filtering in related table could not be applied

    KB:  How to configure the maximum age of cached calculations using the QueryCacheEntryMaxAge setting

    Doc: Ports and FireWall configuration Doc: Performance Troubleshooting Doc: Web Player service performance counters Doc: Action log reference KB: TIBCO Spotfire Web Player Paging Behavior Doc: Accessing performance data KB: Route specific set of analysis to a dedicated Node Manager KB: Troubleshooting Guide: Perfromance issues with Information Links KB: Best Practices: Information Link joins between different data sources