Performance difference between Python data functions returning Pandas DataFrame vs Python list in Spotfire

Performance difference between Python data functions returning Pandas DataFrame vs Python list in Spotfire

book

Article ID: KB0138073

calendar_today

Updated On:

Products Versions
Spotfire 12.0 and Higher

Description

In Spotfire, Python data functions can return results in various formats. A noticeable performance difference can occur depending on whether the function returns a Pandas DataFrame or a plain Python list.

This behavior is due to how Spotfire handles data serialization and data transfer between the Python environment and the Spotfire environment. When results are returned as a Pandas DataFrame, Spotfire efficiently serializes and transfers the data since the structure and data types are predefined and optimized for columnar operations.

In contrast, when returning a Python list, Spotfire must infer the data structure and process each element individually, which introduces significant overhead, especially when dealing with large datasets.

 

Environment

All

Resolution

To improve performance for Python data functions in Spotfire:

  1. Return results as a Pandas DataFrame whenever possible.

    import random
    import pandas as pd
    rand = pd.DataFrame([random.normalvariate(mu=0, sigma=1) for n in range(n)], columns=['rand'])
  2. Avoid returning plain Python lists for large datasets.

  3. Optionally use NumPy for faster, vectorized data generation.

Using these approaches helps minimize serialization overhead and ensures more efficient data transfer between Python and Spotfire.

Issue/Introduction

When using Python data functions in Spotfire, returning results as a Pandas DataFrame significantly improves performance compared to returning a plain Python list.

 

Additional Information

Doc: Spotfire Service for Python Installation and Administration

Doc: Python Data Functions in Spotfire