Statistica Python node function "ActiveDataSet" gives _ctypes.COMError when reading large input data to Pandas data frame

Statistica Python node function "ActiveDataSet" gives _ctypes.COMError when reading large input data to Pandas data frame

book

Article ID: KB0077055

calendar_today

Updated On:

Products Versions
Spotfire Statistica 13.5

Description

In Statistica Python scripting node, "ActiveDataSet" function can be used to read an upstream input spreadsheet to a Pandas data frame, for example "trainDS = ActiveDataSet['trainData']" or "trainDS = ActiveDataSet[0]".

User-added image

When reading a large input spreadsheet (e.g. 30000v * 2000c) with ActiveDataSet function to a Pandas dataframe, the python node execution gives below error message:

_ctypes.COMError: (-2147024882,'Not enough storage is available to complete this operation.')

 User-added image

This error happens to Python version 3.5.2, 3.6.3, 3.7.3 and 3.7.4. 

Issue/Introduction

Statistica Python node function "ActiveDataSet" gives _ctypes.COMError 'Not enough storage is available to complete this operation' when reading large dataset into Pandas data frame.

Environment

Windows

Resolution

For large input dataset, the user could use below workaround to avoid above issue: 
1. First export the input spreadsheet data into a CSV file with comma as separator through Statistica "Export Data" node

User-added image

2. Then read the CSV file from local directory into a Pandas data frame through "pandas.read_csv" function in Python Node. For example,

import pandas as pd
#data = pd.read_csv("local CSV file path"), for example
data = pd.read_csv("C:/Users/lizliu.APACANALYTICS/Desktop/AllColumn4Test.csv")

User-added image