Statistica Python node function "ActiveDataSet" gives _ctypes.COMError when reading large input data to Pandas data frame

book

Article ID: KB0077055

calendar_today

Updated On:

Products	Versions
Spotfire Statistica	13.5

Description

In Statistica Python scripting node, "ActiveDataSet" function can be used to read an upstream input spreadsheet to a Pandas data frame, for example "trainDS = ActiveDataSet['trainData']" or "trainDS = ActiveDataSet[0]".

User-added image

When reading a large input spreadsheet (e.g. 30000v * 2000c) with ActiveDataSet function to a Pandas dataframe, the python node execution gives below error message:

_ctypes.COMError: (-2147024882,'Not enough storage is available to complete this operation.')

User-added image

This error happens to Python version 3.5.2, 3.6.3, 3.7.3 and 3.7.4.

Environment

Windows

Resolution

For large input dataset, the user could use below workaround to avoid above issue:
1. First export the input spreadsheet data into a CSV file with comma as separator through Statistica "Export Data" node

User-added image

2. Then read the CSV file from local directory into a Pandas data frame through "pandas.read_csv" function in Python Node. For example,

import pandas as pd
#data = pd.read_csv("local CSV file path"), for example
data = pd.read_csv("C:/Users/lizliu.APACANALYTICS/Desktop/AllColumn4Test.csv")

User-added image

Issue/Introduction

Statistica Python node function "ActiveDataSet" gives _ctypes.COMError 'Not enough storage is available to complete this operation' when reading large dataset into Pandas data frame.

Feedback

thumb_up Yes

thumb_down No

Welcome to "KB Articles"