How to fetch public data from Amazon S3 bucket into Statistica?

How to fetch public data from Amazon S3 bucket into Statistica?

book

Article ID: KB0077833

calendar_today

Updated On:

Products Versions
Spotfire Statistica -

Description

How to fetch public dataset from Amazon S3 bucket into Statistica?

Issue/Introduction

How to fetch public data from Amazon S3 bucket into Statistica?

Resolution

Statistica has an example Python node that connects to data in public Amazon S3 buckets. To get to this node, you may click on New>>Workspace and in the Workspace Template, choose "Get Data"  and then use the Amazon S3 Data node as shown below:

User-added image

Alternatively, the new workspace can be opened with a blank template and click cancel for selecting the data source. Open the node browser and select  a folder to import the node into and then select  "Import New Node"  and browse to [Statistica Installation Folder]\DataMiner\S3DataImport.DMI to import the node to the Node browser permanently. Following this procedure, the node will exist in the folder imported for the user for next usage in the node browser.In other words, the node has to be imported only once per user,per machine.  Double click on the node in node browser to add it to the workspace.

User-added image

Click on the gear icon on the top left corner of the node to edit parameters of the node. Add the public URL of the Amazon S3 dataset. In this example, we used the example dataset : https://s3.amazonaws.com/aml-sample-data/banking.csv.

User-added image

The node turns white after it retrieves the data. The results of the run can be viewed in the reporting document. The spreadsheet with the data can be viewed by clicking the spreadsheet icon on the bottom right of the node. The data can be used downstream in the workspace with other analytic nodes in Statistica to perform analysis of choice. 

NOTE:  The node uses IronPython code. This may be viewed by clicking on Parameters of the node (gear icon on top left corner of the node) and then clicking Code or by right clicking on the node and selecting Edit code.  If the node needs to access private Amazon S3 data, the code and the parameters can be modified accordingly to accommodate the same. Alternatively, private S3 data maybe pushed into Amazon Redshift and queried further. Reference articles are linked below.



 

Additional Information

Amazon S3 to Redshift
How to query data from Amazon Redshift?
Python-AWS code examples