How to connect to HDFS Server and fetch data in Statistica

How to connect to HDFS Server and fetch data in Statistica

book

Article ID: KB0082703

calendar_today

Updated On:

Products Versions
Spotfire Statistica 13.0 and later versions

Description

Article details how connections can be made to HDFS Server and how data can be fetched to Statistica and exported from Statistica to HDFS.

PRE-REQUISITES:
The URL to HDFS server and the credentials
Appropriate permissions in Enterpprise manager
 

Issue/Introduction

This article details how to connect to HDFS Server and import and export data to HDFS into/from TIBCO Statistica

Environment

Windows

Resolution

1. Launch Enterprise Manager as a user who has effective Permissions : Database Admin (EXTDB_ADM)
2. Right Click on "Hadoop Distributed File Systems" and select New HDFS Server
     User-added image

3. Name the connection appropriately and enter username for the HDFS Server.
4. Click on Test connection to ensure the server is a HDFS Server
       User-added image
5. Define access permissions for the connection and click Commit to save the HDFS Server
6. Launch Statistica and open a new workspace
7. Use the HDFS Import Text or HDFS Export Text  to either read from or write data to HDFS.

Here is a simple example : 

1. In a new workspace, insert the HDFS Import Text node by either typing in Feature Finder or by adding it from Data | Manage(section) | External Data | HDFS Import Text

           User-added image

2. Click on the cog wheel in the top left hand corner of the node to open the parameters of the node. 
        User-added image
3. Define options such as what delimiter is used for the file and other options such as if Variable names are on the first line, if blank lines should be skipped etc.
4.  Click on File to Open HDFS Browser and select the HDFS Server from Servers listed , browse to the file you want to import from HDFS Server
    User-added image
5. Preview the file and click OK
     User-added image
5 . Run the node by clicking on the green button on the bottom left corner of the node. 
6.  Add other analytic nodes as needed (Optionally)
7.  Insert HDFS Export node and open the parameters by clicking on cog wheel in the top left hand corner of the node
     User-added image
8. Select the server and click on File and choose the directory and filename to output the file
9. Run the node/ workspace by clicking Run All on the top corner of the workspace.
      User-added image