How to retreive data from Spark and use it for further Spark analyses?

How to retreive data from Spark and use it for further Spark analyses?

book

Article ID: KB0080657

calendar_today

Updated On:

Products Versions
Spotfire Statistica 13.3.1 and later versions

Description

PREREQUISITES:
URL of  Livy Spark installed in the Network
Path of file in Livy Spark to be retrieved

Issue/Introduction

How to connect to and retrieve data from Spark Livy and use it for further spark analyses ?

Resolution

1. Launch Statistica and click on File >> Options or with an active spreadsheet open Tools>>Options

    User-added image

2.  Click on Server/Web and select Use custom Spark Livy Server and update the URL to point to the Spark Livy server on the network. On Statistica 13.4 and later versions, you may also enter custom session configurations.

   User-added image
 3.  Open a workspace and insert the Spark Data node by typing Spark data in the Feature finder or from Node Browser >>Big Data Analytics >>Hadoop>>Spark

   User-added image

4.  Click on the gear icon on the top left corner of the node to open Spark Data Node Parameters and type the path to the file on the livy server and the file type.
    User-added image
5. Click on Options of and uncheck "Requires input" if the data source is from Spark Livy server. Click OK .
 
  User-added image

 6. Click on Run button in the bottom left of the Spark data node to bring back the data.Spark data will be brought back into the workspace via a Spark session.
   User-added image
  Other Spark nodes can be connected downstream for further analyses. Workspace examples for Spark nodes can be found in the Statistica Examples directory. Click on Open> Open Examples and select the workspaces directory.

   User-added image

    Refer to the following Example Workspaces that uses Spark Nodes.
  • Example_SparkFeatureSelection.sdm
  • Example_SparkModelComparison.sdm
  • Example_SparkRegression.sdm
  • Example_SparkTrees.sdm
 
 

Additional Information

http://support.tibco.com/s/article/Livy-Versions-supported-with-Spark-Node-in-Statistica-13-3