book
Article ID: KB0080657
calendar_today
Updated On:
Description
PREREQUISITES: URL of
Livy Spark installed in the Network Path of file in Livy Spark to be retrieved
Resolution
1. Launch Statistica and click on File >> Options or with an active spreadsheet open Tools>>Options
2. Click on Server/Web and select Use custom Spark Livy Server and update the URL to point to the Spark Livy server on the network. On Statistica 13.4 and later versions, you may also enter custom session configurations.
3. Open a workspace and insert the Spark Data node by typing Spark data in the Feature finder or from Node Browser >>Big Data Analytics >>Hadoop>>Spark
4. Click on the gear icon on the top left corner of the node to open Spark Data Node Parameters and type the path to the file on the livy server and the file type.
5. Click on Options of and uncheck "Requires input" if the data source is from Spark Livy server. Click OK .
6. Click on Run button in the bottom left of the Spark data node to bring back the data.Spark data will be brought back into the workspace via a Spark session.
Other Spark nodes can be connected downstream for further analyses. Workspace examples for Spark nodes can be found in the Statistica Examples directory. Click on Open> Open Examples and select the workspaces directory.
Refer to the following Example Workspaces that uses Spark Nodes.
- Example_SparkFeatureSelection.sdm
- Example_SparkModelComparison.sdm
- Example_SparkRegression.sdm
- Example_SparkTrees.sdm
Issue/Introduction
How to connect to and retrieve data from Spark Livy and use it for further spark analyses ?