1. Launch Statistica and click on File >> Options or with an active spreadsheet open Tools>>Options
2. Click on Server/Web and select Use custom Spark Livy Server and update the URL to point to the Spark Livy server on the network. On Statistica 13.4 and later versions, you may also enter custom session configurations.
3. Open a workspace and insert the Spark Data node by typing Spark data in the Feature finder or from Node Browser >>Big Data Analytics >>Hadoop>>Spark
4. Click on the gear icon on the top left corner of the node to open Spark Data Node Parameters and type the path to the file on the livy server and the file type.
5. Click on Options of and uncheck "Requires input" if the data source is from Spark Livy server. Click OK .
6. Click on Run button in the bottom left of the Spark data node to bring back the data.Spark data will be brought back into the workspace via a Spark session.
Other Spark nodes can be connected downstream for further analyses. Workspace examples for Spark nodes can be found in the Statistica Examples directory. Click on Open> Open Examples and select the workspaces directory.
Refer to the following Example Workspaces that uses Spark Nodes.
- Example_SparkFeatureSelection.sdm
- Example_SparkModelComparison.sdm
- Example_SparkRegression.sdm
- Example_SparkTrees.sdm