Configure and Enable Spark Job History

Configure and Enable Spark Job History

book

Article ID: KB0082629

calendar_today

Updated On:

Products Versions
Spotfire Data Science 6.2+

Description

Configure and Enable Spark Job History

Issue/Introduction

Configure and Enable Spark Job History

Resolution

Configure and Enable Spark Job History

Note: Applicable to Spotfire Data Science Versions 6.2 and up.

For CDH, you must have the Spark History Server Service installed on your cluster.  

You may find the values to these paramters in your spark-defaults.conf file.  By default, it is located here:  /etc/spark/conf/spark-defaults.conf

Add these parameters to Data Source -> additional parameters:

spark.yarn.historyServer.address=10.10.3.217:18088
spark.eventLog.enabled=true
spark.eventLog.dir=hdfs://cdhsn.alpinedata.com:8020/user/spark/applicationHistory

To access Spark History Server UI, you must use port 18088 (default port)
cdhsn.alpinedata.com:18088

Note: replace the above mentioned IP and hostname with your corresponding values.

To allow the Spark Job History Server to capture the Spark job,  we must disable Spark autotuning at the operator level and add the parameter below.

spark.eventLog.enabled=true