DC/OS Apache Spark includes the Spark History Server. Because the history server requires HDFS, you must explicitly enable it.
Installing HDFS
-
Install HDFS:
dcos package install hdfs
-
Create a history HDFS directory (default is
/history
). SSH into your cluster and run:docker run -it mesosphere/hdfs-client:1.0.0-2.6.0 bash ./bin/hdfs dfs -mkdir /history
-
Create
spark-history-options.json
:{ "service": { "hdfs-config-url": "http://api.hdfs.marathon.l4lb.thisdcos.directory/v1/endpoints" } }
Installing Spark history server
-
To install the Spark history server:
dcos package install spark-history --options=spark-history-options.json
-
Create
spark-dispatcher-options.json
:{ "service": { "spark-history-server-url": "http://<dcos_url>/service/spark-history" }, "hdfs": { "config-url": "http://api.hdfs.marathon.l4lb.thisdcos.directory/v1/endpoints" } }
-
Install the Spark dispatcher:
dcos package install spark --options=spark-dispatcher-options.json
-
Run jobs with the event log enabled:
dcos spark run --submit-args="--conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs://hdfs/history ... --class MySampleClass http://external.website/mysparkapp.jar"
Confirm history server installation
View your job in the dispatcher at http://<dcos_url>/service/spark/
. The information displayed includes a link to the history server entry for that job.