This page explains how to install the DC/OS Apache Spark service.
Prerequisites
-
DC/OS and DC/OS CLI installed with a minimum of three agent nodes, with 8 GB of memory and 10 GB of disk space
-
Depending on your security mode, Spark requires service authentication for access to DC/OS. See Provisioning a service account for more information.
Security mode Service account Disabled Not available Permissive Optional Strict Required
-
Install the Spark package. This may take a few minutes. This step installs the Spark DC/OS service, Spark CLI, dispatcher, and, optionally, the history server. See the History Server section for information about how to install the history server.
dcos package install spark
Expected output:
Installing Marathon app for package [spark] version [2.12.0-3.0.1] Installing CLI subcommand for package [spark] version [2.12.0-3.0.1] New command available: dcos spark DC/OS Spark is being installed! Documentation: /mesosphere/dcos/services/spark/ Issues: https://docs.mesosphere.com/support/
-
Run the sample SparkPi jar for DC/OS.
You can view the example source here.
-
Use the following command to run a Spark job which calculates the value of Pi.
dcos spark run --submit-args="--class org.apache.spark.examples.SparkPi https://downloads.mesosphere.com/spark/assets/spark-examples_2.12-3.0.1.jar 30"
Expected output:
Using image 'mesosphere/spark:2.12.0-3.0.1-scala-2.12-hadoop-3.2' for the driver and the executors (from dispatcher: container.docker.image). To disable this image on executors, set spark.mesos.executor.docker.forcePullImage=false Run job succeeded. Submission id: driver-20200729203326-0001
-
View the standard output from your job:
dcos spark log --completed driver-20200729203326-0001
Expected output should contain:
Pi is roughly 3.1424917141639046
-
-
Run a Python SparkPi jar. You can view the example source here.
-
Use the following command to run a Python Spark job which calculates the value of Pi.
dcos spark run --submit-args="https://downloads.mesosphere.com/spark/examples/pi.py 30"
Expected output:
Parsing application as Python job Using image 'mesosphere/spark:2.12.0-3.0.1-scala-2.12-hadoop-3.2' for the driver and the executors (from dispatcher: container.docker.image). To disable this image on executors, set spark.mesos.executor.docker.forcePullImage=false Run job succeeded. Submission id: driver-20200729203704-0002
-
View the standard output from your job:
dcos spark log --completed driver-20200729203704-0002
Expected output should contain:
Pi is roughly 3.139508
-
-
Run an R job. You can view the example source here.
-
Use the following command to run an R job.
dcos spark run --submit-args="https://downloads.mesosphere.com/spark/examples/dataframe.R"
Expected output:
Parsing application as R job Using image 'mesosphere/spark:2.12.0-3.0.1-scala-2.12-hadoop-3.2' for the driver and the executors (from dispatcher: container.docker.image). To disable this image on executors, set spark.mesos.executor.docker.forcePullImage=false Run job succeeded. Submission id: driver-20200729204249-0003
-
Use the following command to view the standard output from your job.
dcos spark log --completed --lines_count=10 driver-20170824224524-0003
Expected output:
In Filter(nzchar, unlist(strsplit(input, ",|\\s"))) : bytecode version mismatch; using eval root |-- name: string (nullable = true) |-- age: double (nullable = true) root |-- age: long (nullable = true) |-- name: string (nullable = true) name 1 Justin
-
Next steps
- To view the status of your job, run the
dcos spark webui
command then visit the Spark cluster dispatcher UI athttp://<dcos-url>/service/spark/
. - To view the logs, see the documentation for Mesosphere DC/OS monitoring.
- To view details about your Spark job, run the
dcos task log --completed <submissionId>
ordcos spark log --completed <submissionId>
command.