This page explains how to install the DC/OS Apache Spark service.
Prerequisites
- DC/OS and DC/OS CLI installed with a minimum of three agent nodes, with 8 GB of memory and 10 GB of disk space
- Depending on your security mode, Spark requires service authentication for access to DC/OS. See Provisioning a service account for more information.
Security mode | Service account |
---|---|
Disabled | Not available |
Permissive | Optional |
Strict | Required |
-
Install the Spark package. This may take a few minutes. This step installs the Spark DC/OS service, Spark CLI, dispatcher, and, optionally, the history server. See the History Server section for information about how to install the history server.
dcos package install spark
Expected output:
Installing Marathon app for package [spark] version [2.6.0-2.3.2] Installing CLI subcommand for package [spark] version [2.6.0-2.3.2] New command available: dcos spark DC/OS Spark is being installed! Documentation: /mesosphere/dcos/services/spark/ Issues: https://docs.mesosphere.com/support/
<p class="message--note"><strong>NOTES: </strong> Type `dcos spark` to view the Spark CLI options. You can install the Spark CLI using `dcos package install spark --cli`</p>
-
Run the sample SparkPi jar for DC/OS. You can view the example source here.
-
Use the following command to run a Spark job which calculates the value of Pi.
dcos spark run --submit-args="--class org.apache.spark.examples.SparkPi https://downloads.mesosphere.com/spark/assets/spark-examples_2.11-2.0.1.jar 30"
Expected output:
2017/08/24 15:42:07 Using docker image mesosphere/spark:2.6.0-2.3.2-hadoop-2.7 for drivers 2017/08/24 15:42:07 Pulling image mesosphere/spark:2.6.0-2.3.2-hadoop-2.7 for executors, by default. To bypass set spark.mesos.executor.docker.forcePullImage=false 2017/08/24 15:42:07 Setting DCOS_SPACE to /spark Run job succeeded. Submission id: driver-20170824224209-0001
-
View the standard output from your job:
dcos spark log driver-20170824224209-0001
Expected output:
Pi is roughly 3.141853333333333
-
-
Run a Python SparkPi jar. You can view the example source here.
-
Use the following command to run a Python Spark job which calculates the value of Pi.
dcos spark run --submit-args="https://downloads.mesosphere.com/spark/examples/pi.py 30"
Expected output:
2017/08/24 15:44:20 Parsing application as Python job 2017/08/24 15:44:23 Using docker image mesosphere/spark:2.6.0-2.3.2-hadoop-2.7 for drivers 2017/08/24 15:44:23 Pulling image mesosphere/spark:2.6.0-2.3.2-hadoop-2.7 for executors, by default. To bypass set spark.mesos.executor.docker.forcePullImage=false 2017/08/24 15:44:23 Setting DCOS_SPACE to /spark Run job succeeded. Submission id: driver-20170824224423-0002
-
View the standard output from your job:
dcos task log --completed driver-20170616213917-0002
Expected output:
Pi is roughly 3.142715
-
-
Run an R job. You can view the example source here.
-
Use the following command to run an R job.
dcos spark run --submit-args="https://downloads.mesosphere.com/spark/examples/dataframe.R"
Expected output:
2017/08/24 15:45:21 Parsing application as R job 2017/08/24 15:45:23 Using docker image mesosphere/spark:2.6.0-2.3.2-hadoop-2.7 for drivers 2017/08/24 15:45:23 Pulling image mesosphere/spark:2.6.0-2.3.2-hadoop-2.7 for executors, by default. To bypass set spark.mesos.executor.docker.forcePullImage=false 2017/08/24 15:45:23 Setting DCOS_SPACE to /spark Run job succeeded. Submission id: driver-20170824224524-0003
-
Use the following command to view the standard output from your job.
dcos spark log --lines_count=10 driver-20170824224524-0003
Expected output:
In Filter(nzchar, unlist(strsplit(input, ",|\\s"))) : bytecode version mismatch; using eval root |-- name: string (nullable = true) |-- age: double (nullable = true) root |-- age: long (nullable = true) |-- name: string (nullable = true) name 1 Justin
-
Next steps
- To view the status of your job, run the
dcos spark webui
command then visit the Spark cluster dispatcher UI athttp://<dcos-url>/service/spark/
. - To view the logs, see the documentation for Mesosphere DC/OS monitoring.
- To view details about your Spark job, run the
dcos task log --completed <submissionId>
command.