Quick Start

This page explains how to install the DC/OS Apache Spark service.

Prerequisites

DC/OS and DC/OS CLI installed with a minimum of three agent nodes, with 8 GB of memory and 10 GB of disk space
Depending on your security mode, Spark requires service authentication for access to DC/OS. See Provisioning a service account for more information.

Security mode Service account

Disabled Not available

Permissive Optional

Strict Required

Security mode	Service account
Disabled	Not available
Permissive	Optional
Strict	Required

Install the Spark package. This may take a few minutes. This step installs the Spark DC/OS service, Spark CLI, dispatcher, and, optionally, the history server. See the History Server section for information about how to install the history server.

dcos package install spark

Expected output:

Installing Marathon app for package [spark] version [2.11.0-2.4.6]
Installing CLI subcommand for package [spark] version [2.11.0-2.4.6]
New command available: dcos spark
DC/OS Spark is being installed!

	Documentation: /mesosphere/dcos/services/spark/
	Issues: https://docs.mesosphere.com/support/

NOTE: Type dcos spark to view the Spark CLI options.

Run the sample SparkPi jar for DC/OS.

You can view the example source here.

Use the following command to run a Spark job which calculates the value of Pi.

dcos spark run --submit-args="--class org.apache.spark.examples.SparkPi https://downloads.mesosphere.com/spark/assets/spark-examples_2.11-2.4.0.jar 30"

Expected output:

Using image 'mesosphere/spark:2.11.0-2.4.6-scala-2.11-hadoop-2.9' for the driver and the executors (from dispatcher: container.docker.image).
To disable this image on executors, set spark.mesos.executor.docker.forcePullImage=false
Run job succeeded. Submission id: driver-20200729203326-0001

View the standard output from your job:

dcos spark log --completed driver-20200729203326-0001

Expected output should contain:

Pi is roughly 3.1424917141639046

Run a Python SparkPi jar. You can view the example source here.

Use the following command to run a Python Spark job which calculates the value of Pi.

dcos spark run --submit-args="https://downloads.mesosphere.com/spark/examples/pi.py 30"

Expected output:

Parsing application as Python job
Using image 'mesosphere/spark:2.11.0-2.4.6-scala-2.11-hadoop-2.9' for the driver and the executors (from dispatcher: container.docker.image).
To disable this image on executors, set spark.mesos.executor.docker.forcePullImage=false
Run job succeeded. Submission id: driver-20200729203704-0002

View the standard output from your job:

dcos spark log --completed driver-20200729203704-0002

Expected output should contain:

Pi is roughly 3.139508

Run an R job. You can view the example source here.

Use the following command to run an R job.

dcos spark run --submit-args="https://downloads.mesosphere.com/spark/examples/dataframe.R"

Expected output:

Parsing application as R job
Using image 'mesosphere/spark:2.11.0-2.4.6-scala-2.11-hadoop-2.9' for the driver and the executors (from dispatcher: container.docker.image).
To disable this image on executors, set spark.mesos.executor.docker.forcePullImage=false
Run job succeeded. Submission id: driver-20200729204249-0003

Use the following command to view the standard output from your job.

dcos spark log --completed --lines_count=10 driver-20170824224524-0003

Expected output:

In Filter(nzchar, unlist(strsplit(input, ",|\\s"))) :
  bytecode version mismatch; using eval
root
 |-- name: string (nullable = true)
 |-- age: double (nullable = true)
root
 |-- age: long (nullable = true)
 |-- name: string (nullable = true)
    name
1 Justin

Next steps

To view the status of your job, run the dcos spark webui command then visit the Spark cluster dispatcher UI at http://<dcos-url>/service/spark/ .
To view the logs, see the documentation for Mesosphere DC/OS monitoring.
To view details about your Spark job, run the dcos task log --completed <submissionId> or dcos spark log --completed <submissionId> command.

Introduction to DC/OS Apache Spark service

Next steps