-
Before submitting your job, upload the artifact (such as a
jar
file) to a location that is visible to the cluster (such as HTTP, S3, or HDFS). Learn more. -
Run the job.
- Include all configuration flags for the job before the
jar
url. - Provide the arguments for the Spark job after the
jar
url.
Follow the template
dcos spark run --submit-args="<flags> URL [args]
, where:<flags>
are options like--conf spark.cores.max=16
and--class my.aprk.App
URL
is the location of the application[args]
are any arguments for the application
For example:
dcos spark run --submit-args="--class MySampleClass http://external.website/mysparkapp.jar" dcos spark run --submit-args="--py-files mydependency.py http://external.website/mysparkapp.py" dcos spark run --submit-args="http://external.website/mysparkapp.R"
If your job runs successfully, you will get a message with the job’s submission ID:
Run job succeeded. Submission id: driver-20160126183319-0001
- Include all configuration flags for the job before the
-
View the Spark scheduler progress by navigating to the Spark dispatcher at http://
/service/spark/`. -
View the job’s logs through the Mesos UI at
http://<dcos-url>/mesos/
or by runningdcos task log --follow <submission_id>
.
Setting Spark properties
Spark job settings are controlled by configuring Spark properties.
Submission
All properties are submitted through the --submit-args
option to dcos spark run
. There are a few options that are unique to DC/OS that are not in Spark Submit (for example --keytab-secret-path
). View dcos spark run --help
for a list of all these options. All --conf
properties supported by Spark can be passed through the command-line with within the --submit-args
string.
dcos spark run --submit-args="--conf spark.executor.memory=4g --supervise --class MySampleClass http://external.website/mysparkapp.jar 30"
Setting automatic configuration defaults
To set Spark properties with a configuration file, create a spark-defaults.conf
file and set the environment variable SPARK_CONF_DIR
to the containing directory. Learn more.
Using a properties file
To reuse Spark properties without cluttering the command line, the CLI supports passing a path to a local file containing Spark properties. Such a file consists of properties and values separated by whitespace. For example:
spark.mesos.containerizer mesos
spark.executors.cores 4
spark.eventLog.enabled true
spark.eventLog.dir hdfs:///history
This sample property file sets the containerizer to mesos
, the executor cores to 4
and enables the history server. This file is parsed locally, so it is not be available to your driver applications.
Secrets
Enterprise DC/OS provides a secrets store to enable access to sensitive data such as database passwords, private keys, and API tokens. DC/OS manages secure transportation of secret data, access control and authorization, and secure storage of secret content. A secret can be exposed to drivers and executors as a file or as an environment variable.
To configure a job to access a secret, see the sections on
DC/OS overlay network
To submit a Spark job inside the DC/OS Overlay Network, run a command similar to the following:
dcos spark run --submit-args="--conf spark.mesos.containerizer=mesos --conf spark.mesos.network.name=dcos --class MySampleClass http://external.website/mysparkapp.jar"
Note that DC/OS overlay support requires the UCR rather than the default Docker Containerizer, so you must set --conf spark.mesos.containerizer=mesos
.
Driver failover timeout
The --conf spark.mesos.driver.failoverTimeout
option specifies the number of seconds that the master will wait for the driver to reconnect, after being temporarily disconnected, before it tears down the driver framework by killing all its executors. The default value is zero, meaning no timeout. If the driver disconnects and the value of this option is zero, the master immediately tears down the framework.
To submit a job with a nonzero failover timeout, run a command similar to the following:
dcos spark run --submit-args="--conf spark.mesos.driver.failoverTimeout=60 --class MySampleClass http://external.website/mysparkapp.jar"
Versioning
The DC/OS Apache Spark Docker image contains OpenJDK 8 and Python 2.7.6.
DC/OS Apache Spark distributions 1.X are compiled with Scala 2.10. DC/OS Apache Spark distributions 2.X are compiled with Scala 2.11. DC/OS Apache Spark distributions 3.X are compiled with Scala 2.12. Scala is not binary compatible across minor verions, so your Spark job must be compiled with the same Scala version as your version of DC/OS Apache Spark.
The default DC/OS Apache Spark distribution is compiled against Hadoop 3.2 libraries. However, you can choose a different version by following the instructions in Customize Spark distribution.