Job Scheduling

Scheduling jobs with DC/OS Apache Spark

This section is a simple overview of material described in greater detail in the Apache Spark documentation here and here.

Modes

Spark on Mesos supports two modes of operation: coarse-grained mode and fine-grained mode. Coarse-grained mode provides lower latency, whereas fine-grained mode provides higher utilization. You can find nore information here.

Coarse-grained mode

With the Coarse-grained mode, each Spark executor is represented by a single Mesos task. As a result, executors have a constant size throughout their lifetime.

  • Executor memory: spark.executor.memory
  • Executor CPUs: spark.executor.cores, or all the cores in the offer.
  • Number of Executors: spark.cores.max / spark.executor.cores. Executors are brought up until spark.cores.max is reached. Executors survive for duration of the job.
  • Executors per agent: Multiple

IMPORTANT: We highly recommend you set spark.cores.max. If you do not, your Spark job may consume all available resources in your cluster, resulting in unhappy peers.

Quota for drivers and executors

Setting Mesos Quota for the drivers prevents the Dispatcher from consuming too many resources and assists queueing behavior.

To control the number of drivers the Spark service runs concurrently, you should set a quota for the drivers. The quota guarantees that the Spark Dispatcher has resources available to launch drivers and limits the total impact on the cluster due to drivers.

Optionally, you can set a quota for the drivers to consume to ensure that drivers are not starved of resources by other frameworks and make sure they do not consume too much of the cluster. For more information, see coarse-grained mode above.

Best practices for setting drivers

The quota for the drivers allows the operator of the cluster to ensure that only a given number of drivers are concurrently running. As additional drivers are submitted, they are queued by the Spark Dispatcher.

Use the following guidelines to achieve best results:

  • Set the quota conservatively, but be aware that the setting affects the number of jobs that can run concurrently.
  • Decide how much of your cluster’s resources to allocate to running drivers. Allocated resources are only be used for the Spark drivers, meaning that you can decide roughly how many concurrent jobs you would like to have running at a time. As additional jobs are submitted, they are queued and run with first-in-first-out semantics.
  • For the most predictable behavior, enforce uniform driver resource requirements and a particular quota size for the Dispatcher. For example, if each driver consumes 1.0 cpu and it is desirable to run up to 5 Spark jobs concurrently, you should create a quota that specifies 5 CPUs.

Setting quota for the drivers

  1. SSH to the Mesos master and set the quota for a role (dispatcher in this example):

    cat dispatcher-quota.json
    {
    "role": "dispatcher",
    "guarantee": [
      {
        "name": "cpus",
        "type": "SCALAR",
        "scalar": { "value": 5.0 }
      },
      {
        "name": "mem",
        "type": "SCALAR",
        "scalar": { "value": 5120.0 }
      }
    ]
    }
    curl -d @dispatcher-quota.json -X POST http://<master>:5050/quota
    
  2. Install the Spark service with the following options (at a minimum):

    cat options.json
    {
        "service": {
            "role": "dispatcher"
        }
    }
    dcos package install spark --options=options.json
    

Best practices for the executors

It is recommended to allocate a quota for Spark job executors. Allocating quota for the Spark executors provides:

  • A guarantee that Spark jobs receive the requested amount of resources.
  • Additional assurance that even if misconfigured (for example, with a driver with spark.cores.max unset), Spark jobs do not consume resources that impact other tenants on the cluster.

The drawback to allocating quota to the executors is that quota resources cannot be used by other frameworks in the cluster.

Setting quota for the executors

Quota can be allocated for Spark executors in the same way it is allocated for Spark dispatchers. If you want to run 100 executors concurrently, each with 1.0 CPU and 4096 MB of memory, you would do the following:

cat executor-quota.json
{
  "role": "executor",
  "guarantee": [
    {
      "name": "cpus",
      "type": "SCALAR",
      "scalar": { "value": 100.0 }
    },
    {
      "name": "mem",
      "type": "SCALAR",
      "scalar": { "value": 409600.0 }
    }
  ]
}
curl -d @executor-quota.json -X POST http://<master>:5050/quota

When Spark jobs are submitted, they must indicate the role for which the quota has been set to consume resources from this quota. For example:

dcos spark run --verbose --name=spark --submit-args="\
--driver-cores=1 \
--driver-memory=1024M \
--conf spark.cores.max=8 \
--conf spark.mesos.role=executor \
--class org.apache.spark.examples.SparkPi \
http://downloads.mesosphere.com/spark/assets/spark-examples_2.11-2.0.1.jar 3000"

IMPORTANT: : To prevent a single long-running or streaming Spark job from consuming the entire quota, you should set the maximum CPUs for the Spark job to roughly one “job’s worth” of the quota’s resources. This setting ensures that the Spark job gets sufficient resources to make progress, and prevents the job from starving other Spark jobs of resources and provides predictable offer suppression semantics.

Permissions when using quota with strict mode

Strict mode clusters (see security modes) require extra permissions to be set before you can use quota. Follow the instructions in installing and add the additional permissions for the roles you intend to use, as detailed below.

Using the example above, you would set permissions as follows:

  1. First set the quota for the Dispatcher’s role (dispatcher):

    cat dispatcher-quota.json
    {
      "role": "dispatcher",
      "guarantee": [
        {
          "name": "cpus",
          "type": "SCALAR",
          "scalar": { "value": 5.0 }
        },
        {
          "name": "mem",
          "type": "SCALAR",
          "scalar": { "value": 5120.0 }
        }
      ]
    }
    

    If you have downloaded the CA certificate,dcos-ca.crt to your local machine from the https://<dcos_url>/ca/dcos-ca.crt endpoint, set the quota from your local machine:

    curl -X POST --cacert dcos-ca.crt -H "Authorization: token=$(dcos config show core.dcos_acs_token)" $(dcos config show core.dcos_url)/mesos/quota -d @dispatcher-quota.json -H 'Content-Type: application/json'
    
  2. Optionally, set the quota for the executors using the same settings as above:

    cat executor-quota.json
    {
      "role": "executor",
      "guarantee": [
        {
          "name": "cpus",
          "type": "SCALAR",
          "scalar": { "value": 100.0 }
        },
        {
          "name": "mem",
          "type": "SCALAR",
          "scalar": { "value": 409600.0 }
        }
      ]
    }
    
  3. If you have not already done so, set the quota from your local machine. For example, assuming you have dcos-ca.crt locally:

    curl -X POST --cacert dcos-ca.crt -H "Authorization: token=$(dcos config show core.dcos_acs_token)" $(dcos config show core.dcos_url)/mesos/quota -d @executor-quota.json -H 'Content-Type: application/json'
    
  4. Install Spark with these minimal configuration settings:

    {
        "service": {
                "service_account": "spark-principal",
                "role": "dispatcher",
                "user": "root",
                "service_account_secret": "spark/spark-secret"
        }
    }
    
  5. Now you are ready to run a Spark job using the principal you set and the roles:

    dcos spark run --verbose --submit-args=" \
    --conf spark.mesos.principal=spark-principal \
    --conf spark.mesos.role=executor \
    --conf spark.mesos.containerizer=mesos \
    --class org.apache.spark.examples.SparkPi http://downloads.mesosphere.com/spark/assets/spark-examples_2.11-2.0.1.jar 100"
    

Setting spark.cores.max

To improve Spark job execution reliability, set the maximum number of cores consumed by any given job. This avoids any particular Spark job from consuming too many resources in a cluster. It is highly recommended that each Spark job be submitted with a limitation on the maximum number of cores (CPUs) it can consume. This is especially important for long-running and streaming Spark jobs.

dcos spark run --verbose --name=spark --submit-args="\
--driver-cores=1 \
--driver-memory=1024M \
--conf spark.cores.max=8 \ #<< Very important!
--class org.apache.spark.examples.SparkPi \
http://downloads.mesosphere.com/spark/assets/spark-examples_2.11-2.0.1.jar 3000"

When running multiple concurrent Spark jobs, consider setting spark.cores.max between <total_executor_quota>/<max_concurrent_jobs> and <total_executor_quota>, depending on your workload characteristics and goals.

Fine-grained mode (deprecated)

NOTE: Fine-grained mode has been deprecated and does not have all of the features of coarse-grained mode.

In “fine-grained” mode, each Spark task is represented by a single Mesos task. When a Spark task finishes, the resources represented by its Mesos task are relinquished. Fine-grained mode enables finer-grained resource allocation at the cost of task startup latency.

  • Executor memory: spark.executor.memory
  • Executor CPUs: Increases and decreases as tasks start and terminate.
  • Number of Executors: Increases and decreases as tasks start and terminate.
  • Executors per agent: At most 1

Properties

The following is a description of the most common Spark scheduling properties on Mesos. For a full list, see the Spark configuration page and the Spark on Mesos configuration page.

property default description
spark.mesos.coarse true Described above.
spark.executor.memory 1g Executor memory allocation.
spark.executor.cores All available cores in the offer Coarse-grained mode only. DC/OS Apache Spark >= 1.6.1. Executor CPU allocation.
spark.cores.max unlimited Maximum total number of cores to allocate.