Release Notes

Release notes for DC/OS Apache Spark 2.12.0-3.0.1

Spark and Spark history 2.12.0-3.0.1 was released on 12, November 2020

New features

  • Base tech upgrades to 3.0.1
  • Adds support of _SPARK_AUTH_SECRET_FILE environment to accept file-based authentication, similar to _SPARK_AUTH_SECRET, which accepts secret directly.
  • Adds the flag --executor-auth-secret-path in a Spark CLI to enable File-Based RPC Authentication.
  • Adds configurations to support log rotate options for both stdout and stderr.

Updates

  • Supports Scala version 2.12 only. No longer supports Scala version 2.11.
  • Compiles Spark with Hadoop versions 3.2 and 2.7.

Breaking changes

  • Spark CLI flag --executor-auth-secret, no longer accepts filename or secret path.
  • No longer publish Spark docker image compiled with Hadoop 2.9.

Spark and Spark history 2.11.0-2.4.6 was released on 30, July 2020

New features

  • Base tech upgrades to 2.4.6

Updates

  • Removes hardcoded --completed flag from dcos spark log CLI and allows all options supported

Spark and Spark history 2.10.0-2.4.5 was released on 17, June 2020

New features

  • Suppress and Revive support
  • Placement Constraints support by Dispatcher

Updates

  • Updated libmesos-bundle from 1.14-alpha to 1.14-beta

Version Spark and Spark history 2.9.0-2.4.3

New features

  • Marathon group role enforcement support for DCOS 1.14
  • Role propagation from Dispatcher to all submitted Spark applications
  • Dispatcher role enforcement for submitted Spark applications. See details here
  • Node draining support
  • Executor memory overhead config property support
  • Mesos secrets support by Dispatcher
  • Standardized metric names and metric tags for monitoring

Updates

  • Switched from Oracle JDK to OpenJDK
  • Switched to UCR containerizer as the default
  • Scala 2.12 support
  • Updated libmesos to 1.14

Bug fixes

  • Fixed a bug causing crashes when labels are malformed

Breaking changes

  • Standardized metrics rename existing metrics by moving variable parts of metric names to tags

Version Spark and Spark history 2.8.0-2.4.0

New features

  • Mesos checkpointing support for Spark Drivers submitted to Dispatcher. For more information check fault tolerance documentation.

Updates

  • Upgraded Hadoop dependency to 2.9.2

Bug fixes

  • Fixed bug for Spark Executor ID being different from Mesos Task ID
  • Fixed a bug in Spark CLI causing incorrect parsing of jars argument when word “jars” is present in path

Breaking changes

  • The default Hadoop dependency is now 2.9 and not 2.7

Version Spark and Spark history 2.7.0-2.4.0

New features

  • Upgraded Spark and Spark History Server to 2.4.0

Updates

  • Spark Mesos Dispatcher uses the same user for running Spark jobs as itself and defaults to nobody
  • Switched to dcos-commons bootstrap script for IP address detection

Breaking changes

  • Removed configuration option use_bootstrap_for_IP_detect since we now use bootstrap by default for Spark container IP detection which works across all containerizers: DOCKER & MESOS (UCR) and networking modes: HOST & CNI Virtual Networks.

Version Spark and Spark history 2.6.0-2.3.2

New features

  • Upgraded Spark and Spark History Server to 2.3.2
  • Added DC/OS Spark CLI support for --jars
  • Added CNI Support for Dispatcher, Driver, and Executors for Docker and Mesos containerizers
  • Added CNI labels support for Mesos containerizer
  • Added package configuration for CNI:
    • virtual_network_enabled
    • virtual_network_name
    • virtual_network_plugin_labels
  • Spark Dispatcher by default launches Spark Drivers and Executors in the same virtual network it was launched in itself

NOTE: Limitations of current Spark CNI support:

  • Configuration of network plugin labels from DC/OS UI supported only in JSON editing mode
  • Network plugin labels are not supported by Docker containerizer
  • Currently, DC/OS AdminRouter doesn't support virtual networks so DC/OS Spark endpoints will not be accessible from CLI, and jobs need to be submitted from a routable network

Updates

  • SPARK_HOME environment variable defaults to /opt/spark in Dockerfile and executable scripts
  • Switched to Spark’s own StatsD Sink instead of 3rd-party dependency
  • Updated dcos-commons bootstrap version to 0.55.2

Bug fixes

  • Fixed bug for Dispatcher restarting duplicate Spark drivers after agent restart in --supervised mode
  • Fixed bug for CLI incorrect --jars parsing resulting in submit failure

Version Spark and Spark history 2.5.0-2.2.1

New features

  • Added unique Mesos Task IDs for Spark executors.
  • Added trusted Ubuntu 18.04 base Docker image.
  • Added nobody user support on RHEL/CentOS (through configuration).

Updates

  • Changed the default user for the Docker container from root to nobody.
  • Upgraded JRE to 1.8.192.
  • Upgraded to Ubuntu 18.04`
  • Updated Hadoop dependencies from 2.7.3 to 2.7.7 (fixes CVE-2016-6811, CVE-2017-3162, CVE-2017-3166, CVE-2018-8009).
  • Updated Jetty dependencies from 9.3.11.v20160721 to 9.3.24.v20180605 (fixes CVE-2017-7658).
  • Updated Jackson dependencies from 2.6.5 to 2.9.6 (fixes CVE-2017-15095, CVE-2017-17485, CVE-2017-7525, CVE-2018-7489, CVE-2016-3720).
  • Updated ZooKeeper dependencies from 3.4.6 to 3.4.13.

Bug fixes

  • dcos task log now works because of unique Mesos Tasks IDs of Spark executors.
  • Fixed unstable health checks for Spark dispatcher and history server.
  • Spark dispatcher task output now redirects to stdout and is available in logs.

Breaking changes

  • Added a new configuration option docker_user to override the user when running Spark using Docker containerizer.
  • The default Hadoop dependency is now 2.7 and not 2.6.

Version Spark and Spark history 2.4.0-2.2.1-3

New features

  • Added service name to dispatcher’s VIP endpoints.
  • Added shell-escape fix to spark-cli (SPARK-21014).
  • Added spark.mesos.executor.gpus (SPARK-21033).
  • Added dispatcher and driver metrics.
  • Added statsd sink for spark metrics.

    NOTE: Metrics is a beta feature and requires enabling UCR. Production use is not advised.

Updates

  • Updated tests, build tools, CLI, and vendor packages.
  • Updated bootstrap version to 0.50.0.
  • Updated JRE version to 8u172.

Bug fixes

  • Fixed duplicate docker image URLs, with use resource.json as the default.

Breaking changes

  • VIP endpoints for the dispatcher are no longer spark-dispatcher:<port> and are now dispatcher.:<port>.

Version Spark and Spark history 2.4.0-2.2.1-3

New features

  • Added service name to dispatcher’s VIP endpoints.
  • Added shell-escape fix to spark-cli (SPARK-21014).
  • Added spark.mesos.executor.gpus (SPARK-21033).
  • Added dispatcher and driver metrics.
  • Added statsd sink for spark metrics.

    NOTE: Metrics is a beta feature and requires enabling UCR. Production use is not advised.

Updates

  • Updated tests, build tools, CLI, and vendor packages.
  • Updated bootstrap version to 0.50.0.
  • Updated JRE version to 8u172.

Bug fixes

  • Fixed duplicate docker image URLs, with use resource.json as the default.

Breaking changes

  • VIP endpoints for the dispatcher are no longer spark-dispatcher:<port> and are now dispatcher.:<port>.

Version Spark and Spark history 2.3.1-2.2.1-2

Updates

  • Updated libmesos version with a critical bug fix.

Documentation

  • Added a page documenting results from scale testing of Spark on DC/OS.

Version 2.3.0-2.2.1-2

New features

  • Added secrets support for drivers, so that a secret can be disseminated to the executors. (SPARK-22131).
  • Added Kerberos ticket renewal (SPARK-21842).
  • Added Mesos sandbox URI to the Dispatcher UI (SPARK-13041).
  • Added support for Driver <-> Executor TLS with file-based secrets.
  • Added support for Driver <-> Executor SASL (RPC endpoint authentication and encryption) using file-based secrets.
  • Added --executor-auth-secret as a shortcut for Driver <-> Executor Spark SASL (RPC endpoint authentication and encryption) configuration.
  • Added CLI command to generate a random secret.
  • Enabled native BLAS for MLLib.
  • Added configuration to deploy Dispatcher on UCR (default is Docker).
  • Instead of setting the krb5.conf as a base64-encoded blob, the user can now specify service.security.kerberos.kdc.[port|hostname] and service.security.kerberos.realm directly in the options.json file. The behavior with the base64-encoded blob remains the same, and will overwrite the new configuration.

History server

  • Added Kerberos support for integration with a Kerberized HDFS. See documentation for configuration instructions.
  • Made the user configurable, defaults to root.

Updates

  • Updated JRE version to 8u152 JCE.
  • Changed the default user to root (Breaking change).

Bug fixes

  • First delegation token renewal time is not 75% of renewal time (SPARK-22583).
  • Fixed supervise mode with checkpointing(SPARK-22145).
  • Added support for older SPARK_MESOS_KRB5_CONF_BASE64 environment variable.
  • The spark CLI has “shortcut” command-line arguments that are translated into spark.config=setting configurations downstream (such as spark.executor.memory) no longer overwrite the configuration a user sets directly with the default value for the shortcut argument.

Breaking changes

  • Changed the default user to root, in both the Dispatcher and history server.
  • To configure Kerberos in the options.json file, a new property service.security.kerberos.enabled must be set to true. This option applies to both the Dispatcher and history server.
  • Removed the security.ssl properties from the options.json file. These properties are no longer needed for the Go-based CLI.
  • Removed --dcos-space option from the CLI. Access to secrets is determined by the Spark Dispatcher service name. See security for more information about where to place secrets.

Version 2.1.0-2.2.0-1

Improvements

  • Changed the image to run as user nobody instead of root by default. (https://github.com/mesosphere/spark-build/pull/189)

Bug fixes

  • Configuration to allow custom Dispatcher docker image. (https://github.com/mesosphere/spark-build/pull/179)
  • CLI breaks with multiple spaces in submit args. (https://github.com/mesosphere/spark-build/pull/193)

Documentation

  • Updated the HDFS endpoint information in hdfs.
  • Added checkpointing instructions. (https://github.com/mesosphere/spark-build/pull/181)
  • Updated custom docker image support policy. (https://github.com/mesosphere/spark-build/pull/200)

Version 2.2.0-2.2.0-2-beta

Improvements

  • Added secrets support in driver (SPARK-22131).
  • Added Kerberos ticket renewal (SPARK-21842).
  • Added Mesos sandbox URI to dispatcher UI(SPARK-13041).
  • Updated JRE version to 8u152 JCE.
  • Added support for Driver <-> Executor TLS with file-based secrets.
  • Added support for Driver <-> Executor SASL (RPC endpoint authentication and encryption) with file-based secrets.
  • Added CLI command to generate a random secret.
  • Enabled native BLAS for MLLib.
  • Added configuration to deploy Dispatcher on UCR (default is Docker).

Bug fixes

  • First delegation token renewal time is not 75% of renewal time (SPARK-22583).
  • Fixed supervise mode with checkpointing(SPARK-22145).
  • Added support for older SPARK_MESOS_KRB5_CONF_BASE64 environment variable.

Tests

  • Added integration test that reads / writes to a Kerberized HDFS.
  • Added integration test that reads / writes to a Kerberized Kafka.
  • Added integration test of checkpointing and supervise.

Documentation

  • Updated naming of DC/OS.
  • Updated docs links in package post-install notes.
  • Updated Kerberos docs.
  • Documented running Spark Streaming jobs with Kerberized Kafka.
  • Documented nobody limitation on certain OSes.

Version 2.1.0-2.2.0-1

Improvements

  • Changed the image to run as user nobody instead of root by default. (https://github.com/mesosphere/spark-build/pull/189)

Bug fixes

  • Configuration to allow a custom dispatcher Docker image. (https://github.com/mesosphere/spark-build/pull/179)
  • CLI breaks with multiple spaces in submit arguments. (https://github.com/mesosphere/spark-build/pull/193)

Documentation

  • Updated HDFS endpoint information in hdfs.
  • Added checkpointing instructions. (https://github.com/mesosphere/spark-build/pull/181)
  • Updated custom docker image support policy. (https://github.com/mesosphere/spark-build/pull/200)

Version 2.0.1-2.2.0-1

Improvements

  • Exposed isR and isPython spark run arguments.

Bug fixes

  • Allowed for application args to have arguments without equals sign.
  • Fixed docs link in Universe package description.

Version 2.0.0-2.2.0-1

Improvements

  • Kerberos support has changed to use common code from spark-core instead of custom implementation.
  • Added file and environment variable-based secret support.
  • Kerberos keytab/TGT login from the DC/OS Spark CLI in cluster mode (uses file-based secrets).
  • Added CNI network label support.
  • CLI doesn’t require spark-submit to be present on client machine.

Bug fixes

  • Drivers are successfully re-launched when --supervise flag is set.
  • CLI works on 1.9 and 1.10 DC/OS clusters.

Breaking changes

  • Setting spark.app_id has been removed (for example, dcos config set spark.app_id <dispatcher_app_id>). To submit jobs with a given dispatcher, use dcos spark --name <dispatcher_app_id>.
  • principal is now service_account and secret is now service_account_secret.

Version 1.1.1-2.2.0

Improvements

  • Upgrade to Spark 2.2.0.
  • Spark driver now supports configurable failover_timeout. The default value is 0 when the configuration is not set SPARK-21456.

Breaking change

  • Spark CLI no longer supports -Dspark arguments.

Version 1.0.9-2.1.0-1

  • The history server has been removed from the “spark” package, and put into a dedicated “spark-history” package.