Welcome to the documentation for the DC/OS Apache Spark. For more information about new and changed features, see the release notes.
Apache Spark is a fast and general-purpose cluster computing system for big data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including the following:
- Spark SQL for SQL and DataFrames
- MLlib for machine learning
- GraphX for graph processing
- Spark Streaming for stream processing.
For more information, see the Apache Spark documentation.
DC/OS Apache Spark consists of Apache Spark with a few custom commits along with DC/OS-specific packaging.
DC/OS Apache Spark includes:
- Mesos Cluster Dispatcher
- Spark History Server
- DC/OS Apache Spark CLI
- Interactive Spark shell
Benefits
- Utilization: DC/OS Apache Spark leverages Mesos to run Spark on the same cluster as other DC/OS services
- Improved efficiency
- Simple management
- Multi-team support
- Interactive analytics through notebooks
- UI integration
- Security, including file-based and environment-based secrets
Features
- Multiversion support
- Run multiple Spark dispatchers
- Run against multiple HDFS clusters
- Backports of scheduling improvements
- Simple installation of all Spark components, including the dispatcher and the history server
- Integration of the dispatcher and history server
- Zeppelin integration
- Kerberos and SSL support
Related services
Release Notes
Release notes for DC/OS Apache Spark 2.11.0-2.4.6…Read More
Quick Start
Introduction to DC/OS Apache Spark service…Read More
Install and Customize
Installing and customizing your DC/OS Apache Spark service…Read More
Usage Examples
Using DC/OS Apache Spark…Read More
Integration with HDFS
Integrating HDFS with DC/OS Apache Spark service…Read More
History Server
Enabling the Spark History Server…Read More
Security
Configuring DC/OS service accounts for Spark…Read More
Upgrade
Upgrading DC/OS Apache Spark…Read More
Uninstalling Spark
Uninstalling DC/OS Apache Spark…Read More
Runtime Configuration Changes
Customizing DC/OS Apache Spark while it is up and running…Read More
Run a Spark Job
Running a Spark job…Read More
Interactive Spark Shell
Running commands interactively in the Apache Spark shell…Read More
Custom Docker Images
Customizing the Docker image in which Spark runs…Read More
Fault Tolerance
Understanding DC/OS Apache Spark fault tolerance…Read More
Job Scheduling
Scheduling jobs with DC/OS Apache Spark…Read More
Kerberos
Setting up Kerberos to run with DC/OS Apache Spark…Read More
Troubleshooting
Troubleshooting DC/OS Apache Spark…Read More
Version Policy
Understanding DC/OS Apache Spark version policy…Read More
Limitations
Limitations of DC/OS Apache Spark…Read More
Tested Limits
Mesosphere has scale-tested Spark on DC/OS…Read More