Spark Operator

Overview

The Kubernetes Operator for Apache Spark aims to make specifying and running Spark applications as easy and idiomatic as running other workloads on Kubernetes. It uses Kubernetes custom resources for specifying, running, and surfacing status of Spark applications. For a complete reference of the custom resource definitions, please refer to the API Definition. For details on its design, please refer to the design documentation. It requires Spark 2.3 and above that supports Kubernetes as a native scheduler backend.

NOTE: The default installation is basic, please provide your override configmap to enable desired Spark Operator features

Install

You can find generic installation instructions for workspace catalog applications on the Application Deployment topic.

NOTE: Only install the Spark operator once per workspace.

For details on custom configuration for the operator, refer to the Spark Operator Helm Chart documentation.

After you finish the installation, see Spark Operator custom resource documentation for more information about how to submit your Spark jobs.

Sample override config

NOTE: Ensure you configure the AppDeployment with the appropriate override configmap

Using UI

podLabels:
  owner: john
  team: operations

Using CLI

See Application Deployment for details.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  namespace: ${WORKSPACE_NAMESPACE}
  name: spark-operator-overrides
data:
  values.yaml: |
    configInline:
      podLabels:
        owner: john
        team: operations
EOF

Uninstall via the CLI

NOTE: Uninstalling the Spark Operator does not affect existing SparkApplication and ScheduledSparkApplication custom resources. You need to manually remove any leftover custom resources and CRDs from the operator. Please refer to deleting Spark Operator custom resources.

Uninstall the Spark Operator AppDeployment:

kubectl -n <your workspace namespace> delete AppDeployment <name of AppDeployment>

Remove the Spark Operator Service Account:

# <name of service account> is spark-operator-service-account if you didn't override the RBAC resources.
kubectl -n <your workspace namespace> delete serviceaccounts <name of service account>

Remove the Spark Operator CRDs:

NOTE: The CRDs are not finalized for deletion until you delete the associated custom resources.
```
kubectl delete crds scheduledsparkapplications.sparkoperator.k8s.io sparkapplications.sparkoperator.k8s.io
```

Resources

Here are some resources to learn more about Spark Operator:

How to spin up your Spark Operator

Overview

Install

Sample override config

Uninstall via the CLI

Resources