Overview
The Kubernetes Operator for Apache Spark aims to make specifying and running Spark applications as easy and idiomatic as running other workloads on Kubernetes. It uses Kubernetes custom resources for specifying, running, and surfacing status of Spark applications. For a complete reference of the custom resource definitions, please refer to the API Definition. For details on its design, please refer to the design documentation. It requires Spark 2.3 and above that supports Kubernetes as a native scheduler backend.
Install
You can find generic installation instructions for workspace catalog applications on the Application Deployment topic.
For details on custom configuration for the operator, refer to the Spark Operator Helm Chart documentation.
After you finish the installation, see Spark Operator custom resource documentation for more information about how to submit your Spark jobs.
Sample override config
-
Using UI
podLabels: owner: john team: operations
-
Using CLI
See Application Deployment for details.
cat <<EOF | kubectl apply -f - apiVersion: v1 kind: ConfigMap metadata: namespace: ${WORKSPACE_NAMESPACE} name: spark-operator-overrides data: values.yaml: | configInline: podLabels: owner: john team: operations EOF
Uninstall via the CLI
-
Uninstall the Spark Operator
AppDeployment
:kubectl -n <your workspace namespace> delete AppDeployment <name of AppDeployment>
-
Remove the Spark Operator Service Account:
# <name of service account> is spark-operator-service-account if you didn't override the RBAC resources. kubectl -n <your workspace namespace> delete serviceaccounts <name of service account>
-
Remove the Spark Operator CRDs:
kubectl delete crds scheduledsparkapplications.sparkoperator.k8s.io sparkapplications.sparkoperator.k8s.io
Resources
Here are some resources to learn more about Spark Operator: