To run your Spark workloads with Spark Operator, apply the Spark Operator specific custom resources. The Spark Operator works with the following kinds of custom resources:
SparkApplication
ScheduledSparkApplication
See Spark Operator API documentation for more details.
Prerequisites
Follow these steps:
-
Deploy your Spark Operator. See the Spark Operator documentation for more information.
-
Ensure the necessary RBAC resources referenced in your custom resources exist, otherwise the custom resources can fail. See the Spark Operator documentation for details.
-
This is an example of commands for you to create the RBAC resources needed in your project namespace:
export PROJECT_NAMESPACE=<project namespace> kubectl apply -f - <<EOF apiVersion: v1 kind: ServiceAccount metadata: name: spark-service-account namespace: ${PROJECT_NAMESPACE} --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: ${PROJECT_NAMESPACE} name: spark-role rules: - apiGroups: [""] resources: ["pods"] verbs: ["*"] - apiGroups: [""] resources: ["services"] verbs: ["*"] - apiGroups: [""] resources: ["configmaps"] verbs: ["*"] - apiGroups: [""] resources: ["persistentvolumeclaims"] verbs: ["*"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: spark-role-binding namespace: ${PROJECT_NAMESPACE} subjects: - kind: ServiceAccount name: spark-service-account namespace: ${PROJECT_NAMESPACE} roleRef: kind: Role name: spark-role apiGroup: rbac.authorization.k8s.io EOF
-
Deploy a simple SparkApplication
Follow these steps:
-
Create your Project if you don’t already have one.
-
Set the
PROJECT_NAMESPACE
environment variable to the name of your project’s namespace:export PROJECT_NAMESPACE=<project namespace>
-
Set the SPARK_SERVICE_ACCOUNT environment variable to one of the following:
-
${PROJECT_NAMESPACE}
, if you skipped the step in Prerequisites to create RBAC resources.# This service account is automatically created when you create a project and has access to everything in the project namespace. export SPARK_SERVICE_ACCOUNT=${PROJECT_NAMESPACE}
-
Or set to
spark-service-account
export SPARK_SERVICE_ACCOUNT=spark-service-account
-
-
Apply the SparkApplication custom resource in your project namespace
kubectl apply -f - <<EOF apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: name: pyspark-pi namespace: ${PROJECT_NAMESPACE} spec: type: Python pythonVersion: "3" mode: cluster image: "gcr.io/spark-operator/spark-py:v3.1.1" imagePullPolicy: Always mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py sparkVersion: "3.1.1" restartPolicy: type: OnFailure onFailureRetries: 3 onFailureRetryInterval: 10 onSubmissionFailureRetries: 5 onSubmissionFailureRetryInterval: 20 driver: cores: 1 coreLimit: "1200m" memory: "512m" labels: version: 3.1.1 serviceAccount: ${SPARK_SERVICE_ACCOUNT} executor: cores: 1 instances: 1 memory: "512m" labels: version: 3.1.1 EOF
Clean up
Follow these steps:
-
View
SparkApplications
in all namespaces:kubectl get sparkapp -A
-
Deleting a specific
SparkApplication
:kubectl -n ${PROJECT_NAMESPACE} delete sparkapp <name of sparkapplication>