Kaptain includes a Spark Operator and installs it by default, but it is only accessible to Kaptain users. If a global Spark Operator is required on the cluster, Kaptain needs additional configuration to use it and to avoid conflicts when running Spark Applications.
Prerequisites
- A Provisioned Konvoy cluster running Konvoy
v1.7.0
or above.
Installing Kaptain alongside an existing Spark Operator
To avoid conflicts with an existing Spark Operator installed on a cluster, Kaptain needs to be installed without the Spark Operator.
Create or update a configuration file named parameters.yaml
that includes the following property:
installSparkOperator: "false"
To install Kaptain with the provided parameters, run the following command on the cluster:
kubectl kudo install ./kubeflow-1.3.0_1.2.0.tgz \
--instance kaptain \
-P parameters.yaml \
--namespace kubeflow \
--create-namespace
Create a ClusterRole
to allow Kubeflow users to run Spark Applications and save it to a file named spark-role.yaml
:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: spark-operator-role
labels:
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-edit: "true"
rules:
- apiGroups:
- sparkoperator.k8s.io
resources:
- sparkapplications
- scheduledsparkapplications
- sparkapplications/status
- scheduledsparkapplications/status
verbs:
- '*'
- apiGroups:
- scheduling.incubator.k8s.io
- scheduling.sigs.dev
resources:
- podgroups
verbs:
- '*'
Use kubectl
to apply the ClusterRole
to the cluster:
kubectl apply -f spark-role.yaml
Configuring Kaptain to use an external Spark Operator after the installation
By default, the Spark instance installed by Kaptain is only usable by Kaptain users. If all cluster users need access to Spark Applications, you need to install an external Spark Operator and disable the one included with Kaptain. Spark Operator installation steps are available in the KUDO Spark Operator documentation.
First, you must disable the default Spark Operator in Kaptain. Create or update a configuration file named parameters.yaml
to include the following property:
installSparkOperator: "false"
Update the Kaptain instance to disable the Spark Operator:
kubectl kudo update \
--instance kaptain \
-P parameters.yaml \
--namespace kubeflow
Re-create the ClusterRole
to allow Kaptain users to run Spark Applications and save it to a file named spark-role.yaml
:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: spark-operator-role
labels:
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-edit: "true"
rules:
- apiGroups:
- sparkoperator.k8s.io
resources:
- sparkapplications
- scheduledsparkapplications
- sparkapplications/status
- scheduledsparkapplications/status
verbs:
- '*'
- apiGroups:
- scheduling.incubator.k8s.io
- scheduling.sigs.dev
resources:
- podgroups
verbs:
- '*'
Use kubectl
to apply the created ClusterRole
to the cluster:
kubectl apply -f spark-role.yaml
Enabling the default Spark Operator in Kaptain
If you installed Kaptain without the Spark Operator enabled, it is possible to enable it after the installation.
Delete the previously created ClusterRole
if it exists:
kubectl delete clusterrole spark-operator-role
Create or update a configuration file named parameters.yaml
to include the following property:
installSparkOperator: "true"
Update the Kaptain instance to enable the Spark Operator:
kubectl kudo update --instance kaptain -P parameters.yaml