HDFS Kerberos
Kerberos is an authentication system that allows Spark to retrieve and write data securely to a Kerberos-enabled HDFS cluster. As of Mesosphere Spark 2.2.0-2
, long-running jobs will renew their delegation tokens (authentication credentials). This section assumes you have previously set up a Kerberos-enabled HDFS cluster.
Spark installation
Spark (and all Kerberos-enabled) components need a valid krb5.conf
configuration file. You can set up the Spark service to use a single krb5.conf
file for all of its drivers. The krb5.conf
file tells Spark how to connect to your Kerberos key distribution center (KDC).
To use Kerberos with Spark:
-
Base64 encode the
krb5.conf
file:cat krb5.conf | base64 -w 0
-
Add the encoded file as a string value into your JSON configuration file:
{ "security": { "kerberos": { "enabled": "true", "krb5conf": "<base64 encoding>" } } }
The JSON configuration will probably also have the
hdfs
parameters similar to the following:{ "service": { "name": "kerberized-", "user": "nobody" }, "hdfs": { "config-url": "http://api.hdfs.marathon.l4lb.thisdcos.directory/v1/endpoints" }, "security": { "kerberos": { "enabled": true, "krb5conf": "<base64_encoding>" } } }
Alternatively, you can specify properties for the
krb5.conf
file with more concise options:{ "security": { "kerberos": { "enabled": true, "kdc": { "hostname": "<kdc_hostname>", "port": <kdc_port> }, "realm": "<kdc_realm>" } } }
-
Install Spark with your custom configuration, here called
options.json
:dcos package install --options=/path/to/options.json
-
Make sure your
keytab
file is in the DC/OS secret store, under a path that is accessible by the Spark service.Since the
keytab
is a binary file, you must also base64 encode it on DC/OS 1.10 or earlier. See Using the Secret Store for details. -
If you are using the history server, you must also configure the
krb5.conf
, principal, and keytab for the history server. Add the Kerberos configurations to your-history
JSON configuration file:{ "service": { "user": "nobody", "hdfs-config-url": "http://api.hdfs.marathon.l4lb.thisdcos.directory/v1/endpoints" }, "security": { "kerberos": { "enabled": true, "krb5conf": "<base64_encoding>", "principal": "<Kerberos principal>", # e.g. @REALM "keytab": "<keytab secret path>" # e.g. -history/hdfs_keytab } } }
Alternatively, you can specify properties of the
krb5.conf
:{ "security": { "kerberos": { "enabled": true, "kdc": { "hostname": "<kdc_hostname>", "port": <kdc_port> }, "realm": "<kdc_realm>", "principal": "<Kerberos principal>", # e.g. @REALM "keytab": "<keytab secret path>" # e.g. -history/hdfs_keytab } } }
-
Make sure all users have write permission to the history HDFS directory. In an HDFS client:
hdfs dfs -chmod 1777 <history directory>
Job submission
To authenticate to a Kerberos KDC, Spark on Mesos supports keytab files and ticket-granting tickets (TGTs). Keytabs are valid indefinitely, while tickets can expire. Keytabs are recommended, especially for long-running streaming jobs.
krb5.conf
with environment variables
Controlling the If you did not specify service.security.kerberos.kdc.hostname
, service.security.kerberos.kdc.port
, and services.security.realm
at install time, but wish to use a templated krb5.conf
on a job submission, you can do this with the following environment variables:
--conf spark.mesos.driverEnv.SPARK_SECURITY_KERBEROS_KDC_HOSTNAME=<kdc_hostname> \
--conf spark.mesos.driverEnv.SPARK_SECURITY_KERBEROS_KDC_PORT=<kdc_port> \
--conf spark.mesos.driverEnv.SPARK_SECURITY_KERBEROS_REALM=<kerberos_realm> \
You can also set the base64 encoded krb5.conf
after install time:
--conf spark.mesos.driverEnv.SPARK_MESOS_KRB5_CONF_BASE64=<krb5.conf_base64_encoding> \
Setting the Spark user
By default, when Kerberos is enabled, Spark runs as the OS user corresponding to the primary of the specified Kerberos principal. For example, the principal “alice@LOCAL” would map to the user name “alice”. If it is known that “alice” is not available as an OS user, either in the Docker image or in the host, the Spark user should be specified as root
or nobody
instead:
--conf spark.mesos.driverEnv.SPARK_USER=<Spark user>
Keytab authentication
Submit the job with the keytab:
dcos run --submit-args="\
--kerberos-principal user@REALM \
--keytab-secret-path //hdfs-keytab \
--conf spark.mesos.driverEnv.SPARK_USER=<user> \
--conf ... --class MySparkJob <url> <args>"
TGT authentication
Submit the job with the ticket:
dcos run --submit-args="\
--kerberos-principal user@REALM \
--tgt-secret-path //tgt \
--conf spark.mesos.driverEnv.SPARK_USER=<user> \
--conf ... --class MySparkJob <url> <args>"
You can access external (non-DC/OS) Kerberos-secured HDFS clusters from Spark on Mesos.
Using Kerberos-secured Kafka
Spark can consume data from a Kerberos-enabled Kafka cluster. Connecting Spark to secure Kafka does not require special installation parameters.
However, the Kafka cluster does require the Spark driver and the Spark executors be able to access the following files:
-
Client Java Authentication and Authorization Service (JAAS) file. This file is provided using Mesos URIS with
--conf spark.mesos.uris=<location_of_jaas>
. -
krb5.conf
for your Kerberos setup. Like HDFS, this file is provided using a base64 encoding of the file:cat krb5.conf | base64 -w 0
The encoded file is assigned to the environment variable
KRB5_CONFIG_BASE64
for the driver and the executors:--conf spark.mesos.driverEnv.KRB5_CONFIG_BASE64=<base64_encoded_string> --conf spark.executorEnv.KRB5_CONFIG_BASE64=<base64_encoded_string>
-
The
keytab
containing the credentials for accessing the Kafka cluster.--conf spark.mesos.containerizer=mesos # required for secrets --conf spark.mesos.driver.secret.names=<keytab> # e.g. /kafka_keytab --conf spark.mesos.driver.secret.filenames=<keytab_file_name> # e.g. kafka.keytab --conf spark.mesos.executor.secret.names=<keytab> # e.g. /kafka_keytab --conf spark.mesos.executor.secret.filenames=<keytab_file_name> # e.g. kafka.keytab
-
Finally, you will need to tell Spark to use the JAAS file:
--conf spark.driver.extraJavaOptions=-Djava.security.auth.login.config=/mnt/mesos/sandbox/<jaas_file> --conf spark.executor.extraJavaOptions=-Djava.security.auth.login.config=/mnt/mesos/sandbox/<jaas_file>