You can customize the Docker image in which Spark runs by extending the standard Spark Docker image. In this way, you can install your own libraries, such as a custom Python library.
To customize your Docker image:
-
In your Dockerfile, extend from one of the standard Spark images and add your customizations:
FROM mesosphere/spark:2.12.0-3.0.1-scala-2.12-hadoop-3.2 RUN apt-get install -y python-pip RUN pip install requests
-
Build an image from the customized Dockerfile.
docker build -t username/image:tag . docker push username/image:tag
-
Reference the custom Docker image with the
--docker-image
option when running a Spark job.dcos spark run --docker-image=myusername/myimage:v1 --submit-args="http://external.website/mysparkapp.py 30"