Kubeflow Fairing: Build Docker Images from within Jupyter Notebooks
Introduction
Although you can build Docker images by downloading files to your local machine and subsequently pushing the images to a container registry, it is much faster to do so without leaving Jupyter! Kubeflow Fairing makes that possible.
What You Will Learn
In this notebook you will go through the steps involved in building a Docker image from a base image (e.g. TensorFlow or PyTorch) and a custom trainer file that defines your machine learning model.
This image can be used for distributed training or hyperparameter tuning.
You can use the model code you generated with %%writefile
in MNIST with TensorFlow tutorial or MNIST with PyTorch tutorial or a file of your own choosing.
The Docker image builder process stores (temporary) files in MinIO. MinIO, an open-source S3-compliant object storage tool, is already included with your Kubeflow installation.
What You Need
- An executable Python file (e.g. an
mnist.py
trainer - you can extract it from Pytorch tutorial in case you do not have one handy). - A container registry to which you have push access.
Please note that this notebook is interactive!
Prerequisites
Kubeflow Fairing must be installed:
%%sh
pip show kubeflow-fairing
Prepare the training code and datasets
The examples in this tutorial require a trainer code file mnist.py
and a dataset to be present in the current folder.
The code and datasets are already available in MNIST with TensorFlow
or MNIST with PyTorch tutorials and can be reused here. Run one of the following shortcuts to copy the required files.
TensorFlow
%%sh
set -o errexit
jq -j '.cells[] | select (.metadata.tags[]? | contains("trainer_code")) | .source[]' tensorflow-tutorial/MNIST\ with\ TensorFlow.ipynb | sed '1d' > mnist.py
cp -R tensorflow-tutorial/datasets .
PyTorch
%%sh
set -o errexit
jq -j '.cells[] | select (.metadata.tags[]? | contains("trainer_code")) | .source[]' pytorch-tutorial/MNIST\ with\ PyTorch.ipynb | sed '1d' > mnist.py
cp -R pytorch-tutorial/datasets .
How to Create a Docker Credentials File and Kubernetes Secret
For the tutorial you will need getpass
to provide a password interactively without it being immediately visible.
It is a standard Python library, so there is no need to install it.
A simple import
will suffice.
import json
import getpass
Please type in the container registry username by running the next cell:
docker_user = input()
Please enter the password for the container registry by executing the following cell:
docker_password = getpass.getpass()
With these details, base-64-encode the username and password and create a Kubernetes configmap with a name expected by the builder’s context source.
from base64 import b64encode
docker_credentials = b64encode(f"{docker_user}:{docker_password}".encode()).decode()
js = {"auths": {"https://index.docker.io/v1/": {"auth": docker_credentials}}}
%store json.dumps(js) >config.json
%%sh
kubectl create configmap docker-config --from-file=config.json
How to Set up MinIO
from kubeflow.fairing import constants
from kubeflow.fairing.builders.cluster.minio_context import MinioContextSource
s3_endpoint = "minio.kubeflow"
s3_endpoint_url = f"http://{s3_endpoint}"
s3_secret_id = "minio"
s3_secret_key = "minio123"
s3_region = "us-east-1"
# The default Kaniko version (0.14.0) does not work with Kubeflow Fairing
constants.constants.KANIKO_IMAGE = "gcr.io/kaniko-project/executor:v0.19.0"
minio_context_source = MinioContextSource(
endpoint_url=s3_endpoint_url,
minio_secret=s3_secret_id,
minio_secret_key=s3_secret_key,
region_name=s3_region,
)
How to Build a Docker Image
Let us set some constants that will customize the code for your environment.
If you have your own container registry, please prepend it in REGISTRY
.
The IMAGE_NAME
contains the name of the image that will be built and pushed to the REGISTRY
.
REGISTRY = "mesosphere"
IMAGE_NAME = "kubeflow"
EPOCHS = "3"
Depending on whether you want the training code to run on a node with a GPU, set USE_GPU
to True
or False
.
USE_GPU = True
This, among other things determines where to run the training pods:
if USE_GPU:
from kubeflow.fairing.kubernetes import utils as k8s_utils
POD_SPEC_MUTATORS = [k8s_utils.get_resource_mutator(gpu=1)]
else:
POD_SPEC_MUTATORS = None
Next, pick the base image depending on whether your mnist.py
file is based on TensorFlow or PyTorch.
If using GPUs, you need to use the -gpu
variant that contains the necessary libraries.
BASE_IMAGE = "mesosphere/kubeflow:1.2.0-tensorflow-2.4.0" + ("-gpu" if USE_GPU else "")
BASE_IMAGE = "mesosphere/kubeflow:1.2.0-pytorch-1.7.1" + ("-gpu" if USE_GPU else "")
This tutorial describes two options of using Kubeflow Fairing:
- If your goal is to run a distributed training job immediately from a notebook, choose Option 1. With it, you build (and push) the image as a part of a deployment (e.g. distributed training job).
- If your goal is to provide a Docker image that includes the code for distributed training or hyperparameter tuning, Option 2 is more appropriate. It does not run the job (with pre-defined arguments) but merely pushes the image to the container registry.
Both options automatically push the image to the registry specified.
containers:
- name: <name>
image: <docker-image-built-with-kubeflow-fairing>
command:
- python
- -u
- mnist.py
args:
- --epochs
- "7"
...
Option 1: Build-Push-Run
Multiple input files (e.g. a trainer and utilities) can be provided in the input_files
list.
There can be only one executable
file.
Within command
you must include all the mandatory arguments (i.e. epochs
):
from kubeflow import fairing
import glob
fairing.config.set_preprocessor(
"python",
command=["python", "-u", "mnist.py", "--epochs", str(EPOCHS)],
input_files=["mnist.py"] + glob.glob("datasets/**", recursive=True),
path_prefix="/",
executable="mnist.py",
)
fairing.config.set_builder(
name="cluster",
registry=REGISTRY,
base_image=BASE_IMAGE,
image_name=IMAGE_NAME,
context_source=minio_context_source,
)
TensorFlow
The primary configuration options are the chief and worker counts, but feel free to peruse all available parameters of the tfjob
deployer.
If your model code is based on PyTorch, please skip this section!
fairing.config.set_deployer(
name="tfjob",
worker_count=2,
chief_count=1,
pod_spec_mutators=POD_SPEC_MUTATORS,
stream_log=False,
)
fairing.config.run()
PyTorch
The main configuration options are the master and worker counts, but you can see all options of the pytorchjob
deployer.
If your model code is based on TensorFlow, please skip this section!
fairing.config.set_deployer(
name="pytorchjob",
worker_count=2,
master_count=1,
pod_spec_mutators=POD_SPEC_MUTATORS,
)
fairing.config.run()
Option 2: Build-and-Push
You can ‘just’ build a Docker image, that is, without running it by plugging it into a Kubeflow Fairing workflow, with the following snippet.
from kubeflow.fairing.builders import cluster
from kubeflow.fairing.preprocessors import base as base_preprocessor
import glob
preprocessor = base_preprocessor.BasePreProcessor(
input_files=["mnist.py"] + glob.glob("datasets/**", recursive=True), path_prefix="/", executable="mnist.py"
)
cluster_builder = cluster.cluster.ClusterBuilder(
registry=REGISTRY,
base_image=BASE_IMAGE,
preprocessor=preprocessor,
image_name=IMAGE_NAME,
context_source=minio_context_source,
)
cluster_builder.build()
image_tag = cluster_builder.image_tag
print(f"Published Docker image with tag: {image_tag}")
Since the image is not run immediately, there is no need to specify a deployer
.
That is done with a YAML specification.
Leave out the command
in the preprocessor, since Kubeflow Fairing does not set the entrypoint or executable command in the Docker image.
You have to manually do that in the specification.