Kubeflow Pipelines: from Training to Serving
Introduction
With Kubeflow Pipelines you can build entire workflows that automate the steps involved in going from training a machine learning model to actually serving an optimized version of it. These steps can be triggered automatically by a CI/CD workflow or on demand from a command line or notebook.
Kubeflow Pipelines (kfp
) comes with a user interface for managing and tracking experiments, jobs, and runs.
A pipeline is a description of a machine learning workflow, replete with all inputs and outputs.
In Kubeflow Pipelines, an experiment is a workspace where you can experiment with different configurations of your pipelines.
Experiments are a way to organize runs of jobs into logical groups.
A run is simply a single execution (instance) of a pipeline.
Kubeflow Pipelines also supports recurring runs, which is a repeatable run of a pipeline.
Based on a so-called run trigger an instance of a pipeline with its run configuration is periodically started.
As of now, run triggers are time-based (i.e., not event-based).
In the UI, there is a pictorial representation of the runtime execution of a pipeline. This graph consists of one or more steps (i.e. nodes). Each step, the directed edges (arrows) show the parent/child relationship: A → B means that B depends on A; B cannot start until A has successfully completed.
A component performs a single step in the pipeline (e.g. data ingestion, data preprocessing, data transformation, model training, hyperparameter tuning). It is analogous to a function: it has a name, (metadata) parameters and return values (interface), and a body (implementation). It must therefore be self-contained. Each component must be packaged as a Docker image. Please note that components are independently executed: they do not share the same process and cannot share in-memory data.
What You Will Learn
This notebook trains a simple (MNIST) model in TensorFlow and serves it with KFServing, which is a serverless inference server. What this means is that you do not have to worry about which machines it runs on, networking, autoscaling, health checks, and what have you. Instead, you can focus on what matters to you: the model and a REST API you can call for predictions. If you are familiar with Kubernetes, you can even do out-of-the-box canary deployments, in which a percentage of traffic is directed to the ‘canary (in the coal mine)’ with the latest model to ensure it functions properly before completely rolling out any (potentially problematic) updates.
If you prefer to use a more sophisticated model or a PyTorch-based one, you can check out the relevant notebooks: MNIST with TensorFlow or MNIST with PyTorch.
KFServing reads the model file from MinIO, an open-source S3-compliant object storage tool, which is already included with your Kubeflow installation.
MinIO holds the input data set for the pipeline. This way it can run without a connection to the Internet.
What You Need
This notebook.
Prerequisites
Ensure Kubeflow Pipelines is available:
How to Configure Credentials
In order for KFServing to access MinIO, the credentials must be added to the default service account.
Copy input data set into MinIO using its CLI
First, configure credentials for mc
, the MinIO command line client.
Use it to create a bucket, upload the dataset to it, and set access policy so that the pipeline can download it from MinIO. You may want to change the default bucket names used by this tutorial, since MinIO buckets are global resources shared between all cluster users.
How to Implement Kubeflow Pipelines Components
Components are self-contained pieces of code: Python functions.
Why? Because each component will be packaged as a Docker image. The base image must therefore contain all dependencies. Any dependencies you install manually in the notebook are invisible to the Python function once it is inside the image. The function itself becomes the entrypoint of the image, which is why all auxiliary functions must be defined inside the function. That does cause some unfortunate duplication, but it also means you do not have to worry about the mechanism of packaging.
For the pipeline, define four components:
- Download the MNIST data set
- Train the TensorFlow model
- Evaluate the trained model
- Export the trained model
- Serve the trained model
You will also need the current Kubernetes namespace, which you can retrieve using the following code:
Function arguments specified with InputPath
and OutputPath
are the key to defining dependencies.
For now, it suffices to think of them as the input and output of each step.
How to define dependencies is explained in the next section.
Component 1: Download the MNIST Data Set
Component 2: Train the Model
For both the training and evaluation, divide the integer-valued pixel values by 255 to scale all values into the [0, 1] (floating-point) range.
This function must be copied into both component functions (cf. normalize_image
).
If you wish to learn more about the model code, please have a look at the MNIST with TensorFlow notebook.
Component 3: Evaluate the Model
Evaluate the model with the following Python function. The metrics metadata (loss and accuracy) is available to the Kubeflow Pipelines UI. All metadata can automatically be visualized with output viewer(s).
Component 4: Export the Model
Component 5: Serve the Model
Kubeflow Pipelines comes with a set of pre-defined components which can be imported from GitHub repo and reused across the pipelines without the need to define it every time. A copy of the KFServing component is included with the tutorial to make it work in an air-gapped environment. Here’s what the import looks like:
How to Combine the Components into a Pipeline
Note that up to this point you have not yet used the Kubeflow Pipelines SDK!
With the four components (i.e. self-contained functions) defined, wire up the dependencies with Kubeflow Pipelines.
The call components.func_to_container_op(f, base_image=img)(*args)
has the following ingredients:
f
is the Python function that defines a componentimg
is the base (Docker) image used to package the function*args
lists the arguments tof
What the *args
mean is best explained by going forward through the graph:
downloadOp
is the first step and has no dependencies; it therefore has noInputPath
. Its output (i.e.,OutputPath
) is stored indata_dir
.trainOp
needs the data downloaded fromdownloadOp
and its signature listsdata_dir
(input) andmodel_dir
(output). It depends ondownloadOp.output
(i.e., the previous step’s output) and stores its own outputs inmodel_dir
, which can be used by another step.downloadOp
is the parent oftrainOp
, as required.evaluateOp
's function takes three arguments:data_dir
(i.e.,downloadOp.output
),model_dir
(i.e.,trainOp.output
), andmetrics_path
, which is where the function stores its evaluation metrics. That way,evaluateOp
can only run after the successful completion of bothdownloadOp
andtrainOp
.exportOp
runs the functionexport_model
, which accepts five parameters:model_dir
,metrics
,export_bucket
,model_name
, andmodel_version
. From where do you get themodel_dir
? It is nothing buttrainOp.output
. Similarly,metrics
isevaluateOp.output
. The remaining three arguments are regular Python arguments that are static for the pipeline: they do not depend on any step’s output being available. Hence, they are defined without usingInputPath
.kfservingOp
is loaded from the external component and its order of execution should be specified explicitly by usingkfservingOp.after(evaluateOp)
function which assignsexportOp
as a parent. Just in case it isn’t obvious: this will build the Docker images for you. Each image is based onBASE_IMAGE
and includes the Python functions as executable files. Each component can use a different base image though. This may come in handy if you want to have reusable components for automatic data or model analysis (e.g. to investigate bias).
packages_to_install
) and additional code to execute before the function code (extra_code
) though.
For GPU support, please add the “-gpu” suffix to the base image.
Is that it? Not quite!
That leaves pipeline itself for you to define.
The train_and_serve
function defines dependencies but you must use the Kubeflow Pipelines domain-specific language (DSL) to register the pipeline with its four components:
Submit the pipeline directly from the notebook:
The pipeline is now running. Wait for it to complete successfully. In the meantime you can use the links above to see the pipelines UI.
The graph will look like this:
If there are any issues with the pipeline definition, this is where they would flare up. Until you submit it, you will not know if your pipeline definition is correct.
How to Predict with the Inference Server
The simplest way to check that the inference server is up and running is to check it with curl
( pre-installed on the cluster).
To do so, define a few helper functions for plotting and displaying images:
The inference server expects a JSON payload:
The probabilities for each class (0-9) are shown in the predictions
response.
The model believes the image shows a “9”, which indeed it does!
For more details on the URL, please check out this example.
This tutorial includes code from the MinIO Project (“MinIO”), which is © 2015-2021 MinIO, Inc. MinIO is made available subject to the terms and conditions of the GNU Affero General Public License 3.0. The complete source code for the versions of MinIO packaged with Kaptain 2.0.0 are available at these URLs: https://github.com/minio/minio/tree/RELEASE.2021-02-14T04-01-33Z and https://github.com/minio/minio/tree/RELEASE.2022-02-24T22-12-01Z