scikit-learn
models with Kaptain SDK
Developing and deploying Introduction
This notebook shows how to build and deploy ML models by using different popular machine learning frameworks. This example will be based on scikit-learn
library, but because model building will be handled by Kubernetes Job, you can choose to use any other ML library (that can be installed with pip
). Currently, the SDK supports XGBoost, Onnx and LightGBM as well as PyTorch and TensorFlow for model deployment. See Kaptain documentation for more info.
To perform distributed training on a cluster’s resources, conduct experiments with multiple parallel trials to obtain the best hyperparameters, and deploying a trained or tuned model typically requires additional steps, such as building a Docker image and providing framework-specific specifications for Kubernetes. This places the burden on each data scientist to learn all the details of all the components.
Instead of doing all the work, using the Kaptain SDK, you can train
, tune
, and deploy
from within a notebook without having to worry about framework specifics, Kubeflow-native SDKs, or even thinking about Kubernetes
What You Will Learn
The Kaptain SDK provides a data science-friendly user experience from within a notebook that hides all the specifics, and focuses on the model as the main abstraction. A model can be trained, tuned, deployed, and tracked.
The example is based on scikit-learn, but it works equally for other data science frameworks. The original scikit-learn example code can be found in the tutorials Recognizing hand-written digits.
The SDK relies on MinIO, an open-source S3-compliant object storage tool, that is already included with your Kaptain installation.
What You Will Need
All you need is this notebook.
Prerequisites
Before proceeding, check you are using the correct notebook image, that is, scikit-learn is available:
How to Create a Docker Credentials File and Kubernetes Secret
For the tutorial you will need getpass
to provide a password interactively without it being immediately visible.
It is a standard Python library, so there is no need to install it.
A simple import
will suffice.
Please type in the container registry username by running the next cell:
Enter the password for the Docker container registry when prompted by executing the following code:
With these details, base64-encode the username and password and create a Docker configuration file as follows:
Define the trainer file
This step writes a file that will be run inside of the training container rather than than inside of the notebook kernel. It takes the form of a parameterized training script. The training parameters default inside the training program with the user-defined values being passed in when the container is run. The script is responsible for accessing any datasets that are required for the training run into the container.
To use the Kaptain SDK, you need to add code to do three things to the original model code:
- In the block at the end of main() to save the trained model to the cluster’s built-in object storage, MinIO.
- Also in the block at the end of main(), to record the metrics of interest.
- Parameters that will change need to be retrieved from the CLI
Describe the model
The central abstraction of the Kaptain SDK is a Model
class that encapsulates all the configuration and high-level APIs required for the model training, tuning, and serving. Prior to creating an instance of the Model
class, let’s consider an example where we need to specify additional dependecies required for model training, tuning and serving, and also provide minimal required configuration for the model server.
Model dependencies can be provided via pip requirements.txt
. For example:
Below is an example of the minimally required serving configuration for a scikit-learn model. To learn more about advanced configuration options, consult the Kaptain SDK documentation.
Finally, a Model
instance requires providing a base Docker image (base_image
) to use for model training, a target Docker repository and image name (image_name
) to publish trainer code to (packed in a Docker image too), and a list of additional files that are required for the model code or the trainer code (extra_files
).
The id
is a unique identifier of the model.
The identifier shown indicates it is an MNIST model in development.
The fields member
and description
are for humans: to inform your colleagues and yourself of what the model is about.
version
is the models’ own version, so it is easy to identify models by their iteration.
The framework
and framework_version
are for the time being human-friendly metadata.
Since a Docker image is built in the background when you train
or tune
a Model
instance, a base_image
has to be provided.
The name of the final image image_name
must be provided with or without image tag.
If the tag is omitted, a concatenation of model id
, framework
, and framework_version
is used.
The main_file
specifies the name of file that contains the model code, that is, trainer.py
for the purposes of this tutorial.
To specify additional Python packages required for training or serving, provide the path to your requirements file via the requirements
parameter of the Model
class. Details on the format of the requirements file can be found in the pip official documentation.
More details are available with ?Model
.
Train/tune the model
Training the model is as easy as the following function call:
The default gpus
argument is 0, but it is shown here as an explicit option.
Use ?Model.train
to see all supported arguments.
The low accuracy of the model is to make the demonstration of distributed training quicker, as in the next section the model’s hyperparameters are optimized anyway.
Run an experiment
Verify the Model is Exported to MinIO
Deploy the Model
Model is ready to be deployed. A trained model can be deployed as an auto-scalable inference service with a single call. When providing additional serving dependencies, make sure to specify sufficient resources (mostly memory) in order for the server to install them without issues.
In this tutorial, we will be using V2 protocol for sklearn predictor.
Test the Model Endpoint
We can see that the class with label “7” has the largest probability; the neural network correctly predicts the image to be a number 7.
This tutorial includes code from the MinIO Project (“MinIO”), which is © 2015-2021 MinIO, Inc. MinIO is made available subject to the terms and conditions of the GNU Affero General Public License 3.0. The complete source code for the versions of MinIO packaged with Kaptain 2.0.0 are available at these URLs: https://github.com/minio/minio/tree/RELEASE.2021-02-14T04-01-33Z and https://github.com/minio/minio/tree/RELEASE.2022-02-24T22-12-01Z