Table of Contents
- kaptain.config
- kaptain.envs
- kaptain.exceptions
- kaptain.model
- kaptain.model.models
- kaptain.model.frameworks
- kaptain.model.states
- kaptain.hyperparameter
- kaptain.hyperparameter.algorithms
- kaptain.hyperparameter.domains
- kaptain.platform.config
- kaptain.platform.config.provider
- kaptain.platform.config.certificates
- kaptain.platform.config.docker
- kaptain.platform.config.defaults
- kaptain.platform.config.s3
kaptain.config
Config Objects
class Config()
__init__
| __init__(docker_config_provider: ConfigurationProvider, storage_config_provider: ConfigurationProvider, docker_registry_url: Optional[str] = None, docker_registry_certificate_provider: Optional[ConfigurationProvider] = None, base_dir: str = os.getcwd(), base_model_storage_uri: str = "s3://kaptain/models")
Encapsulates platform-specific configuration such as access credentials or AWS
endpoints. Config
is provided as an argument to the Model
and is used to instantiate
concrete implementations of lower-level components based on its properties so that users
work with a configuration-based API when it comes to fine-tuning the workloads.
Arguments:
docker_config_provider
: the configuration provider for Docker registry.storage_config_provider
: the configuration provider for blob storage access. Currently, only S3 and MinIO are supported.docker_registry_url
: private custom Docker registry URL to use with provided TLS certificates.docker_registry_certificate_provider
: the configuration provider for Docker registry certificate.base_dir
: base directory to use for referencing relative file paths of model files. Defaults to current working directory.base_model_storage_uri
: name of a bucket in the remote storage (MinIO or S3) to store the model. Defaults to ‘s3://kaptain/models’
kaptain.envs
_M Objects
class _M(types.ModuleType)
the environment variables that can change anytime by the user
VERBOSE
| @property
| VERBOSE() -> bool
this environment variable (KAPTAIN_SDK_VERBOSE) will enable showing pod logs unless overridden to not
VERBOSE
| @VERBOSE.setter
| VERBOSE(value: bool) -> None
this environment variable (KAPTAIN_SDK_VERBOSE) will enable showing pod logs unless overridden to not
DEBUG
| @property
| DEBUG() -> bool
this environment variable (KAPTAIN_SDK_DEBUG) will show stacktrace for uncaught exceptions
LOG_TIMEFORMAT
| @property
| LOG_TIMEFORMAT() -> str
this environment variable (KAPTAIN_SDK_LOG_TIMEFORMAT) will set the time format to show in logs
kaptain.exceptions
InvalidModelProperty Objects
class InvalidModelProperty(Exception)
Raised when a model property is None or blank.
UndefinedModelProperty Objects
class UndefinedModelProperty(Exception)
Raised when a model property is not defined.
UnsupportedModelFrameworkException Objects
class UnsupportedModelFrameworkException(Exception)
Raised when a model framework is not supported.
UnsupportedAlgorithmException Objects
class UnsupportedAlgorithmException(Exception)
Raised when a hyperparameter tuning algorithm is not supported.
UnsupportedModelDeploymentException Objects
class UnsupportedModelDeploymentException(Exception)
Raised when a model deployment is not supported.
UnsupportedMetricsTypeException Objects
class UnsupportedMetricsTypeException(Exception)
Raised when a metric type is not supported.
ModelDeploymentException Objects
class ModelDeploymentException(Exception)
Raised in case of a model deployment failure.
ModelValidationException Objects
class ModelValidationException(Exception)
Raised in case the model configuration properties are missing or model is in a state that is unsuitable for the operation invoked on the model.
ImageBuildException Objects
class ImageBuildException(Exception)
Raised in case of a image build failure.
WorkloadDeploymentError Objects
class WorkloadDeploymentError(Exception)
Raised in case of a workload deployemnt failure, e.g. failed scheduling
kaptain.model
kaptain.model.models
Model Objects
class Model()
__init__
| __init__(id: str, name: str, description: str, version: str, framework: str, framework_version: str, main_file: str, image_name: str, base_image: str, extra_files: Optional[List[str]] = None, requirements: Optional[str] = None, labels: Optional[List[str]] = None, config: Optional[Config] = None)
A representation of a machine learning model.
When the model is created for the first time, its internal revision is set to a random UUID and its internal state is “untrained”. Once the model is trained or tuned, its state will be updated accordingly, hyperparameter values set, its revision refreshed, and it can be saved or deployed. Each action (train, tune, deploy) alters the revision and is stored in the model tracking database.
Arguments:
id
: Unique identifier of model, e.g. “dev/mnist”. It is recommended to include the stage of the model (e.g. dev/prod) in the name to make it easier to filter models under active development and in production.name
: Short name of the model, e.g. “MNIST”. This name is visible in the model tracking database.description
: Description of the model, e.g. “Digit recognition for MNIST data set”. This description is visible in the model tracking database.version
: Model version, e.g. “4.5”main_file
: Main (Python) file that contains the executable model code, e.g. “trainer.py”.image_name
: Name of the repository to push the resulting image, e.g. ‘kaptain/mnist’ Can also contain image tag, e.g. “kaptain/mnist:0.0.1-tensorflow-2.2.0”.extra_files
: Auxiliary files, e.g. [“utils.py”, “data_loader.py”].requirements
: Additional pip requirements, e.g. [“numpy”, “nltk==3.5”]framework
: Machine learning library or framework used for the model, e.g. “tensorflow”.framework_version
: Machine learning library or framework version used by model, e.g. “2.3.2”base_image
: Base container image, e.g. “tensorflow-2.3.2”labels
: Custom labels for deployment-related metadata, e.g. “dev/mnist-tensorflow”config
: Configuration object used for configuring access to Docker registries and blob storage.
hyperparameters
| @property
| hyperparameters() -> Optional[Dict[str, Any]]
Hyperparameters of the model as defined through an action:
- Train: uses the static values provided to the training procedure.
- Tune: extracts the recommended values after running multiple experiments.
build
| build(verbose: Optional[bool] = None) -> None
Builds a Docker image with the model training code and dependencies and publishes it to the registry specified in the configuration. Label with checksum of the model’s content will be included in the image. Image rebuilding is triggered only if an image with the same name and checksum is not already present in the registry.
Arguments:
verbose
: Enable verbose output (can also be set via environment variable KAPTAIN_SDK_VERBOSE).
train
| train(*args: str, *, hyperparameters: Dict[str, Any], gpus: Optional[int] = None, cpu: Optional[str] = None, memory: Optional[str] = None, resources: Optional[Resources] = None, workers: int = 2, verbose: Optional[bool] = None, **kwargs: str, ,) -> bool
Train a model in a distributed manner.
Simple / advanced resource API
Resources may be specified via the ‘simple’ resource parameters::
model.train(workers=1, cpu=1, memory="2G", gpus=0)
… the model training process will have both the request and limit set for all resource parameters.
More fine-grained resource specification is possible via the ‘resources’ parameter::
model.train(workers=workers, resources=Resources(cpu_request=1, memory_limit="2G", gpu_limit=gpus))
It is illegal to specify both the ‘resources’ parameter or any ‘simple’ resource parameters (gpus, memory, cpu).
Arguments:
args
: Arguments to be passed to the training function.hyperparameters
: Dictionary of hyperparameter values.workers
: Number of parallel workers to use (default: 2).gpus
: Number of GPUs to use (default: 0).memory
: Amount of memory for each worker (optional),cpu
: Number of CPUs to use for each worker (optional).resources
: Advanced API for resource specification. Do not use in tandem with the parameters gpus, memory and cpu (optional).verbose
: Enable verbose output (can also be set via environment variable KAPTAIN_SDK_VERBOSE).kwargs
: Keyword arguments to be passed to the training function.
Returns:
True if successful, otherwise False
tune
| tune(*args: str, *, hyperparameters: Dict[str, Domain], objectives: List[str], objective_goal: Optional[float] = None, objective_type: str = "maximize", workers: int = 2, gpus: Optional[int] = None, cpu: Optional[str] = None, memory: Optional[str] = None, resources: Optional[Resources] = None, trials: int = 16, parallel_trials: int = 2, failed_trials: int = 4, algorithm: Optional[str] = Algorithm.RANDOM.value, algorithm_setting: Optional[dict] = None, verbose: Optional[bool] = None, **kwargs: str, ,) -> bool
Tunes a model with parallel trials and possibly distributed trials.
Simple / advanced resource API
Resources may be specified via the ‘simple’ resource parameters::
model.tune(hyperparameters=params, objectives=objectives, cpu=1, memory="2G", gpus=0)
… the deployed tuning process will have both the request and limit set for all resource parameters.
More fine-grained resource specification is possible via the ‘resources’ parameter::
model.tune(
hyperparameters=params,
objectives=objectives,
resources=Resources(cpu_request=1, memory_limit="2G", gpu_limit=gpus))
It is illegal to specify both the ‘resources’ parameter or any ‘simple’ resource parameters (gpus, memory, cpu).
Arguments:
args
: Arguments to be passed to the training/tuning function.hyperparameters
: Dictionary of hyperparameters and their specified domains.objectives
: List of metrics to track in order of importance. The first one listed is used in conjunction with the objective goal and type.objective_goal
: Main objective’s goal, which when reached causes the tuning to stop. The main objective is the first element inobjectives
. If None, the tuning will continue until the maximum number oftrials
has been reached.objective_type
: Whether to “maximize” or “minimize” the main objective’s value (default: maximize).workers
: Number of parallel workers to use for each trial (default: 2).gpus
: Number of GPUs to use (default: 0).memory
: Amount of memory for each worker (optional),cpu
: Number of CPUs to use for each worker (optional).resources
: Advanced API for resource specification. Do not use in tandem with the parameters gpus, memory and cpu (optional).trials
: Maximum number of trials (default: 16).parallel_trials
: Maximum number of trials to run in parallel (default: 2).failed_trials
: Maximum number of failed trials before hyperparameter tuning stops (default: 4).algorithm
: Algorithm to use for hyperparameter search (default: random).algorithm_setting
: Algorithm settings. Please see https://www.kubeflow.org/docs/components/hyperparameter-tuning/experiment/ for details.verbose
: Enable verbose output (can also be set via environment variable KAPTAIN_SDK_VERBOSE).kwargs
: Keyword arguments to be passed to the training/tuning function.
Returns:
True if successful, otherwise False
deploy
| deploy(model_uri: Optional[str] = None, autoscale: int = 2, gpus: Optional[int] = None, cpu: Optional[str] = None, memory: Optional[str] = None, resources: Optional[Resources] = None, replace: bool = False, **kwargs: str, ,) -> bool
Deploys a model.
Simple / advanced resource API
Resources may be specified via the ‘simple’ resource parameters::
model.deploy(model_uri=uri, cpu=1, memory=“2G”, gpus=0)
… the deployed model process will have both the request and limit set for all resource parameters.
More fine-grained resource specification is possible via the ‘resources’ parameter::
model.deploy(model_uri=uri, resources=Resources(cpu_request=1, memory_limit=“2G”, gpu_limit=gpus))
It is illegal to specify both the ‘resources’ parameter or any ‘simple’ resource parameters (gpus, memory, cpu).
Arguments:
model_uri
: URI of the saved model to be loaded. If None, the default location managed by Kaptain is chosen based on the most recent state of the model.autoscale
: Target concurrency (default: 2).gpus
: Number of GPUs to use (default: 0).memory
: Amount of memory for each worker (optional),cpu
: Number of CPUs to use for each worker (optional).resources
: Advanced API for resource specification. Do not use in tandem with the parameters gpus, memory and cpu (optional).replace
: Safety flag to avoid accidental redeployment of the model. If True, the previously deployed model will be replaced. If False, an error will be logged in case the model had been previously deployed.kwargs
: Keyword arguments for the deployment.
Returns:
True if successful, otherwise False
deploy_canary
| deploy_canary(canary_traffic_percentage: int, model_uri: Optional[str] = None, **kwargs: str, ,) -> None
Deploys a model in a canary with a pre-determined percentage of traffic. A canary deployment allows a model to be run in parallel with a baseline or previous model revision. This allows traffic to be split, so the latest revision can be checked for possible issues with model (e.g. compared to the baseline) or system (e.g. latency) performance. To deploy a model to the canary, a previously deployed model revision must exist.
To deploy canary with 30 percent traffic:
model.deploy_canary(canary_traffic_percentage=30)
To change the canary traffic percentage to 50 (half the traffic):
model.deploy_canary(canary_traffic_percentage=50)
To deploy canary with 30 percent traffic and specified saved model location:
model.deploy_canary(canary_traffic_percentage=30, model_uri=uri)
To change the canary traffic percentage to 50 (half the traffic) for a model deployed from a specified saved location:
model.deploy_canary(canary_traffic_percentage=50, model_uri=uri)
Arguments:
canary_traffic_percentage
: the percentage of traffic to route to the canary model.model_uri
: URI of the saved model to be loaded. If None, the default location managed by Kaptain is chosen based on the most recent state of the model.
rollback_canary
| rollback_canary() -> None
Undeploy the model from canary and switch 100% of traffic to the previously deployed baseline model.
:raises: ModelDeploymentException if canary deployment doesn’t exist.
promote_canary
| promote_canary() -> None
Promote the model from canary to server 100% of traffic.
:raises: ModelDeploymentException if canary deployment doesn’t exist.
undeploy
| undeploy() -> None
Removes existing deployment and canary deployment of a model.
:raises: ModelDeploymentException in case the model was not previously deployed
log_data
| log_data(name: str, uri: str, description: Optional[str] = None, features: Optional[List[str]] = None, version: Optional[str] = None) -> None
Logs an input data set to a model execution.
Arguments:
name
: Name of the data set.uri
: URI of the data set.description
: Optional description.features
: List of features used.version
: Optional version of the data set.
log_metrics
| log_metrics(metrics: dict, metrics_type: str, uri: Optional[str] = None) -> None
Logs model evaluation metrics to a model execution.
Arguments:
metrics
: A dictionary of metrics names and their values, e.g. {“accuracy”, 0.95, “auc”: 0.975}.metrics_type
: Evaluation type of the metric: training, testing, validation, or production (for deployed models).uri
: Optional URI to the metrics (e.g. log directory).
kaptain.model.frameworks
ModelFramework Objects
class ModelFramework(Enum)
of
| @staticmethod
| of(framework: Optional[str]) -> Optional["ModelFramework"]
Converts a framework (string) to a ModelFramework enum.
Arguments:
framework
: Model framework or library.
Returns:
ModelFramework
enum if the framework is supported.
kaptain.model.states
kaptain.hyperparameter
kaptain.hyperparameter.algorithms
Algorithm Objects
class Algorithm(Enum)
of
| @staticmethod
| of(algorithm: Optional[str]) -> Optional["Algorithm"]
Converts a hyperparameter tuning algorithm (string) to an Algorithm enum.
Arguments:
algorithm
: Model framework or library.
Returns:
Algorithm
enum if the algorithm is supported.
kaptain.hyperparameter.domains
Double Objects
class Double(Domain)
__init__
| __init__(min: float, max: float)
Defines a floating-point (double) hyperparameter with domain [min, max]
Arguments:
min
: Minimum valuemax
: Maximum value
Integer Objects
class Integer(Domain)
__init__
| __init__(min: int, max: int)
Defines an integer (int) hyperparameter with domain [min, max]
Arguments:
min
: Minimum valuemax
: Maximum value
Discrete Objects
class Discrete(Domain)
Defines an discrete hyperparameter with a list of possible values of floats
Arguments:
values
: List of allowed floating-point values
Categorical Objects
class Categorical(Domain)
Defines an integer hyperparameter with a list of possible values of strings
Arguments:
values
: List of allowed string values
kaptain.platform.config
kaptain.platform.config.provider
ConfigurationProvider Objects
class ConfigurationProvider(ABC)
The ConfigurationProvider interface defines high-level functions for translating user-provided credentials for a Docker registry or cloud buckets into Kubernetes Secrets required for distributed building, training, tuning, and serving components.
FileBasedConfigurationProvider Objects
class FileBasedConfigurationProvider(ConfigurationProvider)
The FileBasedConfigurationProvider defines a factory method for creating instances of ConfigurationProvider from provided configuration file specific for the concrete implementation.
EnvironmentVariableConfigurationProvider Objects
class EnvironmentVariableConfigurationProvider(ConfigurationProvider)
The EnvironmentVariableConfigurationProvider defines a factory method for creating instances of ConfigurationProvider from environment variables specific for the concrete implementation.
kaptain.platform.config.certificates
DockerRegistryCertificateProvider Objects
class DockerRegistryCertificateProvider(FileBasedConfigurationProvider)
__init__
| __init__(certificate_body: str, ceritifcate_path: Optional[str] = None)
Docker Registry Certificate Provider is a container for private Docker registries running with custom/self-signed TLS certificates which are required for pushing Docker images containing model training code.
Docker Registry Certificate Provider by default loads the configuration from
$HOME/.tls/certificate.crt
. It is also possible to specify a custom registry
certificate.crt
location using
DockerRegistryCertificateProvider.from_file(path=/path/to/certificate.crt).
Docker Registry certificate.crt
file can be created ad-hoc while using a notebook or
mounted to the notebook from a Secret. To support mounting of a shared Docker certificate.crt
as a volume, the system administrator must create the PodDefault
resource with a
certificate file to make it available for the user.
Arguments:
certificate_body
: The configuration string in json formatcertificate_path
: Path to the certificate file (optional)
kaptain.platform.config.docker
DockerConfigurationProvider Objects
class DockerConfigurationProvider(FileBasedConfigurationProvider)
__init__
| __init__(config_json: str)
Docker Configuration Provider is a container for user Docker configuration which are required for pulling and pushing images used in training and tuning jobs.
Docker Configuration Provider supports standard Docker config.json file of the following format:
{
"auths": {
"https://index.docker.io/v1/": {
"auth": "<username and password in base64>"
}
}
}
The auth
field is a base64-encoded string of the form “auth
field, use the following command:
echo -n "<username>:<password>" | base64
.
Docker Configuration Provider by default loads the configuration from
$HOME/.docker/config.json
. It is also possible to specify a custom config.json location
using DockerConfigurationProvider.from_file(path=/path/to/config.json).
Docker config.json
file can be created ad-hoc while using a notebook or mounted to the
notebook from a Secret. To support mounting of a shared Docker config.json as a volume,
the system administrator must create the PodDefault
resource with a pre-populated file
to make it available for the user.
Arguments:
config_json
: The configuration string in json format
kaptain.platform.config.defaults
kaptain.platform.config.s3
S3ConfigurationProvider Objects
class S3ConfigurationProvider(FileBasedConfigurationProvider, EnvironmentVariableConfigurationProvider)
__init__
| __init__(aws_access_key_id: str, aws_secret_access_key: str, aws_session_token: Optional[str] = None, region_name: str = _DEFAULT_REGION, s3_endpoint: Optional[str] = None, s3_signature_version: Optional[str] = None, s3_force_path_style: bool = False)
S3-specific configuration provider which supports reading configuration from AWS
configuration file and from environment variables. The provider can be used as a
configuration object, or for convenience resolution of the configuration both on the
development side and in containers when configuration is passed in form of environment
variables from Kubernetes Secrets
.
Constructor arguments represent a subset of
[boto3 configuration properties]
(https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html)
sufficient for kaptain
.
Arguments:
aws_access_key_id
: The access key to authenticate with S3.aws_secret_access_key
: The secret key to authenticate with S3.aws_session_token
: The session token to authenticate with S3.region_name
: The name of AWS region.s3_endpoint
: The complete URL of S3 endpoint. This parameter is required when working with non-standard, S3-compatible storage solutions such as MinIO. It should be set to a resolvable address of the running server.s3_signature_version
: The signature version when signing requestss3_force_path_style
: When enabled, the clients will use path style instead of URL style for accessing buckets
get_secret_body
| get_secret_body() -> Dict[str, str]
Transforms the configuration properties into a dict of environment variables. The
resulting dict will be used for creating Kubernetes Secret
to securely share access
credentials between containers.
Returns:
dict of environment variables with associated values