Pipeline serving with kedro-mlflow
Introduction to Mlflow Models
Mlflow Models are a standardised agnostic format to store machine learning models. They intend to be standalone to be as portable as possible to be deployed virtually anywhere and mlflow provides built-in CLI commands to deploy a mlflow model to most common cloud platforms or to create an API.
A Mlflow Model is composed of:
a
MLModelfile which is a configuration file to indicate to mlflow how to load the model. This file may also contain theSignatureof the model (i.e. theSchemaof the input and output of your model, including the columns names and order) as well as example data.a
conda.ymlfile which contains the specifications of the virtual conda environment inside which the model should run. It contains the packages versions necessary for your model to be executed.a
model.pkl(or apython_function.pklfor custom model) file containing the trained model.an
artifactsfolder containing all other data necessary to execute the models
Mlflow enable to create custom models “flavors” to convert any object to a Mlflow Model providing we have these informations. Inside a Kedro prpojects, the Pipeline and DataCatalog objects contains all these informations: as a consequence, it is easy to create a custom model to convert entire Kedro Pipelines to mlflow models.
Pre-requisite for serving a pipeline
You can log any Kedro Pipeline matching the following requirements:
one of its input must be a
pandas.DataFrame, aspark.DataFrameor anumpy.array. This is the input which contains the data to predict on. This can be any KedroAbstractDatasetwhich loads data in one of the previous three formats. It can also be aMemoryDatasetand not be persisted in thecatalog.yml.all its other inputs must be persisted on disk (e.g. if the machine learning model must already be trained and saved so we can export it).
Note: if the pipeline has parameters, they will be persisted before exporting the model, which implies that you will not be able to modify them at runtime. This is a limitation of mlflow.