# Pipeline serving with kedro-mlflow

## Introduction to Mlflow Models

[Mlflow Models are a standardised agnostic format to store machine learning models](https://www.mlflow.org/docs/latest/models.html). They intend to be standalone to be as portable as possible to be deployed virtually anywhere and mlflow provides built-in CLI commands to deploy a mlflow model to most common cloud platforms or to create an API.

A Mlflow Model is composed of:
- a ``MLModel`` file which is a configuration file to indicate to mlflow how to load the model. This file may also contain the ``Signature`` of the model (i.e. the ``Schema`` of the input and output of your model, including the columns names and order) as well as example data.  
- a ``conda.yml`` file which contains the specifications of the virtual conda environment inside which the model should run. It contains the packages versions necessary for your model to be executed.
- a ``model.pkl`` (or a ``python_function.pkl`` for custom model) file containing the trained model.  
- an ``artifacts`` folder containing all other data necessary to execute the models

Mlflow enable to create custom models "flavors" to convert any object to a Mlflow Model providing we have these informations. Inside a Kedro prpojects, the ``Pipeline`` and ``DataCatalog`` objects contains all these informations: as a consequence, it is easy to create a custom model to convert entire Kedro ``Pipeline``s to mlflow models.

## Pre-requisite for serving a pipeline

You can log any Kedro ``Pipeline`` matching the following requirements:
- one of its input must be a ``pandas.DataFrame``, a ``spark.DataFrame`` or a ``numpy.array``. This is the **input which contains the data to predict on**. This can be any Kedro ``AbstractDataset`` which loads data in one of the previous three formats. It can also be a ``MemoryDataset`` and not be persisted in the ``catalog.yml``.
- all its other inputs must be persisted on disk (e.g. if the machine learning model must already be trained and saved so we can export it).

```{note}
If the pipeline has parameters, they will be persisted before exporting the model, which implies that you will not be able to modify them at runtime. This is a limitation of ``mlflow<2.6.0``, recently relaxed and that will be adressed by https://github.com/Galileo-Galilei/kedro-mlflow/issues/445.
```