# Motivation

## When should I use kedro-mlflow?

Basically, you should use `kedro-mlflow` in **any `Kedro` project which involves machine learning** / deep learning. As stated in the [introduction](./01_introduction.md), `Kedro`'s current versioning (as of version `0.16.6`) is not sufficient for machine learning projects: it lacks a UI and a ``run`` management system. Besides, the `KedroPipelineModel` ability to serve a kedro pipeline as an API or a batch in one line of code is a great addition for collaboration and transition to production.

If you do not use ``Kedro`` or if you do pure data processing which do not involve *machine learning*, this plugin is not what you are seeking for ;)

## Why should I use kedro-mlflow?

### Benchmark of existing solutions

This paragraph gives a (quick) overview of existing solutions for mlflow integration inside Kedro projects.

``Mlflow`` is very simple to add to any existing code. It is a 2-step process:

- add `log_{XXX}` (either param, artifact, metric or model) functions where they are needed inside the code
- add a `MLProject` at the root of the project to enable CLI execution. This file must contain all the possible execution steps (like the `pipeline.py` / `hooks.py`  in a kedro project).

Including mlflow inside a ``kedro project`` is consequently very easy: the logging functions can be added in the code, and the ``MLProject`` is very simple and is composed almost only of the ``kedro run`` command. You can find examples of such implementations:

- the [medium paper](https://medium.com/quantumblack/deploying-and-versioning-data-pipelines-at-scale-942b1d81b5f5) by QuantumBlack employees.
- the associated [github repo](https://github.com/tgoldenberg/kedro-mlflow-example)
- other examples can be found on Github, but AFAIK all of them follow the very same principles.

### Enforcing Kedro principles

Above implementations have the advantage of being very straightforward and *mlflow compliant*, but they break several ``Kedro`` principles:

- the ``MLFLOW_TRACKING_URI`` which registers the database where runs are logged is declared inside the code instead of a configuration file, which **hinders portability across environments** and makes transition to production more difficult
- the logging of different elements can be put in many places in the ``Kedro`` template (in the code of any function involved in a ``node``, in a ``Hook``, in  the ``ProjectContext``, in a ``transformer``...). This is not compliant with the ``Kedro`` template where any object has a dedicated location. We want to avoid the logging to occur anywhere because:
  - it is **very error-prone** (one can forget to log one parameter)
  - it is **hard to modify** (if you want to remove / add / modify an mlflow action you have to find it in the code)
  - it **prevents reuse** (re-usable function must not contain mlflow specific code unrelated to their functional specificities, only their execution must be tracked).

``kedro-mlflow`` enforces these best practices while implementing a clear interface for each mlflow action in Kedro template. Below chart maps the mlflow action to perform with the Python API provided by ``kedro-mlflow`` and the location in Kedro template where the action should be performed.

| Mlflow action             | Template file   | Python API                                                 |
| :------------------------ | :-------------- | :--------------------------------------------------------- |
| Set up configuration      | ``mlflow.yml``  | ``MlflowHook``                                             |
| Logging parameters        | ``mlflow.yml``  | ``MlflowHook``                                             |
| Logging artifacts         | ``catalog.yml`` | ``MlflowArtifactDataset``                                  |
| Logging models            | ``catalog.yml`` | `MlflowModelTrackingDataset` and `MlflowModelLocalFileSystemDataset` |
| Logging metrics           | ``catalog.yml`` | ``MlflowMetricsHistoryDataset``                                   |
| Logging Pipeline as model | ``hooks.py``    | ``KedroPipelineModel`` and ``pipeline_ml_factory``         |

`kedro-mlflow` does not currently provide interface to set tags outside a Kedro ``Pipeline``. Some of above decisions are subject to debate and design decisions (for instance, metrics are often updated in a loop during each epoch / training iteration and it does not always make sense to register the metric between computation steps, e.g. as a an I/O operation after a node run).

```{note}
You do **not** need any ``MLProject`` file to use mlflow inside your Kedro project. As seen in the [introduction](./01_introduction.md), this file overlaps with Kedro configuration files.
```