# Track models ## What is model tracking? MLflow allows to serialize and deserialize models to a common format, track those models in MLflow Tracking and manage them using MLflow Model Registry. Many popular Machine / Deep Learning frameworks have built-in support through what MLflow calls [flavors](https://www.mlflow.org/docs/latest/models.html#built-in-model-flavors). Even if there is no flavor for your framework of choice, it is easy to [create your own flavor](https://www.mlflow.org/docs/latest/models.html#custom-python-models) and integrate it with MLflow. ## How to track models using MLflow in Kedro project? `kedro-mlflow` introduces two new `DataSet` types that can be used in `DataCatalog` called `MlflowModelTrackingDataset` and `MlflowModelLocalFileSystemDataset`. The two have very similar API, except that: - the ``MlflowModelTrackingDataset`` is used to load from and save to from the mlflow artifact store. It can [load from any given `model_uri`](https://mlflow.org/docs/latest/api_reference/python_api/mlflow.pyfunc.html?highlight=mlflow%20pyfunc%20load_model#mlflow.pyfunc.load_model), including registered model, local models, models tied to a run... - the ``MlflowModelLocalFileSystemDataset`` is used to load from and save to a given local path. It uses the standard `filepath` argument in the constructor of Kedro DataSets. Note that it **does not log in mlflow**. *Note: If you use ``MlflowModelTrackingDataset``, it will be saved as a "LoggedModel" object and will be linked to your current run. However, you will need to specify the run id to predict with (since it is not persisted locally, it will not pick the latest model by default). You may prefer to combine ``MlflowModelLocalFileSystemDataset`` and ``MlflowArtifactDataset`` to make persist it both locally and remotely, see further.* Suppose you would like to register a `scikit-learn` model of your `DataCatalog` in mlflow, you can use the following yaml API: ```yaml my_sklearn_model: type: kedro_mlflow.io.models.MlflowModelTrackingDataset flavor: mlflow.sklearn ``` More informations on available parameters are available in the [dedicated section](https://kedro-mlflow.readthedocs.io/en/latest/source/05_API/01_python_objects/01_Datasets.html#mlflowmodeltrackingdataset). You are now able to use ``my_sklearn_model`` in your nodes. Since this model is registered in mlflow, you can also leverage the [mlflow model serving abilities](https://www.mlflow.org/docs/latest/cli.html#mlflow-models-serve) or [predicting on batch abilities](https://www.mlflow.org/docs/latest/cli.html#mlflow-models-predict), as well as the [mlflow models registry](https://www.mlflow.org/docs/latest/model-registry.html) to manage the lifecycle of this model. ## Frequently asked questions :::{dropdown} How is it working under the hood? **For ``MlflowModelTrackingDataset``** During save, a model object from node output is logged to mlflow using ``log_model`` function of the specified ``flavor``. It is logged as a ``LoggedModel`` mlflow object, and eventually (this will actually always happen if you are using the `kedro run` command, because the `MlflowHook` creates a new run before each pipeline run) linked to your currently active mlflow run. If the `model_uri` is specified, the saving operation will fail. Consequently it will **never be possible to save in a specific mlflow run_id** if you launch a pipeline with the `kedro run` command because the `MlflowHook` creates a new run before each pipeline run. During load, the model is retrieved from the ``model_uri`` if specified, else the previously saved is reloaded. **For ``MlflowModelLocalFileSystemDataset``** During save, a model object from node output is saved locally under specified ``filepath`` using ``save_model`` function of the specified ``flavor``. When model is loaded, the latest version stored locally is read using ``load_model`` function of the specified ``flavor``. You can also load a model from a specific kedro run by specifying the `version` argument to the constructor. ::: :::{dropdown} How can I track a custom MLflow model flavor? To track a custom MLflow model flavor you need to set the `flavor` parameter to import the module of your custom flavor and to specify a [pyfunc workflow](https://mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#pyfunc-create-custom-workflows) which can be set either to `python_model` or `loader_module`. The former is the more high level and user friendly and is [recommend by mlflow](https://mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#which-workflow-is-right-for-my-use-case) while the latter offers more control. We haven't tested the integration in `kedro-mlflow` of this second workflow extensively, and it should be used with caution. ```yaml my_custom_model: type: kedro_mlflow.io.models.MlflowModelTrackingDataset flavor: my_package.custom_mlflow_flavor pyfunc_workflow: python_model # or loader_module ``` ::: ### How can I save model locally and log it in MLflow in one step? :::{dropdown} How can I save model locally and log it in MLflow in one step? If you want to save your model both locally and remotely within the same run, you can leverage `MlflowArtifactDataset`: ```yaml sklearn_model: type: kedro_mlflow.io.artifacts.MlflowArtifactDataset dataset: type: kedro_mlflow.io.models.MlflowModelLocalFileSystemDataset flavor: mlflow.sklearn filepath: data/06_models/sklearn_model ``` This might be useful if you want to always read the lastest model saved locally and log it to MLflow each time the new model is being trained for tracking purpose. :::