Migration guide between kedro-mlflow versions#

This page explains how to migrate an existing kedro project to a more up to date kedro-mlflow versions with breaking changes.

Migration from 0.13.x to 0.14.x#

Upgrade mlflow to mlflow>=2.7.0.

Migration from 0.12.x to 0.13.x#

Upgrade mlflow to mlflow>=1.30.

Migration from 0.11.x to 0.12.x#

Upgrade your kedro project to kedro>=0.19,<0.20
Rename the following DataSets with the Dataset suffix (without final capitalized S) in your catalog.yml and change names to make them more explicit: | Name in kedro_mlflow<=0.11|Name in kedro_mlflow>=0.12 | |——————————-|————————————-| |MlflowArtifactDataSet |MlflowArtifactDataset | |MlflowAbstractModelDataSet |MlflowAbstractModelDataset | |MlflowModelRegistryDataSet |MlflowModelRegistryDataset | |MlflowMetricDataSet |MlflowMetricDataset | |MlflowMetricHistoryDataSet |MlflowMetricHistoryDataset | |MlflowModelLoggerDataSet |MlflowModelTrackingDataset | |MlflowModelSaverDataSet |MlflowModelLocalFileSystemDataset| |MlflowMetricsDataSet |MlflowMetricsHistoryDataset |
Update your MlflowArtifactDataset catalog entry to rename the data_set key to dataset

my_dataset:
    type: MlflowArtifactDataset
    dataset:
        type: ...

If you use KedroPipelineModel or pipeline_ml_factory, the default copy_mode is now assignbecause this is the most efficient setup (and usually the desired one) when serving a Kedro Pipeline as a Mlflow model. To get back to the previous deepcopy mode, change the entry to:

pipeline_ml_factory(
    training=training_pipeline,
    inference=inference_pipeline,
    kpm_kwargs=dict(copy_mode="deepcopy"),
)

Migration from 0.10.x to 0.11.x#

If you are registering your kedro_mlflow hooks manually (instead of using automatic registering from plugin, which is the default), change your settings.py

from this:

# <your_project>/src/<your_project>/settings.py
from kedro_mlflow.framework.hooks import MlflowHook

HOOKS = (MlflowPipelineHook(), MlflowNodeHook())

to this:

# <your_project>/src/<your_project>/settings.py
from kedro_mlflow.framework.hooks import MlflowHook

HOOKS = (MlflowHook(),)

The get_mlflow_config public method has been removed and the mlflow configuration is now automatically stored in the mlflow attribute of KedroContext. if you need to access the mlflow configuration, you can use:

from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project

bootstrap_project(project_path)
with KedroSession.create(
    project_path=project_path,
) as session:
    context = session.load_context()
    print(context.mlflow)  # this is where mlflow configuration is stored

Remove the server.stores_environment_variables key from mlflow.yml. This is a dead key which was unused. It will now throw an error if it is still written in mlflow.yml.

Migration from 0.9.x to 0.10.x#

You must upgrade your kedro version to kedro>=0.18.1 to use kedro_mlflow>=0.10.

Migration from 0.8.x to 0.9.x#

There are no breaking change in this patch release except if you retrieve the mlflow configuration manually (e.g. in a script or a jupyter notebok). The setup() method needs to be called with context:

from kedro.framework.context import load_context
from kedro_mlflow.config import get_mlflow_config

context = load_context(".")

# the new best practice is just to remove these lines
mlflow_config = get_mlflow_config(context)  # pass context instead of session
mlflow_config.setup(context)  # pass context instead of session

This is not necessary: the mlflow config is automatically set up when the context is loaded, so unless you need to access the config manually you can get rid of these 2 lines

Migration from 0.7.x to 0.8.x#

Update the mlflow.yml configuration file with kedro mlflow init --force command
pipeline_ml_factory(pipeline_ml=<your-pipeline-ml>,...) (resp. KedroPipelineModel(pipeline_ml=<your-pipeline-ml>, ...)) first argument is renamed pipeline. Change the call to pipeline_ml_factory(pipeline=<your-pipeline-ml>) (resp. KedroPipelineModel(pipeline=<your-pipeline-ml>, ...)).
Change the call from pipeline_ml_factory(..., model_signature=<model-signature>, conda_env=<conda-env>, model_name=<model_name>) to ``pipeline_ml_factory(…, log_model_kwargs=dict(signature=, conda_env=, artifact_path=<model_name>}). Notice that the arguments are renamed to match mlflow's and they are passed as a dict in log_model_kwargs`.

Migration from 0.6.x to 0.7.x#

If you are working with kedro==0.17.0, update your template to kedro>=0.17.1.

Migration from 0.5.x to 0.6.x#

kedro==0.16.x is no longer supported. You need to update your project template to kedro==0.17.0 template.

Migration from 0.4.x to 0.5.x#

The only breaking change with the previous release is the format of KedroPipelineMLModel class. Hence, if you saved a pipeline as a Mlflow Model with pipeline_ml_factory in kedro-mlflow==0.4.x, loading it (either with MlflowModelTrackingDataset or mlflow.pyfunc.load_model) with kedro-mlflow==0.5.0 installed will raise an error. You will need either to retrain the model or to load it with kedro-mlflow==0.4.x.

Migration from 0.4.0 to 0.4.1#

There are no breaking change in this patch release except if you retrieve the mlflow configuration manually (e.g. in a script or a jupyter notebok). You must add an extra call to the setup() method:

from kedro.framework.context import load_context
from kedro_mlflow.config import get_mlflow_config

context = load_context(".")
mlflow_config = get_mlflow_config(context)
mlflow_config.setup()  # <-- add this line which did not exists in 0.4.0

Migration from 0.3.x to 0.4.x#

Catalog entries#

Replace the following entries:

old	new
`kedro_mlflow.io.MlflowArtifactDataset`	`kedro_mlflow.io.artifacts.MlflowArtifactDataset`
`kedro_mlflow.io.MlflowMetricsHistoryDataset`	`kedro_mlflow.io.metrics.MlflowMetricsHistoryDataset`

Hooks#

Hooks are now auto-registered if you use kedro>=0.16.4. You can remove the following entry from your run.py:

hooks = (MlflowPipelineHook(), MlflowNodeHook())

KedroPipelineModel#

Be aware that if you have saved a pipeline as a mlflow model with pipeline_ml_factory, retraining this pipeline with kedro-mlflow==0.4.0 will lead to a new behaviour. Let assume the name of your output in the DataCatalog was predictions, the output of a registered model will be modified from:

{
    "predictions":
        {
            "<your model-predictions>"
        }
}

to:

{
    "<your model-predictions>"
}

Thus, parsing the predictions of this model must be updated accordingly.