Migration guide between kedro-mlflow versions#
This page explains how to migrate an existing kedro project to a more up to date kedro-mlflow
versions with breaking changes.
Migration from 0.13.x to 0.14.x#
Upgrade mlflow to mlflow>=2.7.0
.
Migration from 0.12.x to 0.13.x#
Upgrade mlflow to mlflow>=1.30
.
Migration from 0.11.x to 0.12.x#
Upgrade your kedro project to
kedro>=0.19,<0.20
Rename the following
DataSet
s with theDataset
suffix (without final capitalized S) in yourcatalog.yml
and change names to make them more explicit: | Name inkedro_mlflow<=0.11
|Name inkedro_mlflow>=0.12
| |——————————-|————————————-| |MlflowArtifactDataSet
|MlflowArtifactDataset
| |MlflowAbstractModelDataSet
|MlflowAbstractModelDataset
| |MlflowModelRegistryDataSet
|MlflowModelRegistryDataset
| |MlflowMetricDataSet
|MlflowMetricDataset
| |MlflowMetricHistoryDataSet
|MlflowMetricHistoryDataset
| |MlflowModelLoggerDataSet
|MlflowModelTrackingDataset
| |MlflowModelSaverDataSet
|MlflowModelLocalFileSystemDataset
| |MlflowMetricsDataSet
|MlflowMetricsHistoryDataset
|Update your
MlflowArtifactDataset
catalog entry to rename thedata_set
key todataset
my_dataset:
type: MlflowArtifactDataset
dataset:
type: ...
If you use
KedroPipelineModel
orpipeline_ml_factory
, the defaultcopy_mode
is nowassign
because this is the most efficient setup (and usually the desired one) when serving a Kedro Pipeline as a Mlflow model. To get back to the previousdeepcopy
mode, change the entry to:
pipeline_ml_factory(
training=training_pipeline,
inference=inference_pipeline,
kpm_kwargs=dict(copy_mode="deepcopy"),
)
Migration from 0.10.x to 0.11.x#
If you are registering your
kedro_mlflow
hooks manually (instead of using automatic registering from plugin, which is the default), change yoursettings.py
from this:
# <your_project>/src/<your_project>/settings.py
from kedro_mlflow.framework.hooks import MlflowHook
HOOKS = (MlflowPipelineHook(), MlflowNodeHook())
to this:
# <your_project>/src/<your_project>/settings.py
from kedro_mlflow.framework.hooks import MlflowHook
HOOKS = (MlflowHook(),)
The
get_mlflow_config
public method has been removed and the mlflow configuration is now automatically stored in themlflow
attribute ofKedroContext
. if you need to access the mlflow configuration, you can use:
from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project
bootstrap_project(project_path)
with KedroSession.create(
project_path=project_path,
) as session:
context = session.load_context()
print(context.mlflow) # this is where mlflow configuration is stored
Remove the
server.stores_environment_variables
key frommlflow.yml
. This is a dead key which was unused. It will now throw an error if it is still written inmlflow.yml
.
Migration from 0.9.x to 0.10.x#
You must upgrade your kedro version to kedro>=0.18.1
to use kedro_mlflow>=0.10
.
Migration from 0.8.x to 0.9.x#
There are no breaking change in this patch release except if you retrieve the mlflow configuration manually (e.g. in a script or a jupyter notebok). The setup()
method needs to be called with context
:
from kedro.framework.context import load_context
from kedro_mlflow.config import get_mlflow_config
context = load_context(".")
# the new best practice is just to remove these lines
mlflow_config = get_mlflow_config(context) # pass context instead of session
mlflow_config.setup(context) # pass context instead of session
This is not necessary: the mlflow config is automatically set up when the context is loaded, so unless you need to access the config manually you can get rid of these 2 lines
Migration from 0.7.x to 0.8.x#
Update the
mlflow.yml
configuration file withkedro mlflow init --force
commandpipeline_ml_factory(pipeline_ml=<your-pipeline-ml>,...)
(resp.KedroPipelineModel(pipeline_ml=<your-pipeline-ml>, ...)
) first argument is renamedpipeline
. Change the call topipeline_ml_factory(pipeline=<your-pipeline-ml>)
(resp.KedroPipelineModel(pipeline=<your-pipeline-ml>, ...)
).Change the call from
pipeline_ml_factory(..., model_signature=<model-signature>, conda_env=<conda-env>, model_name=<model_name>)
to ``pipeline_ml_factory(…, log_model_kwargs=dict(signature=, conda_env= , artifact_path=<model_name>}) . Notice that the arguments are renamed to match mlflow's and they are passed as a dict in
log_model_kwargs`.
Migration from 0.6.x to 0.7.x#
If you are working with kedro==0.17.0
, update your template to kedro>=0.17.1
.
Migration from 0.5.x to 0.6.x#
kedro==0.16.x
is no longer supported. You need to update your project template to kedro==0.17.0
template.
Migration from 0.4.x to 0.5.x#
The only breaking change with the previous release is the format of KedroPipelineMLModel
class. Hence, if you saved a pipeline as a Mlflow Model with pipeline_ml_factory
in kedro-mlflow==0.4.x
, loading it (either with MlflowModelTrackingDataset
or mlflow.pyfunc.load_model
) with kedro-mlflow==0.5.0
installed will raise an error. You will need either to retrain the model or to load it with kedro-mlflow==0.4.x
.
Migration from 0.4.0 to 0.4.1#
There are no breaking change in this patch release except if you retrieve the mlflow configuration manually (e.g. in a script or a jupyter notebok). You must add an extra call to the setup()
method:
from kedro.framework.context import load_context
from kedro_mlflow.config import get_mlflow_config
context = load_context(".")
mlflow_config = get_mlflow_config(context)
mlflow_config.setup() # <-- add this line which did not exists in 0.4.0
Migration from 0.3.x to 0.4.x#
Catalog entries#
Replace the following entries:
old | new |
---|---|
kedro_mlflow.io.MlflowArtifactDataset |
kedro_mlflow.io.artifacts.MlflowArtifactDataset |
kedro_mlflow.io.MlflowMetricsHistoryDataset |
kedro_mlflow.io.metrics.MlflowMetricsHistoryDataset |
Hooks#
Hooks are now auto-registered if you use kedro>=0.16.4
. You can remove the following entry from your run.py
:
hooks = (MlflowPipelineHook(), MlflowNodeHook())
KedroPipelineModel#
Be aware that if you have saved a pipeline as a mlflow model with pipeline_ml_factory
, retraining this pipeline with kedro-mlflow==0.4.0
will lead to a new behaviour. Let assume the name of your output in the DataCatalog
was predictions
, the output of a registered model will be modified from:
{
"predictions":
{
"<your model-predictions>"
}
}
to:
{
"<your model-predictions>"
}
Thus, parsing the predictions of this model must be updated accordingly.