Version metrics

What is metric tracking?

MLflow defines a metric as “a (key, value) pair, where the value is numeric”. Each metric can be updated throughout the course of the run (for example, to track how your model’s loss function is converging), and MLflow records and lets you visualize the metric’s full history”.

How to version metrics in a kedro project?

kedro-mlflow introduces 3 AbstractDataset to manage metrics:

MlflowMetricDataset which can log a float as a metric
MlflowMetricHistoryDataset which can log the evolution over time of a given metric, e.g. a list or a dict of float.
MlflowMetricsHistoryDataset. It is a wrapper around a dictionary with metrics which is returned by node and log metrics in MLflow.

Saving a single float as a metric with `MlflowMetricDataset`

The MlflowMetricDataset is an AbstractDataset which enable to save or load a float as a mlflow metric. You must specify the key (i.e. the name to display in mlflow) when creating the dataset. Somes examples follow:

The most basic usage is to create the dataset and save a a value:

from kedro_mlflow.io.metrics import MlflowMetricDataset

metric_ds = MlflowMetricDataset(key="my_metric")
with mlflow.start_run():
    metric_ds.save(
        0.3
    )  # create a "my_metric=0.3" value in the "metric" field in mlflow UI

Warning

Unlike mlflow default behaviour, if there is no active run, no run is created.

You can also specify a run_id instead of logging in the active run:

from kedro_mlflow.io.metrics import MlflowMetricDataset

metric_ds = MlflowMetricDataset(key="my_metric", run_id="123456789")
with mlflow.start_run():
    metric_ds.save(
        0.3
    )  # create a "my_metric=0.3" value in the "metric" field of the run 123456789

It is also possible to pass load_args and save_args to control which step should be logged (in case you have logged several step for the same metric.) save_args accepts a mode key which can be set to overwrite (mlflow default) or append. In append mode, if no step is specified, saving the metric will “bump” the last existing step to create a linear history. This is very useful if you have a monitoring pipeline which calculates a metric frequently to check the performance of a deployed model.

from kedro_mlflow.io.metrics import MlflowMetricDataset

metric_ds = MlflowMetricDataset(
    key="my_metric", load_args={"step": 1}, save_args={"mode": "append"}
)

with mlflow.start_run():
    metric_ds.save(0)  # step 0 stored for "my_metric"
    metric_ds.save(0.1)  # step 1 stored for "my_metric"
    metric_ds.save(0.2)  # step 2 stored for "my_metric"

    my_metric = metric_ds.load()  # value=0.1 (step number 1)

Since it is an AbstractDataset, it can be used with the YAML API in your catalog.yml, e.g. :

my_model_metric:
    type: kedro_mlflow.io.metrics.MlflowMetricDataset
    run_id: 123456 # OPTIONAL, you should likely let it empty to log in the current run
    key: my_awesome_name # OPTIONAL: if not provided, the dataset name will be sued (here "my_model_metric")
    load_args:
        step: ... # OPTIONAL: likely not provided, unless you have a very good reason to do so
    save_args:
        step: ... # OPTIONAL: likely not provided, unless you have a very good reason to do so
        mode: append #  OPTIONAL: likely better than the default "overwrite". Will be ignored if "step" is provided.

Saving the evolution of a metric during training with `MlflowMetricHistoryDataset`

The MlflowMetricDataset is an AbstractDataset which enable to save or load the evolutionf of a metric with various formats. You must specify the key (i.e. the name to display in mlflow) when creating the dataset. Somes examples follow:

It enables logging either:

a list of int as a metric with incremental step, e.g [0.1,0.2,0.3] with mode=list for either save_args or load_args

from kedro_mlflow.io.metrics import MlflowMetricHistoryDataset

metric_history_ds = MlflowMetricDataset(key="my_metric", save_args={"mode": "list"})

with mlflow.start_run():
    metric_history_ds.save([0.1, 0.2, 0.3])  # will be logged with incremental steps

a dict of {step: value} as a metric:

from kedro_mlflow.io.metrics import MlflowMetricHistoryDataset

metric_history_ds = MlflowMetricDataset(key="my_metric", save_args={"mode": "dict"})

with mlflow.start_run():
    metric_history_ds.save(
        {0: 0.1, 1: 0.2, 2: 0.3}
    )  # will be logged with incremental steps

a list of dict [{log_metric_arg: value}] as a metric, e.g:

from kedro_mlflow.io.metrics import MlflowMetricHistoryDataset

metric_history_ds = MlflowMetricDataset(key="my_metric", save_args={"mode": "history"})

with mlflow.start_run():
    metric_history_ds.save(
        [
            {"step": 0, "value": 0.1, "timestamp": 1345545},
            {"step": 1, "value": 0.2, "timestamp": 1345546},
            {"step": 2, "value": 0.3, "timestamp": 1345547},
        ]
    )

You can combine the different mode for save and load, e.g:

from kedro_mlflow.io.metrics import MlflowMetricHistoryDataset

metric_history_ds = MlflowMetricDataset(
    key="my_metric", save_args={"mode": "dict"}, save_args={"mode": "list"}
)

with mlflow.start_run():
    metric_history_ds.save(
        {0: 0.1, 1: 0.2, 2: 0.3}
    )  # will be logged with incremental steps
metric_history_ds.load()  # return [0.1,0.2,0.3]

As usual, since it is an AbstractDataset, it can be used with the YAML API in your catalog.yml, and in this case, the key argument is optional:

my_model_metric:
    type: kedro_mlflow.io.metrics.MlflowMetricHistoryDataset
    run_id: 123456 # OPTIONAL, you should likely let it empty to log in the current run
    key: my_awesome_name # OPTIONAL: if not provided, the dataset name will be used (here "my_model_metric")
    load_args:
        mode: ... # OPTIONAL: "list" by default, one of {"list", "dict", "history"}
    save_args:
        mode: ... # OPTIONAL: "list" by default, one of {"list", "dict", "history"}

Saving several metrics with their entire history with `MlflowMetricsHistoryDataset`

Since it is an AbstractDataset, it can be used with the YAML API. You can define it in your catalog.yml as:

my_model_metrics:
    type: kedro_mlflow.io.metrics.MlflowMetricsHistoryDataset

You can provide a prefix key, which is useful in situations like when you have multiple nodes producing metrics with the same names which you want to distinguish. If you are using the v, it will handle that automatically for you by giving as prefix metrics data set name. In the example above the prefix would be my_model_metrics.

Let’s look at an example with custom prefix:

my_model_metrics:
    type: kedro_mlflow.io.metrics.MlflowMetricsHistoryDataset
    prefix: foo

How to return metrics from a node?

Let assume that you have node which doesn’t have any inputs and returns dictionary with metrics to log:

def metrics_node() -> Dict[str, Union[float, List[float]]]:
    return {
        "metric1": {"value": 1.1, "step": 1},
        "metric2": [{"value": 1.1, "step": 1}, {"value": 1.2, "step": 2}],
    }

As you can see above, kedro_mlflow.io.metrics.MlflowMetricsHistoryDataset can take metrics as:

Dict[str, key]
List[Dict[str, key]]

To store metrics we need to define metrics dataset in Kedro Catalog:

my_model_metrics:
    type: kedro_mlflow.io.metrics.MlflowMetricsHistoryDataset

Within a kedro run, the MlflowHook will automatically prefix the metrics datasets with their name in the catalog. In our example, the metrics will be stored in Mlflow with the following keys: my_model_metrics.metric1, my_model_metrics.metric2.

It is also prossible to provide a prefix manually:

my_model_metrics:
    type: kedro_mlflow.io.metrics.MlflowMetricsHistoryDataset
    prefix: foo

which would result in metrics logged as foo.metric1 and foo.metric2.

As any entry in the catalog, the metrics data set must be defined in a Kedro pipeline:

def create_pipeline() -> Pipeline:
    return Pipeline(
        node(
            func=metrics_node,
            inputs=None,
            outputs="my_model_metrics",
            name="log_metrics",
        )
    )