Datasets#
Artifact Dataset#
- class kedro_mlflow.io.artifacts.mlflow_artifact_dataset.MlflowArtifactDataset(dataset: str | dict, run_id: str = None, artifact_path: str = None, credentials: dict[str, Any] = None, metadata: dict[str, Any] | None = None)#
Bases:
AbstractVersionedDatasetThis class is a wrapper for any kedro AbstractDataset. It decorates their
savemethod to log the dataset in mlflow whensaveis called.- __init__(filepath: PurePosixPath, version: Version | None, exists_function: Callable[[str], bool] | None = None, glob_function: Callable[[str], list[str]] | None = None)#
Creates a new instance of
AbstractVersionedDataset.- Parameters:
filepath – Filepath in POSIX format to a file.
version – If specified, should be an instance of
kedro.io.core.Version. If itsloadattribute is None, the latest version will be loaded. If itssaveattribute is None, save version will be autogenerated.exists_function – Function that is used for determining whether a path exists in a filesystem.
glob_function – Function that is used for finding all paths in a filesystem, which match a given pattern.
- load() Any#
MlflowArtifactDataset is a factory for DataSet and consequently does not implements abtracts methods
- save(data: Any) None#
MlflowArtifactDataset is a factory for DataSet and consequently does not implements abtracts methods
Metrics Dataset#
- class kedro_mlflow.io.metrics.mlflow_metric_dataset.MlflowMetricDataset(key: str = None, run_id: str = None, load_args: dict[str, Any] = None, save_args: dict[str, Any] = None, metadata: dict[str, Any] | None = None)#
Bases:
MlflowAbstractMetricDataset- DEFAULT_SAVE_MODE = 'overwrite'#
- SUPPORTED_SAVE_MODES = {'append', 'overwrite'}#
- __init__(key: str = None, run_id: str = None, load_args: dict[str, Any] = None, save_args: dict[str, Any] = None, metadata: dict[str, Any] | None = None)#
Initialise MlflowMetricDataset. :param run_id: The ID of the mlflow run where the metric should be logged :type run_id: str
- load() None#
Loads data by delegation to the provided load method.
- Returns:
Data returned by the provided load method.
- Raises:
DatasetError – When underlying load method raises error.
- save(data: float) None#
Saves data by delegation to the provided save method.
- Parameters:
data – the value to be saved by provided save method.
- Raises:
DatasetError – when underlying save method raises error.
FileNotFoundError – when save method got file instead of dir, on Windows.
NotADirectoryError – when save method got file instead of dir, on Unix.
- class kedro_mlflow.io.metrics.mlflow_metric_history_dataset.MlflowMetricHistoryDataset(key: str = None, run_id: str = None, load_args: dict[str, Any] = None, save_args: dict[str, Any] = None, metadata: dict[str, Any] | None = None)#
Bases:
MlflowAbstractMetricDataset- __init__(key: str = None, run_id: str = None, load_args: dict[str, Any] = None, save_args: dict[str, Any] = None, metadata: dict[str, Any] | None = None)#
Initialise MlflowMetricDataset. :param run_id: The ID of the mlflow run where the metric should be logged :type run_id: str
- load() None#
Loads data by delegation to the provided load method.
- Returns:
Data returned by the provided load method.
- Raises:
DatasetError – When underlying load method raises error.
- save(data: list[int] | dict[int, float] | list[dict[str, float | str]]) None#
Saves data by delegation to the provided save method.
- Parameters:
data – the value to be saved by provided save method.
- Raises:
DatasetError – when underlying save method raises error.
FileNotFoundError – when save method got file instead of dir, on Windows.
NotADirectoryError – when save method got file instead of dir, on Unix.
- class kedro_mlflow.io.metrics.mlflow_metrics_history_dataset.MlflowMetricsHistoryDataset(run_id: str = None, prefix: str | None = None, metadata: dict[str, Any] | None = None)#
Bases:
AbstractDatasetThis class represent MLflow metrics dataset.
- __init__(run_id: str = None, prefix: str | None = None, metadata: dict[str, Any] | None = None)#
Initialise MlflowMetricsHistoryDataset.
- Parameters:
prefix (Optional[str]) – Prefix for metrics logged in MLflow.
run_id (str) – ID of MLflow run.
- load() dict[str, dict[str, float] | list[dict[str, float]]]#
Load MlflowMetricDataSet.
- Returns:
dictionary with MLflow metrics dataset.
- Return type:
dict[str, Union[int, float]]
- property run_id#
Get run id.
If active run is not found, tries to find last experiment.
Raise DatasetError exception if run id can’t be found.
- Returns:
String contains run_id.
- Return type:
str
- save(data: dict[str, dict[str, float] | list[dict[str, float]]]) None#
Save given MLflow metrics dataset and log it in MLflow as metrics.
- Parameters:
data (Metricsdict) – MLflow metrics dataset.
Models Dataset#
- class kedro_mlflow.io.models.mlflow_abstract_model_dataset.MlflowAbstractModelDataSet(filepath: str, flavor: str, pyfunc_workflow: str | None = None, load_args: dict[str, Any] = None, save_args: dict[str, Any] = None, version: Version = None, metadata: dict[str, Any] | None = None)#
Bases:
AbstractVersionedDatasetAbstract mother class for model datasets.
- __init__(filepath: str, flavor: str, pyfunc_workflow: str | None = None, load_args: dict[str, Any] = None, save_args: dict[str, Any] = None, version: Version = None, metadata: dict[str, Any] | None = None) None#
Initialize the Kedro MlflowAbstractModelDataSet.
Parameters are passed from the Data Catalog.
During save, the model is first logged to MLflow. During load, the model is pulled from MLflow run with run_id.
- Parameters:
filepath (str) – Path to store the dataset locally.
flavor (str) – Built-in or custom MLflow model flavor module. Must be Python-importable.
pyfunc_workflow (str, optional) – Either python_model or loader_module. See https://www.mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#workflows.
load_args (dict[str, Any], optional) – Arguments to load_model function from specified flavor. Defaults to {}.
save_args (dict[str, Any], optional) – Arguments to log_model function from specified flavor. Defaults to {}.
version (Version, optional) – Specific version to load.
metadata – Any arbitrary metadata. This is ignored by Kedro, but may be consumed by users or external plugins.
- Raises:
DatasetError – When passed flavor does not exist.
- class kedro_mlflow.io.models.mlflow_model_tracking_dataset.MlflowModelTrackingDataset(flavor: str, pyfunc_workflow: str | None = None, load_args: dict[str, Any] | None = None, save_args: dict[str, Any] | None = None, metadata: dict[str, Any] | None = None)#
Bases:
MlflowAbstractModelDataSetWrapper for saving, logging and loading for all MLflow model flavor.
- __init__(flavor: str, pyfunc_workflow: str | None = None, load_args: dict[str, Any] | None = None, save_args: dict[str, Any] | None = None, metadata: dict[str, Any] | None = None) None#
Initialize the Kedro MlflowModelDataSet.
Parameters are passed from the Data Catalog.
During save, the model is first logged to MLflow. During load, the model is pulled from MLflow through its model_id.
- Parameters:
flavor (str) – Built-in or custom MLflow model flavor module. Must be Python-importable. ex: “mlflow.sklearn”, “mlflow.pyfunc…”
pyfunc_workflow (str, optional) – Either python_model or loader_module. See https://www.mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#workflows.
load_args (dict[str, Any], optional) – Arguments to load_model function from specified flavor, see mlflow documentation. Defaults to None.
save_args (dict[str, Any], optional) – Arguments to log_model function from specified flavor, see mlflow documentation. Default to None, it is recommended to specify ‘name’.
metadata – Any arbitrary metadata. This is ignored by Kedro, but may be consumed by users or external plugins.
- Raises:
DatasetError – When passed flavor does not exist.
- load() LoggedModel#
Loads an MLflow model from local path or from MLflow run.
- Returns:
Deserialized model.
- Return type:
LoggedModel
- property model_uri: str | None#
- save(model: Any) None#
Save a model to local path and then logs it to MLflow.
- Parameters:
model (Any) – A model object supported by the given MLflow flavor.
- class kedro_mlflow.io.models.mlflow_model_local_filesystem_dataset.MlflowModelLocalFileSystemDataset(filepath: str, flavor: str, pyfunc_workflow: str | None = None, load_args: dict[str, Any] = None, save_args: dict[str, Any] = None, log_args: dict[str, Any] = None, version: Version = None, metadata: dict[str, Any] | None = None)#
Bases:
MlflowAbstractModelDataSetWrapper for saving, logging and loading for all MLflow model flavor.
- __init__(filepath: str, flavor: str, pyfunc_workflow: str | None = None, load_args: dict[str, Any] = None, save_args: dict[str, Any] = None, log_args: dict[str, Any] = None, version: Version = None, metadata: dict[str, Any] | None = None) None#
Initialize the Kedro MlflowModelDataSet.
Parameters are passed from the Data Catalog.
During save, the model is saved locally at filepath During load, the model is loaded from the local filepath.
- Parameters:
flavor (str) – Built-in or custom MLflow model flavor module. Must be Python-importable.
filepath (str) – Path to store the dataset locally.
pyfunc_workflow (str, optional) – Either python_model or loader_module. See https://www.mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#workflows.
load_args (dict[str, Any], optional) – Arguments to load_model function from specified flavor. Defaults to None.
save_args (dict[str, Any], optional) – Arguments to save_model function from specified flavor. Defaults to None.
version (Version, optional) – Kedro version to use. Defaults to None.
metadata – Any arbitrary metadata. This is ignored by Kedro, but may be consumed by users or external plugins.
- Raises:
DatasetError – When passed flavor does not exist.
- load() Any#
Loads an MLflow model from local path or from MLflow run.
- Returns:
Deserialized model.
- Return type:
Any
- save(model: Any) None#
Save a model to local path and then logs it to MLflow.
- Parameters:
model (Any) – A model object supported by the given MLflow flavor.
- class kedro_mlflow.io.models.mlflow_model_registry_dataset.MlflowModelRegistryDataset(model_name: str, stage_or_version: str | int | None = None, alias: str | None = None, flavor: str | None = 'mlflow.pyfunc', pyfunc_workflow: str | None = 'python_model', load_args: dict[str, Any] | None = None, metadata: dict[str, Any] | None = None)#
Bases:
MlflowAbstractModelDataSetWrapper for saving, logging and loading for all MLflow model flavor.
- __init__(model_name: str, stage_or_version: str | int | None = None, alias: str | None = None, flavor: str | None = 'mlflow.pyfunc', pyfunc_workflow: str | None = 'python_model', load_args: dict[str, Any] | None = None, metadata: dict[str, Any] | None = None) None#
Initialize the Kedro MlflowModelRegistryDataset.
Parameters are passed from the Data Catalog.
During “load”, the model is pulled from MLflow model registry by its name. “save” is not supported.
- Parameters:
model_name (str) – The name of the registered model is the mlflow registry
stage_or_version (str) – A valid stage (either “staging” or “production”) or version number for the registred model. Default to “latest” which fetch the last version and the higher “stage” available.
flavor (str) – Built-in or custom MLflow model flavor module. Must be Python-importable.
pyfunc_workflow (str, optional) – Either python_model or loader_module. See https://www.mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#workflows.
load_args (dict[str, Any], optional) – Arguments to load_model function from specified flavor. Defaults to None.
metadata – Any arbitrary metadata. This is ignored by Kedro, but may be consumed by users or external plugins.
- Raises:
DatasetError – When passed flavor does not exist.
- load() Any#
Loads an MLflow model from local path or from MLflow run.
- Returns:
Deserialized model.
- Return type:
Any
- save(model: Any) None#
Saves data by delegation to the provided save method.
- Parameters:
data – the value to be saved by provided save method.
- Raises:
DatasetError – when underlying save method raises error.
FileNotFoundError – when save method got file instead of dir, on Windows.
NotADirectoryError – when save method got file instead of dir, on Unix.