@@ -72,6 +72,13 @@ You can use this package as part of your MLOps toolkit or platform (e.g., Model
7272 - [ Programming] ( #programming )
7373 - [ Language: Python] ( #language-python )
7474 - [ Version: Pyenv] ( #version-pyenv )
75+ - [ Observability] ( #observability )
76+ - [ Reproducibility: Mlflow Project] ( #reproducibility-mlflow-project )
77+ - [ Monitoring : Mlflow Evaluate] ( #monitoring--mlflow-evaluate )
78+ - [ Alerting: Plyer] ( #alerting-plyer )
79+ - [ Lineage: Mlflow Dataset] ( #lineage-mlflow-dataset )
80+ - [ Explainability: SHAP] ( #explainability-shap )
81+ - [ Infrastructure: Mlflow System Metrics] ( #infrastructure-mlflow-system-metrics )
7582- [ Tips] ( #tips )
7683 - [ AI/ML Practices] ( #aiml-practices )
7784 - [ Data Catalog] ( #data-catalog )
@@ -150,10 +157,10 @@ job:
150157 KIND : TrainingJob
151158 inputs :
152159 KIND : ParquetReader
153- path : data/inputs .parquet
160+ path : data/inputs_train .parquet
154161 targets :
155162 KIND : ParquetReader
156- path : data/targets .parquet
163+ path : data/targets_train .parquet
157164` ` `
158165
159166This config file instructs the program to start a ` TrainingJob` with 2 parameters:
@@ -173,6 +180,8 @@ $ poetry run [package] confs/tuning.yaml
173180$ poetry run [package] confs/training.yaml
174181$ poetry run [package] confs/promotion.yaml
175182$ poetry run [package] confs/inference.yaml
183+ $ poetry run [package] confs/evaluations.yaml
184+ $ poetry run [package] confs/explanations.yaml
176185` ` `
177186
178187In production, you can build, ship, and run the project as a Python package :
@@ -210,7 +219,7 @@ You can invoke the actions from the [command-line](https://www.pyinvoke.org/) or
210219
211220` ` ` bash
212221# execute the project DAG
213- $ inv dags
222+ $ inv projects
214223# create a code archive
215224$ inv packages
216225# list other actions
@@ -231,13 +240,16 @@ $ inv --list
231240- **cleans.coverage** - Clean the coverage tool.
232241- **cleans.dist** - Clean the dist folder.
233242- **cleans.docs** - Clean the docs folder.
243+ - **cleans.environment** - Clean the project environment file.
234244- **cleans.folders** - Run all folders tasks.
235245- **cleans.mlruns** - Clean the mlruns folder.
236246- **cleans.mypy** - Clean the mypy tool.
237247- **cleans.outputs** - Clean the outputs folder.
238248- **cleans.poetry** - Clean poetry lock file.
239249- **cleans.pytest** - Clean the pytest tool.
250+ - **cleans.projects** - Run all projects tasks.
240251- **cleans.python** - Clean python caches and bytecodes.
252+ - **cleans.requirements** - Clean the project requirements file.
241253- **cleans.reset** - Run all tools, folders, and sources tasks.
242254- **cleans.ruff** - Clean the ruff tool.
243255- **cleans.sources** - Run all sources tasks.
@@ -251,8 +263,6 @@ $ inv --list
251263- **containers.build** - Build the container image with the given tag.
252264- **containers.compose** - Start up docker compose.
253265- **containers.run** - Run the container image with the given tag.
254- - **dags.all (dags)** - Run all DAG tasks.
255- - **dags.job** - Run the project for the given job name.
256266- **docs.all (docs)** - Run all docs tasks.
257267- **docs.api** - Document the API with pdoc using the given format and output directory.
258268- **docs.serve** - Serve the API docs with pdoc using the given format and computer port.
@@ -267,6 +277,10 @@ $ inv --list
267277- **mlflow.serve** - Start mlflow server with the given host, port, and backend uri.
268278- **packages.all (packages)** - Run all package tasks.
269279- **packages.build** - Build a python package with the given format.
280+ - **projects.all (projects)** - Run all project tasks.
281+ - **projects.environment** - Export the project environment file.
282+ - **projects.requirements** - Export the project requirements file.
283+ - **projects.run** - Run an mlflow project from MLproject file.
270284
271285# # Workflows
272286
@@ -719,6 +733,82 @@ Select your programming environment.
719733- **Alternatives**:
720734 - Manual installation : time consuming
721735
736+ # # Observability
737+
738+ # ## Reproducibility: [Mlflow Project](https://mlflow.org/docs/latest/projects.html)
739+
740+ - **Motivations**:
741+ - Share common project formats.
742+ - Ensure the project can be reused.
743+ - Avoid randomness in project execution.
744+ - **Limitations**:
745+ - Mlflow Project is best suited for small projects.
746+ - **Alternatives**:
747+ - [DVC](https://dvc.org/) : both data and models.
748+ - [Metaflow](https://metaflow.org/) : focus on machine learning.
749+ - **[Apache Airflow](https://airflow.apache.org/)**: for large scale projects.
750+
751+ # ## Monitoring : [Mlflow Evaluate](https://mlflow.org/docs/latest/model-evaluation/index.html)
752+
753+ - **Motivations**:
754+ - Compute the model metrics.
755+ - Validate model with thresholds.
756+ - Perform post-training evaluations.
757+ - **Limitations**:
758+ - Mlflow Evaluate is less feature-rich as alternatives.
759+ - **Alternatives**:
760+ - **[Giskard](https://www.giskard.ai/)**: open-core and super complete.
761+ - **[Evidently](https://www.evidentlyai.com/)**: open-source with more metrics.
762+ - [Arize AI](https://arize.com/) : more feature-rich but less flexible.
763+ - [Graphana](https://grafana.com/) : you must do everything yourself.
764+
765+ # ## Alerting: [Plyer](https://github.com/kivy/plyer)
766+
767+ - **Motivations**:
768+ - Simple solution.
769+ - Send notifications on system.
770+ - Cross-system : Mac, Linux, Windows.
771+ - **Limitations**:
772+ - Should not be used for large scale projects.
773+ - **Alternatives**:
774+ - [Slack](https://slack.com/) : for chat-oriented solutions.
775+ - [Datadog](https://www.datadoghq.com/) : for infrastructure oriented solutions.
776+
777+ # ## Lineage: [Mlflow Dataset](https://mlflow.org/docs/latest/tracking/data-api.html)
778+
779+ - **Motivations**:
780+ - Store information in Mlflow.
781+ - Track metadata about run datasets.
782+ - Keep URI of the dataset source (e.g., website).
783+ - **Limitations**:
784+ - Not as feature-rich as alternative solutions.
785+ - **Alternatives**:
786+ - [Databricks Lineage](https://docs.databricks.com/en/admin/system-tables/lineage.html) : limited to Databricks.
787+ - [OpenLineage and Marquez](https://marquezproject.github.io/) : open-source and flexible.
788+
789+ # ## Explainability: [SHAP](https://shap.readthedocs.io/en/latest/)
790+
791+ - **Motivations**:
792+ - Most popular toolkit.
793+ - Support various models (linear, model, ...).
794+ - Integration with Mlflow through the [SHAP module](https://mlflow.org/docs/latest/python_api/mlflow.shap.html).
795+ - **Limitations**:
796+ - Super slow on large dataset.
797+ - Mlflow SHAP module is not mature enough.
798+ - **Alternatives**:
799+ - [LIME](https://github.com/marcotcr/lime) : not maintained anymore.
800+
801+ # ## Infrastructure: [Mlflow System Metrics](https://mlflow.org/docs/latest/system-metrics/index.html)
802+
803+ - **Motivations**:
804+ - Track infrastructure information (RAM, CPU, ...).
805+ - Integrated with Mlflow tracking.
806+ - Provide hardware insights.
807+ - **Limitations**:
808+ - Not as mature as alternative solutions.
809+ - **Alternatives**:
810+ - [Datadog](https://www.datadoghq.com/) : popular and mature solution.
811+
722812# Tips
723813
724814This sections gives some tips and tricks to enrich the develop experience.
@@ -736,10 +826,10 @@ This tag can then be associated to a reader/writer implementation in a configura
736826` ` ` yaml
737827 inputs:
738828 KIND: ParquetReader
739- path: data/inputs .parquet
829+ path: data/inputs_train .parquet
740830 targets:
741831 KIND: ParquetReader
742- path: data/targets .parquet
832+ path: data/targets_train .parquet
743833` ` `
744834
745835In this package, the implementation are described in `src/[package]/io/datasets.py` and selected by `KIND`.
0 commit comments