Skip to content

Commit be5bf88

Browse files
authored
Observability (#15)
1 parent 5598082 commit be5bf88

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

66 files changed

+13734
-707
lines changed

CHANGELOG.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,29 @@
1+
## v1.1.0 (2024-07-21)
2+
3+
### Feat
4+
5+
- **kpi**: add key performance indicators
6+
- **mlproject**: add mlflow project and tasks
7+
- **monitoring**: add mlflow.evaluate API
8+
- **lineage**: add lineage features through mlflow data api
9+
- **explanations**: add explainability features and tooling
10+
- **data**: add train, test, and sample data
11+
- **notification**: add service and alerts with plyer
12+
- **observability**: add alerting with plyer notifications
13+
- **observability**: add infrastructure through mlflow system metrics
14+
15+
### Fix
16+
17+
- **kpi**: add key performance indicators
18+
- **projects**: change naming convention
19+
- **evaluation**: add evaluation files
20+
- **loading**: use version or alias for loading models
21+
- **warnings**: improve styles and remove warnings
22+
- **mlflow**: remove input examples following the addition of lineage
23+
- **paths**: fix path for explanation job
24+
- **data**: fix models explanations name
25+
- **data**: add parquet data
26+
127
## v1.0.1 (2024-06-28)
228

329
### Fix

MLproject

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# https://mlflow.org/docs/latest/projects.html
2+
3+
name: bikes
4+
python_env: python_env.yaml
5+
entry_points:
6+
main:
7+
parameters:
8+
conf_file: path
9+
command: "PYTHONPATH=src python -m bikes {conf_file}"

README.md

Lines changed: 97 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,13 @@ You can use this package as part of your MLOps toolkit or platform (e.g., Model
7272
- [Programming](#programming)
7373
- [Language: Python](#language-python)
7474
- [Version: Pyenv](#version-pyenv)
75+
- [Observability](#observability)
76+
- [Reproducibility: Mlflow Project](#reproducibility-mlflow-project)
77+
- [Monitoring : Mlflow Evaluate](#monitoring--mlflow-evaluate)
78+
- [Alerting: Plyer](#alerting-plyer)
79+
- [Lineage: Mlflow Dataset](#lineage-mlflow-dataset)
80+
- [Explainability: SHAP](#explainability-shap)
81+
- [Infrastructure: Mlflow System Metrics](#infrastructure-mlflow-system-metrics)
7582
- [Tips](#tips)
7683
- [AI/ML Practices](#aiml-practices)
7784
- [Data Catalog](#data-catalog)
@@ -150,10 +157,10 @@ job:
150157
KIND: TrainingJob
151158
inputs:
152159
KIND: ParquetReader
153-
path: data/inputs.parquet
160+
path: data/inputs_train.parquet
154161
targets:
155162
KIND: ParquetReader
156-
path: data/targets.parquet
163+
path: data/targets_train.parquet
157164
```
158165
159166
This config file instructs the program to start a `TrainingJob` with 2 parameters:
@@ -173,6 +180,8 @@ $ poetry run [package] confs/tuning.yaml
173180
$ poetry run [package] confs/training.yaml
174181
$ poetry run [package] confs/promotion.yaml
175182
$ poetry run [package] confs/inference.yaml
183+
$ poetry run [package] confs/evaluations.yaml
184+
$ poetry run [package] confs/explanations.yaml
176185
```
177186

178187
In production, you can build, ship, and run the project as a Python package:
@@ -210,7 +219,7 @@ You can invoke the actions from the [command-line](https://www.pyinvoke.org/) or
210219

211220
```bash
212221
# execute the project DAG
213-
$ inv dags
222+
$ inv projects
214223
# create a code archive
215224
$ inv packages
216225
# list other actions
@@ -231,13 +240,16 @@ $ inv --list
231240
- **cleans.coverage** - Clean the coverage tool.
232241
- **cleans.dist** - Clean the dist folder.
233242
- **cleans.docs** - Clean the docs folder.
243+
- **cleans.environment** - Clean the project environment file.
234244
- **cleans.folders** - Run all folders tasks.
235245
- **cleans.mlruns** - Clean the mlruns folder.
236246
- **cleans.mypy** - Clean the mypy tool.
237247
- **cleans.outputs** - Clean the outputs folder.
238248
- **cleans.poetry** - Clean poetry lock file.
239249
- **cleans.pytest** - Clean the pytest tool.
250+
- **cleans.projects** - Run all projects tasks.
240251
- **cleans.python** - Clean python caches and bytecodes.
252+
- **cleans.requirements** - Clean the project requirements file.
241253
- **cleans.reset** - Run all tools, folders, and sources tasks.
242254
- **cleans.ruff** - Clean the ruff tool.
243255
- **cleans.sources** - Run all sources tasks.
@@ -251,8 +263,6 @@ $ inv --list
251263
- **containers.build** - Build the container image with the given tag.
252264
- **containers.compose** - Start up docker compose.
253265
- **containers.run** - Run the container image with the given tag.
254-
- **dags.all (dags)** - Run all DAG tasks.
255-
- **dags.job** - Run the project for the given job name.
256266
- **docs.all (docs)** - Run all docs tasks.
257267
- **docs.api** - Document the API with pdoc using the given format and output directory.
258268
- **docs.serve** - Serve the API docs with pdoc using the given format and computer port.
@@ -267,6 +277,10 @@ $ inv --list
267277
- **mlflow.serve** - Start mlflow server with the given host, port, and backend uri.
268278
- **packages.all (packages)** - Run all package tasks.
269279
- **packages.build** - Build a python package with the given format.
280+
- **projects.all (projects)** - Run all project tasks.
281+
- **projects.environment** - Export the project environment file.
282+
- **projects.requirements** - Export the project requirements file.
283+
- **projects.run** - Run an mlflow project from MLproject file.
270284

271285
## Workflows
272286

@@ -719,6 +733,82 @@ Select your programming environment.
719733
- **Alternatives**:
720734
- Manual installation: time consuming
721735

736+
## Observability
737+
738+
### Reproducibility: [Mlflow Project](https://mlflow.org/docs/latest/projects.html)
739+
740+
- **Motivations**:
741+
- Share common project formats.
742+
- Ensure the project can be reused.
743+
- Avoid randomness in project execution.
744+
- **Limitations**:
745+
- Mlflow Project is best suited for small projects.
746+
- **Alternatives**:
747+
- [DVC](https://dvc.org/): both data and models.
748+
- [Metaflow](https://metaflow.org/): focus on machine learning.
749+
- **[Apache Airflow](https://airflow.apache.org/)**: for large scale projects.
750+
751+
### Monitoring : [Mlflow Evaluate](https://mlflow.org/docs/latest/model-evaluation/index.html)
752+
753+
- **Motivations**:
754+
- Compute the model metrics.
755+
- Validate model with thresholds.
756+
- Perform post-training evaluations.
757+
- **Limitations**:
758+
- Mlflow Evaluate is less feature-rich as alternatives.
759+
- **Alternatives**:
760+
- **[Giskard](https://www.giskard.ai/)**: open-core and super complete.
761+
- **[Evidently](https://www.evidentlyai.com/)**: open-source with more metrics.
762+
- [Arize AI](https://arize.com/): more feature-rich but less flexible.
763+
- [Graphana](https://grafana.com/): you must do everything yourself.
764+
765+
### Alerting: [Plyer](https://github.com/kivy/plyer)
766+
767+
- **Motivations**:
768+
- Simple solution.
769+
- Send notifications on system.
770+
- Cross-system: Mac, Linux, Windows.
771+
- **Limitations**:
772+
- Should not be used for large scale projects.
773+
- **Alternatives**:
774+
- [Slack](https://slack.com/): for chat-oriented solutions.
775+
- [Datadog](https://www.datadoghq.com/): for infrastructure oriented solutions.
776+
777+
### Lineage: [Mlflow Dataset](https://mlflow.org/docs/latest/tracking/data-api.html)
778+
779+
- **Motivations**:
780+
- Store information in Mlflow.
781+
- Track metadata about run datasets.
782+
- Keep URI of the dataset source (e.g., website).
783+
- **Limitations**:
784+
- Not as feature-rich as alternative solutions.
785+
- **Alternatives**:
786+
- [Databricks Lineage](https://docs.databricks.com/en/admin/system-tables/lineage.html): limited to Databricks.
787+
- [OpenLineage and Marquez](https://marquezproject.github.io/): open-source and flexible.
788+
789+
### Explainability: [SHAP](https://shap.readthedocs.io/en/latest/)
790+
791+
- **Motivations**:
792+
- Most popular toolkit.
793+
- Support various models (linear, model, ...).
794+
- Integration with Mlflow through the [SHAP module](https://mlflow.org/docs/latest/python_api/mlflow.shap.html).
795+
- **Limitations**:
796+
- Super slow on large dataset.
797+
- Mlflow SHAP module is not mature enough.
798+
- **Alternatives**:
799+
- [LIME](https://github.com/marcotcr/lime): not maintained anymore.
800+
801+
### Infrastructure: [Mlflow System Metrics](https://mlflow.org/docs/latest/system-metrics/index.html)
802+
803+
- **Motivations**:
804+
- Track infrastructure information (RAM, CPU, ...).
805+
- Integrated with Mlflow tracking.
806+
- Provide hardware insights.
807+
- **Limitations**:
808+
- Not as mature as alternative solutions.
809+
- **Alternatives**:
810+
- [Datadog](https://www.datadoghq.com/): popular and mature solution.
811+
722812
# Tips
723813

724814
This sections gives some tips and tricks to enrich the develop experience.
@@ -736,10 +826,10 @@ This tag can then be associated to a reader/writer implementation in a configura
736826
```yaml
737827
inputs:
738828
KIND: ParquetReader
739-
path: data/inputs.parquet
829+
path: data/inputs_train.parquet
740830
targets:
741831
KIND: ParquetReader
742-
path: data/targets.parquet
832+
path: data/targets_train.parquet
743833
```
744834

745835
In this package, the implementation are described in `src/[package]/io/datasets.py` and selected by `KIND`.

confs/evaluations.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
job:
2+
KIND: EvaluationsJob
3+
inputs:
4+
KIND: ParquetReader
5+
path: data/inputs_train.parquet
6+
targets:
7+
KIND: ParquetReader
8+
path: data/targets_train.parquet

confs/explanations.yaml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
job:
2+
KIND: ExplanationsJob
3+
inputs_samples:
4+
KIND: ParquetReader
5+
path: data/inputs_test.parquet
6+
limit: 100
7+
models_explanations:
8+
KIND: ParquetWriter
9+
path: outputs/models_explanations.parquet
10+
samples_explanations:
11+
KIND: ParquetWriter
12+
path: outputs/samples_explanations.parquet

confs/inference.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ job:
22
KIND: InferenceJob
33
inputs:
44
KIND: ParquetReader
5-
path: data/inputs.parquet
5+
path: data/inputs_test.parquet
66
outputs:
77
KIND: ParquetWriter
8-
path: outputs/predictions.parquet
8+
path: outputs/predictions_test.parquet

confs/training.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ job:
22
KIND: TrainingJob
33
inputs:
44
KIND: ParquetReader
5-
path: data/inputs.parquet
5+
path: data/inputs_train.parquet
66
targets:
77
KIND: ParquetReader
8-
path: data/targets.parquet
8+
path: data/targets_train.parquet

confs/tuning.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ job:
22
KIND: TuningJob
33
inputs:
44
KIND: ParquetReader
5-
path: data/inputs.parquet
5+
path: data/inputs_train.parquet
66
targets:
77
KIND: ParquetReader
8-
path: data/targets.parquet
8+
path: data/targets_train.parquet

data/inputs_test.parquet

54.4 KB
Binary file not shown.
218 KB
Binary file not shown.

0 commit comments

Comments
 (0)