This project was developed as a practical exercise for Machine Learning model tracking and deployment using MLflow. It focuses on building a complete ML pipeline, from preprocessing to model registration and production deployment.
2. Log model hyperparameters (C, regularization, solver) and metrics (accuracy, f1-score) with MLflow.
None → Staging → Production
📁 command-classifier/
│
├── Data.py # Data loading, cleaning, label encoding
├── Transformer.py # Text vectorization using Sentence Transformers
├── Model.py # Logistic Regression + embedding pipeline
├── DecodedPipelineWrapper.py # Adds decoding + probability utilities to the pipeline
├── main.py # Full MLflow lifecycle: training → logging → registry
├── Nova.py # Loads the trained .pkl model and performs live predictions
├── test_model.py # Interactive CLI for testing commands with Nova
├── commands_dataset.csv # Example dataset with text and intents
└── README.md # Project documentation
This module is responsible for loading, validating, and preprocessing the dataset.
-
Loads CSV data containing:
text | command | intent -
Cleans text (lowercase, removes accents, punctuation, and extra spaces)
-
Creates a unified label column (
command + intent) -
Encodes labels using
LabelEncoder -
Handles errors gracefully (missing file, empty CSV, wrong format)
✅ Dataset successfully loaded and processed.
Columns: text, label
This module provides the text embedding layer using Sentence Transformers.
- Wraps
SentenceTransformerfromsentence-transformerslibrary. - Uses the model:
'paraphrase-multilingual-mpnet-base-v2'. - Automatically detects and uses GPU if available.
- Produces dense numerical embeddings from input text.
from Transformer import SentenceTransformerVectorizer
vectorizer = SentenceTransformerVectorizer()
embeddings = vectorizer.transform(["turn off the computer"])
print(embeddings.shape) # (1, 768)Defines the Logistic Regression model integrated with the transformer-based vectorizer.
[ SentenceTransformerVectorizer ] → [ LogisticRegression ]
-
Loads cleaned text and encoded labels via
Data.py -
Splits data (80% train / 20% test)
-
Builds pipeline and trains model
-
Evaluates performance using:
- Accuracy
- Precision
- Recall
- F1-score
-
Saves model as
.pklfor production usage
🏋️ MODEL TRAINING STARTED
🎯 Accuracy: 0.9423
🏆 F1 Score (weighted): 0.9371
✅ Model successfully trained.
💾 Model saved as model.pkl
A lightweight wrapper to ensure the model returns decoded labels (instead of numeric classes) and prediction probabilities.
-
Takes the trained pipeline and fitted
LabelEncoder -
Provides:
predict(order)→ returns decoded label (string)probability(order)→ returns confidence in %
decoded_model = DecodedPipelineWrapper(pipeline, label_encoder)
command = "Play music at 6"
prediction = decoded_model.predict(command)
confidence = decoded_model.probability(command)
print(prediction, confidence)
# Output: "play music scheduled" , 97.25%This script handles the entire model lifecycle using MLflow:
- Experiment tracking
- Parameter and metric logging
- Model registration
- Versioning and stage transition
- Initializes MLflow experiment
- Defines model parameters
- Trains the pipeline via
ModelLR - Logs all parameters, metrics, and artifacts
- Registers and promotes model to Production
- Uses
infer_signature()for schema consistency - Automatically creates new model versions
- Promotes model from Staging → Production
- Assigns aliases for each stage
📦 REGISTERING MODEL IN MLFLOW MODEL REGISTRY
✅ Model version successfully created.
🚀 Model successfully moved to 'Production' stage.
🎯 FULL PROCESS COMPLETED SUCCESSFULLY
| Type | Description |
|---|---|
| Parameters | Logistic Regression hyperparameters |
| Metrics | Accuracy, F1-score, training time |
| Artifacts | Dataset, pickled model |
| Tags | Version, model type, author info |
Nova acts as a lightweight production client to load and use the trained model.
- Loads
.pklmodel generated byModelLR - Handles missing/corrupted models gracefully
- Predicts commands using the decoded pipeline
- Returns both prediction and confidence score
from Nova import Nova
model = Nova("model.pkl")
prediction, confidence, time = model.predict("open chrome browser")
print(prediction, confidence)🧠 Predicted Command: open chrome immediate
📈 Confidence: 98.6%
⏱️ Inference Time: 0.128s
Provides a terminal interface to interact with the deployed model.
- Loads the trained model automatically
- Takes user input command
- Displays prediction, probability, and latency
- Type
exitto quit the program
🌌 WELCOME TO NOVA'S COMMAND CLASSIFIER 🌌
👉 Enter a command: Turn off the PC
🧠 Detected Command : turn off computer immediate
📈 Confidence Level : 96.73%
⏱️ Processing Time : 0.1423 seconds
| Stage | Description | Script |
|---|---|---|
| 1. Data Preprocessing | Clean, encode, and validate dataset | Data.py |
| 2. Embedding Generation | Convert text to vector embeddings | Transformer.py |
| 3. Model Training | Train LR classifier using embeddings | Model.py |
| 4. Experiment Tracking | Log metrics, params, and artifacts | main.py |
| 5. Model Registration | Register, version, and promote model | main.py |
| 6. Model Serving | Load and serve .pkl model |
Nova.py |
| 7. CLI Prediction | Real-time command testing | test_model.py |
| Model Name | Version | Stage | Accuracy | F1-score |
|---|---|---|---|---|
| Nova_classifier_model | 3 | Production | 0.945 | 0.940 |
- Python ≥ 3.9
mlflowscikit-learnpandasnumpysentence-transformerstorch
Install requirements:
pip install -r requirements.txtCamilo Ramos Cotes Software Engineer | Machine Learning Enthusiast 📧 camutoxlive20@gmail.com 🔗 (https://github.com/Camiloramos2000)
After running this full pipeline, you will obtain:
✅ A Logistic Regression command classifier 🧩 Tracked, registered, and versioned in MLflow 🧠 Encoded with SentenceTransformer embeddings 🎯 Wrapped for decoded, human-readable predictions 🚀 Served through a CLI application (“Nova”)
“From dataset to production — every stage tracked and versioned with MLflow.” ✨





