An interactive Python ETL pipeline for cleaning messy sales data, saving to a database, generating summary reports, and visualizing sales trends.
β ETL Process
- Load multi-CSV (monthly sales data)
- Clean data: handle NaN, convert messy
Salescolumn to numeric, remove outliers - Add
Monthcolumn automatically based on file name
β Database Integration
- Save cleaned data to SQLite database
- Auto-create index on
Productcolumn for faster queries - Run SQL queries for total sales per product & month
β Reports & Visualization
- Export sales summary CSV from DB query
- Generate bar chart of total sales per month (
output/sales_per_month.png) - Full logging to
output/etl.logfor traceability
This project helps small businesses, analysts, and data teams:
- Automate monthly sales data consolidation
- Build a clean, queryable sales database
- Generate reports & visual insights for better decisions
output/sales_data.dbβ SQLite database fileoutput/monthly_sales_summary.csvβ Sales summary (DB query)output/sales_per_month.pngβ Bar chart of sales by monthoutput/etl.logβ Detailed ETL process log
1οΈβ£ (Recommended) Set up virtual environment
python -m venv venv
source venv/bin/activate # Mac/Linux
venv\Scripts\activate # Windows2οΈβ£ Install dependencies
pip install -r requirements.txt3οΈβ£ Run ETL pipeline
python etl/main.py- π Python 3.x
- π¦ Pandas
- β SQLAlchemy
- π Matplotlib
Built with clean, modular code β ready for production or extension into dashboards.
β
You can easily integrate this pipeline into Streamlit, BI tools, or cloud databases.
Khairu Ikramendra
πΌ Freelance Dashboard & Data Analytics Developer
π LinkedIn
π Upwork
π¬ Need help customizing this ETL for your business? Feel free to reach out!