A machine learning project using Linear Regression to predict health insurance expenses based on personal and lifestyle data. Built using TensorFlow 2.x and trained on real-world data from insurance.csv.
The dataset contains the following features:
age– Age of primary beneficiarysex– Gender (male,female)bmi– Body mass indexchildren– Number of dependentssmoker– Whether the person smokes (yes,no)region– Residential area in the US (northeast,northwest, etc.)expenses– Medical costs billed by health insurance
- One-hot encoding applied to:
sex,smoker, andregion(withdrop_first=Trueto avoid dummy variable trap)
expensescolumn popped as target variable- Train-test split:
80%training /20%testing StandardScalerused to normalize feature columns
Built using TensorFlow Keras Sequential API:
Dense(256)→ ReLUDropout(0.1)Dense(128)→ ReLUDropout(0.1)Dense(64)→ ReLUDense(1)→ Output layer (regression)
Compiled with:
- Loss:
Mean Squared Error (MSE) - Optimizer:
Adam - Metrics:
Mean Absolute Error (MAE)
EarlyStopping used to prevent overfitting.
- Evaluated on unseen test set
- Achieved MAE < 3500, passing the freeCodeCamp challenge ✅
Example output:
- Load the notebook in Google Colab
- Run all cells (training will auto-start)
- Final cell evaluates the model and displays predictions vs true values on a scatter plot
Train a regression model that can predict healthcare costs within a $3500 error margin on new, unseen data. Mission accomplished.
