Bayesian Linear Regression

A comprehensive end-to-end implementation of a Bayesian linear regression model to predict individual salaries using conjugate Gaussian–Inverse-Gamma priors, optimized Gibbs sampling, and full posterior diagnostics.

📖 Overview

This project provides a self-contained Jupyter notebook that:

Loads and preprocesses salary data with predictors: cost, LSAT, GPA, age, library volume, log(cost), and institutional rank
Specifies weakly-informative priors for regression coefficients and error variance
Implements an optimized Gibbs sampler (Cholesky sampling, pre-inversion of constant matrices)
Runs multiple chains and assesses convergence via trace plots, ACF, Effective Sample Size (ESS), and Gelman–Rubin $\hat R$
Summarizes posterior distributions (means, medians, SDs, 95% HDIs, sign probabilities)
Explores joint parameter dependencies (pairplots, correlation heatmaps)
Performs Posterior Predictive Checks (histogram, KDE, HDI shading)
Demonstrates model refinements (Student-$t$ likelihood, log-transform, mixture models, heteroskedasticity)

✨ Key Features

Conjugate Bayesian setup: closed-form updates for $\beta$ and $\sigma^2$
Optimized Gibbs sampler: Cholesky draws and precomputed inverses for speed
Robust diagnostics: trace, ACF, ESS, $\hat R$ and ArviZ integration
Rich posterior summaries: HDIs, posterior $P(\beta>0)$, forest plots
Flexible PPCs: histograms, KDE, rug plots, HDI shading
Extension recipes: code snippets for Student-$t$ errors, mixtures, transformations

🚀 Quick Start

Dependencies
- Python 3.8+
- NumPy, SciPy, Pandas
- Matplotlib, Seaborn
- Statsmodels (for ACF)
- ArviZ (for ESS, $\hat R$)

Environment Setup

pip install numpy scipy pandas matplotlib seaborn statsmodels arviz

Open & Execute
- Navigate to Linear_Regression.ipynb
- Run all cells in sequential order to reproduce data loading, model specification, Gibbs sampling, diagnostics, posterior summaries, PPCs, and extensions.

📈 Results & Interpretation

Predictor Effects
- Strong positive: Cost, GPA, Library volume (95% HDI excludes zero, P(β>0)>0.99)
- Strong negative: Log(cost), Institutional rank (95% HDI excludes zero, P(β>0)<0.01)
- Ambiguous: LSAT, Age (HDIs straddle zero, moderate P(β>0))
Sampling Diagnostics
- High Effective Sample Size (ESS > 70 000)
- Gelman–Rubin $\hat R = 1.00$
- ACF near zero beyond lag 0 → almost independent draws
Predictive Fit
- Posterior Predictive Checks reveal that the Normal-error model underestimates multimodality and heavy tails in the observed salary distribution.
- Model refinements (Student-t errors, mixture components, transformations) are recommended to capture sharp peaks and extreme values.

🤝 Contributing

Contributions and suggestions are welcome! Please:

Open an issue to propose enhancements or report bugs
Submit pull requests with clear descriptions of changes
Include unit tests for any new sampler or diagnostic functions

📜 License

Released under the MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
experiments		experiments
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Linear_Regression.ipynb		Linear_Regression.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bayesian Linear Regression

📖 Overview

✨ Key Features

🚀 Quick Start

📈 Results & Interpretation

🤝 Contributing

📜 License

About

Uh oh!

Releases

Packages

Languages

License

pablo-reyes8/bayesian-linear-regression

Folders and files

Latest commit

History

Repository files navigation

Bayesian Linear Regression

📖 Overview

✨ Key Features

🚀 Quick Start

📈 Results & Interpretation

🤝 Contributing

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages