Iteratively Reweighted Least Squares for Logistic Regression

A Predictive Analytics Project

Author: A K M Intisar Islam
Course: MATH 5383 – Predictive Analytics
Language: R

Project Overview

This project explores linear classification using logistic regression, implemented through the Iteratively Reweighted Least Squares (IRLS) algorithm. It systematically examines how logistic regression behaves under different conditions—such as linear separability, regularization, outliers, dataset size, and class imbalance.

The project demonstrates how L2 regularization (ridge penalty) enhances the stability, convergence, and generalization of logistic regression, especially when data contain noise or outliers.

Objectives

Implement the IRLS algorithm for logistic regression using Newton–Raphson optimization.
Compare unregularized and L2-regularized logistic regression.
Study the effects of dataset size, balance, and outliers.
Visualize decision boundaries and log-likelihood surfaces.
Evaluate performance using an 80–20 train–test split.

Data Generation

Synthetic datasets were generated with two Gaussian clusters:

Parameter	Value
Observations	n = 50 (later 500)
Predictors	m = 2 (later 4)
Class ratio	50/50 (later 40/60)
Standard deviation	1
Cluster means	(3, 3) and (7, 7)

Later experiments introduced:

Larger datasets (n = 500)
Imbalanced class distributions (40%/60%)
Artificial outliers to test robustness

IRLS Implementation

The IRLS algorithm iteratively updates coefficients using:

[ \beta^{(k+1)} = \beta^{(k)} - (H_k)^{-1} g_k ]

where

[ H_k = -X^T W X - \lambda I, \quad g_k = X^T (y - \hat{p}) ]

Convergence criterion: [ | \beta^{(k+1)} - \beta^{(k)} |_2 < \epsilon ]

Default parameters:

β₀ = 0
ε = 1e−6
Max iterations = 100
λ = 0 (unregularized) or 0.5 (regularized)

Key Findings

Scenario	Regularization	Iterations	Accuracy	Observation
Small, balanced	None	35	100%	Perfect separation
Small, balanced	L2	9	100%	Faster, stable
Large, balanced	None	100	99%	Stable with more data
Imbalanced (40/60)	L2	87	100%	Robust to imbalance
With outliers	L2	10	82%	Regularization improves robustness

Regularization reduced coefficient magnitudes and prevented divergence under near-separable or noisy data.

Visualization Highlights

Cluster plots showing decision boundaries (black = unregularized, green = regularized).
Coefficient trajectories visualized on log-likelihood contours.
Outlier impact plots showing shifts in decision boundaries.

Performance Evaluation

Performance was evaluated using an 80/20 train–test split:

Clean datasets → 99–100% accuracy
Datasets with outliers → ≈82% accuracy
Regularization stabilized solutions without sacrificing predictive power

Conclusion

Unregularized logistic regression can achieve perfect accuracy but becomes numerically unstable when data are separable or noisy.
L2 regularization provides finite, stable, and interpretable coefficients.
Regularization is essential in the presence of outliers, small datasets, or high-dimensional features.

Tools & Libraries

R (v4.x)
Packages: ggplot2, MASS, glm
Environment: RStudio / Google Colab

References

Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer.
McCullagh, P., & Nelder, J. A. (1989). Generalized Linear Models. Chapman & Hall.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
analysis_irls_logreg.R		analysis_irls_logreg.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Iteratively Reweighted Least Squares for Logistic Regression

Project Overview

Objectives

Data Generation

IRLS Implementation

Key Findings

Visualization Highlights

Performance Evaluation

Conclusion

Tools & Libraries

References

About

Uh oh!

Releases

Packages

Languages

akmintisar/analysis-of-irls-for-logistic-regression

Folders and files

Latest commit

History

Repository files navigation

Iteratively Reweighted Least Squares for Logistic Regression

Project Overview

Objectives

Data Generation

IRLS Implementation

Key Findings

Visualization Highlights

Performance Evaluation

Conclusion

Tools & Libraries

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages