⚡️ Speed up function `gradient_descent` by 25,816% #181

codeflash-ai · 2025-11-24T23:55:48Z

📄 25,816% (258.16x) speedup for `gradient_descent` in `src/numerical/optimization.py`

⏱️ Runtime : 12.0 seconds → 46.3 milliseconds (best of 95 runs)

📝 Explanation and details

The optimization dramatically improves performance by replacing nested loops with vectorized NumPy operations, achieving a 25815% speedup (from 12.0 seconds to 46.3 milliseconds).

Key optimizations applied:

Vectorized predictions: Replaced the double nested loop for computing predictions with X.dot(weights), leveraging NumPy's optimized BLAS routines instead of Python loops.
Vectorized gradient calculation: Eliminated another double nested loop by using X.T.dot(errors) / m, which computes the entire gradient vector in one operation.
In-place weight updates: Used vectorized subtraction weights -= learning_rate * gradient instead of element-wise loops.

Why this is faster:

NumPy operations execute in optimized C code rather than interpreted Python loops
BLAS libraries provide highly optimized matrix operations that utilize CPU cache efficiently
Eliminates the overhead of millions of Python loop iterations (the profiler shows ~31M loop iterations in the original code)

Performance characteristics from tests:

Excellent for large-scale problems (1000+ samples, 50+ features) where the vectorization advantage is most pronounced
Maintains identical numerical behavior across all test cases (basic linear relationships, edge cases, large datasets)
Particularly beneficial for the typical machine learning workloads with moderate to high iteration counts (500-1000 iterations)

The optimization transforms an O(iterations × m × n) nested loop implementation into efficient matrix operations, making it suitable for production machine learning pipelines where gradient descent is often called repeatedly.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 35 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import numpy as np  # used for matrix operations

# imports
import pytest  # used for our unit tests
from src.numerical.optimization import gradient_descent

# unit tests

# 1. Basic Test Cases


def test_basic_single_feature_perfect_fit():
    # Single feature, perfect fit: y = 2*x
    X = np.array([[1], [2], [3], [4]], dtype=float)
    y = np.array([2, 4, 6, 8], dtype=float)
    codeflash_output = gradient_descent(X, y, learning_rate=0.01, iterations=500)
    weights = codeflash_output


def test_basic_two_features_perfect_fit():
    # Two features, perfect fit: y = 1*x1 + 3*x2
    X = np.array([[1, 2], [2, 1], [3, 0], [0, 3]], dtype=float)
    y = np.array(
        [1 * 1 + 3 * 2, 1 * 2 + 3 * 1, 1 * 3 + 3 * 0, 1 * 0 + 3 * 3], dtype=float
    )
    codeflash_output = gradient_descent(X, y, learning_rate=0.01, iterations=1000)
    weights = codeflash_output


def test_basic_nonzero_initial_weights():
    # The function always starts at zero weights, but test with nonzero y
    X = np.array([[1], [2]], dtype=float)
    y = np.array([2, 4], dtype=float)
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=300)
    weights = codeflash_output


def test_basic_learning_rate_effect():
    # Test that smaller learning rate converges slower
    X = np.array([[1], [2]], dtype=float)
    y = np.array([2, 4], dtype=float)
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=50)
    weights_fast = codeflash_output
    codeflash_output = gradient_descent(X, y, learning_rate=0.001, iterations=50)
    weights_slow = codeflash_output


def test_basic_iterations_effect():
    # More iterations should improve convergence
    X = np.array([[1], [2]], dtype=float)
    y = np.array([2, 4], dtype=float)
    codeflash_output = gradient_descent(X, y, learning_rate=0.01, iterations=10)
    weights_few = codeflash_output
    codeflash_output = gradient_descent(X, y, learning_rate=0.01, iterations=500)
    weights_many = codeflash_output


# 2. Edge Test Cases


def test_edge_zero_iterations():
    # Zero iterations should return initial weights (all zeros)
    X = np.array([[1, 2], [3, 4]], dtype=float)
    y = np.array([5, 11], dtype=float)
    codeflash_output = gradient_descent(X, y, learning_rate=0.01, iterations=0)
    weights = codeflash_output


def test_edge_zero_learning_rate():
    # Zero learning rate should result in no change from initial weights
    X = np.array([[1, 2], [3, 4]], dtype=float)
    y = np.array([5, 11], dtype=float)
    codeflash_output = gradient_descent(X, y, learning_rate=0.0, iterations=100)
    weights = codeflash_output


def test_edge_nan_in_X():
    # NaN in X should propagate to weights
    X = np.array([[np.nan, 2], [3, 4]], dtype=float)
    y = np.array([5, 11], dtype=float)
    codeflash_output = gradient_descent(X, y, learning_rate=0.01, iterations=10)
    weights = codeflash_output


def test_edge_inf_in_X():
    # Inf in X should propagate to weights
    X = np.array([[np.inf, 2], [3, 4]], dtype=float)
    y = np.array([5, 11], dtype=float)
    codeflash_output = gradient_descent(X, y, learning_rate=0.01, iterations=10)
    weights = codeflash_output


def test_edge_negative_learning_rate():
    # Negative learning rate should cause weights to diverge from correct solution
    X = np.array([[1], [2]], dtype=float)
    y = np.array([2, 4], dtype=float)
    codeflash_output = gradient_descent(X, y, learning_rate=-0.01, iterations=100)
    weights = codeflash_output


def test_edge_zero_features():
    # X with zero features (n=0)
    X = np.empty((5, 0))
    y = np.array([1, 2, 3, 4, 5], dtype=float)
    codeflash_output = gradient_descent(X, y, learning_rate=0.01, iterations=10)
    weights = codeflash_output


def test_edge_all_zeros_X_y():
    # All zeros in X and y should result in zero weights
    X = np.zeros((4, 3))
    y = np.zeros(4)
    codeflash_output = gradient_descent(X, y, learning_rate=0.01, iterations=100)
    weights = codeflash_output


# 3. Large Scale Test Cases


def test_large_scale_many_samples():
    # Test with 1000 samples, 3 features, y = 2*x1 + 3*x2 + 4*x3
    np.random.seed(42)
    X = np.random.rand(1000, 3)
    true_weights = np.array([2, 3, 4])
    y = X @ true_weights
    codeflash_output = gradient_descent(X, y, learning_rate=0.05, iterations=500)
    weights = codeflash_output
    # Should be close to true_weights
    for i in range(3):
        pass


def test_large_scale_many_features():
    # Test with 50 features, 200 samples, random weights
    np.random.seed(123)
    X = np.random.rand(200, 50)
    true_weights = np.arange(1, 51)
    y = X @ true_weights
    codeflash_output = gradient_descent(X, y, learning_rate=0.01, iterations=700)
    weights = codeflash_output


def test_large_scale_noisy_data():
    # Test with noise in y
    np.random.seed(321)
    X = np.random.rand(500, 5)
    true_weights = np.array([1, -2, 3, -4, 5])
    noise = np.random.normal(0, 0.1, size=500)
    y = X @ true_weights + noise
    codeflash_output = gradient_descent(X, y, learning_rate=0.02, iterations=800)
    weights = codeflash_output


def test_large_scale_high_iterations():
    # Test with high iteration count
    np.random.seed(456)
    X = np.random.rand(100, 10)
    true_weights = np.arange(10)
    y = X @ true_weights
    codeflash_output = gradient_descent(X, y, learning_rate=0.01, iterations=1000)
    weights = codeflash_output


def test_large_scale_small_learning_rate():
    # Test with very small learning rate
    np.random.seed(789)
    X = np.random.rand(100, 5)
    true_weights = np.array([5, 4, 3, 2, 1])
    y = X @ true_weights
    codeflash_output = gradient_descent(X, y, learning_rate=1e-5, iterations=1000)
    weights = codeflash_output


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import numpy as np

# imports
import pytest  # used for our unit tests
from src.numerical.optimization import gradient_descent

# unit tests

# ------------------ BASIC TEST CASES ------------------


def test_gradient_descent_simple_linear():
    # Test with a simple linear relationship: y = 2x
    X = np.array([[1], [2], [3], [4]])
    y = np.array([2, 4, 6, 8])
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=500)
    weights = codeflash_output


def test_gradient_descent_multiple_features():
    # Test with two features: y = 3x1 + 5x2
    X = np.array([[1, 2], [2, 1], [3, 0], [0, 4]])
    y = np.array([13, 11, 9, 20])
    codeflash_output = gradient_descent(X, y, learning_rate=0.05, iterations=1000)
    weights = codeflash_output


def test_gradient_descent_zero_learning_rate():
    # Test with zero learning rate: weights should not change from zero
    X = np.array([[1, 2], [3, 4]])
    y = np.array([5, 11])
    codeflash_output = gradient_descent(X, y, learning_rate=0.0, iterations=100)
    weights = codeflash_output


def test_gradient_descent_one_iteration():
    # Test with one iteration: weights should update only once
    X = np.array([[1], [2]])
    y = np.array([2, 4])
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=1)
    weights = codeflash_output


# ------------------ EDGE TEST CASES ------------------


def test_gradient_descent_single_sample():
    # Test with only one sample
    X = np.array([[5, 7]])
    y = np.array([31])  # y = 2*5 + 3*7 = 10 + 21 = 31
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=100)
    weights = codeflash_output
    # With one sample, weights should fit perfectly
    expected_weights = np.array([2, 3])


def test_gradient_descent_single_feature():
    # Test with one feature and multiple samples
    X = np.array([[1], [2], [3]])
    y = np.array([2, 4, 6])
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=500)
    weights = codeflash_output


def test_gradient_descent_negative_learning_rate():
    # Test with negative learning rate: should diverge
    X = np.array([[1], [2]])
    y = np.array([2, 4])
    codeflash_output = gradient_descent(X, y, learning_rate=-0.1, iterations=10)
    weights = codeflash_output


def test_gradient_descent_zero_iterations():
    # Test with zero iterations: weights should be zero
    X = np.array([[1, 2], [3, 4]])
    y = np.array([5, 11])
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=0)
    weights = codeflash_output


def test_gradient_descent_all_zero_X():
    # Test with all features zero
    X = np.zeros((5, 3))
    y = np.array([1, 2, 3, 4, 5])
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=10)
    weights = codeflash_output


def test_gradient_descent_all_zero_y():
    # Test with all targets zero
    X = np.array([[1, 2], [3, 4]])
    y = np.zeros(2)
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=100)
    weights = codeflash_output


def test_gradient_descent_high_learning_rate():
    # Test with very high learning rate: weights should diverge
    X = np.array([[1], [2]])
    y = np.array([2, 4])
    codeflash_output = gradient_descent(X, y, learning_rate=10, iterations=10)
    weights = codeflash_output


# ------------------ LARGE SCALE TEST CASES ------------------


def test_gradient_descent_large_scale():
    # Test with large dataset (500 samples, 10 features)
    np.random.seed(42)
    X = np.random.rand(500, 10)
    true_weights = np.arange(1, 11)  # [1, 2, ..., 10]
    y = X @ true_weights
    codeflash_output = gradient_descent(X, y, learning_rate=0.05, iterations=500)
    weights = codeflash_output


def test_gradient_descent_large_feature_count():
    # Test with many features (100 features, 50 samples)
    np.random.seed(123)
    X = np.random.rand(50, 100)
    true_weights = np.linspace(1, 2, 100)
    y = X @ true_weights
    codeflash_output = gradient_descent(X, y, learning_rate=0.01, iterations=500)
    weights = codeflash_output


def test_gradient_descent_large_iterations():
    # Test with a reasonable large number of iterations
    X = np.array([[1], [2], [3], [4]])
    y = np.array([2, 4, 6, 8])
    codeflash_output = gradient_descent(X, y, learning_rate=0.05, iterations=1000)
    weights = codeflash_output


def test_gradient_descent_large_scale_random_targets():
    # Large scale with random targets (no correlation)
    np.random.seed(0)
    X = np.random.rand(1000, 5)
    y = np.random.rand(1000)
    codeflash_output = gradient_descent(X, y, learning_rate=0.01, iterations=200)
    weights = codeflash_output


def test_gradient_descent_large_scale_performance():
    # Test performance on large data (timing test, not strict)
    import time

    np.random.seed(1234)
    X = np.random.rand(1000, 10)
    y = np.random.rand(1000)
    start = time.time()
    codeflash_output = gradient_descent(X, y, learning_rate=0.01, iterations=100)
    weights = codeflash_output
    elapsed = time.time() - start


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from src.numerical.optimization import gradient_descent

To edit these changes git checkout codeflash/optimize-gradient_descent-midt1qzy and push.

The optimization dramatically improves performance by **replacing nested loops with vectorized NumPy operations**, achieving a **25815% speedup** (from 12.0 seconds to 46.3 milliseconds). **Key optimizations applied:** 1. **Vectorized predictions**: Replaced the double nested loop for computing predictions with `X.dot(weights)`, leveraging NumPy's optimized BLAS routines instead of Python loops. 2. **Vectorized gradient calculation**: Eliminated another double nested loop by using `X.T.dot(errors) / m`, which computes the entire gradient vector in one operation. 3. **In-place weight updates**: Used vectorized subtraction `weights -= learning_rate * gradient` instead of element-wise loops. **Why this is faster:** - NumPy operations execute in optimized C code rather than interpreted Python loops - BLAS libraries provide highly optimized matrix operations that utilize CPU cache efficiently - Eliminates the overhead of millions of Python loop iterations (the profiler shows ~31M loop iterations in the original code) **Performance characteristics from tests:** - Excellent for large-scale problems (1000+ samples, 50+ features) where the vectorization advantage is most pronounced - Maintains identical numerical behavior across all test cases (basic linear relationships, edge cases, large datasets) - Particularly beneficial for the typical machine learning workloads with moderate to high iteration counts (500-1000 iterations) The optimization transforms an O(iterations × m × n) nested loop implementation into efficient matrix operations, making it suitable for production machine learning pipelines where gradient descent is often called repeatedly.

codeflash-ai bot requested a review from KRRT7 November 24, 2025 23:55

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `gradient_descent` by 25,816% #181

⚡️ Speed up function `gradient_descent` by 25,816% #181

Uh oh!

codeflash-ai bot commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function gradient_descent by 25,816% #181

Are you sure you want to change the base?

⚡️ Speed up function gradient_descent by 25,816% #181

Uh oh!

Conversation

codeflash-ai bot commented Nov 24, 2025

📄 25,816% (258.16x) speedup for gradient_descent in src/numerical/optimization.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function `gradient_descent` by 25,816% #181

⚡️ Speed up function `gradient_descent` by 25,816% #181

📄 25,816% (258.16x) speedup for `gradient_descent` in `src/numerical/optimization.py`