Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
033de60
390 convert StepDecay to Numpower
apphp Nov 7, 2025
a02c4a0
390 convert RMSProp to Numpower
apphp Nov 7, 2025
cccfa79
390 added math explanation for step() methods
apphp Nov 7, 2025
f1c55e6
390 convert Momentum to Numpower
apphp Nov 7, 2025
919ce36
390 convert Cyclical to NumPower
apphp Nov 8, 2025
d806494
390 added math formulas to momentum.md
apphp Nov 8, 2025
3fa08ec
390 added math formulas to rms-prop.md
apphp Nov 8, 2025
537b586
390 added math formulas to stochastic.md
apphp Nov 8, 2025
331fb36
390 convert Adam to NumPower
apphp Nov 11, 2025
47ad665
390 refactoring CyclicalTest - added dataprovider for constructor tests
apphp Nov 11, 2025
3575565
390 refactoring CyclicalTest - added dataprovider for constructor tests
apphp Nov 11, 2025
8677c76
390 refactoring AdamTest - added dataprovider for constructor tests
apphp Nov 11, 2025
269405b
390 refactoring MomentumTest - added dataprovider for constructor tests
apphp Nov 11, 2025
aca753e
390 refactoring RMSPropTest - added dataprovider for constructor tests
apphp Nov 11, 2025
e9c4831
390 refactoring StepDecayTest - added dataprovider for constructor tests
apphp Nov 11, 2025
8d3f76a
390 refactoring StochasticTest - added dataprovider for constructor t…
apphp Nov 11, 2025
23397ef
390 convert AdaMax to NumPower
apphp Nov 11, 2025
223a90e
390 convert AdaMax to NumPower
apphp Nov 14, 2025
db1c6db
390 Added warm initialization test for zeroed Adam optimizer caches
apphp Nov 14, 2025
548c055
Code cleanup: removed redundant docblocks, adjusted formatting, and a…
apphp Nov 14, 2025
40cf94b
390 convert AdaGrad to NumPower
apphp Nov 14, 2025
a67655f
390- Fix broken link to the Adam optimizer source file in documentation
apphp Nov 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 19 additions & 3 deletions docs/neural-network/optimizers/adagrad.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,35 @@
<span style="float:right;"><a href="https://github.com/RubixML/ML/blob/master/src/NeuralNet/Optimizers/AdaGrad.php">[source]</a></span>
<span style="float:right;"><a href="https://github.com/RubixML/ML/blob/master/src/NeuralNet/Optimizers/AdaGrad/AdaGrad.php">[source]</a></span>

# AdaGrad
Short for *Adaptive Gradient*, the AdaGrad Optimizer speeds up the learning of parameters that do not change often and slows down the learning of parameters that do enjoy heavy activity. Due to AdaGrad's infinitely decaying step size, training may be slow or fail to converge using a low learning rate.

## Mathematical formulation
Per step (element-wise), AdaGrad accumulates the sum of squared gradients and scales the update by the root of this sum:

$$
\begin{aligned}
\mathbf{n}_t &= \mathbf{n}_{t-1} + \mathbf{g}_t^{2} \\
\Delta{\theta}_t &= \alpha\, \frac{\mathbf{g}_t}{\sqrt{\mathbf{n}_t} + \varepsilon}
\end{aligned}
$$

where:
- $t$ is the current step,
- $\alpha$ is the learning rate (`rate`),
- $\mathbf{g}_t$ is the current gradient, and $\mathbf{g}_t^{2}$ denotes element-wise square,
- $\varepsilon$ is a small constant for numerical stability (in the implementation, the denominator is clipped from below by `EPSILON`).

## Parameters
| # | Name | Default | Type | Description |
|---|---|---|---|---|
| 1 | rate | 0.01 | float | The learning rate that controls the global step size. |

## Example
```php
use Rubix\ML\NeuralNet\Optimizers\AdaGrad;
use Rubix\ML\NeuralNet\Optimizers\AdaGrad\AdaGrad;

$optimizer = new AdaGrad(0.125);
```

## References
[^1]: J. Duchi et al. (2011). Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.
[^1]: J. Duchi et al. (2011). Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.
25 changes: 22 additions & 3 deletions docs/neural-network/optimizers/adam.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,27 @@
<span style="float:right;"><a href="https://github.com/RubixML/ML/blob/master/src/NeuralNet/Optimizers/Adam.php">[source]</a></span>
<span style="float:right;"><a href="https://github.com/RubixML/ML/blob/master/src/NeuralNet/Optimizers/Adam/Adam.php">[source]</a></span>

# Adam
Short for *Adaptive Moment Estimation*, the Adam Optimizer combines both Momentum and RMS properties. In addition to storing an exponentially decaying average of past squared gradients like [RMSprop](rms-prop.md), Adam also keeps an exponentially decaying average of past gradients, similar to [Momentum](momentum.md). Whereas Momentum can be seen as a ball running down a slope, Adam behaves like a heavy ball with friction.

## Mathematical formulation
Per step (element-wise), Adam maintains exponentially decaying moving averages of the gradient and its element-wise square and uses them to scale the update:

$$
\begin{aligned}
\mathbf{v}_t &= (1 - \beta_1)\,\mathbf{v}_{t-1} + \beta_1\,\mathbf{g}_t \\
\mathbf{n}_t &= (1 - \beta_2)\,\mathbf{n}_{t-1} + \beta_2\,\mathbf{g}_t^{2} \\
\Delta{\theta}_t &= \alpha\, \frac{\mathbf{v}_t}{\sqrt{\mathbf{n}_t} + \varepsilon}
\end{aligned}
$$

where:
- $t$ is the current step,
- $\alpha$ is the learning rate (`rate`),
- $\beta_1$ is the momentum decay (`momentumDecay`),
- $\beta_2$ is the norm decay (`normDecay`),
- $\mathbf{g}_t$ is the current gradient, and $\mathbf{g}_t^{2}$ denotes element-wise square,
- $\varepsilon$ is a small constant for numerical stability (in the implementation, the denominator is clipped from below by `EPSILON`).

## Parameters
| # | Name | Default | Type | Description |
|---|---|---|---|---|
Expand All @@ -12,10 +31,10 @@ Short for *Adaptive Moment Estimation*, the Adam Optimizer combines both Momentu

## Example
```php
use Rubix\ML\NeuralNet\Optimizers\Adam;
use Rubix\ML\NeuralNet\Optimizers\Adam\Adam;

$optimizer = new Adam(0.0001, 0.1, 0.001);
```

## References
[^1]: D. P. Kingma et al. (2014). Adam: A Method for Stochastic Optimization.
[^1]: D. P. Kingma et al. (2014). Adam: A Method for Stochastic Optimization.
25 changes: 22 additions & 3 deletions docs/neural-network/optimizers/adamax.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,27 @@
<span style="float:right;"><a href="https://github.com/RubixML/ML/blob/master/src/NeuralNet/Optimizers/AdaMax.php">[source]</a></span>
<span style="float:right;"><a href="https://github.com/RubixML/ML/blob/master/src/NeuralNet/Optimizers/AdaMax/AdaMax.php">[source]</a></span>

# AdaMax
A version of the [Adam](adam.md) optimizer that replaces the RMS property with the infinity norm of the past gradients. As such, AdaMax is generally more suitable for sparse parameter updates and noisy gradients.

## Mathematical formulation
Per step (element-wise), AdaMax maintains an exponentially decaying moving average of the gradient (velocity) and an infinity-norm accumulator of past gradients, and uses them to scale the update:

$$
\begin{aligned}
\mathbf{v}_t &= (1 - \beta_1)\,\mathbf{v}_{t-1} + \beta_1\,\mathbf{g}_t \\
\mathbf{u}_t &= \max\big(\beta_2\,\mathbf{u}_{t-1},\ |\mathbf{g}_t|\big) \\
\Delta{\theta}_t &= \alpha\, \frac{\mathbf{v}_t}{\max(\mathbf{u}_t, \varepsilon)}
\end{aligned}
$$

where:
- $t$ is the current step,
- $\alpha$ is the learning rate (`rate`),
- $\beta_1$ is the momentum decay (`momentumDecay`),
- $\beta_2$ is the norm decay (`normDecay`),
- $\mathbf{g}_t$ is the current gradient and $|\mathbf{g}_t|$ denotes element-wise absolute value,
- $\varepsilon$ is a small constant for numerical stability (in the implementation, the denominator is clipped from below by `EPSILON`).

## Parameters
| # | Name | Default | Type | Description |
|---|---|---|---|---|
Expand All @@ -12,10 +31,10 @@ A version of the [Adam](adam.md) optimizer that replaces the RMS property with t

## Example
```php
use Rubix\ML\NeuralNet\Optimizers\AdaMax;
use Rubix\ML\NeuralNet\Optimizers\AdaMax\AdaMax;

$optimizer = new AdaMax(0.0001, 0.1, 0.001);
```

## References
[^1]: D. P. Kingma et al. (2014). Adam: A Method for Stochastic Optimization.
[^1]: D. P. Kingma et al. (2014). Adam: A Method for Stochastic Optimization.
26 changes: 23 additions & 3 deletions docs/neural-network/optimizers/cyclical.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,28 @@
<span style="float:right;"><a href="https://github.com/RubixML/ML/blob/master/src/NeuralNet/Optimizers/Cyclical.php">[source]</a></span>
<span style="float:right;"><a href="https://github.com/RubixML/ML/blob/master/src/NeuralNet/Optimizers/Cyclical/Cyclical.php">[source]</a></span>

# Cyclical
The Cyclical optimizer uses a global learning rate that cycles between the lower and upper bound over a designated period while also decaying the upper bound by a factor at each step. Cyclical learning rates have been shown to help escape bad local minima and saddle points of the gradient.

## Mathematical formulation
Per step (element-wise), the cyclical learning rate and update are computed as:

$$
\begin{aligned}
\text{cycle} &= \left\lfloor 1 + \frac{t}{2\,\text{steps}} \right\rfloor \\
x &= \left| \frac{t}{\text{steps}} - 2\,\text{cycle} + 1 \right| \\
\text{scale} &= \text{decay}^{\,t} \\
\eta_t &= \text{lower} + (\text{upper} - \text{lower})\,\max\bigl(0\,1 - x\bigr)\,\text{scale} \\
\Delta\theta_t &= \eta_t\,g_t
\end{aligned}
$$

where:
- $t$ is the current step counter,
- $steps$ is the number of steps in every half cycle,
- $lower$ and $upper$ are the learning rate bounds,
- $decay$ is the multiplicative decay applied each step,
- $g_t$ is the current gradient.

## Parameters
| # | Name | Default | Type | Description |
|---|---|---|---|---|
Expand All @@ -13,10 +33,10 @@ The Cyclical optimizer uses a global learning rate that cycles between the lower

## Example
```php
use Rubix\ML\NeuralNet\Optimizers\Cyclical;
use Rubix\ML\NeuralNet\Optimizers\Cyclical\Cyclical;

$optimizer = new Cyclical(0.001, 0.005, 1000);
```

## References
[^1]: L. N. Smith. (2017). Cyclical Learning Rates for Training Neural Networks.
[^1]: L. N. Smith. (2017). Cyclical Learning Rates for Training Neural Networks.
29 changes: 27 additions & 2 deletions docs/neural-network/optimizers/momentum.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,33 @@
<span style="float:right;"><a href="https://github.com/RubixML/ML/blob/master/src/NeuralNet/Optimizers/Momentum.php">[source]</a></span>
<span style="float:right;"><a href="https://github.com/RubixML/ML/blob/master/src/NeuralNet/Optimizers/Momentum/Momentum.php">[source]</a></span>

# Momentum
Momentum accelerates each update step by accumulating velocity from past updates and adding a factor of the previous velocity to the current step. Momentum can help speed up training and escape bad local minima when compared with [Stochastic](stochastic.md) Gradient Descent.

## Mathematical formulation
Per step (element-wise), Momentum updates the velocity and applies it as the parameter step:

$$
\begin{aligned}
\beta &= 1 - \text{decay}, \quad \eta = \text{rate} \\
\text{Velocity update:}\quad v_t &= \beta\,v_{t-1} + \eta\,g_t \\
\text{Returned step:}\quad \Delta\theta_t &= v_t
\end{aligned}
$$

Nesterov lookahead (when `lookahead = true`) is approximated by applying the velocity update a second time:

$$
\begin{aligned}
v_t &\leftarrow \beta\,v_t + \eta\,g_t
\end{aligned}
$$

where:
- $g_t$ is the current gradient,
- $v_t$ is the velocity (accumulated update),
- $\beta$ is the momentum coefficient ($1 − decay$),
- $\eta$ is the learning rate ($rate$).

## Parameters
| # | Name | Default | Type | Description |
|---|---|---|---|---|
Expand All @@ -12,7 +37,7 @@ Momentum accelerates each update step by accumulating velocity from past updates

## Example
```php
use Rubix\ML\NeuralNet\Optimizers\Momentum;
use Rubix\ML\NeuralNet\Optimizers\Momentum\Momentum;

$optimizer = new Momentum(0.01, 0.1, true);
```
Expand Down
26 changes: 22 additions & 4 deletions docs/neural-network/optimizers/rms-prop.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,25 @@
<span style="float:right;"><a href="https://github.com/RubixML/ML/blob/master/src/NeuralNet/Optimizers/RMSProp.php">[source]</a></span>
<span style="float:right;"><a href="https://github.com/RubixML/ML/blob/master/src/NeuralNet/Optimizers/RMSProp/RMSProp.php">[source]</a></span>

# RMS Prop
An adaptive gradient technique that divides the current gradient over a rolling window of the magnitudes of recent gradients. Unlike [AdaGrad](adagrad.md), RMS Prop does not suffer from an infinitely decaying step size.
An adaptive gradient technique that divides the current gradient over a rolling window of magnitudes of recent gradients. Unlike [AdaGrad](adagrad.md), RMS Prop does not suffer from an infinitely decaying step size.

## Mathematical formulation
Per step (element-wise), RMSProp maintains a running average of squared gradients and scales the step by the root-mean-square:

$$
\begin{aligned}
\rho &= 1 - \text{decay}, \quad \eta = \text{rate} \\
\text{Running average:}\quad v_t &= \rho\,v_{t-1} + (1 - \rho)\,g_t^{\,2} \\
\text{Returned step:}\quad \Delta\theta_t &= \frac{\eta\,g_t}{\max\bigl(\sqrt{v_t},\,\varepsilon\bigr)}
\end{aligned}
$$

where:
- $g_t$ - is the current gradient,
- $v_t$ - is the running average of squared gradients,
- $\rho$ - is the averaging coefficient ($1 − decay$),
- $\eta$ - is the learning rate ($rate$),
- $\varepsilon$ - is a small constant to avoid division by zero (implemented by clipping $\sqrt{v_t}$ to $[ε, +∞)$).

## Parameters
| # | Name | Default | Type | Description |
Expand All @@ -11,10 +29,10 @@ An adaptive gradient technique that divides the current gradient over a rolling

## Example
```php
use Rubix\ML\NeuralNet\Optimizers\RMSProp;
use Rubix\ML\NeuralNet\Optimizers\RMSProp\RMSProp;

$optimizer = new RMSProp(0.01, 0.1);
```

## References
[^1]: T. Tieleman et al. (2012). Lecture 6e rmsprop: Divide the gradient by a running average of its recent magnitude.
[^1]: T. Tieleman et al. (2012). Lecture 6e rmsprop: Divide the gradient by a running average of its recent magnitude.
24 changes: 21 additions & 3 deletions docs/neural-network/optimizers/step-decay.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,26 @@
<span style="float:right;"><a href="https://github.com/RubixML/ML/blob/master/src/NeuralNet/Optimizers/StepDecay.php">[source]</a></span>
<span style="float:right;"><a href="https://github.com/RubixML/ML/blob/master/src/NeuralNet/Optimizers/StepDecay/StepDecay.php">[source]</a></span>

# Step Decay
A learning rate decay optimizer that reduces the global learning rate by a factor whenever it reaches a new *floor*. The number of steps needed to reach a new floor is defined by the *steps* hyper-parameter.

## Mathematical formulation
Per step (element-wise), the Step Decay learning rate and update are:

$$
\begin{aligned}
\text{floor} &= \left\lfloor \frac{t}{k} \right\rfloor \\
\eta_t &= \frac{\eta_0}{1 + \text{floor}\cdot \lambda} \\
\Delta\theta_t &= \eta_t\,g_t
\end{aligned}
$$

where:
- $t$ is the current step number,
- $k$ is the number of steps per floor,
- $\eta_0$ is the initial learning rate ($rate$),
- $\lambda$ is the decay factor ($decay$),
- $g_t$ is the current gradient.

## Parameters
| # | Name | Default | Type | Description |
|---|---|---|---|---|
Expand All @@ -12,7 +30,7 @@ A learning rate decay optimizer that reduces the global learning rate by a facto

## Example
```php
use Rubix\ML\NeuralNet\Optimizers\StepDecay;
use Rubix\ML\NeuralNet\Optimizers\StepDecay\StepDecay;

$optimizer = new StepDecay(0.1, 50, 1e-3);
```
```
14 changes: 14 additions & 0 deletions docs/neural-network/optimizers/stochastic.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,20 @@
# Stochastic
A constant learning rate optimizer based on vanilla Stochastic Gradient Descent (SGD).

## Mathematical formulation
Per step (element-wise), the SGD update scales the gradient by a constant learning rate:

$$
\begin{aligned}
\eta &= \text{rate} \\
\Delta\theta_t &= \eta\,g_t
\end{aligned}
$$

where:
- $g_t$ is the current gradient,
- $\eta$ is the learning rate ($rate$).

## Parameters
| # | Name | Default | Type | Description |
|---|---|---|---|---|
Expand Down
Loading
Loading