@@ -16,7 +16,7 @@ For each parameter `p` and its gradient `dp`, this runs `p -= η*dp`.
1616- Learning rate (`η`): Amount by which gradients are discounted before updating
1717 the weights.
1818"""
19- struct Descent{T}
19+ struct Descent{T} <: AbstractRule
2020 eta:: T
2121end
2222Descent () = Descent (1f-1 )
@@ -40,7 +40,7 @@ Gradient descent optimizer with learning rate `η` and momentum `ρ`.
4040- Momentum (`ρ`): Controls the acceleration of gradient descent in the
4141 prominent direction, in effect dampening oscillations.
4242"""
43- struct Momentum{T}
43+ struct Momentum{T} <: AbstractRule
4444 eta:: T
4545 rho:: T
4646end
@@ -66,7 +66,7 @@ Gradient descent optimizer with learning rate `η` and Nesterov momentum `ρ`.
6666- Nesterov momentum (`ρ`): Controls the acceleration of gradient descent in the
6767 prominent direction, in effect dampening oscillations.
6868"""
69- struct Nesterov{T}
69+ struct Nesterov{T} <: AbstractRule
7070 eta:: T
7171 rho:: T
7272end
@@ -104,7 +104,7 @@ gradients by an estimate their variance, instead of their second moment.
104104- Keyword `centred` (or `centered`): Indicates whether to use centred variant
105105 of the algorithm.
106106"""
107- struct RMSProp{T}
107+ struct RMSProp{T} <: AbstractRule
108108 eta:: T
109109 rho:: T
110110 epsilon:: T
148148- Machine epsilon (`ϵ`): Constant to prevent division by zero
149149 (no need to change default)
150150"""
151- struct Adam{T}
151+ struct Adam{T} <: AbstractRule
152152 eta:: T
153153 beta:: Tuple{T, T}
154154 epsilon:: T
183183- Machine epsilon (`ϵ`): Constant to prevent division by zero
184184 (no need to change default)
185185"""
186- struct RAdam{T}
186+ struct RAdam{T} <: AbstractRule
187187 eta:: T
188188 beta:: Tuple{T, T}
189189 epsilon:: T
224224- Machine epsilon (`ϵ`): Constant to prevent division by zero
225225 (no need to change default)
226226"""
227- struct AdaMax{T}
227+ struct AdaMax{T} <: AbstractRule
228228 eta:: T
229229 beta:: Tuple{T, T}
230230 epsilon:: T
@@ -258,7 +258,7 @@ is a variant of Adam adding an "optimistic" term suitable for adversarial traini
258258- Machine epsilon (`ϵ`): Constant to prevent division by zero
259259 (no need to change default)
260260"""
261- struct OAdam{T}
261+ struct OAdam{T} <: AbstractRule
262262 eta:: T
263263 beta:: Tuple{T, T}
264264 epsilon:: T
@@ -293,7 +293,7 @@ Parameters don't need tuning.
293293- Machine epsilon (`ϵ`): Constant to prevent division by zero
294294 (no need to change default)
295295"""
296- struct AdaGrad{T}
296+ struct AdaGrad{T} <: AbstractRule
297297 eta:: T
298298 epsilon:: T
299299end
@@ -323,7 +323,7 @@ Parameters don't need tuning.
323323- Machine epsilon (`ϵ`): Constant to prevent division by zero
324324 (no need to change default)
325325"""
326- struct AdaDelta{T}
326+ struct AdaDelta{T} <: AbstractRule
327327 rho:: T
328328 epsilon:: T
329329end
@@ -357,7 +357,7 @@ optimiser. Parameters don't need tuning.
357357- Machine epsilon (`ϵ`): Constant to prevent division by zero
358358 (no need to change default)
359359"""
360- struct AMSGrad{T}
360+ struct AMSGrad{T} <: AbstractRule
361361 eta:: T
362362 beta:: Tuple{T, T}
363363 epsilon:: T
@@ -393,7 +393,7 @@ Parameters don't need tuning.
393393- Machine epsilon (`ϵ`): Constant to prevent division by zero
394394 (no need to change default)
395395"""
396- struct NAdam{T}
396+ struct NAdam{T} <: AbstractRule
397397 eta:: T
398398 beta:: Tuple{T, T}
399399 epsilon:: T
@@ -447,7 +447,7 @@ Adam optimiser.
447447- Machine epsilon (`ϵ::Float32`): Constant to prevent division by zero
448448 (no need to change default)
449449"""
450- struct AdaBelief{T}
450+ struct AdaBelief{T} <: AbstractRule
451451 eta:: T
452452 beta:: Tuple{T, T}
453453 epsilon:: T
@@ -479,7 +479,7 @@ This is equivalent to adding ``L_2`` regularization with coefficient ``γ`` to t
479479# Parameters
480480- Weight decay (`γ`): Decay applied to weights during optimisation.
481481"""
482- struct WeightDecay{T}
482+ struct WeightDecay{T} <: AbstractRule
483483 gamma:: T
484484end
485485WeightDecay () = WeightDecay (5f-4 )
@@ -499,7 +499,7 @@ Restricts every gradient component to obey `-δ ≤ dx[i] ≤ δ`.
499499
500500See also [`ClipNorm`](@ref).
501501"""
502- struct ClipGrad{T<: Real }
502+ struct ClipGrad{T<: Real } <: AbstractRule
503503 delta:: T
504504end
505505ClipGrad () = ClipGrad (10f0 )
@@ -524,7 +524,7 @@ which you can turn off with `throw = false`.
524524
525525See also [`ClipGrad`](@ref).
526526"""
527- struct ClipNorm{T<: Real }
527+ struct ClipNorm{T<: Real } <: AbstractRule
528528 omega:: T
529529 p:: T
530530 throw:: Bool
@@ -566,7 +566,7 @@ julia> Optimisers.update(s, m, ([0.3, 1, 7],))[2] # clips before discounting
566566([-0.03, -0.1, -0.1],)
567567```
568568"""
569- struct OptimiserChain{O<: Tuple }
569+ struct OptimiserChain{O<: Tuple } <: AbstractRule
570570 opts:: O
571571end
572572OptimiserChain (opts... ) = OptimiserChain (opts)
0 commit comments