@@ -97,8 +97,8 @@ The simplest kind of optimisation using the gradient is termed *gradient descent
9797(or sometimes * stochastic gradient descent* when it is applied to individual examples
9898in a loop, not to the entire dataset at once).
9999
100- This needs a * learning rate* which is a small number describing how fast to walk downhill,
101- usually written as the Greek letter "eta", ` η ` .
100+ Gradient descent needs a * learning rate* which is a small number describing how fast to walk downhill,
101+ usually written as the Greek letter "eta", ` η ` . This is what it does:
102102
103103``` julia
104104η = 0.01 # learning rate
@@ -110,16 +110,14 @@ fmap(model, grads[1]) do p, g
110110end
111111```
112112
113- This is wrapped up as a function [ ` update! ` ] (@ref Flux.Optimise.update!), which can be used as follows:
114-
115- ``` julia
116- Flux. update! (Descent (0.01 ), model, grads[1 ])
117- ```
113+ This update of all parameters is wrapepd up as a function [ ` update! ` ] (@ref Flux.Optimise.update!)` (opt, model, grads[1]) ` .
118114
119115There are many other optimisation rules, which adjust the step size and direction.
120- Most require some memory of the gradients from earlier steps. The function [ ` setup ` ] (@ref Flux.Train.setup)
121- creates the necessary storage for this, for a particular model. This should be done
122- once, before training, and looks like this:
116+ Most require some memory of the gradients from earlier steps, rather than always
117+ walking straight downhill. The function [ ` setup ` ] (@ref Flux.Train.setup) creates the
118+ necessary storage for this, for a particular model.
119+ It should be called once, before training, and returns a tree-like object which is the
120+ first argument of ` update! ` . Like this:
123121
124122``` julia
125123# Initialise momentum
@@ -128,7 +126,7 @@ opt = Flux.setup(Adam(0.001), model)
128126for data in train_set
129127 ...
130128
131- #
129+ # Update both model parameters and optimiser state:
132130 Flux. update! (opt, model, grads[1 ])
133131end
134132```
@@ -138,7 +136,7 @@ These are listed on the [optimisers](@ref man-optimisers) page.
138136
139137
140138!!! note "Implicit-style optimiser state"
141- This ` setep ` makes another tree-like structure. Old versions of Flux did not do this,
139+ This ` setup ` makes another tree-like structure. Old versions of Flux did not do this,
142140 and instead stored a dictionary-like structure within the optimiser ` Adam(0.001) ` .
143141 This was initialised on first use of the version of ` update! ` for "implicit" parameters.
144142
@@ -266,12 +264,14 @@ for epoch in 1:100
266264end
267265```
268266
269-
270267## Implicit vs Explicit
271268
272269Flux used to handle gradients, training, and optimisation rules quite differently.
273270The new style described above is called "explicit" by Zygote, and the old style "implicit".
274271Flux 0.13 is the transitional version which supports both.
275272
276- For full details on the implicit style, see [ Flux 0.13.6 documentation] ( https://fluxml.ai/Flux.jl/v0.13.6/training/training/ ) .
273+ The blue boxes above describe the changes.
274+ For more details on training in the implicit style, see [ Flux 0.13.6 documentation] ( https://fluxml.ai/Flux.jl/v0.13.6/training/training/ ) .
275+
276+ For details about the two gradient modes, see [ Zygote's documentation] ( https://fluxml.ai/Zygote.jl/dev/#Explicit-and-Implicit-Parameters-1 ) .
277277
0 commit comments