|
1 | 1 | # [How Flux Works: Gradients and Layers](@id man-basics) |
2 | 2 |
|
3 | | -## Taking Gradients |
| 3 | +## [Taking Gradients](@id man-taking-gradients) |
4 | 4 |
|
5 | 5 | Flux's core feature is taking gradients of Julia code. The `gradient` function takes another Julia function `f` and a set of arguments, and returns the gradient with respect to each argument. (It's a good idea to try pasting these examples in the Julia terminal.) |
6 | 6 |
|
@@ -29,35 +29,77 @@ julia> gradient(f, [2, 1], [2, 0]) |
29 | 29 | ([0.0, 2.0], [-0.0, -2.0]) |
30 | 30 | ``` |
31 | 31 |
|
32 | | -These gradients are based on `x` and `y`. Flux works by instead taking gradients based on the weights and biases that make up the parameters of a model. |
| 32 | +These gradients are based on `x` and `y`. Flux works by instead taking gradients based on the weights and biases that make up the parameters of a model. |
33 | 33 |
|
34 | | - |
35 | | -Machine learning often can have *hundreds* of parameters, so Flux lets you work with collections of parameters, via the `params` functions. You can get the gradient of all parameters used in a program without explicitly passing them in. |
| 34 | +Machine learning often can have *hundreds* of parameter arrays. |
| 35 | +Instead of passing them to `gradient` individually, we can store them together in a structure. |
| 36 | +The simplest example is a named tuple, created by the following syntax: |
36 | 37 |
|
37 | 38 | ```jldoctest basics |
38 | | -julia> x = [2, 1]; |
| 39 | +julia> nt = (a = [2, 1], b = [2, 0], c = tanh); |
| 40 | +
|
| 41 | +julia> g(x::NamedTuple) = sum(abs2, x.a .- x.b); |
| 42 | +
|
| 43 | +julia> g(nt) |
| 44 | +1 |
| 45 | +
|
| 46 | +julia> dg_nt = gradient(g, nt)[1] |
| 47 | +(a = [0.0, 2.0], b = [-0.0, -2.0], c = nothing) |
| 48 | +``` |
| 49 | + |
| 50 | +Notice that `gradient` has returned a matching structure. The field `dg_nt.a` is the gradient |
| 51 | +for `nt.a`, and so on. Some fields have no gradient, indicated by `nothing`. |
39 | 52 |
|
40 | | -julia> y = [2, 0]; |
| 53 | +Rather than define a function like `g` every time (and think up a name for it), |
| 54 | +it is often useful to use anonymous functions: this one is `x -> sum(abs2, x.a .- x.b)`. |
| 55 | +Anonymous functions can be defined either with `->` or with `do`, |
| 56 | +and such `do` blocks are often useful if you have a few steps to perform: |
| 57 | + |
| 58 | +```jldoctest basics |
| 59 | +julia> gradient((x, y) -> sum(abs2, x.a ./ y .- x.b), nt, [1, 2]) |
| 60 | +((a = [0.0, 0.5], b = [-0.0, -1.0], c = nothing), [-0.0, -0.25]) |
41 | 61 |
|
42 | | -julia> gs = gradient(Flux.params(x, y)) do |
43 | | - f(x, y) |
| 62 | +julia> gradient(nt, [1, 2]) do x, y |
| 63 | + z = x.a ./ y |
| 64 | + sum(abs2, z .- x.b) |
44 | 65 | end |
45 | | -Grads(...) |
| 66 | +((a = [0.0, 0.5], b = [-0.0, -1.0], c = nothing), [-0.0, -0.25]) |
| 67 | +``` |
46 | 68 |
|
47 | | -julia> gs[x] |
48 | | -2-element Vector{Float64}: |
49 | | - 0.0 |
50 | | - 2.0 |
| 69 | +Sometimes you may want to know the value of the function, as well as its gradient. |
| 70 | +Rather than calling the function a second time, you can call [`withgradient`](@ref Zygote.withgradient) instead: |
51 | 71 |
|
52 | | -julia> gs[y] |
53 | | -2-element Vector{Float64}: |
54 | | - -0.0 |
55 | | - -2.0 |
56 | 72 | ``` |
| 73 | +julia> Flux.withgradient(g, nt) |
| 74 | +(val = 1, grad = ((a = [0.0, 2.0], b = [-0.0, -2.0], c = nothing),)) |
| 75 | +``` |
| 76 | + |
| 77 | +!!! note "Implicit gradients" |
| 78 | + Flux used to handle many parameters in a different way, using the [`params`](@ref Flux.params) function. |
| 79 | + This uses a method of `gradient` which takes a zero-argument function, and returns a dictionary |
| 80 | + through which the resulting gradients can be looked up: |
| 81 | + |
| 82 | + ```jldoctest basics |
| 83 | + julia> x = [2, 1]; |
| 84 | + |
| 85 | + julia> y = [2, 0]; |
| 86 | + |
| 87 | + julia> gs = gradient(Flux.params(x, y)) do |
| 88 | + f(x, y) |
| 89 | + end |
| 90 | + Grads(...) |
| 91 | + |
| 92 | + julia> gs[x] |
| 93 | + 2-element Vector{Float64}: |
| 94 | + 0.0 |
| 95 | + 2.0 |
57 | 96 |
|
58 | | -Here, `gradient` takes a zero-argument function; no arguments are necessary because the `params` tell it what to differentiate. |
| 97 | + julia> gs[y] |
| 98 | + 2-element Vector{Float64}: |
| 99 | + -0.0 |
| 100 | + -2.0 |
| 101 | + ``` |
59 | 102 |
|
60 | | -This will come in really handy when dealing with big, complicated models. For now, though, let's start with something simple. |
61 | 103 |
|
62 | 104 | ## Building Simple Models |
63 | 105 |
|
|
0 commit comments