FluxML
diff --git a/‎docs/make.jl‎
Lines changed: 24 additions & 18 deletions b/‎docs/make.jl‎
Lines changed: 24 additions & 18 deletions
diff --git a/‎docs/src/assets/flux.css‎
Lines changed: 0 additions & 2 deletions b/‎docs/src/assets/flux.css‎
Lines changed: 0 additions & 2 deletions
diff --git a/‎docs/src/assets/oneminute.png‎
326 KB b/‎docs/src/assets/oneminute.png‎
326 KB
diff --git a/‎docs/src/data/mlutils.md‎
Lines changed: 4 additions & 6 deletions b/‎docs/src/data/mlutils.md‎
Lines changed: 4 additions & 6 deletions
diff --git a/‎docs/src/destructure.md‎
Lines changed: 69 additions & 0 deletions b/‎docs/src/destructure.md‎
Lines changed: 69 additions & 0 deletions
diff --git a/‎docs/src/index.md‎
Lines changed: 15 additions & 10 deletions b/‎docs/src/index.md‎
Lines changed: 15 additions & 10 deletions
diff --git a/‎docs/src/models/activation.md‎
Lines changed: 27 additions & 2 deletions b/‎docs/src/models/activation.md‎
Lines changed: 27 additions & 2 deletions
diff --git a/‎docs/src/models/basics.md‎
Lines changed: 19 additions & 6 deletions b/‎docs/src/models/basics.md‎
Lines changed: 19 additions & 6 deletions
diff --git a/‎docs/src/models/functors.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/src/models/functors.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/src/models/layers.md‎
Lines changed: 0 additions & 9 deletions b/‎docs/src/models/layers.md‎
Lines changed: 0 additions & 9 deletions
@@ -9,37 +9,43 @@ makedocs(
     sitename = "Flux",
     # strict = [:cross_references,],
     pages = [
-        "Home" => "index.md",
+        "Getting Started" => [
+            "Welcome" => "index.md",
+            "Quick Start" => "models/quickstart.md",
+            "Fitting a Line" => "models/overview.md",
+            "Gradients and Layers" => "models/basics.md",
+        ],
         "Building Models" => [
-            "Overview" => "models/overview.md",
-            "Basics" => "models/basics.md",
+            "Built-in Layers 📚" => "models/layers.md",
             "Recurrence" => "models/recurrence.md",
-            "Layer Reference" => "models/layers.md",
-            "Loss Functions" => "models/losses.md",
-            "Regularisation" => "models/regularisation.md",
-            "Custom Layers" => "models/advanced.md",
-            "NNlib.jl" => "models/nnlib.md",
-            "Activation Functions" => "models/activation.md",
+            "Activation Functions 📚" => "models/activation.md",
+            "NNlib.jl 📚 (`softmax`, `conv`, ...)" => "models/nnlib.md",
          ],
          "Handling Data" => [
-             "MLUtils.jl" => "data/mlutils.md",
-             "OneHotArrays.jl" => "data/onehot.md",
+             "MLUtils.jl 📚 (`DataLoader`, ...)" => "data/mlutils.md",
+             "OneHotArrays.jl 📚 (`onehot`, ...)" => "data/onehot.md",
          ],
          "Training Models" => [
-             "Optimisers" => "training/optimisers.md",
              "Training" => "training/training.md",
-             "Callback Helpers" => "training/callbacks.md",
-             "Zygote.jl" => "training/zygote.md",
+             "Regularisation" => "models/regularisation.md",
+             "Loss Functions 📚" => "models/losses.md",
+             "Optimisation Rules 📚" => "training/optimisers.md",  # TODO move optimiser intro up to Training
+             "Callback Helpers 📚" => "training/callbacks.md",
+             "Zygote.jl 📚 (`gradient`, ...)" => "training/zygote.md",
          ],
-         "GPU Support" => "gpu.md",
          "Model Tools" => [
+             "GPU Support" => "gpu.md",
              "Saving & Loading" => "saving.md",
-             "Shape Inference" => "outputsize.md",
-             "Weight Initialisation" => "utilities.md",
-             "Functors.jl" => "models/functors.md",
+             "Shape Inference 📚" => "outputsize.md",
+             "Weight Initialisation 📚" => "utilities.md",
+             "Flat vs. Nested 📚" => "destructure.md",
+             "Functors.jl 📚 (`fmap`, ...)" => "models/functors.md",
          ],
          "Performance Tips" => "performance.md",
          "Flux's Ecosystem" => "ecosystem.md",
+         "Tutorials" => [  # TODO, maybe
+             "Custom Layers" => "models/advanced.md",  # TODO move freezing to Training
+         ],
     ],
     format = Documenter.HTML(
         sidebar_sitename = false,
 
@@ -100,8 +100,6 @@ article pre {
   max-width: none;
   padding: 1em;
   border-radius: 10px 0px 0px 10px;
-  margin-left: -1em;
-  margin-right: -2em;
 }
 
 .hljs-comment {
 
@@ -1,25 +1,23 @@
-# Working with data using MLUtils.jl
+# Working with Data, using MLUtils.jl
 
 Flux re-exports the `DataLoader` type and utility functions for working with
 data from [MLUtils](https://github.com/JuliaML/MLUtils.jl).
 
-## DataLoader
+## `DataLoader`
 
-`DataLoader` can be used to handle iteration over mini-batches of data.
+The `DataLoader` can be used to create mini-batches of data, in the format [`train!`](@ref Flux.train!) expects.
 
 `Flux`'s website has a [dedicated tutorial](https://fluxml.ai/tutorials/2021/01/21/data-loader.html) on `DataLoader` for more information. 
 
 ```@docs
 MLUtils.DataLoader
 ```
 
-## Utility functions for working with data
+## Utility Functions
 
 The utility functions are meant to be used while working with data;
 these functions help create inputs for your models or batch your dataset.
 
-Below is a non-exhaustive list of such utility functions.
-
 ```@docs
 MLUtils.unsqueeze
 MLUtils.flatten
 
@@ -0,0 +1,69 @@
+# [Flat vs. Nested Structures](@id man-destructure)
+
+
+A Flux model is a nested structure, with parameters stored within many layers. Sometimes you may want a flat representation of them, to interact with functions expecting just one vector. This is provided by `destructure`:
+
+```julia
+julia> model = Chain(Dense(2=>1, tanh), Dense(1=>1))
+Chain(
+  Dense(2 => 1, tanh),                  # 3 parameters
+  Dense(1 => 1),                        # 2 parameters
+)                   # Total: 4 arrays, 5 parameters, 276 bytes.
+
+julia> flat, rebuild = Flux.destructure(model)
+(Float32[0.863101, 1.2454957, 0.0, -1.6345707, 0.0], Restructure(Chain, ..., 5))
+
+julia> rebuild(zeros(5))  # same structure, new parameters
+Chain(
+  Dense(2 => 1, tanh),                  # 3 parameters  (all zero)
+  Dense(1 => 1),                        # 2 parameters  (all zero)
+)                   # Total: 4 arrays, 5 parameters, 276 bytes.
+```
+
+Both `destructure` and the `Restructure` function can be used within gradient computations. For instance, this computes the Hessian `∂²L/∂θᵢ∂θⱼ` of some loss function, with respect to all parameters of the Flux model. The resulting matrix has off-diagonal entries, which cannot really be expressed in a nested structure:
+
+```julia
+julia> x = rand(Float32, 2, 16);
+
+julia> grad = gradient(m -> sum(abs2, m(x)), model)  # nested gradient
+((layers = ((weight = Float32[10.339018 11.379145], bias = Float32[22.845667], σ = nothing), (weight = Float32[-29.565302;;], bias = Float32[-37.644184], σ = nothing)),),)
+
+julia> function loss(v::Vector)
+         m = rebuild(v)
+         y = m(x)
+         sum(abs2, y)
+       end;
+
+julia> gradient(loss, flat)  # flat gradient, same numbers
+(Float32[10.339018, 11.379145, 22.845667, -29.565302, -37.644184],)
+
+julia> Zygote.hessian(loss, flat)  # second derivative
+5×5 Matrix{Float32}:
+  -7.13131   -5.54714  -11.1393  -12.6504   -8.13492
+  -5.54714   -7.11092  -11.0208  -13.9231   -9.36316
+ -11.1393   -11.0208   -13.7126  -27.9531  -22.741
+ -12.6504   -13.9231   -27.9531   18.0875   23.03
+  -8.13492   -9.36316  -22.741    23.03     32.0
+
+julia> Flux.destructure(grad)  # acts on non-models, too
+(Float32[10.339018, 11.379145, 22.845667, -29.565302, -37.644184], Restructure(Tuple, ..., 5))
+```
+
+### All Parameters
+
+The function `destructure` now lives in [`Optimisers.jl`](https://github.com/FluxML/Optimisers.jl).
+(Be warned this package is unrelated to the `Flux.Optimisers` sub-module! The confusion is temporary.)
+
+```@docs
+Optimisers.destructure
+Optimisers.trainable
+Optimisers.isnumeric
+```
+
+### All Layers
+
+Another kind of flat view of a nested model is provided by the `modules` command. This extracts a list of all layers:
+
+```@docs
+Flux.modules
+```
@@ -1,26 +1,31 @@
 # Flux: The Julia Machine Learning Library
 
-Flux is a library for machine learning geared towards high-performance production pipelines. It comes "batteries-included" with many useful tools built in, but also lets you use the full power of the Julia language where you need it. We follow a few key principles:
+Flux is a library for machine learning. It comes "batteries-included" with many useful tools built in, but also lets you use the full power of the Julia language where you need it. We follow a few key principles:
 
 * **Doing the obvious thing**. Flux has relatively few explicit APIs for features like regularisation or embeddings. Instead, writing down the mathematical form will work – and be fast.
-* **Extensible by default**. Flux is written to be highly extensible and flexible while being performant. Extending Flux is as simple as using your own code as part of the model you want - it is all [high-level Julia code](https://github.com/FluxML/Flux.jl/blob/ec16a2c77dbf6ab8b92b0eecd11661be7a62feef/src/layers/recurrent.jl#L131). When in doubt, it’s well worth looking at [the source](https://github.com/FluxML/Flux.jl/). If you need something different, you can easily roll your own.
-* **Performance is key**. Flux integrates with high-performance AD tools such as [Zygote.jl](https://github.com/FluxML/Zygote.jl) for generating fast code. Flux optimizes both CPU and GPU performance. Scaling workloads easily to multiple GPUs can be done with the help of Julia's [GPU tooling](https://github.com/JuliaGPU/CUDA.jl) and projects like [DaggerFlux.jl](https://github.com/DhairyaLGandhi/DaggerFlux.jl).
-* **Play nicely with others**. Flux works well with Julia libraries from [data frames](https://github.com/JuliaComputing/JuliaDB.jl) and [images](https://github.com/JuliaImages/Images.jl) to [differential equation solvers](https://github.com/JuliaDiffEq/DifferentialEquations.jl), so you can easily build complex data processing pipelines that integrate Flux models.
+* **Extensible by default**. Flux is written to be highly extensible and flexible while being performant. Extending Flux is as simple as using your own code as part of the model you want - it is all [high-level Julia code](https://github.com/FluxML/Flux.jl/blob/ec16a2c77dbf6ab8b92b0eecd11661be7a62feef/src/layers/recurrent.jl#L131). When in doubt, it’s well worth looking at [the source](https://github.com/FluxML/Flux.jl/tree/master/src). If you need something different, you can easily roll your own.
+* **Play nicely with others**. Flux works well with Julia libraries from [images](https://github.com/JuliaImages/Images.jl) to [differential equation solvers](https://github.com/SciML/DifferentialEquations.jl), so you can easily build complex data processing pipelines that integrate Flux models.
 
 ## Installation
 
-Download [Julia 1.6](https://julialang.org/) or later, if you haven't already. You can add Flux using Julia's package manager, by typing `] add Flux` in the Julia prompt.
+Download [Julia 1.6](https://julialang.org/downloads/) or later, preferably the current stable release. You can add Flux using Julia's package manager, by typing `] add Flux` in the Julia prompt.
 
-If you have CUDA you can also run `] add CUDA` to get GPU support; see [here](gpu.md) for more details.
+This will automatically install several other packages, including [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) which supports Nvidia GPUs. To directly access some of its functionality, you may want to add `] add CUDA` too. The page on [GPU support](gpu.md) has more details.
 
-NOTE: Flux used to have a CuArrays.jl dependency until v0.10.4, replaced by CUDA.jl in v0.11.0. If you're upgrading Flux from v0.10.4 or a lower version, you may need to remove CuArrays (run `] rm CuArrays`) before you can upgrade.
+Other closely associated packages, also installed automatically, include [Zygote](https://github.com/FluxML/Zygote.jl), [Optimisers](https://github.com/FluxML/Optimisers.jl), [NNlib](https://github.com/FluxML/NNlib.jl), [Functors](https://github.com/FluxML/Functors.jl) and [MLUtils](https://github.com/JuliaML/MLUtils.jl).
 
 ## Learning Flux
 
-There are several different ways to learn Flux. If you just want to get started writing models, the [model zoo](https://github.com/FluxML/model-zoo/) gives good starting points for many common ones. This documentation provides a reference to all of Flux's APIs, as well as a from-scratch introduction to Flux's take on models and how they work. Once you understand these docs, congratulations, you also understand [Flux's source code](https://github.com/FluxML/Flux.jl), which is intended to be concise, legible and a good reference for more advanced concepts.
+The [quick start](models/quickstart.md) page trains a simple neural network.
+
+This rest of this documentation provides a from-scratch introduction to Flux's take on models and how they work, starting with [fitting a line](models/overview.md). Once you understand these docs, congratulations, you also understand [Flux's source code](https://github.com/FluxML/Flux.jl), which is intended to be concise, legible and a good reference for more advanced concepts.
+
+Sections with 📚 contain API listings. The same text is avalable at the Julia prompt, by typing for example `?gpu`.
+
+If you just want to get started writing models, the [model zoo](https://github.com/FluxML/model-zoo/) gives good starting points for many common ones.
 
 ## Community
 
-All Flux users are welcome to join our community on the [Julia forum](https://discourse.julialang.org/), or the [slack](https://discourse.julialang.org/t/announcing-a-julia-slack/4866) (channel #machine-learning). If you have questions or issues we'll try to help you out.
+Everyone is welcome to join our community on the [Julia discourse forum](https://discourse.julialang.org/), or the [slack chat](https://discourse.julialang.org/t/announcing-a-julia-slack/4866) (channel #machine-learning). If you have questions or issues we'll try to help you out.
 
-If you're interested in hacking on Flux, the [source code](https://github.com/FluxML/Flux.jl) is open and easy to understand -- it's all just the same Julia code you work with normally. You might be interested in our [intro issues](https://github.com/FluxML/Flux.jl/labels/good%20first%20issue) to get started or our [contributing guide](https://github.com/FluxML/Flux.jl/blob/master/CONTRIBUTING.md).
+If you're interested in hacking on Flux, the [source code](https://github.com/FluxML/Flux.jl) is open and easy to understand -- it's all just the same Julia code you work with normally. You might be interested in our [intro issues](https://github.com/FluxML/Flux.jl/labels/good%20first%20issue) to get started, or our [contributing guide](https://github.com/FluxML/Flux.jl/blob/master/CONTRIBUTING.md).
@@ -5,6 +5,10 @@ These non-linearities used between layers of your model are exported by the [NNl
 
 Note that, unless otherwise stated, activation functions operate on scalars. To apply them to an array you can call `σ.(xs)`, `relu.(xs)` and so on. Alternatively, they can be passed to a layer like `Dense(784 => 1024, relu)` which will handle this broadcasting.
 
+Functions like [`softmax`](@ref) are sometimes described as activation functions, but not by Flux. They must see all the outputs, and hence cannot be broadcasted. See the next page for details.
+
+### Alphabetical Listing
+
 ```@docs
 celu
 elu
@@ -32,8 +36,29 @@ tanhshrink
 trelu
 ```
 
-Julia's `Base.Math` also provide `tanh`, which can be used as an activation function:
+### One More
+
+Julia's `Base.Math` also provides `tanh`, which can be used as an activation function.
+
+Note that many Flux layers will automatically replace this with [`NNlib.tanh_fast`](@ref) when called, as Base's `tanh` is slow enough to sometimes be a bottleneck.
+
+```julia
+julia> using UnicodePlots
+
+julia> lineplot(tanh, -3, 3, height=7)
+           ┌────────────────────────────────────────┐        
+         1 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⣀⠤⠔⠒⠒⠉⠉⠉⠉⠉⠉⠉⠉⠉│ tanh(x)
+           │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⡠⠖⠋⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│        
+           │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⡰⠊⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│        
+   f(x)    │⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⡤⡯⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤│        
+           │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠎⠁⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│        
+           │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠴⠊⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│        
+        -1 │⣀⣀⣀⣀⣀⣀⣀⣀⣀⡤⠤⠔⠒⠉⠁⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│        
+           └────────────────────────────────────────┘        
+           ⠀-3⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀3⠀        
+           ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀        
+```
 
 ```@docs
 tanh
-```
+```
@@ -1,4 +1,4 @@
-# Flux Basics
+# [How Flux Works: Gradients and Layers](@id man-basics)
 
 ## Taking Gradients
 
@@ -211,14 +211,27 @@ m = Chain(x -> x^2, x -> x+1)
 m(5) # => 26
 ```
 
-## Layer helpers
+## Layer Helpers
 
-Flux provides a set of helpers for custom layers, which you can enable by calling
+There is still one problem with this `Affine` layer, that Flux does not know to look inside it. This means that [`Flux.train!`](@ref) won't see its parameters, nor will [`gpu`](@ref) be able to move them to your GPU. These features are enabled by the `@functor` macro:
 
-```julia
+```
 Flux.@functor Affine
 ```
 
-This enables a useful extra set of functionality for our `Affine` layer, such as [collecting its parameters](../training/optimisers.md) or [moving it to the GPU](../gpu.md).
+Finally, most Flux layers make bias optional, and allow you to supply the function used for generating random weights. We can easily add these refinements to the `Affine` layer as follows:
+
+```
+function Affine((in, out)::Pair; bias=true, init=Flux.randn32)
+  W = init(out, in)
+  b = Flux.create_bias(W, bias, out)
+  Affine(W, b)
+end
+
+Affine(3 => 1, bias=false, init=ones) |> gpu
+```
 
-For some more helpful tricks, including parameter freezing, please checkout the [advanced usage guide](advanced.md).
+```@docs
+Functors.@functor
+Flux.create_bias
+```
@@ -4,7 +4,7 @@ Flux models are deeply nested structures, and [Functors.jl](https://github.com/F
 
 New layers should be annotated using the `Functors.@functor` macro. This will enable [`params`](@ref Flux.params) to see the parameters inside, and [`gpu`](@ref) to move them to the GPU.
 
-`Functors.jl` has its own [notes on basic usage](https://fluxml.ai/Functors.jl/stable/#Basic-Usage-and-Implementation) for more details. Additionally, the [Advanced Model Building and Customisation](@ref Advanced-Model-Building-and-Customisation) page covers the use cases of `Functors` in greater details.
+`Functors.jl` has its own [notes on basic usage](https://fluxml.ai/Functors.jl/stable/#Basic-Usage-and-Implementation) for more details. Additionally, the [Advanced Model Building and Customisation](../models/advanced.md) page covers the use cases of `Functors` in greater details.
 
 ```@docs
 Functors.@functor
 
@@ -86,12 +86,3 @@ Many normalisation layers behave differently under training and inference (testi
 Flux.testmode!
 trainmode!
 ```
-
-
-## Listing All Layers
-
-The `modules` command uses Functors to extract a flat list of all layers:
-
-```@docs
-Flux.modules
-```
Original file line number	Diff line number	Diff line change
`@@ -100,8 +100,6 @@ article pre {`
`100`	`100`	`max-width: none;`
`101`	`101`	`padding: 1em;`
`102`	`102`	`border-radius: 10px 0px 0px 10px;`
`103`		`- margin-left: -1em;`
`104`		`- margin-right: -2em;`
`105`	`103`	`}`
`106`	`104`
`107`	`105`	`.hljs-comment {`