You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Print channel dimensions of Dense like those of Conv (#1658)
* print channel dims of Dense like Conv, and accept as input
* do the same for Bilinear
* fix tests
* fix tests
* docstring
* change a few more
* update
* docs
* rm circular ref
* fixup
* news + fixes
Copy file name to clipboardExpand all lines: docs/src/gpu.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -39,12 +39,12 @@ Note that we convert both the parameters (`W`, `b`) and the data set (`x`, `y`)
39
39
If you define a structured model, like a `Dense` layer or `Chain`, you just need to convert the internal parameters. Flux provides `fmap`, which allows you to alter all parameters of a model at once.
40
40
41
41
```julia
42
-
d =Dense(10,5, σ)
42
+
d =Dense(10=>5, σ)
43
43
d =fmap(cu, d)
44
44
d.weight # CuArray
45
45
d(cu(rand(10))) # CuArray output
46
46
47
-
m =Chain(Dense(10, 5, σ), Dense(5,2), softmax)
47
+
m =Chain(Dense(10=>5, σ), Dense(5=>2), softmax)
48
48
m =fmap(cu, m)
49
49
d(cu(rand(10)))
50
50
```
@@ -54,8 +54,8 @@ As a convenience, Flux provides the `gpu` function to convert models and data to
Congratulations! You just built the `Dense` layer that comes with Flux. Flux has many interesting layers available, but they're all things you could have built yourself very easily.
160
160
161
-
(There is one small difference with `Dense` – for convenience it also takes an activation function, like `Dense(10, 5, σ)`.)
161
+
(There is one small difference with `Dense` – for convenience it also takes an activation function, like `Dense(10 => 5, σ)`.)
162
162
163
163
## Stacking It Up
164
164
165
165
It's pretty common to write models that look something like:
166
166
167
167
```julia
168
-
layer1 =Dense(10,5, σ)
168
+
layer1 =Dense(10=>5, σ)
169
169
# ...
170
170
model(x) =layer3(layer2(layer1(x)))
171
171
```
@@ -175,7 +175,7 @@ For long chains, it might be a bit more intuitive to have a list of layers, like
175
175
```julia
176
176
using Flux
177
177
178
-
layers = [Dense(10, 5, σ), Dense(5,2), softmax]
178
+
layers = [Dense(10=>5, σ), Dense(5=>2), softmax]
179
179
180
180
model(x) =foldl((x, m) ->m(x), layers, init = x)
181
181
@@ -186,8 +186,8 @@ Handily, this is also provided for in Flux:
186
186
187
187
```julia
188
188
model2 =Chain(
189
-
Dense(10,5, σ),
190
-
Dense(5,2),
189
+
Dense(10=>5, σ),
190
+
Dense(5=>2),
191
191
softmax)
192
192
193
193
model2(rand(10)) # => 2-element vector
@@ -198,7 +198,7 @@ This quickly starts to look like a high-level deep learning library; yet you can
198
198
A nice property of this approach is that because "models" are just functions (possibly with trainable parameters), you can also see this as simple function composition.
Copy file name to clipboardExpand all lines: docs/src/models/overview.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,8 +43,8 @@ Normally, your training and test data come from real world observations, but thi
43
43
Now, build a model to make predictions with `1` input and `1` output:
44
44
45
45
```julia
46
-
julia> model =Dense(1,1)
47
-
Dense(1,1)
46
+
julia> model =Dense(1=>1)
47
+
Dense(1=>1)
48
48
49
49
julia> model.weight
50
50
1×1 Matrix{Float32}:
@@ -58,10 +58,10 @@ julia> model.bias
58
58
Under the hood, a dense layer is a struct with fields `weight` and `bias`. `weight` represents a weights' matrix and `bias` represents a bias vector. There's another way to think about a model. In Flux, *models are conceptually predictive functions*:
59
59
60
60
```julia
61
-
julia> predict =Dense(1,1)
61
+
julia> predict =Dense(1=>1)
62
62
```
63
63
64
-
`Dense(1, 1)` also implements the function `σ(Wx+b)` where `W` and `b` are the weights and biases. `σ` is an activation function (more on activations later). Our model has one weight and one bias, but typical models will have many more. Think of weights and biases as knobs and levers Flux can use to tune predictions. Activation functions are transformations that tailor models to your needs.
64
+
`Dense(1 => 1)` also implements the function `σ(Wx+b)` where `W` and `b` are the weights and biases. `σ` is an activation function (more on activations later). Our model has one weight and one bias, but typical models will have many more. Think of weights and biases as knobs and levers Flux can use to tune predictions. Activation functions are transformations that tailor models to your needs.
65
65
66
66
This model will already make predictions, though not accurate ones yet:
67
67
@@ -185,7 +185,7 @@ The predictions are good. Here's how we got there.
185
185
186
186
First, we gathered real-world data into the variables `x_train`, `y_train`, `x_test`, and `y_test`. The `x_*` data defines inputs, and the `y_*` data defines outputs. The `*_train` data is for training the model, and the `*_test` data is for verifying the model. Our data was based on the function `4x + 2`.
187
187
188
-
Then, we built a single input, single output predictive model, `predict = Dense(1, 1)`. The initial predictions weren't accurate, because we had not trained the model yet.
188
+
Then, we built a single input, single output predictive model, `predict = Dense(1 => 1)`. The initial predictions weren't accurate, because we had not trained the model yet.
189
189
190
190
After building the model, we trained it with `train!(loss, parameters, data, opt)`. The loss function is first, followed by the `parameters` holding the weights and biases of the model, the training data, and the `Descent` optimizer provided by Flux. We ran the training step once, and observed that the parameters changed and the loss went down. Then, we ran the `train!` many times to finish the training process.
0 commit comments