Skip to content

Commit eadb491

Browse files
committed
Merge pull request #36 from huyouare/patch-1
Fix Typo in optimization-2.md and Fix Link in neural-networks-1.md
2 parents 6c08246 + b2e99ff commit eadb491

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

neural-networks-1.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -204,5 +204,5 @@ In summary,
204204
## Additional References
205205

206206
- [deeplearning.net tutorial](http://www.deeplearning.net/tutorial/mlp.html) with Theano
207-
- [ConvNetJS]() demos for intuitions
207+
- [ConvNetJS](http://cs.stanford.edu/people/karpathy/convnetjs/) demos for intuitions
208208
- [Michael Nielsen's](http://neuralnetworksanddeeplearning.com/chap1.html) tutorials

optimization-2.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ f = q * z # f becomes -12
7777

7878
# perform the backward pass (backpropagation) in reverse order:
7979
# first backprop through f = q * z
80-
dfdz = q # df/fz = q, so gradient on z becomes 3
80+
dfdz = q # df/dz = q, so gradient on z becomes 3
8181
dfdq = z # df/dq = z, so gradient on q becomes -4
8282
# now backprop through q = x + y
8383
dfdx = 1.0 * dfdq # dq/dx = 1. And the multiplication here is the chain rule!
@@ -104,7 +104,7 @@ Notice that backpropagation is a beautifully local process. Every gate in a circ
104104

105105
> This extra multiplication (for each input) due to the chain rule can turn a single and relatively useless gate into a cog in a complex circuit such as an entire neural network.
106106
107-
Lets get an intuition for how this works by referring again to the example. The add gate received inputs [-2, 5] and computed output 3. Since the gate is computing the addition operation, its local gradient for both of its inputs is +1. The rest of the circuit computed the final value, which is 12. During the backward pass in which the chain rule is applied recursively backwards through the circuit, the add gate (which is an input to the multiply gate) learns that the gradient for its output was -4. If we anthropomorphize the circuit as wanting to output a higher value (which can help with intuition), then we can think of the circuit as "wanting" the output of the add gate to be lower (due to negative sign), and with a *force* of 4. To continue the recurrence and to chain the gradient, the add gate takes that gradient and multiplies it to all of the local gradients for its inputs (making the gradient on both **x** and **y** 1 * -4 = -4). Notice that this has the desired effect: If **x,y** were to decrease (responding to their negative gradient) then the add gate's output would decrease, which in turn makes the multiply gate's output increase.
107+
Lets get an intuition for how this works by referring again to the example. The add gate received inputs [-2, 5] and computed output 3. Since the gate is computing the addition operation, its local gradient for both of its inputs is +1. The rest of the circuit computed the final value, which is -12. During the backward pass in which the chain rule is applied recursively backwards through the circuit, the add gate (which is an input to the multiply gate) learns that the gradient for its output was -4. If we anthropomorphize the circuit as wanting to output a higher value (which can help with intuition), then we can think of the circuit as "wanting" the output of the add gate to be lower (due to negative sign), and with a *force* of 4. To continue the recurrence and to chain the gradient, the add gate takes that gradient and multiplies it to all of the local gradients for its inputs (making the gradient on both **x** and **y** 1 * -4 = -4). Notice that this has the desired effect: If **x,y** were to decrease (responding to their negative gradient) then the add gate's output would decrease, which in turn makes the multiply gate's output increase.
108108

109109
Backpropagation can thus be thought of as gates communicating to each other (through the gradient signal) whether they want their outputs to increase or decrease (and how strongly), so as to make the final output value higher.
110110

0 commit comments

Comments
 (0)