Merge pull request #36 from huyouare/patch-1

karpathy · karpathy · commit eadb4919e4e2 · 2015-05-21T20:21:01.000-07:00
Fix Typo in optimization-2.md and Fix Link in neural-networks-1.md
diff --git a/neural-networks-1.md b/neural-networks-1.md
@@ -204,5 +204,5 @@ In summary,
 ## Additional References
 
 - [deeplearning.net tutorial](http://www.deeplearning.net/tutorial/mlp.html) with Theano
-- [ConvNetJS]() demos for intuitions
+- [ConvNetJS](http://cs.stanford.edu/people/karpathy/convnetjs/) demos for intuitions
 - [Michael Nielsen's](http://neuralnetworksanddeeplearning.com/chap1.html) tutorials
diff --git a/optimization-2.md b/optimization-2.md
@@ -77,7 +77,7 @@ f = q * z # f becomes -12
 
 # perform the backward pass (backpropagation) in reverse order:
 # first backprop through f = q * z
-dfdz = q # df/fz = q, so gradient on z becomes 3
+dfdz = q # df/dz = q, so gradient on z becomes 3
 dfdq = z # df/dq = z, so gradient on q becomes -4
 # now backprop through q = x + y
 dfdx = 1.0 * dfdq # dq/dx = 1. And the multiplication here is the chain rule!
@@ -104,7 +104,7 @@ Notice that backpropagation is a beautifully local process. Every gate in a circ
 
 > This extra multiplication (for each input) due to the chain rule can turn a single and relatively useless gate into a cog in a complex circuit such as an entire neural network.
 
-Lets get an intuition for how this works by referring again to the example. The add gate received inputs [-2, 5] and computed output 3. Since the gate is computing the addition operation, its local gradient for both of its inputs is +1. The rest of the circuit computed the final value, which is 12. During the backward pass in which the chain rule is applied recursively backwards through the circuit, the add gate (which is an input to the multiply gate) learns that the gradient for its output was -4. If we anthropomorphize the circuit as wanting to output a higher value (which can help with intuition), then we can think of the circuit as "wanting" the output of the add gate to be lower (due to negative sign), and with a *force* of 4. To continue the recurrence and to chain the gradient, the add gate takes that gradient and multiplies it to all of the local gradients for its inputs (making the gradient on both **x** and **y** 1 * -4 = -4). Notice that this has the desired effect: If **x,y** were to decrease (responding to their negative gradient) then the add gate's output would decrease, which in turn makes the multiply gate's output increase.
+Lets get an intuition for how this works by referring again to the example. The add gate received inputs [-2, 5] and computed output 3. Since the gate is computing the addition operation, its local gradient for both of its inputs is +1. The rest of the circuit computed the final value, which is -12. During the backward pass in which the chain rule is applied recursively backwards through the circuit, the add gate (which is an input to the multiply gate) learns that the gradient for its output was -4. If we anthropomorphize the circuit as wanting to output a higher value (which can help with intuition), then we can think of the circuit as "wanting" the output of the add gate to be lower (due to negative sign), and with a *force* of 4. To continue the recurrence and to chain the gradient, the add gate takes that gradient and multiplies it to all of the local gradients for its inputs (making the gradient on both **x** and **y** 1 * -4 = -4). Notice that this has the desired effect: If **x,y** were to decrease (responding to their negative gradient) then the add gate's output would decrease, which in turn makes the multiply gate's output increase.
 
 Backpropagation can thus be thought of as gates communicating to each other (through the gradient signal) whether they want their outputs to increase or decrease (and how strongly), so as to make the final output value higher.