You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lectures/back_prop.md
+10-11Lines changed: 10 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,7 +37,7 @@ We'll describe the following concepts that are brick and mortar for neural netwo
37
37
* an activation function
38
38
* a network of neurons
39
39
* A neural network as a composition of functions
40
-
* back-propogation and its relationship to the chain rule of differential calculus
40
+
* back-propagation and its relationship to the chain rule of differential calculus
41
41
42
42
43
43
## A Deep (but not Wide) Artificial Neural Network
@@ -172,22 +172,22 @@ $$ (eq:sgd)
172
172
173
173
where $\frac{d {\mathcal L}}{dx_{N+1}}=-\left(x_{N+1}-y\right)$ and $\alpha > 0 $ is a step size.
174
174
175
-
(See [this](https://en.wikipedia.org/wiki/Gradient_descent#Description) and [this](https://en.wikipedia.org/wiki/Newton%27s_method)) to gather insights about how stochastic gradient descent
175
+
(See [this](https://en.wikipedia.org/wiki/Gradient_descent#Description) and [this](https://en.wikipedia.org/wiki/Newton%27s_method) to gather insights about how stochastic gradient descent
176
176
relates to Newton's method.)
177
177
178
178
To implement one step of this parameter update rule, we want the vector of derivatives $\frac{dx_{N+1}}{dp_k}$.
179
179
180
-
In the neural network literature, this step is accomplished by what is known as **back propogation**
180
+
In the neural network literature, this step is accomplished by what is known as **back propagation**.
181
181
182
-
## Back Propogation and the Chain Rule
182
+
## Back Propagation and the Chain Rule
183
183
184
184
Thanks to properties of
185
185
186
186
* the chain and product rules for differentiation from differential calculus, and
187
187
188
188
* lower triangular matrices
189
189
190
-
back propogation can actually be accomplished in one step by
190
+
back propagation can actually be accomplished in one step by
191
191
192
192
* inverting a lower triangular matrix, and
193
193
@@ -284,7 +284,7 @@ We can then solve the above problem by applying our update for $p$ multiple time
284
284
285
285
Choosing a training set amounts to a choice of measure $\mu$ in the above formulation of our function approximation problem as a minimization problem.
286
286
287
-
In this spirit, we shall use a uniform grid of, say, 50 or 200 or $\ldots$ points.
287
+
In this spirit, we shall use a uniform grid of, say, 50 or 200 points.
288
288
289
289
There are many possible approaches to the minimization problem posed above:
290
290
@@ -294,7 +294,7 @@ There are many possible approaches to the minimization problem posed above:
It is fun to think about how deepening the neural net for the above example affects the quality of approximation
513
512
514
513
515
-
* if the network is too deep, you'll run into the [vanishing gradient problem](http://neuralnetworksanddeeplearning.com/chap5.html)
516
-
* other parameters such as the step size and the number of epochs can be as important or more important than the number of layers in the situation considered in this lecture.
514
+
* If the network is too deep, you'll run into the [vanishing gradient problem](http://neuralnetworksanddeeplearning.com/chap5.html)
515
+
* Other parameters such as the step size and the number of epochs can be as important or more important than the number of layers in the situation considered in this lecture.
517
516
* Indeed, since $f$ is a linear function of $x$, a one-layer network with the identity map as an activation would probably work best.
0 commit comments