Skip to content

Commit 309273e

Browse files
authored
Merge pull request #219 from natashawatkins/patch-3
Correct typos in back_prop.md
2 parents 2ca3637 + 6d63f03 commit 309273e

File tree

1 file changed

+10
-11
lines changed

1 file changed

+10
-11
lines changed

lectures/back_prop.md

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ We'll describe the following concepts that are brick and mortar for neural netwo
3737
* an activation function
3838
* a network of neurons
3939
* A neural network as a composition of functions
40-
* back-propogation and its relationship to the chain rule of differential calculus
40+
* back-propagation and its relationship to the chain rule of differential calculus
4141

4242

4343
## A Deep (but not Wide) Artificial Neural Network
@@ -172,22 +172,22 @@ $$ (eq:sgd)
172172
173173
where $\frac{d {\mathcal L}}{dx_{N+1}}=-\left(x_{N+1}-y\right)$ and $\alpha > 0 $ is a step size.
174174
175-
(See [this](https://en.wikipedia.org/wiki/Gradient_descent#Description) and [this](https://en.wikipedia.org/wiki/Newton%27s_method)) to gather insights about how stochastic gradient descent
175+
(See [this](https://en.wikipedia.org/wiki/Gradient_descent#Description) and [this](https://en.wikipedia.org/wiki/Newton%27s_method) to gather insights about how stochastic gradient descent
176176
relates to Newton's method.)
177177
178178
To implement one step of this parameter update rule, we want the vector of derivatives $\frac{dx_{N+1}}{dp_k}$.
179179
180-
In the neural network literature, this step is accomplished by what is known as **back propogation**
180+
In the neural network literature, this step is accomplished by what is known as **back propagation**.
181181
182-
## Back Propogation and the Chain Rule
182+
## Back Propagation and the Chain Rule
183183
184184
Thanks to properties of
185185
186186
* the chain and product rules for differentiation from differential calculus, and
187187
188188
* lower triangular matrices
189189
190-
back propogation can actually be accomplished in one step by
190+
back propagation can actually be accomplished in one step by
191191
192192
* inverting a lower triangular matrix, and
193193
@@ -284,7 +284,7 @@ We can then solve the above problem by applying our update for $p$ multiple time
284284
285285
Choosing a training set amounts to a choice of measure $\mu$ in the above formulation of our function approximation problem as a minimization problem.
286286
287-
In this spirit, we shall use a uniform grid of, say, 50 or 200 or $\ldots$ points.
287+
In this spirit, we shall use a uniform grid of, say, 50 or 200 points.
288288
289289
There are many possible approaches to the minimization problem posed above:
290290
@@ -294,7 +294,7 @@ There are many possible approaches to the minimization problem posed above:
294294
295295
* something in-between (so-called "mini-batch gradient descent")
296296
297-
The update rule {eq}`eq:sgd` described above amounts to a stochastic gradient descent algorithm
297+
The update rule {eq}`eq:sgd` described above amounts to a stochastic gradient descent algorithm.
298298
299299
```{code-cell} ipython3
300300
from IPython.display import Image
@@ -356,7 +356,6 @@ def loss(params, x, y):
356356
preds = xs[-1]
357357
358358
return 1 / 2 * (y - preds) ** 2
359-
360359
```
361360
362361
```{code-cell} ipython3
@@ -512,8 +511,8 @@ Image(fig.to_image(format="png"))
512511
It is fun to think about how deepening the neural net for the above example affects the quality of approximation
513512
514513
515-
* if the network is too deep, you'll run into the [vanishing gradient problem](http://neuralnetworksanddeeplearning.com/chap5.html)
516-
* other parameters such as the step size and the number of epochs can be as important or more important than the number of layers in the situation considered in this lecture.
514+
* If the network is too deep, you'll run into the [vanishing gradient problem](http://neuralnetworksanddeeplearning.com/chap5.html)
515+
* Other parameters such as the step size and the number of epochs can be as important or more important than the number of layers in the situation considered in this lecture.
517516
* Indeed, since $f$ is a linear function of $x$, a one-layer network with the identity map as an activation would probably work best.
518517
519518
@@ -598,4 +597,4 @@ print(xla_bridge.get_backend().platform)
598597
**Cloud Environment:** This lecture site is built in a server environment that doesn't have access to a `gpu`
599598
If you run this lecture locally this lets you know where your code is being executed, either
600599
via the `cpu` or the `gpu`
601-
```
600+
```

0 commit comments

Comments
 (0)