Merge pull request #42 from danielforsyth/typo-1

omoindrot · web-flow · commit 7d74a3d06162 · 2017-05-23T12:04:31.000-07:00
Fixed Typo
diff --git a/neural-networks-3.md b/neural-networks-3.md
@@ -345,7 +345,7 @@ As we've seen, training Neural Networks can involve many hyperparameter settings
 - learning rate decay schedule (such as the decay constant)
 - regularization strength (L2 penalty, dropout strength)
 
-But as saw, there are many more relatively less sensitive hyperparameters, for example in per-parameter adaptive learning methods, the setting of momentum and its schedule, etc. In this section we describe some additional tips and tricks for performing the hyperparameter search:
+But as we saw, there are many more relatively less sensitive hyperparameters, for example in per-parameter adaptive learning methods, the setting of momentum and its schedule, etc. In this section we describe some additional tips and tricks for performing the hyperparameter search:
 
 **Implementation**. Larger Neural Networks typically require a long time to train, so performing hyperparameter search can take many days/weeks. It is important to keep this in mind since it influences the design of your code base. One particular design is to have a **worker** that continuously samples random hyperparameters and performs the optimization. During the training, the worker will keep track of the validation performance after every epoch, and writes a model checkpoint (together with miscellaneous training statistics such as the loss over time) to a file, preferably on a shared file system. It is useful to include the validation performance directly in the filename, so that it is simple to inspect and sort the progress. Then there is a second program which we will call a **master**, which launches or kills workers across a computing cluster, and may additionally inspect the checkpoints written by workers and plot their training statistics, etc.