purvasingh96
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/NMT SetUp.md‎
Lines changed: 6 additions & 3 deletions b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/NMT SetUp.md‎
Lines changed: 6 additions & 3 deletions
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/images/15. step - 1.png‎
146 KB b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/images/15. step - 1.png‎
146 KB
@@ -19,12 +19,15 @@ We are going to have a great many of these inputs. One thing to note here is tha
 5. Keep track of index mappings with word2index and index2word mappings.
 5. Use start-of-sentence `<SOS>` and end-of-sentence `<EOS>` tokens to represent the same.
 
-## Training NMT
-
 ### Teacher Forcing
 
 Let us assume we want to train an image captioning model, and the ground truth caption for an  image is “Two people reading a book”. Our model makes a mistake in predicting the 2nd word and we have “Two” and “birds” for the 1st and 2nd prediction respectively.
 1. *Without Teacher Forcing*, we would feed “birds” back to our RNN to predict the 3rd word. Let’s say the 3rd prediction is “flying”. Even though it makes sense for our model to predict “flying” given the input is “birds”, it is different from the ground truth.
 <br><img src="./images/13. No teacher forcing.png"></img><br>
 2. *With Teacher Forcing*, we would feed “people” to our RNN for the 3rd prediction, after computing and recording the loss for the 2nd prediction.
-<br><img src="./images/14. with teacher forcing.png"></img><br>
+<br><img src="./images/14. with teacher forcing.png"></img><br>
+
+## Training NMT
+
+1. The initial `select` makes two copies. Each of the input tokens represented by zero (English words) and the target tokens (German words) represented by one.
+ <img src="./images/15. step - 1.png" width="50%"></img><br><br>