purvasingh96
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/NMT SetUp.md‎
Lines changed: 20 additions & 0 deletions b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/NMT SetUp.md‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/Readme.md‎
Lines changed: 2 additions & 1 deletion b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/Readme.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/images/10. data in NMT.png‎
181 KB b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/images/10. data in NMT.png‎
181 KB
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/images/11. NMT setup-english.png‎
552 KB b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/images/11. NMT setup-english.png‎
552 KB
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/images/12. NMT setup - german.png‎
685 KB b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/images/12. NMT setup - german.png‎
685 KB
@@ -0,0 +1,20 @@
+# Setup for Machine Translation
+
+## Data in NMT
+
+Below we have the data sequence in English, *I'm hungry*, and on the right, the corresponding German equivalent. 
+Further down we have, *I watch the soccer game*, and, the corresponding German equivalent. 
+We are going to have a great many of these inputs. One thing to note here is that the data set used is not entirely clean.
+
+<img src="./images/10. data in NMT.png" width="50%"></img><br><br>
+
+## Pre-requisites
+
+1. *Input*: Take English sentence as input.
+2. *Tokenization*: State-of-the-art models use pre-trained word vectors, else, represent words with one-hot vectors to create the input.
+3. *Padding*: Pad the tokenized sequence to make the inputs of equal length.<br><br>
+<img src="./images/11. NMT setup-english.png" width="50%"></img><br><br>
+4. Repeat steps 1-3 for the German sentences as well.<br><br>
+<img src="./images/11. NMT setup-german.png" width="50%"></img><br><br>
+5. Keep track of index mappings with word2index and index2word mappings.
+5. Use start-of-sentence `<SOS>` and end-of-sentence `<EOS>` tokens to represent the same.
@@ -81,8 +81,9 @@ In a situation (as shown below) where the grammar of foreign language requires a
 The first four tokens, the agreements on the, are pretty straightforward, but then the grammatical structure between French and English changes. Now instead of looking at the corresponding fifth token to translate the French word zone, the attention knows to look further down at the eighth token, which corresponds to the English word area, glorious and necessary.  
 
 
+## Next Up
 
-
+Next, we will learn about the set-up required to build a NMT model and what kind of dataset is used to build a NMT model. The readme file for the same is here.