purvasingh96
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/Readme.md‎
Lines changed: 21 additions & 1 deletion b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/Readme.md‎
Lines changed: 21 additions & 1 deletion
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/images/3. word alignment.png‎
215 KB b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/images/3. word alignment.png‎
215 KB
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/images/4. alignment and attention.png‎
355 KB b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/images/4. alignment and attention.png‎
355 KB
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/images/5. Calculating alignment for NMT model.png‎
577 KB b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/images/5. Calculating alignment for NMT model.png‎
577 KB
@@ -16,4 +16,24 @@ One major limitation of a basic seq-to-seq model is *information bottle-neck* re
 In case of long sequences of sentences, when the end-user stacks up multiple layers of words, words that are entered at a later stage are given more importance than the words that were entered first.<br><br>
  Because the encoder hidden states is of a fixed size, and longer inputs become *bottlenecked* on their way to the decoder.
 
-Hence, inputs that contain short sentences will work for NMT but long sentences may not work for a basic seq-to-seq model. 
+Hence, inputs that contain short sentences will work for NMT but long sentences may not work for a basic seq-to-seq model.
+
+## Word Alignment
+
+Word Alignment is the task of finding the correspondence between source and target words in a pair of sentences that are translations of each other.
+<img src="./images/3. word alignment.png"><img> <br><br>
+When performing word alignment, your model needs to be able to identify relationships among the words in order to make accurate predictions in case the words are out of order or not exact translations.
+
+In a model that has a vector for each input, there needs to be a way to focus more attention in the right places. Many languages don't translate exactly into another language. To be able to align the words correctly, you need to add a layer to help the decoder understand which inputs are more important for each prediction.
+<img src="./images/4. alignment and attention.png"><img> <br><br>
+
+### Attention and Alignment
+Below is a step-by-step algorithm for NMT:
+1. *Prepare the encoder hidden state and decoder hidden state.*
+2. *Score each of the encoder hidden state by getting its dot product between each encoder state and decoder hidden states.*<br>
+    2.1. *If one of the scores is higher than the others, it means that this hidden state will have more influence than the others on the output.*
+3. *Then you will run scores through softmax, so each score is transformed to a number between 0 and 1, this gives you your attention distribution.*
+4. *Take each encoder hidden state, and multiply it by its softmax score, which is a number between 0 and 1, this results in the alignments vector.*
+5. *Now just add up everything in the alignments vector to arrive at what's called the context vector, which is then fed to the decoder.*
+
+<img src="./images/5. Calculating alignment for NMT model.png"><img> <br><br>