master: How attention works internally.

purvasingh96 · purvasingh96 · commit b68bac94286d · 2021-02-21T12:27:58.000+05:30
diff --git a/Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/Readme.md b/Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/Readme.md
@@ -70,15 +70,15 @@ The value score (V) is assigned based on the closeness of the match.<br>
 Attention = Softmax(QK^T)V
 ``` 
 <br><br>
-<img src="./images/7. attention visual - 1.png" width="40%"></img> <img src="./images/8. NMT with attention.png" width="40%"></img> <br><br>
+<img src="./images/7. attention visual - 1.png" width="40%"></img> <img src="./images/8. NMT with attention.png" width="60%"></img> <br><br>
 
 ### Flexible Attention
 
 In a situation (as shown below) where the grammar of foreign language requires a difference word order than the other, the attention is flexible enough to find the connection. <br><br>
 
 <img src="./images/9. flexible attention.png" width="50%"></img><br><br>
 
-The first four tokens, the agreements on the, are pretty straightforward, but then the grammatical structure between French and English changes. 
+The first four tokens, the agreements on the, are pretty straightforward, but then the grammatical structure between French and English changes. Now instead of looking at the corresponding fifth token to translate the French word zone, the attention knows to look further down at the eighth token, which corresponds to the English word area, glorious and necessary.