master: dot-product attention.

purvasingh96 · purvasingh96 · commit d2515bd4b9ed · 2021-03-04T15:48:50.000+05:30
diff --git a/Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models/Dot Product Attention.md b/Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models/Dot Product Attention.md
@@ -6,7 +6,7 @@ Below steps describe in detail as to how a *dot-product attention* works:
 
 1. Let's consider the phrase in English, *"I am happy"*. 
 First, the word *I* is embedded, to obtain a vector representation that holds continuous values which is unique for every single word.<br><br>
-<img src="../images/7.step - 1.png" width="50%"></img><br>
+<img src="../images/7. step - 1.png" width="50%"></img><br>
 
 2. By feeding three distinct linear layers, you get three different vectors for queries, keys and values.<br><br>
 <img src="../images/8. step - 2.png" width="50%"></img><br>
@@ -17,7 +17,7 @@ First, the word *I* is embedded, to obtain a vector representation that holds co
 4. Finally the word *happy* to get a third vector and form the queries, keys and values matrix.<br><br>
 <img src="../images/10. step - 4.png" width="50%"></img><br>
 
-5. From both the Q matrix and the K matrix, the attention model calculates weights or scores representing the relative importance of the keys for a specific query.
+5. From both the Q matrix and the K matrix, the attention model calculates weights or scores representing the relative importance of the keys for a specific query.<br><br>
 <img src="../images/11. step - 5.png" width="50%"></img><br>
 
 6. These attention weights can be understood as alignment scores as they come from a dot product. <br><br>