purvasingh96
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models/Dot Product Attention.md‎
Lines changed: 32 additions & 0 deletions b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models/Dot Product Attention.md‎
Lines changed: 32 additions & 0 deletions
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models/Readme.md‎
Lines changed: 4 additions & 1 deletion b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models/Readme.md‎
Lines changed: 4 additions & 1 deletion
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/10. step - 4.png‎
280 KB b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/10. step - 4.png‎
280 KB
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/11. step - 5.png‎
167 KB b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/11. step - 5.png‎
167 KB
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/12. step - 6.png‎
198 KB b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/12. step - 6.png‎
198 KB
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/13. step - 7.png‎
189 KB b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/13. step - 7.png‎
189 KB
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/14. step - 8.png‎
206 KB b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/14. step - 8.png‎
206 KB
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/7. step - 1.png‎
155 KB b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/7. step - 1.png‎
155 KB
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/8. step - 2.png‎
225 KB b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/8. step - 2.png‎
225 KB
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/9. step - 3.png‎
252 KB b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/9. step - 3.png‎
252 KB
@@ -0,0 +1,32 @@
+# Dot Product Attention
+
+Below steps describe in detail as to how a *dot-product attention* works:
+
+*Imp: Queries: German words.*
+
+1. Let's consider the phrase in English, *"I am happy"*. 
+First, the word *I* is embedded, to obtain a vector representation that holds continuous values which is unique for every single word.
+<img src="../images/7.step - 1.png"></img><br>
+
+2. By feeding three distinct linear layers, you get three different vectors for queries, keys and values.<br><br>
+<img src="../images/8. step - 2.png"></img><br>
+
+3. Then you can do the same for the word *am* to output a second vector. <br><br>
+<img src="../images/9. step - 3.png"></img><br>
+
+4. Finally the word *happy* to get a third vector and form the queries, keys and values matrix.<br><br>
+<img src="../images/10. step - 4.png"></img><br>
+
+5. From both the Q matrix and the K matrix, the attention model calculates weights or scores representing the relative importance of the keys for a specific query.
+<img src="../images/11. step - 5.png"></img><br>
+
+6. These attention weights can be understood as alignment scores as they come from a dot product. <br><br>
+<img src="../images/12. step - 6.png"></img><br>
+
+7. Additionally, to turn these weights into probabilities, a softmax function is required.<br><br>
+<img src="../images/13. step - 7.png"></img><br>
+
+7. Finally, multiplying these probabilities with the values, you will then get a weighted sequence, which is the attention results itself.<br><br>
+<img src="../images/14. step - 8.png"></img><br>
+
+
@@ -52,4 +52,7 @@ Some of the applications of Transformers include:
 1. *GPT-2*: Generative Pre-training for Transformers
 2. *BERT* : Bi-directional Encoder Decoder Representations from Transformers.
 3. *T5* : Text-To-Text Transfer Transformer.<br>
-<img src="../images/6. T5 model.png" width="50%"></img><br>
+<img src="../images/6. T5 model.png" width="50%"></img><br>
+
+## Next-up
+Next, we will learn about dot-product attention. You can find the readme file here.