Skip to content

Commit 673e38d

Browse files
committed
master: dot-product attention.
1 parent 51dd97d commit 673e38d

File tree

10 files changed

+36
-1
lines changed

10 files changed

+36
-1
lines changed
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Dot Product Attention
2+
3+
Below steps describe in detail as to how a *dot-product attention* works:
4+
5+
*Imp: Queries: German words.*
6+
7+
1. Let's consider the phrase in English, *"I am happy"*.
8+
First, the word *I* is embedded, to obtain a vector representation that holds continuous values which is unique for every single word.
9+
<img src="../images/7.step - 1.png"></img><br>
10+
11+
2. By feeding three distinct linear layers, you get three different vectors for queries, keys and values.<br><br>
12+
<img src="../images/8. step - 2.png"></img><br>
13+
14+
3. Then you can do the same for the word *am* to output a second vector. <br><br>
15+
<img src="../images/9. step - 3.png"></img><br>
16+
17+
4. Finally the word *happy* to get a third vector and form the queries, keys and values matrix.<br><br>
18+
<img src="../images/10. step - 4.png"></img><br>
19+
20+
5. From both the Q matrix and the K matrix, the attention model calculates weights or scores representing the relative importance of the keys for a specific query.
21+
<img src="../images/11. step - 5.png"></img><br>
22+
23+
6. These attention weights can be understood as alignment scores as they come from a dot product. <br><br>
24+
<img src="../images/12. step - 6.png"></img><br>
25+
26+
7. Additionally, to turn these weights into probabilities, a softmax function is required.<br><br>
27+
<img src="../images/13. step - 7.png"></img><br>
28+
29+
7. Finally, multiplying these probabilities with the values, you will then get a weighted sequence, which is the attention results itself.<br><br>
30+
<img src="../images/14. step - 8.png"></img><br>
31+
32+

Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models/Readme.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,4 +52,7 @@ Some of the applications of Transformers include:
5252
1. *GPT-2*: Generative Pre-training for Transformers
5353
2. *BERT* : Bi-directional Encoder Decoder Representations from Transformers.
5454
3. *T5* : Text-To-Text Transfer Transformer.<br>
55-
<img src="../images/6. T5 model.png" width="50%"></img><br>
55+
<img src="../images/6. T5 model.png" width="50%"></img><br>
56+
57+
## Next-up
58+
Next, we will learn about dot-product attention. You can find the readme file here.
280 KB
Loading
167 KB
Loading
198 KB
Loading
189 KB
Loading
206 KB
Loading
155 KB
Loading
225 KB
Loading
252 KB
Loading

0 commit comments

Comments
 (0)