Skip to content

Commit fcc0f06

Browse files
committed
master: dot-product attention.
1 parent 673e38d commit fcc0f06

File tree

1 file changed

+8
-8
lines changed
  • Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models

1 file changed

+8
-8
lines changed

Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models/Dot Product Attention.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,27 +6,27 @@ Below steps describe in detail as to how a *dot-product attention* works:
66

77
1. Let's consider the phrase in English, *"I am happy"*.
88
First, the word *I* is embedded, to obtain a vector representation that holds continuous values which is unique for every single word.
9-
<img src="../images/7.step - 1.png"></img><br>
9+
<img src="../images/1.step - 1.png" width="50%"></img><br>
1010

1111
2. By feeding three distinct linear layers, you get three different vectors for queries, keys and values.<br><br>
12-
<img src="../images/8. step - 2.png"></img><br>
12+
<img src="../images/8. step - 2.png" width="50%"></img><br>
1313

1414
3. Then you can do the same for the word *am* to output a second vector. <br><br>
15-
<img src="../images/9. step - 3.png"></img><br>
15+
<img src="../images/9. step - 3.png" width="50%"></img><br>
1616

1717
4. Finally the word *happy* to get a third vector and form the queries, keys and values matrix.<br><br>
18-
<img src="../images/10. step - 4.png"></img><br>
18+
<img src="../images/10. step - 4.png" width="50%"></img><br>
1919

2020
5. From both the Q matrix and the K matrix, the attention model calculates weights or scores representing the relative importance of the keys for a specific query.
21-
<img src="../images/11. step - 5.png"></img><br>
21+
<img src="../images/11. step - 5.png" width="50%"></img><br>
2222

2323
6. These attention weights can be understood as alignment scores as they come from a dot product. <br><br>
24-
<img src="../images/12. step - 6.png"></img><br>
24+
<img src="../images/12. step - 6.png" width="50%"></img><br>
2525

2626
7. Additionally, to turn these weights into probabilities, a softmax function is required.<br><br>
27-
<img src="../images/13. step - 7.png"></img><br>
27+
<img src="../images/13. step - 7.png" width="50%"></img><br>
2828

2929
7. Finally, multiplying these probabilities with the values, you will then get a weighted sequence, which is the attention results itself.<br><br>
30-
<img src="../images/14. step - 8.png"></img><br>
30+
<img src="../images/14. step - 8.png" width="50%"></img><br>
3131

3232

0 commit comments

Comments
 (0)