Skip to content

Commit 9f1a1f0

Browse files
authored
Update Attention Maths.md
1 parent 81809d9 commit 9f1a1f0

File tree

1 file changed

+11
-12
lines changed
  • Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models

1 file changed

+11
-12
lines changed

Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models/Attention Maths.md

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -16,22 +16,21 @@ Lk: no. of keys
1616
3. dim[K] = [Lk, D];
1717
4. dim[V] = [Lk, D]
1818

19-
<img src="../images/15.step - 1.png"></img> <br><br>
19+
<br>
20+
<img src="../images/15. step - 1.png" width="50%"></img> <br><br>
2021

2122
4. A query Q, will assign each key K, a probability that key K is a match for Q. Similarity is measured by taking dot
22-
product of vectors. So Q and K are similar iff `Q dot K` is large.
23-
<img src="../images/16.step - 2.png"></img> <br><br>
23+
product of vectors. So Q and K are similar iff `Q dot K` is large. <br>
24+
<img src="../images/16. step - 2.png" width="50%"></img> <br><br>
2425

2526
5. To make attention more focused on best matching keys, use softmax `(softmax(Q.KTranspose))`. Hence, we now calculate a matrix of Q-K probabailities
26-
often called *attention weights*. The shape of this matrix is `[Lq, Lk]`.
27+
often called *attention weights*. The shape of this matrix is `[Lq, Lk]`.<br>
2728

28-
6. In the final step, we take values and get weighted sum of values, weighting each value Vi by the probability that the key Ki matches the query.
29+
6. In the final step, we take values and get weighted sum of values, weighting each value Vi by the probability that the key Ki matches the query.<br>
30+
<img src="../images/17. step - 3.png" width="50%"></img> <br><br>
2931

30-
7. Finally the attention mechanism calculates the dynamic or alignment weights representing the relative importance of the inputs in this sequence.
31-
<img src="../images/17.step - 3.png"></img> <br><br>
32+
7. Finally the attention mechanism calculates the dynamic or alignment weights representing the relative importance of the inputs in this sequence.<br>
33+
<img src="../images/18. step - 4.png" width="50%"></img> <br><br>
3234

33-
8. Multiplying alignment weights with input sequence (values), will then weight the sequence.
34-
<img src="../images/18.step - 4.png"></img> <br><br>
35-
36-
9. A single context vector can then be calculated using the sum of weighted vectors.
37-
<img src="../images/19.step - 5.png"></img> <br><br>
35+
8. Multiplying alignment weights with input sequence (values), will then weight the sequence. A single context vector can then be calculated using the sum of weighted vectors.<br>
36+
<img src="../images/19. step - 5.png" width="50%"></img> <br><br>

0 commit comments

Comments
 (0)