Skip to content

Commit b68bac9

Browse files
committed
master: How attention works internally.
1 parent bd61283 commit b68bac9

File tree

1 file changed

+2
-2
lines changed
  • Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT

1 file changed

+2
-2
lines changed

Chapter-wise code/Code - PyTorch/7. Attention Models/1. NMT/Readme.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,15 +70,15 @@ The value score (V) is assigned based on the closeness of the match.<br>
7070
Attention = Softmax(QK^T)V
7171
```
7272
<br><br>
73-
<img src="./images/7. attention visual - 1.png" width="40%"></img> <img src="./images/8. NMT with attention.png" width="40%"></img> <br><br>
73+
<img src="./images/7. attention visual - 1.png" width="40%"></img> <img src="./images/8. NMT with attention.png" width="60%"></img> <br><br>
7474

7575
### Flexible Attention
7676

7777
In a situation (as shown below) where the grammar of foreign language requires a difference word order than the other, the attention is flexible enough to find the connection. <br><br>
7878

7979
<img src="./images/9. flexible attention.png" width="50%"></img><br><br>
8080

81-
The first four tokens, the agreements on the, are pretty straightforward, but then the grammatical structure between French and English changes.
81+
The first four tokens, the agreements on the, are pretty straightforward, but then the grammatical structure between French and English changes. Now instead of looking at the corresponding fifth token to translate the French word zone, the attention knows to look further down at the eighth token, which corresponds to the English word area, glorious and necessary.
8282

8383

8484

0 commit comments

Comments
 (0)