Skip to content

Commit 1c8b0ef

Browse files
committed
master: Why transformer is better than basic seq2seq architecture.
1 parent 3baa0c9 commit 1c8b0ef

File tree

1 file changed

+1
-1
lines changed
  • Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models

1 file changed

+1
-1
lines changed

Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models/Readme.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ TLDR:
2626
4. Transformer differs from sequence to sequence by using multi-head attention layers instead of recurrent layers.<br><br>
2727
<img src="../images/4. multi-head attention.png" width="50%"></img><br>
2828

29-
5. Transformers also use positional encoding to capture sequential information. The positional encoding out puts values to be added to the embeddings. That's where every input word that is given to the model you have some of the information about it's order and the position.
29+
5. Transformers also use positional encoding to capture sequential information. The positional encoding out puts values to be added to the embeddings. That's where every input word that is given to the model you have some of the information about it's order and the position.<br>
3030
<img src="../images/5. positional encoding.png" width="50%"></img><br>
3131

3232
6. Unlike the recurrent layer, the multi-head attention layer computes the outputs of each inputs in the sequence independently then it allows us to parallelize the computation. But it fails to model the sequential information for a given sequence. That is why you need to incorporate the positional encoding stage into the transformer model.

0 commit comments

Comments
 (0)