purvasingh96
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models/Attention Maths.md‎
Lines changed: 1 addition & 1 deletion b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models/Attention Maths.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models/Causal Attention.md‎
Lines changed: 4 additions & 1 deletion b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models/Causal Attention.md‎
Lines changed: 4 additions & 1 deletion
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models/Dot Product Attention.md‎
Lines changed: 1 addition & 1 deletion b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models/Dot Product Attention.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models/Multi Head Attention.md‎
Lines changed: 13 additions & 0 deletions b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models/Multi Head Attention.md‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models/Readme.md‎
Lines changed: 1 addition & 1 deletion b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/1. Transformer Models/Readme.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/26. step -1 .png‎
48.1 KB b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/26. step -1 .png‎
48.1 KB
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/27. step - 2.png‎
86.5 KB b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/27. step - 2.png‎
86.5 KB
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/28. step - 3.png‎
96.4 KB b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/28. step - 3.png‎
96.4 KB
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/29. step - 4.png‎
107 KB b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/29. step - 4.png‎
107 KB
diff --git a/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/30. step - 5.png‎
201 KB b/‎Chapter-wise code/Code - PyTorch/7. Attention Models/2. Neural Text Summarization/images/30. step - 5.png‎
201 KB
@@ -46,4 +46,4 @@ often called *attention weights*. The shape of this matrix is `[Lq, Lk]`.<br>
 <img src="../images/20. attention formula.png" width="50%"></img> <br><br>
 
 ## Next Up
-Next, we will learn about different types of attention: causal attention and multi-head attention. 
+Next, we will learn about different types of attention: [causal attention](./Causal%20Attention.md) and multi-head attention. 
@@ -24,4 +24,7 @@ All other values become minus infinity. After a softmax, all minus infinities wi
 
 4. The last step is the same as dot-product attention!
 
- <img src="../images/25. step - 4.png" width="50%"></img><br><br>
+ <img src="../images/25. step - 4.png" width="50%"></img><br><br>
+ 
+ ## Next Up
+ Next, we will learn about multi-head attention. You can find the read-me file for the same [here].
@@ -31,6 +31,6 @@ First, the word *I* is embedded, to obtain a vector representation that holds co
 
 ## Next Up
 
-Now that we know about attention, next we will learn about *attention maths*. You can find the readme file for the same here.
+Now that we know about attention, next we will learn about *attention maths*. You can find the readme file for the same [here](./Attention%20Maths.md).
 
 
@@ -0,0 +1,13 @@
+# Multi-Head Attention
+
+1. Input to multi-head attention is a set of 3 values: Queries, Keys and Values.<br><br>
+<img src="../images/26. step -1 .png"></img><br>
+2. To achieve the multiple lookups, you first use a fully-connected, dense linear layer on each query, key, and value. This layer will create the representations for parallel attention heads. <br><br>
+<img src="../images/27. step -2 .png"></img><br>
+3. Here, you split these vectors into number of heads and perform attention on them as each head was different.<br><br>
+4. Then the result of the attention will be concatenated back together.<br><br>
+<img src="../images/28. step -3 .png"></img><br>
+5. Finally, the concatenated attention will be put through a final fully connected layer.<br><br>
+<img src="../images/29. step -4 .png"></img><br>
+6. The scale dot-product is the one used in the dot-product attention model except by the scale factor, one over square root of DK. DK is the key inquiry dimension. It's normalization prevents the gradients from the function to be extremely small when large values of D sub K are used.<br><br>
+<img src="../images/30. step -5 .png"></img><br>
@@ -55,4 +55,4 @@ Some of the applications of Transformers include:
 <img src="../images/6. T5 model.png" width="50%"></img><br>
 
 ## Next-up
-Next, we will learn about dot-product attention. You can find the readme file here.
+Next, we will learn about dot-product attention. You can find the readme file [here](./Dot%20Product%20Attention.md).
Original file line number	Diff line number	Diff line change
`@@ -31,6 +31,6 @@ First, the word I is embedded, to obtain a vector representation that holds co`
`31`	`31`
`32`	`32`	`## Next Up`
`33`	`33`
`34`		`-Now that we know about attention, next we will learn about attention maths. You can find the readme file for the same here.`
	`34`	`+Now that we know about attention, next we will learn about attention maths. You can find the readme file for the same [here](./Attention%20Maths.md).`
`35`	`35`
`36`	`36`