Skip to content

Commit 2edb3fa

Browse files
committed
master: updating table of contents.
1 parent 408ffbc commit 2edb3fa

File tree

9 files changed

+232
-1
lines changed

9 files changed

+232
-1
lines changed
Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
# Handling Question duplicates using Siamese Networks
2+
3+
In this section I have explored Siamese networks applied to natural language processing and fundamentals of Trax. I learned how to implement models with different architectures.
4+
5+
## Outline
6+
7+
- [Overview](#0)
8+
- [Part 1: Importing the Data](#1)
9+
- [1.1 Loading in the data](#1.1)
10+
- [1.2 Converting a question to a tensor](#1.2)
11+
12+
- [Part 2: Defining the Siamese model](#2)
13+
- [2.1 Understanding Siamese Network](#2.1)
14+
15+
- [2.2 Hard Negative Mining](#2.2)
16+
17+
- [Part 3: Training](#3)
18+
- [3.1 Training the model](#3.1)
19+
- [Part 4: Evaluation](#4)
20+
- [4.1 Evaluating our siamese network](#4.1)
21+
- [4.2 Classify](#4.2)
22+
- [Part 5: Testing with our own questions](#5)
23+
- [On Siamese networks](#6)
24+
25+
<a name='0'></a>
26+
### Overview
27+
In this section, we will:
28+
29+
- Learn about Siamese networks
30+
- Understand how the triplet loss works
31+
- Understand how to evaluate accuracy
32+
- Use cosine similarity between the model's outputted vectors
33+
- Use the data generator to get batches of questions
34+
- Predict using our own model
35+
36+
We will start by preprocessing the data. After processing the data we will build a classifier that will allow us to identify whether to questions are the same or not.
37+
We will process the data first and then perform padding. Our model will take in the two question embeddings, run them through an LSTM, and then compare the outputs of the two sub networks using cosine similarity.
38+
39+
<a name='1'></a>
40+
# Part 1: Importing the Data
41+
<a name='1.1'></a>
42+
### 1.1 Loading in the data
43+
44+
We will be using the Quora question answer dataset to build a model that could identify similar questions. This is a useful task because we don't want to have several versions of the same question posted.
45+
46+
<img src="./images/quora_dataset.png"></img>
47+
48+
<a name='1.2'></a>
49+
### 1.2 Converting a question to a tensor
50+
51+
You will now convert every question to a tensor, or an array of numbers, using our vocabulary built above.
52+
53+
<a name='2'></a>
54+
# Part 2: Defining the Siamese model
55+
56+
<a name='2.1'></a>
57+
58+
### 2.1 Understanding Siamese Network
59+
A Siamese network is a neural network which uses the same weights while working in tandem on two different input vectors to compute comparable output vectors.The Siamese network we are about to implement looks like this:
60+
61+
<img src="./images/siamese_networks.png"></img>
62+
63+
we get the question embedding, run it through an LSTM layer, normalize `v_1` and `v_2`, and finally use a triplet loss (explained below) to get the corresponding cosine similarity for each pair of questions. As usual, we will start by importing the data set. The triplet loss makes use of a baseline (anchor) input that is compared to a positive (truthy) input and a negative (falsy) input. The distance from the baseline (anchor) input to the positive (truthy) input is minimized, and the distance from the baseline (anchor) input to the negative (falsy) input is maximized. In math equations, we are trying to maximize the following.
64+
65+
<img src="./images/triplet_loss.png"></img>
66+
67+
`A` is the anchor input, for example `q1_1`, `P` the duplicate input, for example, `q2_1`, and `N` the negative input (the non duplicate question), for example `q2_2`.<br>
68+
`\alpha` is a margin; we can think about it as a safety net, or by how much we want to push the duplicates from the non duplicates.
69+
<br>
70+
71+
<a name='ex02'></a>
72+
### Exercise 02
73+
74+
**Instructions:** Implement the `Siamese` function below. we should be using all the objects explained below.
75+
76+
To implement this model, we will be using `trax`. Concretely, we will be using the following functions.
77+
78+
79+
- `tl.Serial`: Combinator that applies layers serially (by function composition) allows we set up the overall structure of the feedforward. [docs](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.combinators.Serial) / [source code](https://github.com/google/trax/blob/1372b903bb66b0daccee19fd0b1fdf44f659330b/trax/layers/combinators.py#L26)
80+
- we can pass in the layers as arguments to `Serial`, separated by commas.
81+
- For example: `tl.Serial(tl.Embeddings(...), tl.Mean(...), tl.Dense(...), tl.LogSoftmax(...))`
82+
83+
84+
- `tl.Embedding`: Maps discrete tokens to vectors. It will have shape (vocabulary length X dimension of output vectors). The dimension of output vectors (also called d_feature) is the number of elements in the word embedding. [docs](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.core.Embedding) / [source code](https://github.com/google/trax/blob/1372b903bb66b0daccee19fd0b1fdf44f659330b/trax/layers/core.py#L113)
85+
- `tl.Embedding(vocab_size, d_feature)`.
86+
- `vocab_size` is the number of unique words in the given vocabulary.
87+
- `d_feature` is the number of elements in the word embedding (some choices for a word embedding size range from 150 to 300, for example).
88+
89+
90+
- `tl.LSTM` The LSTM layer. It leverages another Trax layer called [`LSTMCell`](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.rnn.LSTMCell). The number of units should be specified and should match the number of elements in the word embedding. [docs](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.rnn.LSTM) / [source code](https://github.com/google/trax/blob/1372b903bb66b0daccee19fd0b1fdf44f659330b/trax/layers/rnn.py#L87)
91+
- `tl.LSTM(n_units)` Builds an LSTM layer of n_units.
92+
93+
94+
- `tl.Mean`: Computes the mean across a desired axis. Mean uses one tensor axis to form groups of values and replaces each group with the mean value of that group. [docs](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.core.Mean) / [source code](https://github.com/google/trax/blob/1372b903bb66b0daccee19fd0b1fdf44f659330b/trax/layers/core.py#L276)
95+
- `tl.Mean(axis=1)` mean over columns.
96+
97+
98+
- `tl.Fn` Layer with no weights that applies the function f, which should be specified using a lambda syntax. [docs](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.base.Fn) / [source doce](https://github.com/google/trax/blob/70f5364dcaf6ec11aabbd918e5f5e4b0f5bfb995/trax/layers/base.py#L576)
99+
- `x` -> This is used for cosine similarity.
100+
- `tl.Fn('Normalize', lambda x: normalize(x))` Returns a layer with no weights that applies the function `f`
101+
102+
103+
- `tl.parallel`: It is a combinator layer (like `Serial`) that applies a list of layers in parallel to its inputs. [docs](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.combinators.Parallel) / [source code](https://github.com/google/trax/blob/37aba571a89a8ad86be76a569d0ec4a46bdd8642/trax/layers/combinators.py#L152)
104+
105+
<a name='2.2'></a>
106+
107+
### 2.2 Hard Negative Mining
108+
109+
110+
we will now implement the `TripletLoss`.<br>
111+
As explained in the lecture, loss is composed of two terms. One term utilizes the mean of all the non duplicates, the second utilizes the *closest negative*. Our loss expression is then:
112+
113+
<img src="./images/new_triplet_loss.png"></img>
114+
115+
116+
Further, two sets of instructions are provided. The first set provides a brief description of the task. If that set proves insufficient, a more detailed set can be displayed.
117+
118+
<a name='ex03'></a>
119+
### Exercise 03
120+
121+
**Instructions (Brief):** Here is a list of things we should do: <br>
122+
123+
- As this will be run inside trax, use `fastnp.xyz` when using any `xyz` numpy function
124+
- Use `fastnp.dot` to calculate the similarity matrix `v_1v_2^T` of dimension `batch_size` x `batch_size`
125+
- Take the score of the duplicates on the diagonal `fastnp.diagonal`
126+
- Use the `trax` functions `fastnp.eye` and `fastnp.maximum` for the identity matrix and the maximum.
127+
128+
129+
130+
<a name='3'></a>
131+
132+
# Part 3: Training
133+
134+
Now we are going to train our model. As usual, we have to define the cost function and the optimizer. we also have to feed in the built model. Before, going into the training, we will use a special data set up. We will define the inputs using the data generator we built above. The lambda function acts as a seed to remember the last batch that was given. Run the cell below to get the question pairs inputs.
135+
136+
<a name='3.1'></a>
137+
138+
### 3.1 Training the model
139+
140+
we will now write a function that takes in our model and trains it. To train our model we have to decide how many times we want to iterate over the entire data set; each iteration is defined as an `epoch`. For each epoch, we have to go over all the data, using our training iterator.
141+
142+
<a name='ex04'></a>
143+
### Exercise 04
144+
145+
**Instructions:** Implement the `train_model` below to train the neural network above. Here is a list of things we should do, as already shown in lecture 7:
146+
147+
- Create `TrainTask` and `EvalTask`
148+
- Create the training loop `trax.supervised.training.Loop`
149+
- Pass in the following depending on the context (train_task or eval_task):
150+
- `labeled_data=generator`
151+
- `metrics=[TripletLoss()]`,
152+
- `loss_layer=TripletLoss()`
153+
- `optimizer=trax.optimizers.Adam` with learning rate of 0.01
154+
- `lr_schedule=lr_schedule`,
155+
- `output_dir=output_dir`
156+
157+
158+
we will be using our triplet loss function with Adam optimizer. Please read the [trax](https://trax-ml.readthedocs.io/en/latest/trax.optimizers.html?highlight=adam#trax.optimizers.adam.Adam) documentation to get a full understanding.
159+
160+
This function should return a `training.Loop` object. To read more about this check the [docs](https://trax-ml.readthedocs.io/en/latest/trax.supervised.html?highlight=loop#trax.supervised.training.Loop).
161+
162+
<a name='4'></a>
163+
164+
# Part 4: Evaluation
165+
166+
<a name='4.1'></a>
167+
168+
### 4.1 Evaluating our siamese network
169+
170+
In this section we will learn how to evaluate a Siamese network. we will first start by loading a pretrained model and then we will use it to predict.
171+
172+
<a name='4.2'></a>
173+
### 4.2 Classify
174+
To determine the accuracy of the model, we will utilize the test set that was configured earlier. While in training we used only positive examples, the test data, Q1_test, Q2_test and y_test, is setup as pairs of questions, some of which are duplicates some are not.
175+
This routine will run all the test question pairs through the model, compute the cosine simlarity of each pair, threshold it and compare the result to y_test - the correct response from the data set. The results are accumulated to produce an accuracy.
176+
177+
178+
<a name='ex05'></a>
179+
### Exercise 05
180+
181+
**Instructions**
182+
- Loop through the incoming data in batch_size chunks
183+
- Use the data generator to load q1, q2 a batch at a time. **Don't forget to set shuffle=False!**
184+
- copy a batch_size chunk of y into y_test
185+
- compute v1, v2 using the model
186+
- for each element of the batch
187+
- compute the cos similarity of each pair of entries, v1[j],v2[j]
188+
- determine if d > threshold
189+
- increment accuracy if that result matches the expected results (y_test[j])
190+
- compute the final accuracy and return
191+
192+
Due to some limitations of this environment, running classify multiple times may result in the kernel failing. If that happens *Restart Kernal & clear output* and then run from the top. During development, consider using a smaller set of data to reduce the number of calls to model().
193+
194+
<a name='5'></a>
195+
196+
# Part 5: Testing with our own questions
197+
198+
In this section we will test the model with our own questions. we will write a function `predict` which takes two questions as input and returns `1` or `0` depending on whether the question pair is a duplicate or not.
199+
200+
But first, we build a reverse vocabulary that allows to map encoded questions back to words:
201+
202+
Write a function `predict`that takes in two questions, the model, and the vocabulary and returns whether the questions are duplicates (`1`) or not duplicates (`0`) given a similarity threshold.
203+
204+
<a name='ex06'></a>
205+
### Exercise 06
206+
207+
208+
**Instructions:**
209+
- Tokenize our question using `nltk.word_tokenize`
210+
- Create Q1,Q2 by encoding our questions as a list of numbers using vocab
211+
- pad Q1,Q2 with next(data_generator([Q1], [Q2],1,vocab['<PAD>']))
212+
- use model() to create v1, v2
213+
- compute the cosine similarity (dot product) of v1, v2
214+
- compute res by comparing d to the threshold
215+
216+
<a name='6'></a>
217+
218+
219+
### <span style="color:blue"> Output of Siamese Networks </span>
220+
221+
<img src="./images/sample_output_1.png"></img>
222+
<img src="./images/sample_output_2.png"></img>
223+
224+
### <span style="color:blue"> On Siamese networks </span>
225+
226+
Siamese networks are important and useful. Many times there are several questions that are already asked in quora, or other platforms and we can use Siamese networks to avoid question duplicates.
227+
228+
229+
Loading
75.6 KB
Loading
Loading
Loading
Loading
14.7 KB
Loading

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,9 @@ Below are the list of projects/theorey that I have worked on/documented. Please
106106
* [Github bug prediction using BERT](./Chapter-wise%20code/Code%20-%20PyTorch/6.%20Natural-Language-Processing/10.%20Text%20Classification/github-bug-prediction-via-bert.ipynb)
107107
* [Predicting DJIA movement using BERT](./Chapter-wise%20code/Code%20-%20PyTorch/6.%20Natural-Language-Processing/10.%20Text%20Classification/predicting-DJIA-movement-with-BERT.ipynb)
108108
* [SMS spam classifier](./Chapter-wise%20code/Code%20-%20PyTorch/6.%20Natural-Language-Processing/10.%20Text%20Classification/sms-spam-classifier.ipynb)
109-
109+
* [Siamese Networks](./Chapter-wise%20code/Code%20-%20PyTorch/6.%20Natural-Language-Processing/10.%20Text%20Classification/)
110+
* [Question Duplication](./Chapter-wise%20code/Code%20-%20PyTorch/6.%20Natural-Language-Processing/13.%20Siamese%20Networks/Question%20Duplication/Readme.md)
111+
110112
## Theorey List
111113
This list basically contains summarized notes for each chapter from the book, 'Deep Learning' by 'Goodfellow, Benigo and Courville':
112114

0 commit comments

Comments
 (0)