Skip to content

Commit 751e9ba

Browse files
Update rnn.md
1 parent 6a1a2a0 commit 751e9ba

File tree

1 file changed

+13
-11
lines changed

1 file changed

+13
-11
lines changed

rnn.md

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ sentiment classification in NLP where we are given a sequence of words of a sent
3333
* An example of **many-to-many** task is video-captioning where the input is a sequence of video frames and the output is caption that describes
3434
what was in the video as shown in the fourth model in Figure 1. Another example of many-to-many task is machine translation in NLP, where we can have an
3535
RNN that takes a sequence of words of a sentence in English, and then this RNN is asked to produce a sequence of words of a sentence in French.
36-
* There is a also a variation of many-to-many task as shown in the last model in Figure 1,
36+
* There is a also a **variation of many-to-many** task as shown in the last model in Figure 1,
3737
where the model generates an output at every timestep. An example of this many-to-many task is video classification on a frame level
3838
where the model classifies every single frame of video with some number of classes. We should note that we don't want
3939
this prediction to only be a function of the current timestep (current frame of the video), but also all the timesteps (frames)
@@ -47,17 +47,20 @@ function of all the timesteps that have come before.
4747
<div class="figcaption"> <b> Figure 1.</b> Different (non-exhaustive) types of Recurrent Neural Network architectures. Red boxes are input vectors. Green boxes are hidden layers. Blue boxes are output vectors.</div>
4848
</div>
4949

50-
A Recurrent Neural Network is basically a blackbox (Figure 2), where it has a state and it receives through
51-
timesteps input vectors. At every single timestep we feed in an input vectors into the RNN and it
52-
can modify that state as a function of what it receives at every single timestep. There are weights
53-
inside the RNN and when we tune those weights, the RNN will have a different behavior in terms of
54-
how its state evolves, as it receives these inputs. Usually we are also interested in producing an
55-
output based on the RNN state, so we can produce these output vectors on top of the RNN (as depicted
56-
in Figure 2).
50+
### Why are existing convnets insufficient?
51+
The existing convnets are insufficient to deal with tasks that have inputs and outputs with variable sequence lengths.
52+
In the example of video captioning, inputs have variable number of frames (e.g. 10-minute and 10-hour long video) and outputs are captions
53+
of variable length. Convnets can only take in inputs with a fixed size of width and height and cannot generalize over
54+
inputs with different sizes. In order to tackle this problem, we introduce Recurrent Neural Networks (RNNs).
55+
56+
### Recurrent Neural Network
57+
RNN is basically a blackbox (Figure 2), where it has an “internal state” that is updated as a sequence is processed. At every single timestep, we feed in an input vector into RNN where it modifies that state as a function of what it receives. When we tune RNN weights,
58+
RNN will show different behaviors in terms of how its state evolves as it receives these inputs.
59+
We are also interested in producing an output based on the RNN state, so we can produce these output vectors on top of the RNN (as depicted in Figure 2.
5760

5861
<div class="fig figcenter fighighlight">
5962
<img src="/assets/rnn/rnn_blackbox.png" width="20%" >
60-
<div class="figcaption">Figure 2. Simplified RNN box.</div>
63+
<div class="figcaption"><b>Figure 2. </b>Simplified RNN box.</div>
6164
</div>
6265

6366
More precisely, RNN can be represented as a recurrence formula of some function $$f_W$$ with
@@ -69,8 +72,7 @@ $$
6972

7073
where at every timestep it receives some previous state as a vector $$h_{t-1}$$ of previous
7174
iteration timestep $$t-1$$ and current input vector $$x_t$$ to produce the current state as a vector
72-
$$h_t$$. A fixed function $$f_W$$ with
73-
weights $$W$$ is applied at every single timestep and that allows us to use
75+
$$h_t$$. A fixed function $$f_W$$ with weights $$W$$ is applied at every single timestep and that allows us to use
7476
the Recurrent Neural Network on sequences without having to commit to the size of the sequence because
7577
we apply the exact same function at every single timestep, no matter how long the input or output
7678
sequences are.

0 commit comments

Comments
 (0)