You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: rnn.md
+13-11Lines changed: 13 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,7 +33,7 @@ sentiment classification in NLP where we are given a sequence of words of a sent
33
33
* An example of **many-to-many** task is video-captioning where the input is a sequence of video frames and the output is caption that describes
34
34
what was in the video as shown in the fourth model in Figure 1. Another example of many-to-many task is machine translation in NLP, where we can have an
35
35
RNN that takes a sequence of words of a sentence in English, and then this RNN is asked to produce a sequence of words of a sentence in French.
36
-
* There is a also a variation of many-to-many task as shown in the last model in Figure 1,
36
+
* There is a also a **variation of many-to-many** task as shown in the last model in Figure 1,
37
37
where the model generates an output at every timestep. An example of this many-to-many task is video classification on a frame level
38
38
where the model classifies every single frame of video with some number of classes. We should note that we don't want
39
39
this prediction to only be a function of the current timestep (current frame of the video), but also all the timesteps (frames)
@@ -47,17 +47,20 @@ function of all the timesteps that have come before.
47
47
<divclass="figcaption"> <b> Figure 1.</b> Different (non-exhaustive) types of Recurrent Neural Network architectures. Red boxes are input vectors. Green boxes are hidden layers. Blue boxes are output vectors.</div>
48
48
</div>
49
49
50
-
A Recurrent Neural Network is basically a blackbox (Figure 2), where it has a state and it receives through
51
-
timesteps input vectors. At every single timestep we feed in an input vectors into the RNN and it
52
-
can modify that state as a function of what it receives at every single timestep. There are weights
53
-
inside the RNN and when we tune those weights, the RNN will have a different behavior in terms of
54
-
how its state evolves, as it receives these inputs. Usually we are also interested in producing an
55
-
output based on the RNN state, so we can produce these output vectors on top of the RNN (as depicted
56
-
in Figure 2).
50
+
### Why are existing convnets insufficient?
51
+
The existing convnets are insufficient to deal with tasks that have inputs and outputs with variable sequence lengths.
52
+
In the example of video captioning, inputs have variable number of frames (e.g. 10-minute and 10-hour long video) and outputs are captions
53
+
of variable length. Convnets can only take in inputs with a fixed size of width and height and cannot generalize over
54
+
inputs with different sizes. In order to tackle this problem, we introduce Recurrent Neural Networks (RNNs).
55
+
56
+
### Recurrent Neural Network
57
+
RNN is basically a blackbox (Figure 2), where it has an “internal state” that is updated as a sequence is processed. At every single timestep, we feed in an input vector into RNN where it modifies that state as a function of what it receives. When we tune RNN weights,
58
+
RNN will show different behaviors in terms of how its state evolves as it receives these inputs.
59
+
We are also interested in producing an output based on the RNN state, so we can produce these output vectors on top of the RNN (as depicted in Figure 2.
0 commit comments