Skip to content

Commit 655b346

Browse files
Update rnn.md
1 parent d19fbfc commit 655b346

File tree

1 file changed

+18
-15
lines changed

1 file changed

+18
-15
lines changed

rnn.md

Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -21,22 +21,25 @@ Table of Contents:
2121
In this lecture note, we're going to be talking about the Recurrent Neural Networks (RNNs). One
2222
great thing about the RNNs is that they offer a lot of flexibility on how we wire up the neural
2323
network architecture. Normally when we're working with neural networks (Figure 1), we are given a fixed sized
24-
input vector (red), then we process it with some hidden layers (green), and then we produce a
25-
fixed sized output vector (blue) as depicted in the leftmost model in Figure 1. Recurrent Neural
26-
Networks allow us to operate over sequences of input, output, or both at the same time. For
27-
example, in the case of image captioning, we are given a fixed sized image and then through an RNN
28-
we produce a sequence of words that describe the content of that image (second model in Figure 1).
29-
Or for example, in the case of sentiment classification in the NLP, we are given a sequence of words
30-
of the sentence and then we are trying to classify whether the sentiment of that sentence is
31-
positive or negative (third model in Figure 1). In the case of machine translation, we can have an
24+
input vector (red), then we process it with some hidden layers (green), and we produce a
25+
fixed sized output vector (blue) as depicted in the leftmost model ("Vanilla" Neural Networks) in Figure 1.
26+
While **"Vanilla" Neural Networks** receive a single input and produce one label for that image, there are tasks where
27+
the model produce a sequence of outputs as shown in the one-to-many model in Figure 1. **Recurrent Neural Networks** allow
28+
us to operate over sequences of input, output, or both at the same time. An example of **one-to-many** model is image
29+
captioning where we are given a fixed sized image and produce a sequence of words that describe the content of that image through RNN (second model in Figure 1).
30+
An example of **many-to-one** task is action prediction where we look at a sequence of video frames instead of a single image and produce
31+
a label of what action was happening in the video as shown in the third model in Figure 1. Another example of many-to-one task is
32+
sentiment classification in NLP where we are given a sequence of words
33+
of a sentence and then classify what sentiment (e.g. positive or negative) that sentence is.
34+
An example of **many-to-many** task is video-captioning where the input is a sequence of video frames and the output is caption that describes
35+
what was in the video as shown in the fourth model in Figure 1. Another example of many-to-many task is machine translation in NLP, where we can have an
3236
RNN that takes a sequence of words of a sentence in English, and then this RNN is asked to produce
33-
a sequence of words of a sentence in French, for example (forth model in Figure 1). As a last case,
34-
we can have a video classification RNN where we might imagine classifying every single frame of
35-
video with some number of classes, and most importantly we don't want the prediction to be only a
36-
function of the current timestep (current frame of the video), but also all the timesteps (frames)
37-
that have come before it in the video (rightmost model in Figure 1). In general Recurrent Neural
38-
Networks allow us to wire up an architecture, where the prediction at every single timestep is a
39-
function of all the timesteps that have come up to that point.
37+
a sequence of words of a sentence in French. There is a also a variation of many-to-many task as shown in the last model in Figure 1,
38+
where the model generates an output at every timestep. An example of this many-to-many task is video classification on a frame level
39+
where the model classifies every single frame of video with some number of classes. We should note that we don't want
40+
this prediction to only be a function of the current timestep (current frame of the video), but also all the timesteps (frames)
41+
that have come before this video. In general, RNNs allow us to wire up an architecture, where the prediction at every single timestep is a
42+
function of all the timesteps that have come before.
4043

4144
<div class="fig figcenter fighighlight">
4245
<img src="/assets/rnn/types.png" width="100%">

0 commit comments

Comments
 (0)