You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: rnn.md
+18-15Lines changed: 18 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,22 +21,25 @@ Table of Contents:
21
21
In this lecture note, we're going to be talking about the Recurrent Neural Networks (RNNs). One
22
22
great thing about the RNNs is that they offer a lot of flexibility on how we wire up the neural
23
23
network architecture. Normally when we're working with neural networks (Figure 1), we are given a fixed sized
24
-
input vector (red), then we process it with some hidden layers (green), and then we produce a
25
-
fixed sized output vector (blue) as depicted in the leftmost model in Figure 1. Recurrent Neural
26
-
Networks allow us to operate over sequences of input, output, or both at the same time. For
27
-
example, in the case of image captioning, we are given a fixed sized image and then through an RNN
28
-
we produce a sequence of words that describe the content of that image (second model in Figure 1).
29
-
Or for example, in the case of sentiment classification in the NLP, we are given a sequence of words
30
-
of the sentence and then we are trying to classify whether the sentiment of that sentence is
31
-
positive or negative (third model in Figure 1). In the case of machine translation, we can have an
24
+
input vector (red), then we process it with some hidden layers (green), and we produce a
25
+
fixed sized output vector (blue) as depicted in the leftmost model ("Vanilla" Neural Networks) in Figure 1.
26
+
While **"Vanilla" Neural Networks** receive a single input and produce one label for that image, there are tasks where
27
+
the model produce a sequence of outputs as shown in the one-to-many model in Figure 1. **Recurrent Neural Networks** allow
28
+
us to operate over sequences of input, output, or both at the same time. An example of **one-to-many** model is image
29
+
captioning where we are given a fixed sized image and produce a sequence of words that describe the content of that image through RNN (second model in Figure 1).
30
+
An example of **many-to-one** task is action prediction where we look at a sequence of video frames instead of a single image and produce
31
+
a label of what action was happening in the video as shown in the third model in Figure 1. Another example of many-to-one task is
32
+
sentiment classification in NLP where we are given a sequence of words
33
+
of a sentence and then classify what sentiment (e.g. positive or negative) that sentence is.
34
+
An example of **many-to-many** task is video-captioning where the input is a sequence of video frames and the output is caption that describes
35
+
what was in the video as shown in the fourth model in Figure 1. Another example of many-to-many task is machine translation in NLP, where we can have an
32
36
RNN that takes a sequence of words of a sentence in English, and then this RNN is asked to produce
33
-
a sequence of words of a sentence in French, for example (forth model in Figure 1). As a last case,
34
-
we can have a video classification RNN where we might imagine classifying every single frame of
35
-
video with some number of classes, and most importantly we don't want the prediction to be only a
36
-
function of the current timestep (current frame of the video), but also all the timesteps (frames)
37
-
that have come before it in the video (rightmost model in Figure 1). In general Recurrent Neural
38
-
Networks allow us to wire up an architecture, where the prediction at every single timestep is a
39
-
function of all the timesteps that have come up to that point.
37
+
a sequence of words of a sentence in French. There is a also a variation of many-to-many task as shown in the last model in Figure 1,
38
+
where the model generates an output at every timestep. An example of this many-to-many task is video classification on a frame level
39
+
where the model classifies every single frame of video with some number of classes. We should note that we don't want
40
+
this prediction to only be a function of the current timestep (current frame of the video), but also all the timesteps (frames)
41
+
that have come before this video. In general, RNNs allow us to wire up an architecture, where the prediction at every single timestep is a
42
+
function of all the timesteps that have come before.
0 commit comments