Skip to content

Commit 4268b1c

Browse files
committed
RNN update
1 parent 4fe6304 commit 4268b1c

File tree

2 files changed

+71
-2
lines changed

2 files changed

+71
-2
lines changed

assets/rnn/rnn_blackbox.png

24.6 KB
Loading

rnn.md

Lines changed: 71 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ permalink: /rnn/
77
Table of Contents:
88

99
- [Intro to RNN](#intro)
10+
- [RNN example as Character-level language model](#char)
11+
- [Long-Short Term Memory (LSTM)](#lstm)
1012

1113

1214

@@ -31,9 +33,76 @@ a sequence of words of a sentence in French, for example (forth model in Figure
3133
we can have a video classification RNN where we might imagine classifying every single frame of
3234
video with some number of classes, and most importantly we don't want the prediction to be only a
3335
function of the current timestep (current frame of the video), but also all the timesteps (frames)
34-
that have come before it in the video (rightmost model in Figure 1).
36+
that have come before it in the video (rightmost model in Figure 1). In general Recurrent Neural
37+
Networks allow us to wire up an architecture, where the prediction at every single timestep is a
38+
function of all the timesteps that have come up to that point.
3539

3640
<div class="fig figcenter fighighlight">
3741
<img src="/assets/rnn/types.png" width="100%">
38-
<div class="figcaption">Different (non-exhaustive) types of Recurrent Neural Network architectures. Red boxes are input vectors. Green boxes are hidden layers. Blue boxes are output vectors.</div>
42+
<div class="figcaption">Figure 1. Different (non-exhaustive) types of Recurrent Neural Network architectures. Red boxes are input vectors. Green boxes are hidden layers. Blue boxes are output vectors.</div>
3943
</div>
44+
45+
A Recurrent Neural Network is basically a blackbox (Figure 2), where it has a state and it receives through
46+
timesteps input vectors. At every single timestep we feed in an input vectors into the RNN and it
47+
can modify that state as a function of what it receives at every single timestep. There are weights
48+
inside the RNN and when we tune those weights, the RNN will have a different behavior in terms of
49+
how its state evolves, as it receives these inputs. Usually we are also interested in producing an
50+
output based on the RNN state, so we can produce these output vectors on top of the RNN (as depicted
51+
in Figure 2).
52+
53+
<div class="fig figcenter fighighlight">
54+
<img src="/assets/rnn/rnn_blackbox.png" width="20%" >
55+
<div class="figcaption">Figure 2. Simplified RNN box.</div>
56+
</div>
57+
58+
More precisely, RNN can be represented as a recurrence formula of some function $$f_W$$ with
59+
parameters $$W$$:
60+
61+
$$
62+
h_t = f_W(h_{t-1}, x_t)
63+
$$
64+
65+
where at every timestep it receives some previous state as a vector $$h_{t-1}$$ at previous
66+
iteration timestep $$t-1$$ and current input vector $$x_t$$ to produce the current state as a vector
67+
$$h_t$$. The same function is used at every single timestep. We have a fixed function $$f_W$$ of
68+
weights $$W$$ and we applied that single function at every single timestep and that allows us to use
69+
the Recurrent Neural Network on sequences without having to commit to the size of the sequence because
70+
we apply the exact same function at every single timestep, no matter how long the input or output
71+
sequences are.
72+
73+
In the most simplest form of RNN, which we call a Vanilla RNN, the network is just a single hidden
74+
state $$h$$ where we use a recurrence formula that basically tells us how we should update our hidden
75+
state $$h$$ as a function of previous hidden state and the current input $$x_t$$. In particular, we're
76+
going to have these weight matrices $$W_{hh}$$ and $$W_{xh}$$, where they will project both the hidden
77+
state from the previous timestep and the current input $$x_t$$, and then those are going to be summed
78+
and squished with $$tanh$$ function to update the hidden state $$h_t$$ at timestep $$t$$. This recurrence
79+
is telling us how $$h$$ will change as a function of its history and also the current input at this
80+
timestep:
81+
82+
$$
83+
h_t = tanh(W_{hh}h_{t-1} + W_{xh}x_t)
84+
$$
85+
86+
We can base predictions on top of $$h$$, for example, by using just another matrix projection on top
87+
of the hidden state. This is the simplest complete case in which you can wire up a neural network:
88+
89+
$$
90+
y_t = W_{hy}h_t
91+
$$
92+
93+
So far we have showed RNN in terms of abstract vectors $$x, h, y$$, however we can endow these vectors
94+
with semantics in the following section.
95+
96+
97+
98+
99+
<a name='char'></a>
100+
101+
## RNN example as Character-level language model
102+
103+
104+
105+
106+
<a name='lstm'></a>
107+
108+
## Long-Short Term Memory (LSTM)

0 commit comments

Comments
 (0)