Skip to content

Commit 92a0cdd

Browse files
committed
RNN update
1 parent 5d78e37 commit 92a0cdd

File tree

5 files changed

+44
-2
lines changed

5 files changed

+44
-2
lines changed

assets/rnn/lstm_mformula_1.png

65.4 KB
Loading

assets/rnn/lstm_mformula_2.png

29 KB
Loading
25.3 KB
Loading
13.4 KB
Loading

rnn.md

Lines changed: 44 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -94,13 +94,21 @@ $$
9494
h_t = tanh(W_{hh}h_{t-1} + W_{xh}x_t)
9595
$$
9696

97+
<div class="fig figcenter">
98+
<img src="/assets/rnn/vanilla_rnn_mformula_1.png" width="80%" >
99+
</div>
100+
97101
We can base predictions on top of $$h_t$$ by using just another matrix projection on top
98102
of the hidden state. This is the simplest complete case in which you can wire up a neural network:
99103

100104
$$
101105
y_t = W_{hy}h_t
102106
$$
103107

108+
<div class="fig figcenter">
109+
<img src="/assets/rnn/vanilla_rnn_mformula_2.png" width="40%" >
110+
</div>
111+
104112
So far we have showed RNN in terms of abstract vectors $$x, h, y$$, however we can endow these vectors
105113
with semantics in the following section.
106114

@@ -238,5 +246,39 @@ $$
238246
In practice, we can treat the exploding gradient problem through gradient clipping, which is clipping large gradient values to a maximum threshold. However, since vanishing gradient problem still exists in cases where largest singular value of W_{hh} matrix is less than one, LSTM was designed to avoid this problem.
239247
240248
241-
242-
249+
### LSTM Formulation
250+
251+
The following is the precise formulation for LSTM. On step $$t$$, there is a hidden state $$h_t$$ and
252+
a cell state $$c_t$$. Both $$h_t$$ and $$c_t$$ are vectors of size $$n$$. One distinction of LSTM from
253+
Vanilla RNN is that LSTM has this additional $$c_t$$ cell state, and intuitively it can be thought of as
254+
$$c_t$$ stores long-term information. LSTM can read, erase, and write information to and from this $$c_t$$ cell.
255+
The way LSTM alters $$c_t$$ cell is through three special gates: $$i, f, o$$ which correspond to “input”,
256+
“forget”, and “output” gates. The values of these gates vary from closed (0) to open (1). All $$i, f, o$$
257+
gates are vectors of size $$n$$.
258+
259+
At every timestep we have an input vector $$x_t$$, previous hidden state $$h_{t-1}$$, previous cell state $$c_{t-1}$$,
260+
and LSTM computes the next hidden state $$h_t$$, and next cell state $$c_t$$ at timestep $$t$$ as follows:
261+
262+
$$
263+
\begin{aligned}
264+
f_t &= \sigma(W_{hf}h_{t_1} + W_{xf}x_t) \\
265+
i_t &= \sigma(W_{hi}h_{t_1} + W_{xi}x_t) \\
266+
o_t &= \sigma(W_{ho}h_{t_1} + W_{xo}x_t) \\
267+
g_t &= \text{tanh}(W_{hg}h_{t_1} + W_{xg}x_t) \\
268+
\end{aligned}
269+
$$
270+
271+
<div class="fig figcenter">
272+
<img src="/assets/rnn/lstm_mformula_1.png" width="50%" >
273+
</div>
274+
275+
$$
276+
\begin{aligned}
277+
c_t &= f_t \odot c_{t-1} + i_t \odot g_t \\
278+
h_t &= o_t \odot \text{tanh}(c_t) \\
279+
\end{aligned}
280+
$$
281+
282+
<div class="fig figcenter">
283+
<img src="/assets/rnn/lstm_mformula_2.png" width="40%" >
284+
</div>

0 commit comments

Comments
 (0)