|
94 | 94 | h_t = tanh(W_{hh}h_{t-1} + W_{xh}x_t) |
95 | 95 | $$ |
96 | 96 |
|
| 97 | +<div class="fig figcenter"> |
| 98 | + <img src="/assets/rnn/vanilla_rnn_mformula_1.png" width="80%" > |
| 99 | +</div> |
| 100 | + |
97 | 101 | We can base predictions on top of $$h_t$$ by using just another matrix projection on top |
98 | 102 | of the hidden state. This is the simplest complete case in which you can wire up a neural network: |
99 | 103 |
|
100 | 104 | $$ |
101 | 105 | y_t = W_{hy}h_t |
102 | 106 | $$ |
103 | 107 |
|
| 108 | +<div class="fig figcenter"> |
| 109 | + <img src="/assets/rnn/vanilla_rnn_mformula_2.png" width="40%" > |
| 110 | +</div> |
| 111 | + |
104 | 112 | So far we have showed RNN in terms of abstract vectors $$x, h, y$$, however we can endow these vectors |
105 | 113 | with semantics in the following section. |
106 | 114 |
|
|
238 | 246 | In practice, we can treat the exploding gradient problem through gradient clipping, which is clipping large gradient values to a maximum threshold. However, since vanishing gradient problem still exists in cases where largest singular value of W_{hh} matrix is less than one, LSTM was designed to avoid this problem. |
239 | 247 |
|
240 | 248 |
|
241 | | - |
242 | | - |
| 249 | +### LSTM Formulation |
| 250 | +
|
| 251 | +The following is the precise formulation for LSTM. On step $$t$$, there is a hidden state $$h_t$$ and |
| 252 | +a cell state $$c_t$$. Both $$h_t$$ and $$c_t$$ are vectors of size $$n$$. One distinction of LSTM from |
| 253 | +Vanilla RNN is that LSTM has this additional $$c_t$$ cell state, and intuitively it can be thought of as |
| 254 | +$$c_t$$ stores long-term information. LSTM can read, erase, and write information to and from this $$c_t$$ cell. |
| 255 | +The way LSTM alters $$c_t$$ cell is through three special gates: $$i, f, o$$ which correspond to “input”, |
| 256 | +“forget”, and “output” gates. The values of these gates vary from closed (0) to open (1). All $$i, f, o$$ |
| 257 | +gates are vectors of size $$n$$. |
| 258 | + |
| 259 | +At every timestep we have an input vector $$x_t$$, previous hidden state $$h_{t-1}$$, previous cell state $$c_{t-1}$$, |
| 260 | +and LSTM computes the next hidden state $$h_t$$, and next cell state $$c_t$$ at timestep $$t$$ as follows: |
| 261 | + |
| 262 | +$$ |
| 263 | +\begin{aligned} |
| 264 | +f_t &= \sigma(W_{hf}h_{t_1} + W_{xf}x_t) \\ |
| 265 | +i_t &= \sigma(W_{hi}h_{t_1} + W_{xi}x_t) \\ |
| 266 | +o_t &= \sigma(W_{ho}h_{t_1} + W_{xo}x_t) \\ |
| 267 | +g_t &= \text{tanh}(W_{hg}h_{t_1} + W_{xg}x_t) \\ |
| 268 | +\end{aligned} |
| 269 | +$$ |
| 270 | + |
| 271 | +<div class="fig figcenter"> |
| 272 | + <img src="/assets/rnn/lstm_mformula_1.png" width="50%" > |
| 273 | +</div> |
| 274 | + |
| 275 | +$$ |
| 276 | +\begin{aligned} |
| 277 | +c_t &= f_t \odot c_{t-1} + i_t \odot g_t \\ |
| 278 | +h_t &= o_t \odot \text{tanh}(c_t) \\ |
| 279 | +\end{aligned} |
| 280 | +$$ |
| 281 | + |
| 282 | +<div class="fig figcenter"> |
| 283 | + <img src="/assets/rnn/lstm_mformula_2.png" width="40%" > |
| 284 | +</div> |
0 commit comments