@@ -8,6 +8,7 @@ Table of Contents:
88
99- [ Intro to RNN] ( #intro )
1010- [ RNN example as Character-level language model] ( #char )
11+ - [ Multilayer RNNs] ( #multi )
1112- [ Long-Short Term Memory (LSTM)] ( #lstm )
1213
1314
@@ -167,6 +168,31 @@ how to scale up the training of the model over larger training dataset.
167168
168169
169170
171+ <a name =' multi ' ></a >
172+
173+ ## Multilayer RNNs
174+
175+ So far we have only shown RNNs with just one layer. However, we're not limited to only a single layer architectures.
176+ One of the ways, RNNs are used today is in more complex manner. RNNs can be stacked together in multiple layers,
177+ which gives more depth, and empirically deeper architectures tend to work better (Figure 4).
178+
179+ <div class =" fig figcenter fighighlight " >
180+ <img src =" /assets/rnn/multilayer_rnn.png " width =" 40% " >
181+ <div class =" figcaption " >Figure 4. Multilayer RNN example.</div >
182+ </div >
183+
184+ For example, in Figure 4, there are three separate RNNs each with their own set of weights. Three RNNs
185+ are stacked on top of each other, so the input of the second RNN (second RNN layer in Figure 4) is the
186+ vector of the hidden state vector of the first RNN (first RNN layer in Figure 4). All stacked RNNs
187+ are trained jointly, and the diagram in Figure 4 represents one computational graph.
188+
189+
190+
191+
170192<a name =' lstm ' ></a >
171193
172194## Long-Short Term Memory (LSTM)
195+
196+ So far we have seen only a simple recurrence formula for the Vanilla RNN. In practice, we actually will
197+ rarely ever use Vanilla RNN formula. Instead, we will use what we call a Long-Short Term Memory (LSTM)
198+ RNN.
0 commit comments