Lecture 10 Flashcards

Question 1

Q

Recurrent Neural Network (RNN)

Answer

A

A Recurrent Neural Network (RNN) is a type of artificial neural network that is used for processing sequential data such as time series, speech, and text. Unlike feedforward neural networks, RNNs have loops that allow information to persist. This makes them well-suited for tasks that require context or memory, such as language modeling, machine translation, and speech recognition. RNNs can be trained using backpropagation through time (BPTT), which is a variant of the backpropagation algorithm that is used to train feedforward neural networks

Question 2

Q

Limitations:

Answer

A

Only accepting a fixed-size vector as input and produce a fixed-size vector as output (e.g., probabilities of different classes).
* Use a fixed amount of computational steps (e.g. the number of layers in the
model).

Question 3

Q

Recurrent Neural Networks

Answer

A

Recurrent Neural Networks are networks with loops, allowing information to persist.

Question 4

Q

Formula

Answer

A

ht = fw(ht-1,xt)

h(sub t) = function(w)(h(sub t-1), x sub t)

Question 5

Q

What about parameters for Dense layer?

Answer

A

Output: y dimension

Hidden state dimension: h

Bias: y dimension

Parameters = shape (y) * shape (h)+ shape (y)

Question 6

Q

Backpropagation in RNNs

Answer

A

A recurrent neural network can be imagined as multiple copies of the same network, each passing a message to a successor. The diagram above provides a
schematic view of what happens if we could unroll the loop.

Question 7

Q

Vanishing Gradient Problem

Answer

A

Words from time steps far away are not as influential as they should be any more

Example:
Michael and Jane met last Saturday. It was a nice sunny day when they saw each other in the park. Michael just saw the doctor two weeks ago. Jane came back from Norway last Monday. Jane offered her best wish to _________.

Question 8

Q

Networks with Memory

Answer

A

Vanilla RNN operates in a “multiplicative” way (repeated
tanh or sigmoid activation) to remember previous inputs
This can work OK if we only need short term memory
Using RELU can alleviate the VG problem (derivative = 1)

Question 9

Q

Networks with Memory

Answer

A

To extend memory beyond the short term:
* Long Short-Term Memory (LSTM) (Hochreiter and
Schmidhuber, 1997)
* Gated Recurrent Unit (GRU) (Cho et al. 2014)
* Both designs process information in an “additive” way with
gates to control information flow.

Question 10

Q

Text Generation with RNNs

Answer

A

Text generation is a natural candidate for sequential learning: “Based on what was said
before, what’s the next thing that will be (or should be) said?” Because RNNs are good for
using variable length, sequential inputs to predict the output, they are well suited to text
generation tasks where initial “seed” text is used to generate new text.

Question 11

Q

Different Varieties of Sequence Modeling

Answer

A

Input: Scalar
Output: Scalar

“standard”
classification /
regression
problems: this
is not sequence
modeling

Question 12

Q

Different Varieties of Sequence Modeling

Answer

A

Input: Scalar
Output: Sequence

Example: Image
to text; question
answering; skip-
gram analysis

Question 13

Q

Different Varieties of Sequence Modeling

Answer

A

Input: Sequence
Output: Scalar

Example: sentence
classification,
multiple-choice
question answering

Question 14

Q

Different Varieties of Sequence Modeling

Answer

A

Input: Sequence
Output: Sequence

Example: machine translation, video captioning, open-ended question answering, video question answering

Question 15

Q

Bigram Language Model vs. RNN

Answer

A

Practical bigram language models
require the simplifying Markov
assumption: prediction of next
token is only dependent on the last
predicted token
Probability of a sequence Y is
simply a chain of p(y2|y1)p(y1)
Long range dependencies are lost
In contrast, an RNN conditions
each prediction on the current
input and the entirety of the
foregoing sequence:

Question 16

Q

Keras RNN Modules

Answer

Study These Flashcards

A

This week we have examined the vanilla RNN, which is implemented in Keras as the SimpleRNN class. Although this vanilla RNN is rarely used in production, for some problems it can produce comparable results to LSTM and GRU while training fewer weights.

Question 17

Q

Keras Models Including Today’s Class

Answer

Study These Flashcards

A

Recurrent networks are the third
type of supervised deep learning
model we have created with Keras
so far
The models all show the same
basic underlying architectural
approach, with enhancement to
support convolution and
recurrence
Today’s class covered “vanilla”
RNNs and next week we will
consider LSTMs and GRUs

Question 18

Q

Simple RNN (From Keras Doc)

Answer

Study These Flashcards

A

Key Features
* Vanilla RNN, susceptible to the vanishing or exploding gradient problem as previously
discussed
* Uses a for loop to iterate over the timesteps of a sequence, while maintaining internal state that
encodes information about timesteps it has seen so far.
* Ability to process an input sequence in reverse, via the go backwards argument
* Loop unrolling (which can lead to a large speedup when processing short sequences on CPU), via
the unroll argument
* By default, the output of a RNN layer contains a single vector per sample. This vector is the RNN
cell output corresponding to the last timestep, containing information about the entire input
sequence.

Question 19

Q

Simple RNN (From Keras Doc)

Answer

Study These Flashcards

A

Key Arguments
* units: Positive integer, dimensionality of the output
space.
* activation: Activation function to use. Default: hyperbolic tangent (tanh). If you pass None, no
activation is applied (ie. “linear” activation: a(x) = x).
* use bias: Boolean, (default True), whether the layer uses a bias vector.
* kernel initializer: Initializer for the kernel weights matrix, used for the linear transformation of the
inputs. Default: glorot_uniform.
* recurrent initializer: Initializer for
the recurrent kernel weights matrix, used for the linear transformation of the recurrent state. Default: orthogonal.

Question 20

Q

Keras RNN Architectures

Answer

Study These Flashcards

A

Having selected a recurrent
approach, there are still a variety of decisions to be made about
configuring a model
This lab demonstrates
comparisons between a few of
these options on a prediction task
The diagnostics demonstrated here are good to include in your own project as a way of demonstrating that the model configuration fits what you intended to do

Lecture 10 Flashcards

Recurrent Neural Network (RNN) and the Vanishing Gradient (20 cards)