Lecture 10 Flashcards

Recurrent Neural Network (RNN) and the Vanishing Gradient

1
Q

Recurrent Neural Network (RNN)

A

A Recurrent Neural Network (RNN) is a type of artificial neural network that is used for processing sequential data such as time series, speech, and text. Unlike feedforward neural networks, RNNs have loops that allow information to persist. This makes them well-suited for tasks that require context or memory, such as language modeling, machine translation, and speech recognition. RNNs can be trained using backpropagation through time (BPTT), which is a variant of the backpropagation algorithm that is used to train feedforward neural networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Limitations:

A

Only accepting a fixed-size vector as input and produce a fixed-size vector as output (e.g., probabilities of different classes).
* Use a fixed amount of computational steps (e.g. the number of layers in the
model).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Recurrent Neural Networks

A

Recurrent Neural Networks are networks with loops, allowing information to persist.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Formula

A

ht = fw(ht-1,xt)

h(sub t) = function(w)(h(sub t-1), x sub t)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What about parameters for Dense layer?

A

Output: y dimension

Hidden state dimension: h

Bias: y dimension

Parameters = shape (y) * shape (h)+ shape (y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Backpropagation in RNNs

A

A recurrent neural network can be imagined as multiple copies of the same network, each passing a message to a successor. The diagram above provides a
schematic view of what happens if we could unroll the loop.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Vanishing Gradient Problem

A

Words from time steps far away are not as influential as they should be any more

Example:
Michael and Jane met last Saturday. It was a nice sunny day when they saw each other in the park. Michael just saw the doctor two weeks ago. Jane came back from Norway last Monday. Jane offered her best wish to _________.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Networks with Memory

A
  • Vanilla RNN operates in a “multiplicative” way (repeated
    tanh or sigmoid activation) to remember previous inputs
  • This can work OK if we only need short term memory
  • Using RELU can alleviate the VG problem (derivative = 1)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Networks with Memory

A

To extend memory beyond the short term:
* Long Short-Term Memory (LSTM) (Hochreiter and
Schmidhuber, 1997)
* Gated Recurrent Unit (GRU) (Cho et al. 2014)
* Both designs process information in an “additive” way with
gates to control information flow.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Text Generation with RNNs

A

Text generation is a natural candidate for sequential learning: “Based on what was said
before, what’s the next thing that will be (or should be) said?” Because RNNs are good for
using variable length, sequential inputs to predict the output, they are well suited to text
generation tasks where initial “seed” text is used to generate new text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Different Varieties of Sequence Modeling

A

Input: Scalar
Output: Scalar

“standard”
classification /
regression
problems: this
is not sequence
modeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Different Varieties of Sequence Modeling

A

Input: Scalar
Output: Sequence

Example: Image
to text; question
answering; skip-
gram analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Different Varieties of Sequence Modeling

A

Input: Sequence
Output: Scalar

Example: sentence
classification,
multiple-choice
question answering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Different Varieties of Sequence Modeling

A

Input: Sequence
Output: Sequence

Example: machine translation, video captioning, open-ended question answering, video question answering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Bigram Language Model vs. RNN

A
  • Practical bigram language models
    require the simplifying Markov
    assumption: prediction of next
    token is only dependent on the last
    predicted token
  • Probability of a sequence Y is
    simply a chain of p(y2|y1)p(y1)
  • Long range dependencies are lost
  • In contrast, an RNN conditions
    each prediction on the current
    input and the entirety of the
    foregoing sequence:
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Keras RNN Modules

A

This week we have examined the vanilla RNN, which is implemented in Keras as the SimpleRNN class. Although this vanilla RNN is rarely used in production, for some problems it can produce comparable results to LSTM and GRU while training fewer weights.

17
Q

Keras Models Including Today’s Class

A
  • Recurrent networks are the third
    type of supervised deep learning
    model we have created with Keras
    so far
  • The models all show the same
    basic underlying architectural
    approach, with enhancement to
    support convolution and
    recurrence
  • Today’s class covered “vanilla”
    RNNs and next week we will
    consider LSTMs and GRUs
18
Q

Simple RNN (From Keras Doc)

A

Key Features
* Vanilla RNN, susceptible to the vanishing or exploding gradient problem as previously
discussed
* Uses a for loop to iterate over the timesteps of a sequence, while maintaining internal state that
encodes information about timesteps it has seen so far.
* Ability to process an input sequence in reverse, via the go backwards argument
* Loop unrolling (which can lead to a large speedup when processing short sequences on CPU), via
the unroll argument
* By default, the output of a RNN layer contains a single vector per sample. This vector is the RNN
cell output corresponding to the last timestep, containing information about the entire input
sequence.

19
Q

Simple RNN (From Keras Doc)

A

Key Arguments
* units: Positive integer, dimensionality of the output
space.
* activation: Activation function to use. Default: hyperbolic tangent (tanh). If you pass None, no
activation is applied (ie. “linear” activation: a(x) = x).
* use bias: Boolean, (default True), whether the layer uses a bias vector.
* kernel initializer: Initializer for the kernel weights matrix, used for the linear transformation of the
inputs. Default: glorot_uniform.
* recurrent initializer: Initializer for
the recurrent kernel weights matrix, used for the linear transformation of the recurrent state. Default: orthogonal.

20
Q

Keras RNN Architectures

A
  • Having selected a recurrent
    approach, there are still a variety of decisions to be made about
    configuring a model
  • This lab demonstrates
    comparisons between a few of
    these options on a prediction task
  • The diagnostics demonstrated here are good to include in your own project as a way of demonstrating that the model configuration fits what you intended to do