RNNs Flashcards

1
Q

How do Hidden Markov Models input words?

A

They do it one at a time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some limits around state based models?

A

Supervised ML techniques take fixed sequence inputs, but sentences can vary in length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the workaround to sentences not having a fixed length?

A

We use a sliding window of words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a problem using a sliding window of words?

A

It is hard to learn semantic patterns due to long range dependencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why can a single sentence generate lots of inputs?

A

This is because of the sliding window - if we have the sentence “and thanks for all the fish” and a window size of 3, we can have inputs of “and thanks for”, “thanks for all”, “for all the” and “all the fish”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In the image, what is the size of the input?

A

It is 3 times dimension of the embedding, as we have three embeddings that are all concatenated together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are Recurrent Neural Networks based on?

A

Elman Networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is different about RNNs compared to NNs?

A

We don’t just take the immediate input, we also factor in the previous input as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What happens to the values of the hidden layer at time t-1 when an input is received at time t?

A

The values are provided as input in addition to the current input vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What type of network does the image show?

A

A simple RNN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain what the image shows?

A

It shows how an RNN works, we have an input vector, which is adjusted by the weights, w, and the hidden layer from the previous input. We aggregate these with the current weights w to get the new value for the hidden layer values. These are multiplied by the weights for the output which are then output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How are the hidden layer values computed?

A

An activation function is used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain what the image shows

A

It shows that to get the hidden layer values, ht, we multiply the previous hidden layer by weights U, and add the current input multiplied by weights W. Then to get the output, we use a function f, usually softmax, to get the output vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does the loss function work in an RNN?

A

It needs ht and ht-1, which in turn needs ht-2 and so on

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When using an RNN for language models, what is the input?

A

The input is a sequence of L words in vocab V where L is the length of the sequence so far, which are then one-hot encoded and used as a vector of size L x V

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a one-hot vector in regards to a language model?

A

It is a vector of size V that is filled with 0s apart from the index where that word appears in the vocabulary

17
Q

When using a RNN with a language model, what is the output Y?

A

It is the predicted next word in the sequence, which is a probability distribution over V

18
Q

What does cross-entropy measure?

A

It measures how well a set of estimated probabilities matches the target class

19
Q

What is teacher forcing?

A

It uses the output from prior training steps as input to help model convergence, so when making the next prediction, use the ground truth sequence rather than the predicted values to ensure that training keeps on track

20
Q

How does sequence labelling with RNNs work (e.g. POS tagging)?

A

The input X is a sequence of words

The output Y is POS tag probabilities (most likely chosen by argmax)

Pre-trained word embeddings can be used

The loss function is a cross-entropy loss function

21
Q

How does autoregressive generation using an RNN work (e.g. text generator)?

A

Input X is a sequence of words so far, starting with start token

Output Y is the next word to be added to X

Pre-trained word embeddings can be used

The loss function is a cross entropy loss function

22
Q

How does sequence classification work with an RNN (e.g. sentence/document classifier)?

A

Input X is a sequence of words in sentence/document

Output Y is a class probability

Use of both an RNN + MLP

Cross entropy loss function based on classification result

23
Q

How do stacked RNNs work?

A

The entire output sequence of one RNN is used as an input for another RNN

24
Q

What is a positive of using a stacked RNN?

A

It encodes different levels of abstract representations which allows for more sophisticated patters to be encoded

25
Q

What is a drawback of using a stacked RNN?

A

Adding more RNN layers increases training time

26
Q

What are stacked RNNs an example of?

A

Deep Learning

27
Q

How does a bi-directional RNN work?

A

We have an RNN layer that does a forward pass, a separate RNN layer that does a backwards pass, and then we concatenate the hidden layer values for each position t in the sequence

28
Q

In an RNN, how many sets of weights do we have to update?

A

3 - W, the weights from the input layer to the hidden layer, U, the weights from the previous hidden layer to the current hidden layer and V, the weights from the hidden layer to the output layer