Lecture 4 Flashcards
What is a recurrent neural network?
A recurrent neural network or RNN is a deep neural network trained on sequential or time series data to create a machine learning (ML) model.
In what situations may inputs to the network be independent?
Some image processing. Eg where we are showing the network 1 image at a time, it doesn’t matter what the previous inputs/predictions were.
If two inputs to the network are independent of each other, their outputs will also be independent.
In what situations may inputs to the network be dependent?
Eg autonomous cars - we want to interpret what we see in terms of what we saw previously. Eg if we saw a lorry previously that is no longer in direct sight, we want to still have the knowledge that it is there.
When inputs/predictions are dependent, on previous data, how do we describe this correlation?
Time correlation
What contexts are time correlations very relevant for?
Language modelling (eg predicting the next word, translating).
We build up the sentence word by word, a bit at a time, narrowing down the sentence one word at a time. This prediction will be different if we predict the next word using only the current one vs the knowledge of the context of the whole sentence.
Particularly in languages like German, where the meaning of a sentence isn’t always clear until the last words, we need to know the context (eg the previous sentences). Knowing context is also a problem for human translators.
What situations are time correlations often critical?
- Speech recognition eg you miss a word, can work it out
- Handwriting recognition - make a guess based on words around it
- Machine translation
- Object classification
- Prediction of stock market prices - supposedly predict based on what has happened before
Does the standard neural network account for time correlation?
No, the output zi only depends on the input xi - where I represents a time step, ie the input the ith time we make a prediction is only dependent on the ith inputs.
How do we make the output Z(i+1) depend on inputs x(i+1) and x(i)?
What scenarios may we want this?
We could give the network inputs from the earlier timepoints as well.
- Self-driving car sees now and what it saw previously
- Sentence prediction: need more than one word to infer context. Not all sentences have the same length, so we would need a new network each time.
What is the problem with simply having the output Z(i+1) depend on inputs x(i+1) and x(i)?
It depends only on one timestep previously. We generally have to look much further back. We want to account for correlations between points in a sequence.
How can we account for correlations between points in a sequence?
Use the output from point i as an input to point i + 1
Linking up networks, we can take inputs of the current step and outputs of previous runs of the network. This means that the output will depend on the outputs at all previous time steps.
This acts as a very deep neural network
Define a recurrent neural network.
A deep network where the weights are shared between layers.
How are recurrent neural networks often represented?
A single network with the output feeding back in.
It is no longer necessarily a feed-forward network, you’re also feeding back into the network.
Can also be shown with several networks side-by-side
Describe the architecture of a RNN.
Unlike traditional deep neural networks, where each dense layer has distinct weight matrices, RNNs use shared weights across time steps, allowing them to remember information over sequences.
If we consider the perceptron, we have some set of features xi connected to a single hidden lair with nodes. Each feature is connected to each of the nodes. Each node has its own output. This enables features to be combined and activations applied to produce outputs. All of those outputs will go into the next layer.
[See flashcard]
In the most general case, we have n1 inputs and n0 outputs. For text recognition, an input xi is a representation of a word and zi is the probability distribution over vocabulary.
What are the sets of weights associated with RNNs?
There are two sets of weights (Wx for the inputs and Wz for the outputs)
- Input features (xi) -> hidden nodes
eg 7 input features and 5 nodes = 35 weights - Hidden nodes -> outputs (Zi)
eg 5 features to 5 nodes = 25 weights - Total connections needed = 60
- Then may have biases +5 = 65
What are the set of activations associated with the RNN for a perceptron?
Z(0) = 0 - we assume the first time we run it, it is zero
[See flashcard]
What function do we apply in an RNN to get probabilities that add to 1?
Softmax function
What is the formula for the softmax function for a two-state classification problem?
[See flashcard]
How do we train an RNN?
An RNN is essentially just a (very) deep network and is trained like one.
What does an RNN “unroll” to give?
The network is”unrolled” to give a deep network
What is a difference between a standard deep neural network and an RNN?
In a standard deep neural network, you would have different weights between nodes but here we have shared weights.
Eg node 1 and node 2 in “layer 1” have same weight as node 1 and node 2 in “layer 2”
How would we find the error of an RNN?
Back propagation gives error derivatives
- The same weight appears several times in the network
We are propagating backwards through the network and time.
What may we encounter with RNNs?
An RNN is a very deep network, trained by back-propagation.
We may encounter the vanishing gradient problem - the further back we go through the network, the more likely the gradient is to be zero. [For a deep neural network, there are even more opportunities for the gradient to go to 0].
This limits the memory of the network - the model is more sensitive to recent inputs than ones from further back in time. In reality we only change the weights based on the current one and a few previous runs (rather than all of them).
How can we solve the vanishing gradient problem in RNNs?
Solved using long short-term memory (LSTM)
What is long short-term memory (LSTM)?
LSTM (Long Short-Term Memory) is a recurrent neural network (RNN) architecture widely used in Deep Learning. It has a more complex architecture than can remember (and forget) over arbitrary intervals.
A traditional RNN has a single hidden state that is passed through time, which can make it difficult for the network to learn long-term dependencies. LSTMs model address this problem by introducing a memory cell, which is a container that can hold information for an extended period.