recurrent neural networks Flashcards
(25 cards)
What is the key idea behind Recurrent Neural Networks (RNNs)?
They maintain a hidden state to remember information across time steps.
What type of data are RNNs designed for?
Sequential or time-series data.
Why is a feedforward network unsuitable for time-dependent inputs?
It treats all inputs as independent and ignores temporal order.
How does an RNN incorporate memory?
By passing a hidden state from one timestep to the next.
What does the formula aₜ = f(Wxₜ + Uaₜ₋₁ + b) represent?
The update rule for the RNN hidden state.
What is parameter sharing in RNNs?
Using the same weights at every timestep to process inputs consistently.
What is the benefit of parameter sharing?
Reduces the number of parameters and improves generalisation.
What activation function is commonly used in basic RNNs?
Tanh or ReLU.
What does ‘many-to-one’ RNN architecture mean?
A sequence of inputs produces a single output.
What does ‘many-to-many’ RNN architecture mean?
A sequence of inputs produces a sequence of outputs.
What is the main limitation of standard RNNs during training?
They suffer from vanishing and exploding gradient problems.
What causes the vanishing gradient problem in RNNs?
Repeated multiplication by weights less than 1 during backpropagation.
What causes the exploding gradient problem in RNNs?
Repeated multiplication by weights greater than 1, causing large gradients.
What is the impact of vanishing gradients on learning?
Prevents learning of long-term dependencies.
What is the impact of exploding gradients?
Leads to unstable updates and divergence during training.
What does backpropagation through time (BPTT) do?
Unrolls the network across time and computes gradients through each step.
In an RNN, what role does the weight wₐ play in memory?
It controls how much past information is carried forward.
What happens if the memory weight wₐ is 0.5 over 2 steps?
The signal quickly diminishes (e.g., 10.5 with input 7).
What happens if the memory weight wₐ is 2.0 over 2 steps?
The signal explodes (e.g., 168 with input 7).
What kind of real-world task was used to motivate RNNs in this lecture?
Rainfall prediction using radar image sequences.
What is the key advantage of RNNs over feedforward networks for sequences?
They model temporal dependencies through recurrent connections.
What is the initial hidden state a₋₁ usually set to?
Zero.
Why is unrolling the RNN necessary for training?
To apply gradient-based optimisation across all timesteps.
How does an RNN process inputs over time?
Sequentially, one timestep at a time, updating the hidden state.