Recurrent Neural Networks Flashcards
(12 cards)
What kind of data are RNNs designed to process?
Sequential data such as speech, music, text (sentences), DNA sequences, and time series.
Why do we need recurrence (feedback loops) in neural networks for sequences?
To remember past inputs and share learned features across time steps, enabling handling of variable-length inputs and context.
What is a vanilla RNN’s key characteristic in its computational graph?
It has directed cycles (feedback loops) that propagate hidden states over time.
Name four typical RNN use-case structures on input/output length.
Many-to-one (e.g., sentiment classification), one-to-many (image captioning), many-to-many (machine translation), and synchronized many-to-many (video classification).
What is the vanishing gradient problem in RNNs?
Gradients diminish exponentially over long sequences, making it hard for vanilla RNNs to learn long-term dependencies.
What are the two gates in a Gated Recurrent Unit (GRU)?
The update gate and the reset gate, which control memory content and state updates.
How does a GRU update its hidden state?
It uses the update gate to combine the previous state with a candidate state computed using the reset gate.
What are the three gates in an LSTM cell?
Input gate, forget gate, and output gate, which regulate cell state writing, resetting, and reading.
Why are LSTMs more complex to train than GRUs or vanilla RNNs?
They have more gates and parameters, increasing computational cost and requiring more data to learn effectively.
Compare vanilla RNN, GRU, and LSTM in terms of training difficulty and effectiveness.
Training difficulty: RNN < GRU < LSTM; effectiveness: RNN < GRU ≈ LSTM.
Give two real-world examples where LSTMs have been successfully applied.
Google Translate for speech translation; Facebook’s daily automatic translations; Apple’s QuickType and Siri text prediction.
What major architecture supplanted RNNs for sequence tasks and why?
Transformers replaced RNNs because they enable parallel sequence processing and avoid vanishing gradients via self-attention.