LSTM Flashcards
(25 cards)
What problem do LSTMs solve in RNNs?
They mitigate vanishing and exploding gradients over long sequences.
What does LSTM stand for?
Long Short-Term Memory.
What is the main innovation in LSTM architecture?
It introduces gates that control memory flow through time.
What are the two types of memory in LSTM?
Short-term (hidden state) and long-term (cell state).
What are the three gates in an LSTM?
Forget gate, input gate, and output gate.
What is the purpose of the forget gate in an LSTM?
To control how much of the previous cell state to retain.
What activation function is used in LSTM gates?
Sigmoid.
What is the range of sigmoid outputs in LSTM gates?
Between 0 and 1.
What happens if the forget gate outputs 0?
The memory is completely forgotten.
What does the input gate control in an LSTM?
How much new information is added to the cell state.
What is the purpose of the candidate memory \u007E Cₜ?
It represents new information to be added to the cell state.
Why is tanh used for the candidate memory update?
To allow both positive and negative memory contributions.
What equation updates the cell state in an LSTM?
Cₜ = fₜ·Cₜ₋₁ + iₜ·\u007ECₜ
What does the output gate determine in an LSTM?
How much of the cell state is passed to the hidden state.
How is the LSTM hidden state computed?
hₜ = oₜ · tanh(Cₜ)
Why is tanh used on the cell state for output?
To allow rich activations for the hidden state.
How does the LSTM cell state help gradient flow?
It allows additive updates, preserving gradients over time.
What does the LSTM use to control memory flow?
Gates that filter information at each timestep.
How does the LSTM prevent vanishing gradients?
By maintaining long-term memory through the cell state.
What is backpropagation through time (BPTT)?
The algorithm used to compute gradients across timesteps in RNNs and LSTMs.
What is the trade-off of using LSTMs over RNNs?
LSTMs are more powerful but computationally more expensive.
What architecture allows deep LSTM learning?
Stacked or multi-layer LSTMs.
What is one example of a real-world LSTM use case?
Predicting financial time series like FTSE350.
What kind of data are LSTMs especially good at?
Sequential or time-dependent data.