RNN & Attention Flashcards
(28 cards)
Neural Nets
What is a Feedforward Neural Net?
n features -> W matrix (dxn) -> non linear func -> d hidden units -> V matrix (Cxd) -> softmax -> C probabilities
Representing text by summing/averaging word embeddings
Neural Nets
Word Embeddings in NNs
Similar input words get similar vectors
Neural Nets
W Matrix in NNs
Similar output words get similar words in softmax matrix
Neural Nets
Hidden States in NNs
Similar contexts get similar hidden states
Neural Nets
What problems ARE handled by NNs? (vs count-based LMs)
- Can share strength among similar words and contexts
- Can condition on context with intervening (interconnected) words
Neural Nets
What problems ARE NOT handled by NNs?
Cannot handle long-distance dependencies
Recurrent Neural Networks
Sequential Data Examples
- Words in sentences
- Characters in words
- Sentences in sample
Recurrent Neural Networks
Long Distance Dependencies Examples
- Agreement in number, gender
- Selectional preference (determine meaning of rain/reign using clouds/queen)
Recurrent Neural Networks
Recurrent Neural Networks
- Retains information from previous inputs due to the use of hidden states
- Designed to process sequential data, where the order of the data matters (FFNN treats inputs independently)
- At each step, RNN takes in current input with the hidden state from the previous step and updates the hidden state (“remembers”)
Recurrent Neural Networks
Unrolling in Time
- RNN updates hidden vector upon each input,
- Unrolling means breaking down “cell” into multiple copies
- Make it easier to see how network processes sequence step-by-step
Recurrent Neural Networks
What can RNNs do?
- Sentence classification
- Conditional generation
- Language modeling
- POS tagging
Recurrent Neural Networks
Sentence Classification
Read whole sentence and represent it
Recurrent Neural Networks
Conditional Generation
Use sentence representation to make prediction
Recurrent Neural Networks
Language Modeling
Read context up until a point and represent the context
Recurrent Neural Networks
POS Tagging
Use sentence and context representation to determine the POS of a word
RNN Training
How to update hidden state?
h_t = tanh( Wx_t + Vh_t + b_h )
RNN Training
How to compute output vector?
y_t = tanh( Uh_t + b_y )
RNN Training
How to calculate the loss?
- Loss/label is calculated each time the hidden state is updated (at each input)
- Add up all losses to get total loss
RNN Training
Unrolling
Unrolled graph (of sum of loss) is a well-formed computation graph that we can use to run backpropagation through time
RNN Training
Parameter Tying
- All parameter calculation are tied together because we are summing the loss together
- This is how we are able to determine context, POS, and long-distance dependencies
*
Recurrent Neural Networks
Bi-RNNs
- Runs the RNN in both directions
- Need two different models to run the input through
- Helpful with context as context goes in both directions
RNN Long Short-Term Memory
Vanshing Gradient
- Cccurs during backpropagation
- Gradients (used to update the network’s weights) shrink exponentially as they are propagated backward through time
- Makes it hard to determine long-distance dependencies
RNN LSTM
Long Short-Term Memory (LSTM)
- Overcomes vanishing gradients and helps to learn long-term dependencies
- Has memory cell and gates that regulate the flow of information
- Allows network to retain or forget information over long sequences
Attention
Why do we need attention?
- In normal RNN, as the input sequence becomes longer, the model struggles to compress all relevant information
- Results in poor performance