Introduction to Transformers for NLP Flashcards
(37 cards)
RNN
Recurrant Neural Network
In an RNN, the information is repeated endlessly within a loop
Bag of Words
n-grams
trigram
A trigram model keeps the context of the last two words to predict the next word in the sequence
LSTM
Long Short-Term Memory
GRU
Gated Reccurrent Unit
Feed-forward Neural Network
BP Mechanism
Back Propogation
Gradient Decent
T5 model
seq2seq
sequence-to-sequence neural network
multi-head attention
multiple modules of self-attention capturing different kinds of attentions.
feed-forward
masked multi-head attention
linear
softmax
The softmax function is a mathematical function that converts a vector of real numbers into a probability distribution, where each value is between 0 and 1 and all values sum to 1.
input embeddings
output embeddings
tokenize
vectorize
positional encoding
self-attention
self-attention allows us to associate each word in the input with other words in the same sentence
query vector
key vector