Generative AI Flashcards
(31 cards)
RLHF
Reinforcement learning from human feedback
PEFT
Parameter efficient fine tuning
Self-Attention
- In order to predict the next word accurately, models need to be able to see the whole sentence or whole document
- The transformer architecture unlocked this ability
- Able to pay attention to the meaning of words it’s processing and attention is all you need
Multi-headed Self-Attention
-Multiple sets of self-attention weights or heads are learned in parallel independently of each other
- The outputs of the multi-headed attention layers are fed through a feed-forward network to the output of the encoder
How many parameters does a model with general knowledge about the world have?
Hundreds of billions
How many parameters do you need for a single task like summarizing dialog or acting as a customer service agent for a single company?
Often just 500-1,000 examples can result in good performance
Context window
Space available for the prompt
Inference
- Generating a prediction
- For LLMs, that would be using the model to generate text
Completion
Output of the model
Entity recognition
Word classification to identify all the people and places
Foundational models by decreasing number of parameters
Bloom -> GPT -> Flan-T5 -> LLaMa -> PaLM -> BERT
RNN
- Recurrent neural networks
- Used by previous generations of language models
What’s so important about the transformer architecture?
- The ability to learn the relevance and context of all the words in a sentence
- It can be scaled efficiently to use multi-core GPUs
- It can parallel process input data making use of much larger training datasets
- Dramatically improved the performance of natural language tasks over earlier generation of RNNs
Instruction Fine Tuning
Adapting pre-trained models to specific tasks and datasets
RAG
Retrieval Augmented Generation
Knowledge base data is used for the retrieval portion of the solution
What’s significant about the transformer architecture
- Can be scaled efficiently to use multi-core GPUs
- Parallel process input data making use of much larger training datasets
- Dramatically improved the
Origin of the Transformer Architecture
Attention Is All You Need
What are attention weights?
The model learns the relevance of each word to all other words during training
What are the two distinct parts of the transformer architecture
Encoder and decoder
Tokenize
- Convert words to numbers with each representing a position in a dictionary of all possible words
- There are multiple tokenization methods. Token IDS can match two complete words or parts of words
- The same tokenizer used to train the model must be used to generate the text
Embedding Layer
- Trainable vector embedding space
- High-D space where each token is represented as a vector and occupies a unique location within that space
- Each token id in the vocabulary is matched to a multi-dimensional vector
- During model training, the vectors learn to encode the meaning and context of individual tokens in the input sequence
What was the vector size in Attention Is All You Need
512 dimensions
Positional encoding
Position of word in sentence/document
What is passed into the encoder/decoder
- Token vectors and positional encoding
- Processed in parallel