LLM Concepts Flashcards
(49 cards)
What is a Large Language Model (LLM)?
A deep learning model trained on large corpora of text to understand and generate human language.
What architecture do most LLMs use?
The transformer architecture.
What is the transformer model?
A model that uses self-attention mechanisms to process sequences in parallel.
What is self-attention?
A mechanism where a model learns which parts of a sequence are relevant to each other.
What is positional encoding in transformers?
Information added to input tokens to preserve word order.
What is a token?
A unit of text, often a word or subword, processed by the model.
What is tokenization?
The process of converting text into tokens.
What is pretraining in LLMs?
Training the model on large unlabeled text data to learn general language patterns.
What is fine-tuning?
Adapting a pretrained model to a specific task with additional labeled data.
What is masked language modeling?
A training task where some input tokens are hidden and the model must predict them.
What is causal language modeling?
A training task where the model predicts the next token in a sequence.
What is GPT?
Generative Pretrained Transformer — a causal language model trained to predict the next word.
What is BERT?
Bidirectional Encoder Representations from Transformers — a masked language model.
How is GPT different from BERT?
GPT is unidirectional and generative; BERT is bidirectional and better for understanding.
What is zero-shot learning?
Making predictions without seeing task-specific examples during training.
What is few-shot learning?
Learning a task with only a few examples.
What is instruction tuning?
Training LLMs to follow instructions in natural language.
What is prompt engineering?
The craft of designing effective input prompts to guide LLM behavior.
What is a system prompt?
A special prompt that guides the behavior of an LLM session.
What is context window?
The maximum number of tokens an LLM can process at once.
What is attention mechanism?
A method that lets models focus on different parts of the input when making predictions.
What is temperature in text generation?
A parameter that controls randomness — higher values yield more diverse outputs.
What is top-k sampling?
A decoding method that samples from the top k most likely next tokens.
What is top-p (nucleus) sampling?
A method that samples from the smallest set of tokens with a cumulative probability > p.