Starter: GenAI Flashcards

Question

What was the Chinchilla paper's key insight? Name of the paper?

Answer 1

Optimal performance isn't just about the biggest model. It's about balancing model size and training data size. Smaller models trained on more data can outperform larger, under-trained models. Suggests a ~20:1 token-to-parameter ratio might be compute-optimal. Training Compute Optimal Large Language Models

Answer 2

When the target domain has unique vocabulary or word meanings not well-represented in general foundation models (e.g., medicine, finance). I.e. legal jargen, medical jargen, internal company acronyms/program names.

Answer 3

Acquiring sufficient high-quality, specialized training data can be difficult, since the problem is usually that the domain has very specific language/use of language that's not commonly found.

Answer 4

- Quantization: Using lower-precision numbers for weights to save memory at training & inference. - Distributed Training: Using multiple GPUs in parallel (e.g., Distributed Data Parallel).

Answer 5

It's an iterative process: Scope (Define Use Case) -> Select (Choose/Pre-train Model) -> Adapt & Align (Prompting, Fine-tuning, Evaluate) -> Application Integration (Deploy, Optimize). Expect to cycle between adapting and evaluating.

Answer 6

1. The architecture, which helps one understand things it might be good/bad at, e.g. Decoders are good at language generation, while Encoders are good at sentiment analysis. 2. The size of the model, which helps determine the cost of compute. 3. The training data, specifically if it had in its training data what's needed to be good at the task you're asking it to do.

Answer 7

Encoding only models: Known as: Auto-encoders How it trains: (Masked Language Modeling) masks a single token & tries to predict it from the context, forming a bi-directional representation of the context Objective: uses a “De-noising objective” Good for: entity recognition, sentiment analysis, word classification Examples: BERT & ROBERTA Decoder only models: Known as: Auto-regressive How it trains: (Causal Language Modeling) These intake a sequence & try to predict only the next token (unidirectionally) Objective: predict the next token (full language modeling) Good for: language generation, zero-short inference well for lots of things at scale Examples: GPT, BLOOM Encoder + Decoder Models: Known as: Sequence to Sequence model How it trains: (Span Corruption) masks random sequences of tokens that are then replaced with a single “Sentinel Token” Objective: decoder reconstructs the input auto-regressively Good for: translation, summarization, question answering Good where we have body of text as input & output Examples: BART, T5

Answer 8

A method of storing model weights will less precision to make model training & inference less computationally intensive. It works by projecting the 32 bit floats typically used to store model weights into a less precise storing format, like BFLOAT 16. - It is based on the range of numbers of the parameters that are used. - Most libraries now offer Quantization aware training frameworks that learn the quantization scaling factors during training

Answer 9

How do we get the best model performance for the least compute, i.e. a compute optimal model? Three key ways to improve model performance: - More data - More model parameters to make meaning from data - More compute for training (higher compute, more time training), but we're usually optimizing for this since it's the constraint

Answer 10

IFT takes a model that has general world knowledge, usually from next word prediction, and makes it better at following instructions for specific tasks, which is what these models are generally used for. Catastrophic Forgetting: when a model from fine-tuning forgets crucial or significant information learned during pre-training. Use it when in context learning (ICL) does not help the model achieve sufficient task performance.

Answer 11

PEFT is freezing some or all of the existing model parameters & training only a subset to tune them to tasks. This helps make fine-tuning less computationally expensive & mitigates against catastrophic forgetting since most model weights are unchanged. LoRA is a popular method for this: Low Rank Adaptation.

Answer 12

1. Get a pre-trained model 2. Assemble a dataset of prompts + completions where the prompt = instruction + context & the completion = desired result 3. Split the dataset into Train, Validate, Test sets 4. Fine-tune the model on this dataset using a loss function (often with parallel distributed computation, PEFT, & quantization), tune hyperparameters with the validation set, & get model performance with the test set Output: Instruct Model

Answer 13

Instruction Fine-tuning

Answer 14

1. 500-1000 2. Catastrophic Forgetting, i.e. forgetting key information learned in training & failing to be effective on other tasks

Answer 15

It matters if you need your LLM to be good at more than just the task you finetuned on, i.e. Fine-tuned for sentiment analysis but lose the ability to do entity recognition well. Mitigated by: 1. PEFT - Parameter efficient fine-tuning, which only changes some parameters associated with the instructions/tasks or adds adaptive layers that are tuned instead 2. Fine-tune on multiple tasks at the same time: give a mix of instruction prompt + completion tasks in fine-tuning so it retains a more general ability to complete tasks beyond just a single instruction problem

Answer 16

Challenges: hard to assemble the datasets which often require 50-100k pairs. Benefits: a great way to improve performance after pre-training.

Answer 17

1. FLaN T5, FLaN PALM are both models that have been fine-tuned with the FLAN dataset 2. FLaN - fine-tuned language network is the dataset used for fine-tuning which has been fine-tuned on: - 473 datasets chosen from other models/papers - 146 task categories Example: SamSum - 16k messenger like conversations with summaries 3. “The metaphorical desert to the main course of pre-training”

Answer 18

Instruction datasets can have the same completion associated with multiple different sets of instruction that means the same thing. SamSum: “Can you summarize what was said in that conversation?” “Briefly summarize the dialogue.” “What were the main points in that conversation?” Etc.

Answer 19

Understand the limitations of the original datasets & create or find datasets that most closely match the task(s) the model will perform for your application. E.x. - summarizing chats back and forth on a social media platform might be quite different than summarizing customer service chats for booking a hotel.

Answer 20

Regular ML is Accurate predictions / Total predictions In language, single word differences can still be “accurate” & single word differences can be completely wrong: - The dog ran quickly to the store - The dog ran to the store - The dog didn’t run from the store We also use benchmarks for overall generalizable performance & metrics (BLEU & ROUGE) for model evaluation (iteratively) since LLMs are used primarily for more generalizable use cases.

Answer 21

ROUGE: Recall-Oriented Understudy for Gisting Evaluation Used in summarization - compares one summary to human reference summaries BLEU Score: Bi-lingual Evaluation Understudy Used in translation - compares to human generated translations

Answer 22

They are simple metrics used for iteration and diagnostic evaluation on translation & summarization tasks specifically. They shouldn’t be used to report on the overall model performance; that's what benchmarks are for.

Answer 23

ROUGE: a suite summarization metrics that stands for Recall-Oriented Understudy for Gisting Evaluation. The core idea is usually based on recall, measuring how much of the reference text is captured by the predicted text. Calculations: ROUGE -1 Recall: # of unigrams matches in output / unigrams in reference Precision: # unigram matches / # unigrams in output F1: 2* (precision X recall) / (precision + recall) Calculations : ROUGE-2 would use bigrams instead to do calcs. Scores would be lower. Rouge-L: instead of picking the n-gram size, use the largest common sub-sequence length that matches across the reference & prediction. E.x: if two grams match then 2 but if three grams match then 3 It is cold outside It is very cold outside L = 2

Answer 24

Different ROUGE calculations should be done across different tasks Since it’s just using a “Match” nonsense responses can be rated highly Reference: It is cold outside Prediction: cold cold cold cold Has a perfect precision. This can be mitigated through clipping, limiting the number of unigrams matches to the maximum times it appears in the reference, i.e. Cold appears 1 time in the reference 1 → .25 since it appears 1 time in the reference Still hard: Reference: it is cold outside Prediction: outside cold it is Perfect score deceivingly. As a result, you may need to experiment for difference n-gram sizes for different tasks

Answer 25

The performance is worse

Answer 26

Average(Precision Across range of n-gram sizes). Precision oriented compared to ROUGE being recall-oriented. Core idea: on average, how many grams from prediction appear in reference. Limitations: - Short predictions are favored, so introduces a brevity penalty. - Uses clipping so tokens in prediction can only be counted correct for as many times as they appear in reference BLEU score calculation: It's the average precision across a range of n-gram sizes.

Answer 27

To evaluate model performance. Since often simpler metrics like BLEU & ROUGE are not sufficient to assess overall model performance, benchmarks help to provide a better avenue to report on model performance across a wide range of tasks. Some benchmarks are designed to measure specific tasks.

Answer 28

GLUE: General Language Understanding - Description: collection on NLP tasks (sentiment analysis, question answering) from 2018 - Measures: generalized model performance SUPER-GLUE - Description: successor to GLUE launched in 2019 - Measures: has some of GLUEs tasks, additional tasks, and more challenging versions of the same tasks (Multi-sentence reasoning, Reading comprehension) MMLU: Massive Multitask Language Understanding - Description: must possess extensive world knowledge & problem solving - Measures: mathematics, law, us history, computer science, i.e. tasks beyond language understanding BigBench: - Description: 204 tasks from linguistic to biology to social bias to SWE. 3 different sizes HELM: holistic evaluation of language models - Description: improve transparency of models & offer guidance on which models perform well for specific tasks - Measures: measures 7 metrics across 16 scenarios - Metrics: Accuracy, Calibration, Robustness, Fairness, Bias, Toxicity, Efficiency

Answer 29

The size of the benchmark because it can incur significant inference costs, which means some benchmarks have various sizes to help ensure researchers continue to have access to run evals on models (their own & industry) The relevance of the benchmark to the task(s) you are attempting to have the model to be good at (reasoning, risks, etc.) Has the model seen the evaluations data during training - if so, it’s likely not a good measure of performance

Answer 30

Precision: - Analogy: I'm trying to catch Tuna via fishing, of all of the fish I catch in my net, what % do I get that are Tuna, i.e. how precise am I. - Phasing: "Of the things I predicted [as positive], the percent that are correct." - Focus: The items you selected or predicted as positive. - Question: How accurate were my positive predictions? (Minimizing False Positives) - Formal Equation: True Positives / (True Positives + False Positives) Recall: - Analogy: Of all of the Tuna in the lake, how many did I actually catch? -Phrasing: "Of the things that are correct [actually positive], the percent I predicted." Focus: The items that are actually positive in the whole dataset. Question: How many of the actual positives did I find? (Minimizing False Negatives) Formal Equations: True Positives / (True Positives + False Negatives)

Answer 31

The cost of fine-tuning an entire LLMs is high because it requires a lot of compute, & each different task to fine-tune on creates a large new version of the model to store & use for inference. Parameter-efficient fine-tuning, via freezing at least some of the model weights (often 80-100%), makes it more efficient to perform fine-tuning & have various versions to use for inference in different tasks PEFT also helps prevent catastrophic forgetting as a result of keeping most model weights frozen

Answer 32

- Parameter efficiency: i.e. parameter:training data ratio - Memory efficiency: i.e. freezing more weights vs. reparameterization more weights - Model performance: i.e. forgetting - Inference costs: i.e. adding weights at inference - Training speed: i.e. changing more weights

Answer 33

“Scaling Down to Scale Up: A Guide to Parameter Efficient Fine-tuning” Selective: identify a subset of model weights for tuning & freeze the rest - Can select specific model components, different layers, or specific parameter types - Benefits: N/A - Drawbacks: mixed results Reparameterization: create new low-rank representations of the original network weights (LoRA) Benefits: doesn't increase cost of inference Drawbacks: - Additive: Add new parameters to be able to train - Adopters: add new layers in encoder/decoder after the attention or feed-forward layers. - Soft-prompting: keep the model fixed but add layers focused on modifying the input in the prompt embeddings or keeping input fixed & retraining the input embeddings

Answer 34

Low Rank Adaptation is a technique used in fine-tuning, classified as reparameterization, that is quite popular. 1. The original model weights are frozen (Attention & Feed-forward layers) 2. Matrics that are much smaller (rank decomposition matrices) but multiply to be the same dimensions as the original parameters are randomly initialized & then trained during fine-tuning in conjunction with the frozen weights. 3. At the end of training, the low-rank matrices are multiplied to get the same size as the frozen parameters & then they are integrated into the original parameters by addition: original parameters + Low Rank Matrices weights (multiplied) 1. It’s super efficient for fine-tuning since often you only need to do this on the attention layers generally (where most weights live in LLMs), though it can also be done on the feedforward layers 2. Often can perform LoRA with a single GPU & avoid need for distributed training

Answer 35

The rank of a matrix essentially represents the number of linearly independent rows or columns it contains. It gives a sense of the "information content" within the matrix. A full-rank matrix has its maximum possible rank, while a low-rank matrix has a rank significantly lower than its dimensions. Rank decomposition is the process of expressing a matrix as a product of two or more matrices with lower ranks. If you have a large matrix A, you can decompose it into matrices B and C such that A = BC. The ranks of B and C are typically lower than the rank of A, hence the term "low-rank decomposition." Analogy: The Matrix as a Sign: Think of a matrix as a large, detailed sign you want to create. It has lots of intricate designs and elements. Rank as Essential Design Elements: The "rank" of the matrix is like the number of truly essential design elements you need to create that sign. If your sign is just a bunch of variations of a few basic shapes and colors, its "rank" is low. You don't need a lot of unique instructions to create it. But if your sign has a lot of unique details and complexity, its "rank" is high. Rank Decomposition as Using Stencils: "Rank decomposition" is like using stencils to make your sign. Instead of drawing every single detail by hand, you create a few simple stencils (these are your smaller, lower-rank matrices). Each stencil represents a basic pattern or element. Then, you overlay and combine these stencils in different ways to create the final complex sign. In this analogy: - The original, complex sign is the original matrix. - The simple stencils are the rank decomposition matrices. - The number of truly unique stencils you need is the rank of the original sign (matrix).

Answer 36

Matrix A: 8X64, Matrix: B: 512X8 86% reduction in parameters to train

Answer 37

You might want to fine-tune for each task (unique parameters & then at inference time switch them out based on the task of inference ~77% better than the pre-trained only version using ROUGE ~3% worse than a fully fine-tuned version (which is in most cases probably worth it)

Answer 38

Still an area of active research, but generally there seems to be a cliff above r=16, so the recommendation is r between 4 & 32.

Answer 39

A method of parameter efficient fine-tuning that is “Additive”, where additional tokens (20 to 100) are included in the input prompt with the same length as other tokens & randomly initialized. This is also known as prompt tuning. The weights are learned over time while the rest of the model weights are kept frozen, allowing for tokens that don’t actually correspond to language to be utilized in instruction-fine-tuning. Prompt engineering is about manipulating the words of the inputs prompt, while soft-prompting is about allowing the algorithm to learn representations that don’t correspond with actual input tokens. You can have different soft-prompt tokens for different tasks and then choose at inference which set to use.

Answer 40

Soft-prompting is effective when: - Computational resources for full fine-tuning are not available or not worth the cost (it can be 100X less expensive), soft prompting is usually 10k-100k parameters updated. - The model is of sufficient size, i.e. >10 million parameters typically - It becomes on-par with full fine-tuning >10 billion parameters - The interpretability of the input tokens, given they don’t correspond to language, is not critical

Answer 41

It’s difficult but the words with the closest representations within the space are generally considered to be similar concepts

Answer 42

Quantization Low Rank Adaptation - further reduce memory footprint

Answer 43

Typically LoRA. LoRA, at a very high level, allows the user to fine-tune their model using fewer compute resources (in some cases, a single GPU)

Answer 44

Reinforcement learning with human feedback is used to help align models to human values after fine-tuning, & it is a key lever in the responsible AI toolkit.

Answer 45

Using LLMs as reasoning engines that have access to tools (via APIs) like search to then choose what tools to use when vs. just using LLMs for fact generation. RAG: retrieval augmented generation to ground the LLM with specific context based on what’s been asked for.

Answer 46

Toxicity, Dangerous Information, Aggressiveness Honesty, Helpfulness, Harmlessness While fine-tuning often makes language more human-like, it might not make it honest, helpful, & harmless to the degree we need.

Answer 47

RLHF is a type of fine-tuning but generally happens after instruction fine-tuning It leads to better performance generally than pre-trained only or instruction fine-tuning alone.

Answer 48

Agent: the actor, which has a “policy” for how to navigate the environment which it can update by taking an “action”. Environment: what can be modified by the agent. The environment has a sate. Reward Function: the way the agent understands whether changes to the environment were good or bad. It progresses over time via taking somewhat random actions, which change the environment, & it gets feedback via the reward function, which it then uses to update it’s policy to take actions more aligned with the reward policy in the future. It starts out randomly & becomes more aligned through iteration.

Answer 49

Tic-tac-toe: - Agent: policy on how to play the game - Environment: the tic-tac-toe board - Objective: win the game - Reward: closer to winning the game? LLM: - Agent: Instruct LLM with the Policy being the LLM - Objective: generate aligned text - Environment: the LLM context window where the prompt can be entered - State: what’s contained in the current context window - Action: generating text, with the action space being the tokens i the vocabulary to choose from - How it generates text depends on the existing context & the probabilities it learned during training - Reward: often human feedback or derivation of human feedback - Humans reviewing output for a specific measure (honest, harmless, helpful) - OR a supervised model that’s been trained on human feedback to scale providing it - a.k.a. the reward model

Answer 50

Rollout for LLMs Playouts classically

Answer 51

1. Choose an instruction fine-tuned model that’s suitable for the task (often something that’s been trained on multiple relevant tasks & has some world knowledge) 2. Have the model generate based on a prompt multiple sets of completions for that prompt 3. Choose what you will have people evaluate the model for (helpfulness, harmlessness, honesty) 4. Have multiple people (based on the evaluation criteria) rank the completions for each prompt with different people ranking the same completions You want people to have diverse & representative skill sets & backgrounds to ensure the reward model trained as a result of their inputs covers your target measures in a well-rounded manner. The instructions they receive should be detailed & clear, otherwise raters may provide conflicting results on clear completions.

Answer 52

Rank Assess based on X, Y. You can use Z tool In case of a tie, do A For nonsensical completions do B

Answer 53

Model might provide the following responses: - “There is nothing you can do about hot houses” - “You can cool your house with air conditioning” - “It is not too hot” Criterion: helpfulness Rankings from 3 people: - Option 1: 2, 2, 2 - Option 2: 1, 1, 3 - Option 3: 3, 3, 1 The third labeler probably misunderstood the instruction

Answer 54

Ranks are turned into pairwise training data for the reward model Where each of the completions is “paired” with all of the other completions, so based on N completions, you’ll have N*N-1 pairs. Within each of those pairs, the completion that is preferred should have a 1 assigned to it and come first, & the non-preferred completion should have a 0 assigned to it and come second

Answer 55

To encode the feedback from humans and then take their place in the tuning process to provide that feedback at scale to tune the model Usually a language model, like BERT

Answer 56

It takes the possible completions as input, and outputs the preferred option via logits, then it minimizes the difference between the reward value of the preferred completion & the one chosen. Logits are the non-normalized precursor to the binary class prediction of the reward model The positive logit value is what’s used in the reward function & is a precursor to a probability (transformation occurs with softmax function application)

Answer 57

1. Pass a prompt to the instruct model to generate a completion 2. The reward model will evaluate the completion with a more positive score being better 3. The loss compared to the reward function will be evaluated 4. A reinforcement policy algorithm (proximal policy optimization) will give the feedback to the instruct model and adjust weights to tune it 5. the updated instruct model will generate a new more closely aligned completion Repeat until a threshold is met or number of maximum steps is reached

Answer 58

It is the reinforcement learning algorithm, that when paired with the rewards model loop can provide the reward policy (i.e. the LLM) with the updates required for it’s model weights to better align with the reward function

Answer 59

When a model in RLHF degrades it’s performance noticeably on the original task via learning to provide responses that maximizes the reward, even if it means its task completion performance suffers. A model trained to provide product reviews has RLHF for toxicity, it could go from saying: - This product is…[a dumpster fire] - This product is…[really the most awesome product ever] The value of the second completion generates a better reward even if it means the original task performance degrades

Answer 60

You keep a copy of the original model with frozen weights as a reference model & then compare the RLHF completions against these completions. The comparison is done via the KL Divergence metric, which is then incorporated in the RLHF process to keep it minimized to a certain degree so the model stays true to its original task

Answer 61

Kullback-Liebler divergence - a statistic comparison of how different two probability distributions are. It’s calculated across all of the tokens in the vocabulary for an LLM. Use a softmax function to reduce the number, but it’s quite computationally expensive so should be done on GPUs. KL Divergence penalty is added to the reward function. It is quite computationally expensive & requires keeping two LLMs in memory to accomplish, but if you use PEFT, it often means you don’t need two full LLMs as copies and instead can use adapters or other techniques to only keep 1 LLM in memory + the modified weights

Answer 62

Gathering the thousands/millions of pieces of human feedback to train the reward model. Reinforcement learning with AI feedback via Self-Supervision, based on constitutional AI. Proposed by Anthropic in 2022, constitutional AI is when an LLM is provided a set of prompts (rules) that make up its “constitution”, meant to help it govern values & make tradeoffs among them, like prioritizing harmlessness over helpfulness. Paper: “Constitutional AI: Harmlessness from AI Feedback” Part 1: supervised learning - fine-tune from self-critique & iterative response 1. Having an LLM provided with its constitution 2. A team trying to red-team the model under consideration by getting it to elicit bad responses (like “How do I make a bomb?”) 3. Have the model critique it’s own responses, using a constitution 4. Have the model regenerate the response based on it’s critique 5. Fine-tune the model based on that new response Part 2: Ask model which response was preferred, the original or the new response & use RL based on those pairs to encode LLM preferences

Answer 63

Model Distillation, Post-Training Quantization, Model Pruning Computational efficiency and storage for inference, via reducing model sizes. Model Distillation: most popular approach, where the fine-tuned LLM (Teacher) has it’s weights frozen, and create predictions for the training dataset. A student LLM then creates predictions and is trained via the “distillation loss” to learn the knowledge of the teacher LLM. Post-Training Quantization (PTQ): similar to how Quantization Aware Training, takes the model weights and converts them to a low precision storage, post-training quantization does the same thing but for inference. Model Pruning: weights that are close to 0 are pruned from the model, resulting in a smaller model & often improved performance (though usually this is minimally effective for LLMs)

Answer 64

Use Teacher LLM with frozen weights to generate labels from training data, a.k.a. soft labels Use Student LLM to generate predictions from training data, a.k.a. soft predictions Knowledge Distillation: distillation loss functions compares the probability distribution of teacher to student & tries to minimize the loss, however a Temperature parameter can be set where a higher value lets the student model learn to be more creative & a lower value <1 let’s it learn to be more exactly like the Teacher model Since we also have the “ground truth” data from the original training dataset, the student model also learns to predict “Hard predictions” which are compared against “hard labels” & the T parameter here always equals 1 and is not varied. Distillation is mostly beneficial for Encoder models where there is a solid representation generated & not decoder-only models

Answer 65

Post Training Quantization Take 32 bit float representations and often converts them to 8 bits Can be applied to both model weights & model activations (though activations tend to have a larger impact on performance)

Answer 66

Full-training PEFT/LoRA Post-training Usually has low impact on LLMs since it targets weights of 0

Answer 67

Struggle with - Out of date information - Math - since it’s just guessing the next token - Making up facts that aren’t actually true This can be mitigated via tool use to access: - Up to date information - Complete mathematical operations - Looking up facts via RAG for grounding things in sources

Answer 68

Retrieval Augment Generation for Knowledge Intensive NLP Tasks Encoded the user query → looked up against a vector database relevant information → with relevant information & user query, passed it to LLM for completion

Answer 69

Mitigates: Knowledge cutoffs & Hallucination Data must fit in the context window so documents often need to be broken up into chunks for embedding

Answer 70

A vector database is just a vector store (i.e. text + vector representation) where there is also a unique key for each vector representation This enables citation

Answer 71

A plan: it needs a set of actions/steps to follow Specific format for it’s output: i.e. like a SQL query Validate Actions: if it’s taking actions, it needs to have its outputs validated by the user to ensure it’s taking the correct actions

Answer 72

Via in context learning, provide examples where to get to an answer, the reasoning steps are broken out. This primes the model to do the same when producing its own results, leading to the LLM to perform more like a person when handling reasoning problems.

Answer 73

Using a program to enhance a language models ability to provided precise mathematical operationally dependent answers A model, via in-context-learning has Chain of Thought prompting demonstrated where the chain of thought is the pairing of breaking down the problem (in comments) to the associated code. A new question is put at the end of the example prompt to have the code steps generated for it The code steps are then passed to a python interpreter for computation The answer is then put with the PAL formatted prompt and passed back to the LLM for it to incorporate the answer in context

Answer 74

“ReAct: Synergizing Reasoning and Acting in Language Models” - Princeton & Google A framework to help LLMs plan out & execute more complex workflows. Combines CoT reasoning with action planning. - Question: what the model is asked - Thought: a reasoning step on how the model might address the question - Action: choosing from a set of actions it could take (lookup, search, or finish) - Observation: incorporate information from Action into the prompt Repeat Thought → Action → Observation sequence until model is confident it’s gotten the answer and chooses the “finish” action Benchmarks - HotPot QA: multi-step question answering from Wikipedia sources - Fever: uses Wikipedia passages to verify facts

Answer 75

Do X by using Thought, Action, and Observation steps. Thought can reason about the current situation & Action can be one of the following: 1 First action [Lookup] 2. Second action [Search] 3. Final [Finish] Here are some examples…

Answer 76

Allows breaking up LLM problems into different components, making implementing things like ReAct easier

Starter: GenAI Flashcards

(100 cards)