Bare Min Must Know Flashcards by Christianna Clark

What is Top-P?

How well did you know this?

Not at all

Perfectly

What is Top-K?

How well did you know this?

Not at all

Perfectly

What is Temperature?

How well did you know this?

Not at all

Perfectly

What is Chain of Thought Prompting

How well did you know this?

Not at all

Perfectly

What is Least to Most Prompting?

How well did you know this?

Not at all

Perfectly

What is Self-ask prompting?

How well did you know this?

Not at all

Perfectly

ReAct prompting

How well did you know this?

Not at all

Perfectly

Iterative Prompting

See https://cobusgreyling.medium.com/12-prompt-engineering-techniques-644481c857aa to fill in for prompts

How well did you know this?

Not at all

Perfectly

How do you mitgate latency in GenAI?

On the model side: Knowledge Distillation, Quantization.

Note 4bit Quantization compresses parameters, and sometimes intermediate calculations from high-precision numbers like 32 bit floats to 4bit. This can reduce the model size significantly.

On the token processing side:
Parallel processing of tokens, caching frequently generated tokens

How well did you know this?

Not at all

Perfectly

What is Grounding?

It’s a way to keep the LLM on track of the “story” we’re trying to tell it helps the model remember why we’re working on the problem.

How well did you know this?

Not at all

Perfectly

How does grounding work?

Similar to RAG–there is a retriever based on relevant documents given the user input.

How well did you know this?

Not at all

Perfectly

Difference between RAG and Grounding

How well did you know this?

Not at all

Perfectly