Bare Min Must Know Flashcards

1
Q

What is Top-P?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Top-K?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Temperature?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Chain of Thought Prompting

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Least to Most Prompting?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Self-ask prompting?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

ReAct prompting

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Iterative Prompting

A

See https://cobusgreyling.medium.com/12-prompt-engineering-techniques-644481c857aa to fill in for prompts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you mitgate latency in GenAI?

A

On the model side: Knowledge Distillation, Quantization.

Note 4bit Quantization compresses parameters, and sometimes intermediate calculations from high-precision numbers like 32 bit floats to 4bit. This can reduce the model size significantly.

On the token processing side:
Parallel processing of tokens, caching frequently generated tokens

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Grounding?

A

It’s a way to keep the LLM on track of the “story” we’re trying to tell it helps the model remember why we’re working on the problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does grounding work?

A

Similar to RAG–there is a retriever based on relevant documents given the user input.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Difference between RAG and Grounding

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly