lesson 2 Flashcards

1
Q

Describe in 2 sentences how a LLM like ChatGPT generates text

A

prediction of next token conditional on context, recursively

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In the expression P(xn|xn−1, . . . xn−k), identify
(a) the next token, (b) the context and (c) give an expression for the context length.

A

(a) The next token is represented by xn.
b) The context includes xn−1, xn−2, …, xn−k, contextual information that the model uses to predict xn.
(c) The expression for the context length is k, as indicated by the subscripts in the expression. The context length k represents the number of preceding tokens that are considered when predicting the next token xn.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

An english text has a length of 3000 words.
(a) Calculate approximately how many tokens will represent the text.
(b) If the text is translated to Italian, will the number change? how?

A

considering that Average EN 1.3 tokens/word, a text of 3000 words: 1.3 * 3000 = 3900.
probably yes, because we have considering the avarage of english word as 1.3. the italian is more.

yes, higher number,
2* 3000 = 6000

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Approximately which Top-p value will you select to generate . . .
(a) a poem and (b) a technical description

A

0,9

0,4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

State the elements of the P.R.O.P.E.R. framework.

A

Persona – which role should it take? E.g. professor, helpful assistant, critic, . . .

Request – what task should it should fulfill?
Operation – in which way / using which method?

Presentation – which tone/style/format for the result? E.g. informal, short, table…

Examples – provide a template for the output.

Refinement – give feedback, iterate and imp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which tasks will ChatGPT 3.5 (free version) be able to fulfill?

(a) Write a summary of the 1400-page bestseller classic “War and Peace” by Leo Tolstoy

(b) Integrate x2dx

(c) Summarize a 10-page PDF document (you have copied and pasted the text)

(d) Create a detailed plan for a friend’s wedding

(e) Summarize the political events of last year

(f) Propose a balanced portfolio of US stocks

A

no

yes

no

yes

no

yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are n-grams?

A

Direct sequence of n words in a text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is an LLM?

A

Large Language Models:
Statistical model to predict the next token, recursively

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a token?

A

Important unit of account for LLMs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

LLMs can . . . (at least 3 things)

A

▶ Produce convincing text
▶ Incorporate provided information (in-context learning)
▶ Reproduce standard facts / textbook knowledge
▶ Transform and translate information
▶ Present output in many forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

LLMs cannot . . . (at least 3)

A

▶ Analyze a problem like humans do
▶ Understand your cultural/implicit context
▶ Know everything (rare facts, news)
▶ Run code, reason logically, web access, symbolic calculations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Procedure of Estimating P()

A

Procedure:
1 Data: entire Wikipedia, Stackexchange, Github (40TB for GPT4)
2 Tokenization
3 Pretraining: token prediction (masking) ← gradient descent
4 Finetuning: specific datasets or tasks
5 Parameter adjustment: based on evaluation of (3) and (4)
6 Iterate: repeat steps (3) to (5) until model converges

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

“Top K”

A

= set of K tokens with largest cumulative probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

▶ “Top p” (nucleus)

A

= smallest set of tokens with P pi > pthreshold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Context length of ChatGPT 3.5 e 4.0:

A

▶ ChatGPT 3.5 – 4096 tokens
▶ ChatGPT 4.0 – 8192 tokens

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
A