Large Language Models Flashcards

(27 cards)

1
Q

How are people at predicting information based on faces?

Election example

A

-People exposed to two face for a fraction of a second and asked to predict which one they think will win a political election
one they think will win
-Doesn’t mater how long you look at that face
-Results were similar for 100 ms, 250 ms, and unlimited time
-About 55% of people accurately guessed based on a fraction of a second

Conducted a similar experiment for who is more likely to be promoted to a general; Positive correlation between facial dominance score and percentage of people who thought they would become general

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a loss function desgined for in a LLM?

A

for a specific task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the primary critique of AI?

A

Many AI models are specifically designed for one specific application, although the media makes it seems very close to human cognition (still very far)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When was the first time AI wasn’t designed for one thing?

A

ChatGPT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Artificial General Intelligence

A

A very general mental capability that involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the foundation model?

A

-Not trained to do one thing, it is trained to act like AGI
-Trained by one loss function applied to all these tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the master loss function for AGI and how do you build it?

A

Next word prediction; built by turning words into numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the input and output in a LLM?

A

Beginning of a sentence in input, the word it predicts is the output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain the temeprature analogy for LLMs

A

Low temp means the system is very probabilistic
High temperature means the model behaves very randomly
If the temp is 0, meaning no randomness, and you give the same prompt, the model will give the same answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the neurons in a LLM?

A

Each neuron corresponds to each word in the dictionary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain how temperature affects the LLM

A

-User sets the temperature in the code
-Model will give a probablity for the next word
-If temp > 0, meaning there is some randomness, the model will generally pick the words with high probability but low probability words also have a chance

There is an upper limit on the temp. When it’s at the highest limit, the model will just spit out random words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain how a LLM works

A

The first word is generated from a prompt, then it is sent through the model again; after each word is added the response is sent back through the model

E.g. if it gives you an answer of 10,000 words, the response was run through the model 10,000 times
It bypasses how we acquire knowledge and goes directly to the outcome of our collective intelligence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain how LLM uses vectors.

A

-Each word is converted into a vector (each word is represented by a a long list of zeroes, except for one spot that is one; the position where 1 is located correspond to each word)

-Like an image is made of pixels, the entire sentence is just a stack of vectors, using one hot representation

-the one hot representation is the input and is condensed into a much shorter code (this code is embedded in the middle)

-Then you make a prediction; when looking at a sentence, you look at the words before and after the highlighted word, allowing you to make a pair of data — given the current word, what should the previous word be?

If the dictionary contained 500,00 words, there’d need to be 499,999 0s in the vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

*How are LLMs trained?

A

Gather his data from all sentences that are available to it (e.g. journal articles, news articles)
This is the training data and the ground truth
You ask the model to make a prediction on the next word and then compare it to the ground truth
Once the neural network is trained and transforms each one hot representation into a code

Unsupervised learning because the data itself is its own label

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

*What is the input layer in a LLM?

A

One word from the one hot representation is is the input layer, not multiple words

i.e. also the embedding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the ouput in a LLM?

A

The probability of each word based on the training data

17
Q

Will adjectives have a similar code?

A

Yes, the code will be similar although the input layer will be very different

E.g. “tired” and “exhausted” will appear completely different in the input layer and have a different one hot representation but the code will actually be very similar because they are surrounded by similar words in sentences

18
Q

Vector

A

an arrow starting from the origin, pointing to the coordination

-Traditional AI would search the dataset end the efficiency of AI was based on how fast it could search. Now, instead of searching through the data set, you need embedding of the vectors

This is how generative AI views language

19
Q

Basics of vector math

A

-you can subtract two vectors to get another vector
-e.g. C - A = B

e.g. China - Beijing = Russia - Moscow

20
Q

How is the position of the vector determined?

A

by their embedding (the x and y coordinates)

21
Q

What was originally the purpose of using language in neural networks?

A

For computer vision

22
Q

Language, vectors, and bias

A

The patterns of how words are associated with each other reveals a lot about human society

The bias is not in the research or the algorithm, it’s in the use of our language

E.g. computer programmer - man + woman = homemaker

23
Q

How does a LLM process sentences?

A

When the LLM processes a prompt, it sees the entire prompt all at once. It is not like reading for humans where we process sentences word by word

*Makes the computation much faster
*Complexity of document doesn’t make a difference to a LLM as long as they’re the same length (e.g. a children’s book vs a quantum physics textbook are processed the same)

24
Q

What is a qualitative difference between human thinking and next word prediction?

A

AI prioritizes satisfying the request at the expense of making sense

E.g. Prompt: write a poem where the last sentence is the first sentence but with the words in reverse order

25
What is the 20 question game and how does it relate to AI?
-A classic simple guessing game where only player thinks of an object and other ask up to 20 yes or no questions to identify the object -Key for doing this is is that you have to hold the object in you mind -You can play the game with GPT (but GPT can’t hold the object in a mind) -to play the game, it relies on the conversation history, but if you refresh, it no longer holds th image -But if you just keep asking it randomly without refreshing, the answers will be consistent ## Footnote The model determined that the user was trying to figure out if it could hold an object
26
Where does the code come from when generating images by prompts?
The code comes from the long prompt You extract the code from the sentence and then throw the code into the image generator
27
How do you generate an image from a prompt?
-Image is put through encoder to generate a image code -Text is put through an encoder to generate a code -The code generated from these two should be similar because they are representing the same thing -You can add a image projector and a text projector to embed jointly -The final product should be the same -If they’re different there’s an error and you train the neural network more (backpropagation) -In this method there is no ground truth, you just want them to be identical