Large Language Models Flashcards by Audrey Balas

How are people at predicting information based on faces?

Election example

-People exposed to two face for a fraction of a second and asked to predict which one they think will win a political election
one they think will win
-Doesn’t mater how long you look at that face
-Results were similar for 100 ms, 250 ms, and unlimited time
-About 55% of people accurately guessed based on a fraction of a second

Conducted a similar experiment for who is more likely to be promoted to a general; Positive correlation between facial dominance score and percentage of people who thought they would become general

How well did you know this?

Not at all

Perfectly

What is a loss function desgined for in a LLM?

for a specific task

How well did you know this?

Not at all

Perfectly

What is the primary critique of AI?

Many AI models are specifically designed for one specific application, although the media makes it seems very close to human cognition (still very far)

How well did you know this?

Not at all

Perfectly

When was the first time AI wasn’t designed for one thing?

ChatGPT

How well did you know this?

Not at all

Perfectly

Artificial General Intelligence

A very general mental capability that involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas

How well did you know this?

Not at all

Perfectly

What is the foundation model?

-Not trained to do one thing, it is trained to act like AGI
-Trained by one loss function applied to all these tasks

How well did you know this?

Not at all

Perfectly

What is the master loss function for AGI and how do you build it?

Next word prediction; built by turning words into numbers

How well did you know this?

Not at all

Perfectly

What is the input and output in a LLM?

Beginning of a sentence in input, the word it predicts is the output

How well did you know this?

Not at all

Perfectly

Explain the temeprature analogy for LLMs

Low temp means the system is very probabilistic
High temperature means the model behaves very randomly
If the temp is 0, meaning no randomness, and you give the same prompt, the model will give the same answer

How well did you know this?

Not at all

Perfectly

What are the neurons in a LLM?

Each neuron corresponds to each word in the dictionary

How well did you know this?

Not at all

Perfectly

Explain how temperature affects the LLM

-User sets the temperature in the code
-Model will give a probablity for the next word
-If temp > 0, meaning there is some randomness, the model will generally pick the words with high probability but low probability words also have a chance

There is an upper limit on the temp. When it’s at the highest limit, the model will just spit out random words

How well did you know this?

Not at all

Perfectly

Explain how a LLM works

The first word is generated from a prompt, then it is sent through the model again; after each word is added the response is sent back through the model

E.g. if it gives you an answer of 10,000 words, the response was run through the model 10,000 times
It bypasses how we acquire knowledge and goes directly to the outcome of our collective intelligence

How well did you know this?

Not at all

Perfectly

Explain how LLM uses vectors.

-Each word is converted into a vector (each word is represented by a a long list of zeroes, except for one spot that is one; the position where 1 is located correspond to each word)

-Like an image is made of pixels, the entire sentence is just a stack of vectors, using one hot representation

-the one hot representation is the input and is condensed into a much shorter code (this code is embedded in the middle)

-Then you make a prediction; when looking at a sentence, you look at the words before and after the highlighted word, allowing you to make a pair of data — given the current word, what should the previous word be?

If the dictionary contained 500,00 words, there’d need to be 499,999 0s in the vector

How well did you know this?

Not at all

Perfectly

*How are LLMs trained?

Gather his data from all sentences that are available to it (e.g. journal articles, news articles)
This is the training data and the ground truth
You ask the model to make a prediction on the next word and then compare it to the ground truth
Once the neural network is trained and transforms each one hot representation into a code

Unsupervised learning because the data itself is its own label

How well did you know this?

Not at all

Perfectly

*What is the input layer in a LLM?

One word from the one hot representation is is the input layer, not multiple words

i.e. also the embedding

How well did you know this?

Not at all

Perfectly

What is the ouput in a LLM?

Study These Flashcards

The probability of each word based on the training data

Will adjectives have a similar code?

Study These Flashcards

Yes, the code will be similar although the input layer will be very different

E.g. “tired” and “exhausted” will appear completely different in the input layer and have a different one hot representation but the code will actually be very similar because they are surrounded by similar words in sentences

Vector

Study These Flashcards

an arrow starting from the origin, pointing to the coordination

-Traditional AI would search the dataset end the efficiency of AI was based on how fast it could search. Now, instead of searching through the data set, you need embedding of the vectors

This is how generative AI views language

Basics of vector math

Study These Flashcards

-you can subtract two vectors to get another vector
-e.g. C - A = B

e.g. China - Beijing = Russia - Moscow

How is the position of the vector determined?

Study These Flashcards

by their embedding (the x and y coordinates)

What was originally the purpose of using language in neural networks?

Study These Flashcards

For computer vision

Language, vectors, and bias

Study These Flashcards

The patterns of how words are associated with each other reveals a lot about human society

The bias is not in the research or the algorithm, it’s in the use of our language

E.g. computer programmer - man + woman = homemaker

How does a LLM process sentences?

Study These Flashcards

When the LLM processes a prompt, it sees the entire prompt all at once. It is not like reading for humans where we process sentences word by word

*Makes the computation much faster
*Complexity of document doesn’t make a difference to a LLM as long as they’re the same length (e.g. a children’s book vs a quantum physics textbook are processed the same)

What is a qualitative difference between human thinking and next word prediction?

Study These Flashcards

AI prioritizes satisfying the request at the expense of making sense

E.g. Prompt: write a poem where the last sentence is the first sentence but with the words in reverse order

What is the 20 question game and how does it relate to AI?

-A classic simple guessing game where only player thinks of an object and other ask up to 20 yes or no questions to identify the object -Key for doing this is is that you have to hold the object in you mind -You can play the game with GPT (but GPT can’t hold the object in a mind) -to play the game, it relies on the conversation history, but if you refresh, it no longer holds th image -But if you just keep asking it randomly without refreshing, the answers will be consistent ## Footnote The model determined that the user was trying to figure out if it could hold an object

Where does the code come from when generating images by prompts?

The code comes from the long prompt You extract the code from the sentence and then throw the code into the image generator

How do you generate an image from a prompt?

-Image is put through encoder to generate a image code -Text is put through an encoder to generate a code -The code generated from these two should be similar because they are representing the same thing -You can add a image projector and a text projector to embed jointly -The final product should be the same -If they’re different there’s an error and you train the neural network more (backpropagation) -In this method there is no ground truth, you just want them to be identical

Large Language Models Flashcards

(27 cards)