lecture 10 Flashcards

(29 cards)

1
Q

What major AI breakthrough occurred in November 2022?

A

The introduction of ChatGPT.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the fundamental technology behind ChatGPT?

A

The Transformer model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When was the Transformer model introduced?

A

In 2016.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What makes Transformer models powerful?

A

They use self-attention and scale effectively with large datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are sequence models used for?

A

Processing sequential data like language, time series, and speech.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are two common sequence models before Transformers?

A

Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a limitation of RNNs?

A

They require sequential processing, making them slow.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a limitation of CNNs for sequences?

A

They have a limited memory and cannot capture long-range dependencies effectively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What advantage does self-attention provide?

A

It allows parallel processing and captures long-range dependencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the basic function of self-attention?

A

Each output is computed as a weighted sum of all input values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How are self-attention weights determined?

A

They are computed dynamically from the input itself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the primary benefit of self-attention?

A

It captures dependencies between all elements in a sequence efficiently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the term ‘transformer’ refer to in deep learning?

A

A model architecture that relies on self-attention and feedforward layers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a key benefit of Transformers over RNNs?

A

Transformers allow parallel computation, reducing training time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What operation is central to self-attention?

A

Computing similarity between input elements to determine their importance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What mathematical operation is used in self-attention?

A

Dot-product attention.

17
Q

What mechanism normalizes attention weights?

A

The softmax function.

18
Q

What does the softmax function do in self-attention?

A

It converts raw scores into probabilities that sum to 1.

19
Q

What are the three main components of self-attention?

A

Query, Key, and Value matrices.

20
Q

What does the Query (Q) matrix represent?

A

It captures how much attention an input should give to others.

21
Q

What does the Key (K) matrix represent?

A

It determines how much an input should be attended to by others.

22
Q

What does the Value (V) matrix represent?

A

It holds the actual information that will be aggregated.

23
Q

What is scaled dot-product attention?

A

A modification of dot-product attention that scales down large values for stability.

24
Q

What is the benefit of multi-head attention?

A

It allows the model to focus on different aspects of the sequence simultaneously.

25
What is positional encoding in Transformers?
A technique to introduce order information into the input sequence.
26
Why is positional encoding necessary?
Because self-attention does not inherently preserve word order.
27
What type of functions are used for positional encoding?
Sine and cosine functions with different frequencies.
28
What is the role of feedforward layers in Transformers?
They apply transformations to each position independently after self-attention.
29
What is the key takeaway from Transformers?
They revolutionized sequence processing by enabling efficient parallelization and long-range dependencies.