lecture 10 Flashcards by V.I.N E.S.H

What major AI breakthrough occurred in November 2022?

The introduction of ChatGPT.

How well did you know this?

Not at all

Perfectly

What is the fundamental technology behind ChatGPT?

The Transformer model.

How well did you know this?

Not at all

Perfectly

When was the Transformer model introduced?

In 2016.

How well did you know this?

Not at all

Perfectly

What makes Transformer models powerful?

They use self-attention and scale effectively with large datasets.

How well did you know this?

Not at all

Perfectly

What are sequence models used for?

Processing sequential data like language, time series, and speech.

How well did you know this?

Not at all

Perfectly

What are two common sequence models before Transformers?

Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs).

How well did you know this?

Not at all

Perfectly

What is a limitation of RNNs?

They require sequential processing, making them slow.

How well did you know this?

Not at all

Perfectly

What is a limitation of CNNs for sequences?

They have a limited memory and cannot capture long-range dependencies effectively.

How well did you know this?

Not at all

Perfectly

What advantage does self-attention provide?

It allows parallel processing and captures long-range dependencies.

How well did you know this?

Not at all

Perfectly

What is the basic function of self-attention?

Each output is computed as a weighted sum of all input values.

How well did you know this?

Not at all

Perfectly

How are self-attention weights determined?

They are computed dynamically from the input itself.

How well did you know this?

Not at all

Perfectly

What is the primary benefit of self-attention?

It captures dependencies between all elements in a sequence efficiently.

How well did you know this?

Not at all

Perfectly

What does the term ‘transformer’ refer to in deep learning?

A model architecture that relies on self-attention and feedforward layers.

How well did you know this?

Not at all

Perfectly

What is a key benefit of Transformers over RNNs?

Transformers allow parallel computation, reducing training time.

How well did you know this?

Not at all

Perfectly

What operation is central to self-attention?

Computing similarity between input elements to determine their importance.

How well did you know this?

Not at all

Perfectly

What mathematical operation is used in self-attention?

Study These Flashcards

Dot-product attention.

What mechanism normalizes attention weights?

Study These Flashcards

The softmax function.

What does the softmax function do in self-attention?

Study These Flashcards

It converts raw scores into probabilities that sum to 1.

What are the three main components of self-attention?

Study These Flashcards

Query, Key, and Value matrices.

What does the Query (Q) matrix represent?

Study These Flashcards

It captures how much attention an input should give to others.

What does the Key (K) matrix represent?

Study These Flashcards

It determines how much an input should be attended to by others.

What does the Value (V) matrix represent?

Study These Flashcards

It holds the actual information that will be aggregated.

What is scaled dot-product attention?

Study These Flashcards

A modification of dot-product attention that scales down large values for stability.

What is the benefit of multi-head attention?

Study These Flashcards

It allows the model to focus on different aspects of the sequence simultaneously.

What is positional encoding in Transformers?

A technique to introduce order information into the input sequence.

Why is positional encoding necessary?

Because self-attention does not inherently preserve word order.

What type of functions are used for positional encoding?

Sine and cosine functions with different frequencies.

What is the role of feedforward layers in Transformers?

They apply transformations to each position independently after self-attention.

What is the key takeaway from Transformers?

They revolutionized sequence processing by enabling efficient parallelization and long-range dependencies.

lecture 10 Flashcards

(29 cards)