Models & Architecture Flashcards

(40 cards)

1
Q

What is a vector

A

It is a mathematical representation of a word or sequence of words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Is genai model a neural network

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an autoencoder

A

It is a type of neural network used for unsupervised learning, and it plays an important role in Generative AI (GenAI). Its primary purpose is to learn efficient, low-dimensional representations of input data (encoding) and then reconstruct the original data from these representations (decoding). Here’s an explanation of its structure and role in GenAI:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an Encoder:

A

Transforms the input data into a compressed, lower-dimensional latent representation.
For example, an image might be reduced from a high-dimensional array to a compact vector.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an Decoder

A

Reconstructs the original data from the compressed latent representation.
It attempts to recreate the input as accurately as possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Autoencoders are foundational in Generative AI for tasks such as:

A

Image Generation: VAEs are used to generate realistic images by learning latent representations of image datasets.

Data Augmentation: Autoencoders can generate variations of existing data to improve training in machine learning models.

Anomaly Detection: In applications like fraud detection or medical imaging, autoencoders are used to flag data points that differ significantly from learned patterns.

Style Transfer: Latent representations from autoencoders can be manipulated to alter attributes like style, color, or texture in images.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

GANs are composed of two models, such as:

A

Generator (it generates images to trick the discriminator) and Discriminator (accurately predicts if the image is real)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Classification in terms of vectors

A

refers to the process of categorizing data points represented as vectors into predefined groups or classes. In machine learning and data science, this involves mathematical techniques to determine which class a given vector belongs to based on its features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Normalization in the context of Generative AI (GenAI)

A

is the process of adjusting or scaling data to improve the performance, stability, and generalization capabilities of models. This concept applies across various stages of GenAI, from preprocessing input data to normalizing intermediate representations within the model itself. Here’s how normalization is relevant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two main features of presenting Data as Vectors

A

Each data point is represented as a vector in a high-dimensional space.

For example, a vector
𝑥 = [x1 ,x 2, …x n​ ]
could represent features like height, weight, and age for a classification task.
The dimensions of the vector correspond to the number of features in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why Normalize Inputs?

A

Prevents certain features with large values from dominating the training process.
Speeds up convergence by stabilizing the optimization process.
Reduces the likelihood of numerical instability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are features

A

Features are the measurable properties of data that describe its underlying characteristics. In GenAI, these features are often represented as vectors to serve as input to or output from models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Cosine distance

A

is a measure used in mathematics and machine learning to calculate the dissimilarity between two vectors. It is derived from the cosine similarity, which measures how similar two vectors are based on the cosine of the angle between them. Cosine distance is used when we are more interested in the “direction” of the vectors rather than their magnitude.

1 - A * B / (|A| |B|)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Cosine Similarity

A

The cosine similarity between two vectors
A and B is defined as:

CosineSimilarity =
cos(𝜃) = 𝐴 ⋅ 𝐵 / ∥𝐴∥ ∥𝐵∥

Where:
A⋅B is the dot product of A and B.
∥A∥ and ∥B∥ are the magnitudes (Euclidean norms) of the vectors.
The result ranges from:

+
1
+1: Perfectly similar (pointing in the same direction).
0
0: Completely orthogonal (no similarity).

1
−1: Perfectly opposite directions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is word embedding

A

It is a technique used in natural language processing (NLP) to represent words or phrases as dense vectors of numbers. These vectors capture the semantic meaning of words by placing similar words closer together in the embedding space. Word embeddings are foundational for many NLP tasks, including language modeling, machine translation, and sentiment analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the cosine similarity for vector A = [1,0,-1] and B = [0,1,-1]

A

Dot Product:
𝐴 ⋅𝐵 = A⋅B=(1×0)+(0×1)+(−1×−1)=0+0+1=1

Magnitudes:

∥A∥= (1 power2 +0 power 2 +(−1) power2 ) power 1/2 = 2
∥B∥= (0 power 2 +1 power2 +(−1) power 2 ) power 1/2 = 2

Cosine Similarity:

cos(θ)= 1 / (2 power 1/2 * 2 power 1/2) = 0.5

Cosine distance : 1 - 0.5 = 0.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Word2Vec is supported by

A

Neuronetworks

18
Q

Word2Vec is a popular algorithm introduced by Google, which uses two main approaches:

A

Continuous Bag of Words (CBOW): Predicts a word given its surrounding context.
Skip-Gram: Predicts the surrounding words given a specific word.

19
Q

What is a Context Window?

A

A context window defines the span of words around a target word that are used during training to learn its embedding. For example, in the sentence:

“Thequickbrownfoxjumpsoverthelazydog.”
“Thequickbrownfoxjumpsoverthelazydog.”
If the target word is “fox”, and the context window size is 2, the words “quick”, “brown”, “jumps”, and “over” are included in the context window.

20
Q

What is supervise learning

A

This is when the labels are provided

21
Q

What are the components of a network

A

Input
Hidden layer
Output layer
The work to be predicted

22
Q

To multiply matrices they have to the have the following structure

A

matrix 1 = [1 * 5 ] * [5 * 1] = [1 * 3]

23
Q

What is The Continuous Bag of Words (CBOW) model

A

It is a method for generating word embeddings, introduced as part of the Word2Vec framework by Google. In CBOW, the goal is to predict a target word based on its surrounding context words. It is an efficient approach for learning dense, low-dimensional vector representations of words.

24
Q

The CBOW Model Overview

A

Input: A context window of words surrounding a target word.
Output: The target word.
Objective: Minimize the error in predicting the target word given its context words.

25
How CBOW Works
Define the Context Window: Choose a window size 𝑐 c that specifies how many words before and after the target word are considered. For example, if 𝑐 = 2 c=2, and the sentence is: "The quick brown fox jumps" "The quick brown fox jumps" For the target word "brown," the context is: ["The", "quick", "fox", "jumps"]. One-Hot Encoding of Words: Represent each word in the vocabulary as a one-hot vector, where the dimension equals the size of the vocabulary. For example, if the vocabulary size is 10,000, each word is a vector of length 10,000 with a single 1 indicating the word. Input Representation: Compute the average of the one-hot encoded vectors of the context words. This represents the input for the model. Hidden Layer Transformation: Pass the input through a hidden layer with weights 𝑊 W, resulting in a lower-dimensional dense vector representation. This is the embedding space. Mathematically: ​ is the one-hot encoded context word vector. Output Layer and Softmax: The hidden layer vector is multiplied by another set of weights to produce a score for each word in the vocabulary. Apply the softmax function to convert these scores into probabilities, representing the likelihood of each word being the target. Prediction and Loss Calculation: The target word's probability is compared to the actual target word using a loss function (typically cross-entropy loss). The model adjusts weights using backpropagation to reduce the loss. Training: Repeat the process for every word in the corpus across multiple epochs. The embeddings are refined iteratively until the model converges.
26
Whow the word embedding works for "I love machine learning models"
Context window size 𝑐 = 2 Target: "machine". Context: ["I", "love", "learning", "models"]. The process: Convert context words ("I", "love", "learning", "models") to one-hot vectors. Average the one-hot vectors. Pass the averaged vector through the hidden layer to obtain a dense embedding. Use the output layer to predict "machine."
27
what is the main purpose of neuro networks in word emb4edding contexct
They assist in the understanding of weights
28
What is Self-Attention?
Self-attention computes the importance of each word in a sequence relative to every other word in the same sequence. Unlike static word embeddings (e.g., Word2Vec, GloVe), self-attention produces contextualized embeddings, where a word’s representation changes depending on the surrounding context.
29
What is the context of self attention on these two sentences: "The bank of the river was serene." "She deposited money at the bank."
The word "bank" has different meanings. Using self-attention, the model considers surrounding words like "river" (sentence 1) or "money" (sentence 2) to generate contextually relevant embeddings.
30
What is Similarity values
quantify how closely two data points (e.g., words, sentences, or other entities) resemble each other. These values are central to many machine learning, natural language processing (NLP), and information retrieval tasks. They are calculated using similarity metrics, which measure the relationship between two vectors, sets, or other data representations.
31
How can we write down the similarity values of She wore a ring
Ering = s1 * Eshe + s2 * Ewore + s2 * a + s3 * E ring
32
Similarities are calculating by using
cos teta
33
each word in word embedding plays different roles such as
query, key and value
34
The combination of Q and Key form
The dot product = QK (similarities scores)
35
What is the purpose of softmax
to add up the similarities to amount to 1
36
The softmax end product matrix and the values will form
the embedding matrix
37
What is the The Transformer model
is a deep learning architecture introduced by Vaswani et al. in their 2017 paper "Attention is All You Need." It revolutionized natural language processing (NLP) and machine learning tasks by introducing the self-attention mechanism, which allows the model to weigh the importance of different input elements dynamically.
38
What are the key features of the Transformer Model
Self-Attention Mechanism: It enables the model to capture relationships between words in a sequence regardless of their distance from each other. Each word (or token) attends to every other word in the sequence, creating context-aware representations. Positional Encoding: Since Transformers do not process data sequentially (like RNNs), positional encodings are added to input embeddings to retain order information. Encoder-Decoder Structure: Encoder: Processes the input sequence and generates a contextual representation for each token. Decoder: Uses the encoder's outputs and prior predictions to generate the target sequence. Multi-Head Attention: Multiple attention mechanisms run in parallel, allowing the model to focus on different aspects of the sequence simultaneously. Feedforward Layers: After the attention mechanism, each token's representation passes through a feedforward neural network to enhance its expressiveness. Layer Normalization and Residual Connections: Help stabilize training and enable deep architectures by preventing vanishing/exploding gradient problems. Scalability and Parallelization: Unlike RNNs, Transformers process input sequences in parallel, leading to significant improvements in training speed.
39
What is an autoencoder
It is a neural network that learns to compress and reconstruct input data
40