Math Flashcards

Basic math questions for ML theory quiz (12 cards)

1
Q

Dot Product

What’s the geometric interpretation of the dot product of two vectors?

A

Projection: The dot product of A and B is equal to the magnitude of A multiplied by the projection of B onto A (or vice versa). The projection of one vector onto another is the length of the shadow cast by one vector onto the other when the light source is perpendicular to the vector being projected upon.

Angle: The dot product also reveals information about the angle between the two vectors:

  • If the dot product is positive, the angle between the vectors is acute (less than 90 degrees).
  • If the dot product is zero, the vectors are orthogonal (perpendicular) to each other (angle is 90 degrees).
  • If the dot product is negative, the angle between the vectors is obtuse (greater than 90 degrees).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Dot Product

Given a vector , find vector of unit length such that the dot product of it and given vector and is maximum.

Assume any vector

A

tldr: It’s the normalized vector itself.

The dot product of two vectors is maximized when the vectors point in the same direction.

Since we’re looking for a unit vector (a vector with length 1), we want to find a vector that has the same direction as the given vector but with a magnitude of 1.

Let the given vector be v. Here’s the process to find the unit vector u that maximizes the dot product with v:

Normalize:
Divide each component of v by its magnitude (length):
u = v / ||v||
where ||v|| is the magnitude of v, calculated as:
||v|| = sqrt(v₁² + v₂² + ... + vₙ²)

Magnitude: Dividing a vector by its magnitude scales it down (or up, if the magnitude is less than 1) to have a length of 1. This ensures that the resulting vector u is a unit vector.
Direction: Dividing a vector by a scalar (its magnitude) doesn’t change its direction. Therefore, u retains the same direction as v.

Key Points

The dot product of a vector with itself is the square of its magnitude.
The maximum value of the dot product of a vector with a unit vector is the magnitude of the vector.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Outer product/Cross Product

Given two vectors and . Calculate the outer product ?

A

There are a few ways to calculate the cross product:

Determinant Method (Most common)

If a = (a₁, a₂, a₃) and b = (b₁, b₂, b₃), the cross product is:

a × b = |  i   j   k  |
        | a₁  a₂  a₃ |
        | b₁  b₂  b₃ |

where i, j, and k are the unit vectors along the x, y, and z axes respectively.

Expand this determinant to get the resulting vector’s components.

Algebraic Formula

a × b = (a₂b₃ - a₃b₂) i - (a₁b₃ - a₃b₁) j + (a₁b₂ - a₂b₁) k

Geometric Interpretation
Perpendicular Vector:
The resulting vector from the cross product is perpendicular (orthogonal) to both of the original vectors
𝐴
A and
𝐵
B. This is useful in determining the normal vector to a plane defined by two vectors.

Computer Graphics:
* Surface Normals: Determining the direction a surface is facing.
* Lighting: Calculating how light reflects off of surfaces.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Give an example of how the outer product can be useful in ML.

A

Feature Interaction in Polynomial Regression
In machine learning, the outer product can be used to capture interactions between features, which is crucial for polynomial regression and higher-order feature interactions.

Context
Suppose you are working on a regression problem where you suspect that the interaction between features plays a significant role in predicting the target variable. For example, in a housing price prediction model, the interaction between the number of bedrooms and the size of the house might be important.

  1. Get Input Features
  2. Compute the Outer Product:
  3. Feature Engineering:
    You can use the elements of this outer product matrix as new features in your regression model.

Benefits
Capturing Non-linear Relationships: By including interaction terms and higher-order terms, the model can capture non-linear relationships between features that a simple linear model might miss.
Enhanced Predictive Power: Incorporating these interactions can lead to better model performance, especially when the target variable depends on the combined effect of multiple features.
General Applications in ML
Neural Networks:

Bilinear Layers: In certain types of neural networks, bilinear layers use the outer product to model interactions between different sets of features or embeddings. This is common in tasks like relation extraction in natural language processing.
Tensor Factorization:

Recommendation Systems: Tensor factorization techniques decompose higher-order interaction tensors into factor matrices, which can be used to predict missing entries in recommendation systems.

Covariance Matrices:
The outer product is used to compute covariance matrices in statistical analysis and principal component analysis (PCA). The covariance matrix provides insights into the linear relationships between features.

Important Considerations

While powerful, the outer product also has some limitations:

High Dimensionality: The resulting matrix can be very large, especially for high-dimensional embeddings, leading to computational challenges.

Sparsity: Many elements in the outer product matrix may be zero or near-zero, making storage and computation inefficient.

Interpretation: The meaning of individual elements in the outer product matrix can be difficult to interpret directly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does it mean for two vectors to be linearly independent?

A

Two vectors are considered linearly independent if neither of them can be expressed as a scalar multiple of the other. In more formal terms:

Linear Independence

Vectors u and v are linearly independent if the only solution to the equation:

c₁u + c₂v = 0
is the trivial solution where c₁ = 0 and c₂ = 0.

Geometric Interpretation

Geometrically, two linearly independent vectors in a 2D space are not parallel, and in a 3D space, they do not lie on the same plane.

Alternative Explanations

Spanning: Two linearly independent vectors span a plane in 3D space or the entire space in 2D space.
Determinant: If you arrange the two vectors as rows (or columns) of a matrix, the determinant of that matrix will be non-zero if the vectors are linearly independent.
Unique Representation: Any vector in the space spanned by two linearly independent vectors can be uniquely represented as a linear combination of those two vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is Linear Independence Important?

A

Basis of a Vector Space: A set of linearly independent vectors that span a vector space form a basis for that space. This allows us to represent any vector in the space as a unique linear combination of the basis vectors.

Solving Systems of Equations: Linear independence of vectors is crucial for determining the existence and uniqueness of solutions to systems of linear equations.

Dimension of a Vector Space: The number of linearly independent vectors in a basis determines the dimension of a vector space.

Machine Learning: Linear independence is used in various machine learning techniques, such as principal component analysis (PCA) for dimensionality reduction.

Feature Selection:
In machine learning, having linearly independent features is crucial for building effective models. If features are linearly dependent (i.e., one feature can be expressed as a linear combination of others), it can lead to multicollinearity, which can negatively impact the performance of models.

Multicollinearity: When features are highly correlated, it becomes difficult to determine the individual effect of each feature on the target variable. This can make the model coefficients unstable and reduce interpretability.

Dimensionality Reduction:
Techniques like Principal Component Analysis (PCA) rely on linear independence to identify the most important directions (principal components) in the data. These principal components are linearly independent and capture most of the variance in the dataset, enabling dimensionality reduction without significant loss of information.

Covariance Matrix
In many machine learning algorithms, especially in those involving statistical methods, the covariance matrix is used. For the covariance matrix to be invertible (a requirement in many algorithms like Linear Discriminant Analysis), the features must be linearly independent.

Neural Networks:
In neural networks, linear independence of the input features (or the learned representations in hidden layers) can help prevent issues like vanishing or exploding gradients, leading to more stable and efficient training.
Clustering:
In clustering algorithms, linear independence of the feature vectors representing data points can influence the formation of clusters. Linearly independent features can lead to more distinct and well-separated clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Given two sets of vectors A=a1,a2,a3,…,an
and B=b1,b2,b3,…,bm
. How do you check that they share the same basis?

A

tldr.
If two sets of vectors share the same basis:

The sets span the same subspace.
The vectors within each set are linearly independent.
The sets contain the same number of vectors, equal to the dimension of the subspace they span.

Long Read:

Core Concept

Two sets of vectors share the same basis if and only if:

Spanning the Same Subspace: Both sets of vectors span the same subspace. This means every vector in one set can be expressed as a linear combination of vectors from the other set, and vice-versa.

Linear Independence: Both sets of vectors are linearly independent. This means no vector within a set can be expressed as a linear combination of the other vectors within the same set.
Methods to Check

Here are a few approaches to verify if two sets of vectors share a basis:

A) Checking Span and Linear Independence

Span: Try to express each vector from set A as a linear combination of vectors from set B. Then, do the reverse, expressing each vector from set B as a linear combination of vectors from set A. If successful in both directions, the sets span the same subspace.
Linear Independence: Check if each set is linearly independent. You can do this by forming a matrix with the vectors as rows (or columns) and calculating the determinant. If the determinant is non-zero, the vectors are linearly independent.

B). Row Reduction (Gaussian Elimination)

C) Dimension of Subspaces

  • Find Dimensions: Determine the dimension of the subspace spanned by each set of vectors. You can do this by finding the rank of the matrix formed by the vectors.
  • Compare Dimensions: If the dimensions of the subspaces spanned by both sets are equal, and the number of vectors in each set equals the dimension, then the sets share a basis.

Important Considerations

  • Order: The order of the vectors in the sets doesn’t matter for determining if they share a basis.
  • Number of Vectors: If the number of vectors in the two sets is different, they cannot share the same basis.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Given n
vectors, each of d
dimensions. What is the dimension of their span?

A

The dimension of the span of n vectors, each of d dimensions, is at most the minimum of n and d.

It can be less than this minimum if the vectors are not linearly independent.

Here’s why:

Upper Bound: The dimension of the span cannot exceed the number of vectors (n) because each vector contributes at most one dimension to the span. Similarly, the dimension cannot exceed the dimension of the individual vectors (d) since they live within a d-dimensional space.

Linear Independence: If all n vectors are linearly independent, then the dimension of their span is exactly n (or d if n > d). This is because linearly independent vectors form a basis for their span.

Linear Dependence: If some vectors are linearly dependent on others (i.e., one or more vectors can be expressed as linear combinations of the others), then the dimension of the span will be less than the minimum of n and d. The number of linearly independent vectors in the set determines the actual dimension of the span.

The dimension of the span of a set of vectors is equal to the number of linearly independent vectors within that set.

Longer Read:
Span: The span of a set of vectors is the set of all possible linear combinations of those vectors. It represents the subspace that these vectors “fill up.”

Dimension: The dimension of a vector space is the number of vectors in a basis for that space. A basis is a set of linearly independent vectors that span the entire space.
Steps to Determine the Dimension of the Span
List the Vectors: Identify all the vectors in the set.

Form a Matrix: Arrange the vectors as columns (or rows) in a matrix.

Determine Linear Independence:
Perform Gaussian elimination to row reduce the matrix to its row echelon form (REF) or reduced row echelon form (RREF).
Count the number of pivot columns (columns with leading 1s in RREF) or non-zero rows in REF.

If Matrix is square you can do rank of the matrix and if its 0 it’s dependent.

Count the Independent Vectors:
The number of pivot columns (or non-zero rows) gives the number of linearly independent vectors.

Alternate:

How to Determine the Dimension of the Span

Form a Matrix: Create a matrix where each row is one of the k vectors. This matrix will have dimensions k x n.

Row Reduce: Perform row operations (Gaussian elimination) on the matrix to bring it into row echelon form or reduced row echelon form.

Count Non-Zero Rows (Rank): The number of non-zero rows in the row-reduced matrix is the rank of the matrix, and it represents the dimension of the span of the original k vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Span of a set of vectors?

A

Span: The span of a set of vectors is the set of all possible linear combinations of those vectors. It represents the subspace that these vectors “fill up.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is basis of vector space?

A

The basis of a vector space is a set of vectors that satisfies two important properties:

Spanning: The basis vectors span the entire vector space. This means that any vector in the space can be expressed as a linear combination of the basis vectors.
Linear Independence: The basis vectors are linearly independent. This means that none of the basis vectors can be expressed as a linear combination of the other basis vectors.
In simpler terms, a basis is a minimal set of vectors that can “build” any other vector in the space through scaling and adding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What’s a norm? What is 𝐿0,𝐿1,𝐿2,𝐿𝑛𝑜𝑟𝑚?

A

In mathematics, a norm is a function that assigns a non-negative length or size to a vector.

It generalizes the concept of absolute value for numbers to vectors in vector spaces. A norm must satisfy the following properties:

Non-negativity: The norm of a vector is always greater than or equal to zero, and it is zero only if the vector is the zero vector.
Scalar Multiplication: The norm of a scalar multiple of a vector is equal to the absolute value of the scalar times the norm of the vector.
Triangle Inequality: The norm of the sum of two vectors is less than or equal to the sum of their norms.

In mathematics and specifically in vector spaces, a norm is a function that assigns a non-negative length or size to each vector in the space. Norms are used to measure the magnitude of vectors and have several important properties.

Summary of Norms
- (L_0) norm: Counts the number of non-zero entries in the vector.
- (L_1) norm: Sum of the absolute values of the vector components.
- (L_2) norm: Square root of the sum of the squares of the vector components (Euclidean distance).
- (L_p) norm: Generalization of (L_1) and (L_2) norms, p-th root of the sum of the p-th powers of the absolute values of the vector components.
- (L_\infty) norm: Maximum absolute value among the components of the vector.

Applications in Machine Learning
- Regularization: Norms are used in regularization techniques to prevent overfitting. For example, Lasso regression uses the (L_1) norm, and Ridge regression uses the (L_2) norm.
- Optimization: Norms are often used in the cost functions that need to be minimized in optimization problems.
- Sparsity: The (L_0) and (L_1) norms are particularly useful in promoting sparsity in models, which can lead to more interpretable models and reduced computational costs.

Understanding these norms and their properties is fundamental in many areas of machine learning and optimization.

L0 Norm (Not a True Norm):

Definition: The L0 “norm” of a vector is the number of non-zero elements in the vector.
Technically, it’s not a true norm because it doesn’t satisfy the scalar multiplication property.
Use: Often used as a measure of sparsity in machine learning and signal processing.

L1 Norm (Manhattan Distance, Lasso ):

Definition: The sum of the absolute values of the vector’s components.
Formula: ||x||₁ = |x₁| + |x₂| + … + |xₙ|
Use: Common in machine learning for regularization (Lasso), feature selection, and robust regression.

L2 Norm (Euclidean Distance, Ridge):

Definition: The square root of the sum of the squares of the vector’s components.
Formula: ||x||₂ = √(x₁² + x₂² + … + xₙ²)
Use: Widely used to measure distances between vectors, error calculations, and regularization (Ridge regression).

L-infinity Norm (Maximum Norm):

Definition: The maximum of the absolute values of the vector’s components.
Formula: ||x||∞ = max(|x₁|, |x₂|, …, |xₙ|)
Use: Can be used in applications where the largest magnitude component is of interest, like in some optimization problems.

Why Are Norms Useful in Machine Learning?

Norms are used extensively in machine learning for various purposes:

Regularization: Adding a penalty term based on the L1 or L2 norm of the model’s weights can help prevent overfitting and improve generalization.
Loss Functions: Norms are used in defining loss functions (e.g., mean squared error is based on the L2 norm) to quantify the difference between predicted and actual values.
Feature Scaling: Normalizing features using norms can improve the performance and convergence of some machine learning algorithms.
Distance Metrics: Norms provide a way to measure distances or similarities between data points, which is essential for clustering, classification, and recommendation systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do norm and metric differ? Given a norm, make a metric. Given a metric, can we make a norm?

A

Norms vs. Metrics: Key Differences

|——————-|—————————————————————————|—————————————————————————|
| Measures | Length/magnitude of a single vector | Distance between two points/elements |
| Notation | ||v|| (double bars) | d(x, y) |
| Properties | Non-negativity, scalar homogeneity, triangle inequality | Non-negativity, identity of indiscernibles, symmetry, triangle inequality |

From Norm to Metric:

Given a norm ||.|| defined on a vector space V, you can easily construct a metric d on V as follows:

d(x, y) = ||x - y|| 

This metric measures the distance between two vectors x and y as the length of their difference vector. It intuitively satisfies the metric properties.

Example:

Consider the Euclidean norm (L2 norm) in 2D space:

||(x1, y1)||₂ = √(x1² + y1²)

The corresponding Euclidean metric would be:

d((x1, y1), (x2, y2)) = ||(x1, y1) - (x2, y2)||₂ = √((x1 - x2)² + (y1 - y2)²)

This is the familiar distance formula we use to calculate distances between points in the plane.

From Metric to Norm (Not Always Possible):

Given a metric d on a set X, it’s not always possible to construct a norm. A norm requires the underlying set to be a vector space, and the metric must satisfy additional properties:

  • Translation Invariance: d(x + z, y + z) = d(x, y) for all x, y, z in V.
  • Homogeneity: d(αx, αy) = |α|d(x, y) for all x, y in V and scalar α.

If these conditions hold, then you can define a norm using the metric:

||x|| = d(x, 0)

where 0 is the zero vector of the vector space.

Example:

The discrete metric, where the distance between any two distinct points is 1, does not satisfy translation invariance or homogeneity. Therefore, you cannot derive a norm from the discrete metric.

In Summary:

  • You can always create a metric from a norm, but not vice versa.
  • To go from a metric to a norm, the underlying set must be a vector space, and the metric must satisfy additional properties.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly