PCA Flashcards

(30 cards)

1
Q

What is the primary purpose of PCA?

A

To reduce dimensionality while preserving as much variance as possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does PCA transform data into?

A

A new coordinate system aligned with directions of maximum variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the name of the directions found by PCA?

A

Principal Components.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why do we use dimensionality reduction?

A

To compress data, remove redundancy, visualize, and denoise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What shape does the covariance matrix describe?

A

The geometric shape of the data cloud in feature space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the diagonal of a covariance matrix represent?

A

The variance of each individual feature.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What do off-diagonal entries in a covariance matrix represent?

A

The covariance between pairs of features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does it mean for data to be ‘white noise’?

A

It is uncorrelated, has zero mean, and unit variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the goal of whitening?

A

To transform data so its covariance matrix becomes the identity matrix.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the formula for the multivariate Gaussian distribution?

A

P(x) = (1 / sqrt((2π)^D |Σ|)) * exp(-0.5 * (x - μ)^T Σ⁻¹ (x - μ))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does an eigenvector of Σ represent in PCA?

A

A principal direction of variance in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the corresponding eigenvalue represent?

A

The amount of variance captured in that principal direction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the first step of PCA?

A

Center the data by subtracting the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How is the covariance matrix computed after centering?

A

Σ = (1 / (n - 1)) * BᵀB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is B in the PCA algorithm?

A

The mean-centered data matrix (X - mean).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does projecting data onto eigenvectors achieve?

A

Transforms data into a decorrelated space with ranked variance.

17
Q

What does whitening do in PCA?

A

Removes correlations and scales components to unit variance.

18
Q

What matrix operation is used to whiten data?

A

Multiply by D⁻¹ᐟ² where D contains the eigenvalues.

19
Q

What do principal component ‘loadings’ mean?

A

They are eigenvectors scaled by the variance (eigenvalue).

20
Q

What does PCA seek to maximize when choosing projection directions?

A

The variance of the projected data.

21
Q

What is the rank of the PCA-transformed dataset if we keep k components?

22
Q

What shape is the projection matrix if we reduce to k dimensions?

A

V_k ∈ ℝ^{d × k}, where d is the original dimension.

23
Q

What is an advantage of using PCA before a classifier?

A

It can reduce noise and remove multicollinearity.

24
Q

What happens if you remove PCs with low variance in images?

A

You may remove noise while preserving structure.

25
Why does PCA work well for image compression?
Most variance (information) is captured in a few components.
26
What is a geometric interpretation of PCA?
Finding the plane or hyperplane that best fits the data.
27
Why is PCA an unsupervised method?
Because it does not use class labels or outputs—only feature structure.
28
What is the role of SVD in PCA?
SVD can be used to compute PCA efficiently and stably.
29
What does PCA assume about feature relationships?
That directions with high variance are the most informative.
30
What happens when you project data onto the top k principal components?
You get a compressed version with minimal information loss.