9 Dimensionality Reduction Flashcards

(18 cards)

1
Q

What does PCA stand for?

A

Principal Component Analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does PCA aim to do with data?

A

Reduce dimensionality while preserving variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does PCA aim to maximise when choosing principal components?

A

The variance of the projected data; equivalently, it minimises the squared reconstruction error ‖X − XVVᵀ‖² under VᵀV = I.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Write the formula for projecting data X onto the first q principal components.

A

Z = XV, where V contains the top q eigenvectors of XᵀX (or right‑singular vectors of X).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why must data usually be centred before applying PCA?

A

Without centring, the first PC may align with the mean vector and capture total sums‑of‑squares instead of true variance directions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When is centring typically not applied in dimensionality reduction?

A

When methods require non‑negativity (e.g. NMF) or when raw counts would lose meaning if shifted below zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In sparse PCA, what does imposing an L1 penalty on Z achieve?

A

Each observation loads on only a few components, yielding a sparse latent representation useful for compression or clustering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In sparse PCA, what does imposing an L1 penalty on V achieve?

A

Each principal component depends on a small subset of original features, improving interpretability and acting like feature selection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What information does a scree plot convey?

A

Ordered eigenvalues (variance explained) so one can pick an ‘elbow’ where additional PCs add little extra variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

State two reasons NMF is popular for topic modelling.

A

1) Non‑negativity yields parts‑based, interpretable factors; 2) It automatically clusters documents/terms while minimising ‖X − ZVᵀ‖² with Z,V ≥ 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Key difference between NMF and PCA loadings.

A

NMF loadings are non‑negative and often sparse, while PCA loadings can be positive, negative, and dense.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define ‘perplexity’ in t‑SNE.

A

A user‑set knob (≈ number of effective neighbours) that controls σᵢ bandwidths and thus the size of local neighbourhoods preserved.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What loss does t‑SNE minimise?

A

The Kullback–Leibler divergence between high‑dimensional pairwise similarities pᵢⱼ and low‑dimensional qᵢⱼ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

List two limitations of t‑SNE.

A

1) Poor global distance preservation; 2) High computational cost and strong dependence on perplexity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are UMAP’s two most important hyperparameters and their roles?

A

n_neighbors sets neighbourhood size (local vs global balance); min_dist controls how closely points can pack in the embedding (cluster tightness).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Name one theoretical and one practical advantage of UMAP over t‑SNE.

A

Theory: optimises a cross‑entropy based on manifold topology; Practice: faster, scales to larger data and preserves more global structure.

17
Q

True/False: UMAP and t‑SNE exactly preserve Euclidean distances between distant clusters.

A

False — both focus on topology/relative similarity; exact global distances are not guaranteed.

18
Q

How does PCA relate to the SVD of X?

A

If X = USVᵀ, the right‑singular vectors V are the principal component directions and singular values squared equal eigenvalues of XᵀX.