9 Dimensionality Reduction Flashcards

Question 1

Q

What does PCA stand for?

Answer

A

Principal Component Analysis

Question 2

Q

What does PCA aim to do with data?

Answer

A

Reduce dimensionality while preserving variance

Question 3

Q

What does PCA aim to maximise when choosing principal components?

Answer

A

The variance of the projected data; equivalently, it minimises the squared reconstruction error ‖X − XVVᵀ‖² under VᵀV = I.

Question 4

Q

Write the formula for projecting data X onto the first q principal components.

Answer

A

Z = XV, where V contains the top q eigenvectors of XᵀX (or right‑singular vectors of X).

Question 5

Q

Why must data usually be centred before applying PCA?

Answer

A

Without centring, the first PC may align with the mean vector and capture total sums‑of‑squares instead of true variance directions.

Question 6

Q

When is centring typically not applied in dimensionality reduction?

Answer

A

When methods require non‑negativity (e.g. NMF) or when raw counts would lose meaning if shifted below zero.

Question 7

Q

In sparse PCA, what does imposing an L1 penalty on Z achieve?

Answer

A

Each observation loads on only a few components, yielding a sparse latent representation useful for compression or clustering.

Question 8

Q

In sparse PCA, what does imposing an L1 penalty on V achieve?

Answer

A

Each principal component depends on a small subset of original features, improving interpretability and acting like feature selection.

Question 9

Q

What information does a scree plot convey?

Answer

A

Ordered eigenvalues (variance explained) so one can pick an ‘elbow’ where additional PCs add little extra variance.

Question 10

Q

State two reasons NMF is popular for topic modelling.

Answer

A

1) Non‑negativity yields parts‑based, interpretable factors; 2) It automatically clusters documents/terms while minimising ‖X − ZVᵀ‖² with Z,V ≥ 0.

Question 11

Q

Key difference between NMF and PCA loadings.

Answer

A

NMF loadings are non‑negative and often sparse, while PCA loadings can be positive, negative, and dense.

Question 12

Q

Define ‘perplexity’ in t‑SNE.

Answer

A

A user‑set knob (≈ number of effective neighbours) that controls σᵢ bandwidths and thus the size of local neighbourhoods preserved.

Question 13

Q

What loss does t‑SNE minimise?

Answer

A

The Kullback–Leibler divergence between high‑dimensional pairwise similarities pᵢⱼ and low‑dimensional qᵢⱼ.

Question 14

Q

List two limitations of t‑SNE.

Answer

A

1) Poor global distance preservation; 2) High computational cost and strong dependence on perplexity.

Question 15

Q

What are UMAP’s two most important hyperparameters and their roles?

Answer

A

n_neighbors sets neighbourhood size (local vs global balance); min_dist controls how closely points can pack in the embedding (cluster tightness).

Question 16

Q

Name one theoretical and one practical advantage of UMAP over t‑SNE.

Answer

Study These Flashcards

A

Theory: optimises a cross‑entropy based on manifold topology; Practice: faster, scales to larger data and preserves more global structure.

Question 17

Q

True/False: UMAP and t‑SNE exactly preserve Euclidean distances between distant clusters.

Answer

Study These Flashcards

A

False — both focus on topology/relative similarity; exact global distances are not guaranteed.

Question 18

Q

How does PCA relate to the SVD of X?

Answer

Study These Flashcards

A

If X = USVᵀ, the right‑singular vectors V are the principal component directions and singular values squared equal eigenvalues of XᵀX.

9 Dimensionality Reduction Flashcards

(18 cards)