9 Dimensionality Reduction Flashcards
(18 cards)
What does PCA stand for?
Principal Component Analysis
What does PCA aim to do with data?
Reduce dimensionality while preserving variance
What does PCA aim to maximise when choosing principal components?
The variance of the projected data; equivalently, it minimises the squared reconstruction error ‖X − XVVᵀ‖² under VᵀV = I.
Write the formula for projecting data X onto the first q principal components.
Z = XV, where V contains the top q eigenvectors of XᵀX (or right‑singular vectors of X).
Why must data usually be centred before applying PCA?
Without centring, the first PC may align with the mean vector and capture total sums‑of‑squares instead of true variance directions.
When is centring typically not applied in dimensionality reduction?
When methods require non‑negativity (e.g. NMF) or when raw counts would lose meaning if shifted below zero.
In sparse PCA, what does imposing an L1 penalty on Z achieve?
Each observation loads on only a few components, yielding a sparse latent representation useful for compression or clustering.
In sparse PCA, what does imposing an L1 penalty on V achieve?
Each principal component depends on a small subset of original features, improving interpretability and acting like feature selection.
What information does a scree plot convey?
Ordered eigenvalues (variance explained) so one can pick an ‘elbow’ where additional PCs add little extra variance.
State two reasons NMF is popular for topic modelling.
1) Non‑negativity yields parts‑based, interpretable factors; 2) It automatically clusters documents/terms while minimising ‖X − ZVᵀ‖² with Z,V ≥ 0.
Key difference between NMF and PCA loadings.
NMF loadings are non‑negative and often sparse, while PCA loadings can be positive, negative, and dense.
Define ‘perplexity’ in t‑SNE.
A user‑set knob (≈ number of effective neighbours) that controls σᵢ bandwidths and thus the size of local neighbourhoods preserved.
What loss does t‑SNE minimise?
The Kullback–Leibler divergence between high‑dimensional pairwise similarities pᵢⱼ and low‑dimensional qᵢⱼ.
List two limitations of t‑SNE.
1) Poor global distance preservation; 2) High computational cost and strong dependence on perplexity.
What are UMAP’s two most important hyperparameters and their roles?
n_neighbors sets neighbourhood size (local vs global balance); min_dist controls how closely points can pack in the embedding (cluster tightness).
Name one theoretical and one practical advantage of UMAP over t‑SNE.
Theory: optimises a cross‑entropy based on manifold topology; Practice: faster, scales to larger data and preserves more global structure.
True/False: UMAP and t‑SNE exactly preserve Euclidean distances between distant clusters.
False — both focus on topology/relative similarity; exact global distances are not guaranteed.
How does PCA relate to the SVD of X?
If X = USVᵀ, the right‑singular vectors V are the principal component directions and singular values squared equal eigenvalues of XᵀX.