Data set is smaller after dimensionality reduction – Example: MNIST with ~150 features retains 95% variance at ~20% size Data can be reconstructed from lower dimension – Reconstruction is lossy and creates reconstruction error

Chapter 8 Flashcards by Gourgey Hats

Why is Dimension Reduction beneficial

Data sets may have thousands or millions of features (dimensions)
Reducing features can make large problems tractable
Feature reduction techniques identify how to keep most information

How well did you know this?

Not at all

Perfectly

Opportunities for reducing features

– Some features are more important than others
– Some features might be highly correlated

How well did you know this?

Not at all

Perfectly

Dimensionality reduction projects data set
from ______ space to _____ space

high-dimensional , lower-
dimensional

How well did you know this?

Not at all

Perfectly

The Curse of Dimensionality

High-dimensional space is difficult to understand with intuition, More “extreme” points, Sparser coverage

How well did you know this?

Not at all

Perfectly

Projection (slice of dimension)

Training data can be projected to lower-dimensional subspace
– Resulting data set has lower dimensionality
– Subspace has own coordinate system (not just elimination of dimension)

How well did you know this?

Not at all

Perfectly

What are challenges with dimension reduction with projection

Projection may not work for all
data sets
– “Swiss roll” presents challenges
due to twisted characteristics

How well did you know this?

Not at all

Perfectly

What is a manifold in the context of manifold learning?

A d-dimensional manifold is a d-dimensional hyperplane embedded in an n-dimensional space where d < n.

How well did you know this?

Not at all

Perfectly

Why is the Swiss roll dataset commonly used in manifold learning?

It’s a 2D shape twisted into 3D space, illustrating how low-dimensional manifolds can exist within higher-dimensional spaces.

How well did you know this?

Not at all

Perfectly

What does PCA stand for

Principal Component Analysis

How well did you know this?

Not at all

Perfectly

What is PCA

– PCA identifies hyperplane that lies closes to the data
– PCA projects data onto that hyperplane
- Determines the slice with highest variance

How well did you know this?

Not at all

Perfectly

What does Explained Variance Ratio tell you

Explained Variance Ratio states how much variance lies along axis
– Highest ratio with first principal component, etc.

How well did you know this?

Not at all

Perfectly

What is the common amount of desired variance to preserve

95%

How well did you know this?

Not at all

Perfectly

Besides choosing a specific variance how else can you chose a sufficient number of dimensions

Elbow in variance curve

How well did you know this?

Not at all

Perfectly

PCA for Compression

Data set is smaller after dimensionality reduction
– Example: MNIST with ~150 features retains 95% variance at ~20% size
Data can be reconstructed from lower dimension
– Reconstruction is lossy and creates reconstruction error

How well did you know this?

Not at all

Perfectly

Incremental PCA

Incremental PCA (IPCA) processes one mini-batch at a time, SVD algorithm needs entire training set in memory

How well did you know this?

Not at all

Perfectly

Kernel PCA

Study These Flashcards

Based on “kernel trick” from SVM
– Implicit mapping to very high-dimensional space for nonlinear classification
– Linear decision boundary in high-dimensional space correspond to complex
decision boundary in the original space

What type of learning algorithm is Kernel PCA (kPCA)?

Study These Flashcards

It is an unsupervised learning algorithm.

What is a major challenge when using Kernel PCA?

Study These Flashcards

Choosing the right kernel and hyperparameters, since there is no clear performance metric in unsupervised learning.

How can we find suitable kernels and parameters in Kernel PCA?

Study These Flashcards

By using parameter search methods like GridSearchCV, often based on downstream task performance.

What is Locally Linear Embedding (LLE)?

Study These Flashcards

A non-linear dimensionality reduction technique that preserves local relationships between data points.

Is LLE a projection-based method?

Study These Flashcards

No, LLE does not rely on projections like PCA.

How does LLE represent each data point?

Study These Flashcards

As a linear combination of its nearest neighbors.

What types of data structures does LLE work well with?

Study These Flashcards

Twisted or curved manifolds with relatively low noise.

What is a limitation of LLE?

Study These Flashcards

It does not preserve global distances well — only local structures. Does not scale well over large datasets

Define LLE operation in 2 steps

Step 1: represent instances relative to closest neighbors – For each training instance, identify k closes neighbors – Reconstruct instance as linear function of neighbors – Generate weight matrix that contains all reconstruction information * Step 2: map training instances to lower-dimensional space – Similar to first step, but keep weights fixed and find position of instances

Other Dimensionality Reduction Techniques

* Multidimensional Scaling (MDS) – Reduces dimensionality while trying to preserve distance between instances * Isomap – Uses graph connecting neighbors and maintaining geodesic distance * t-Distributed Stochastic Neighbor Embedding (t-SNE) – Keeps similar instances close (used as clustering for visualization)

Chapter 8 Flashcards

(27 cards)