Chapter 8 Flashcards

(27 cards)

1
Q

Why is Dimension Reduction beneficial

A
  • Data sets may have thousands or millions of features (dimensions)
  • Reducing features can make large problems tractable
  • Feature reduction techniques identify how to keep most information
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Opportunities for reducing features

A

– Some features are more important than others
– Some features might be highly correlated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Dimensionality reduction projects data set
from ______ space to _____ space

A

high-dimensional , lower-
dimensional

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The Curse of Dimensionality

A

High-dimensional space is difficult to understand with intuition, More “extreme” points, Sparser coverage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Projection (slice of dimension)

A

Training data can be projected to lower-dimensional subspace
– Resulting data set has lower dimensionality
– Subspace has own coordinate system (not just elimination of dimension)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are challenges with dimension reduction with projection

A

Projection may not work for all
data sets
– “Swiss roll” presents challenges
due to twisted characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a manifold in the context of manifold learning?

A

A d-dimensional manifold is a d-dimensional hyperplane embedded in an n-dimensional space where d < n.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is the Swiss roll dataset commonly used in manifold learning?

A

It’s a 2D shape twisted into 3D space, illustrating how low-dimensional manifolds can exist within higher-dimensional spaces.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does PCA stand for

A

Principal Component Analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is PCA

A

– PCA identifies hyperplane that lies closes to the data
– PCA projects data onto that hyperplane
- Determines the slice with highest variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does Explained Variance Ratio tell you

A

Explained Variance Ratio states how much variance lies along axis
– Highest ratio with first principal component, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the common amount of desired variance to preserve

A

95%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Besides choosing a specific variance how else can you chose a sufficient number of dimensions

A

Elbow in variance curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

PCA for Compression

A
  • Data set is smaller after dimensionality reduction
    – Example: MNIST with ~150 features retains 95% variance at ~20% size
  • Data can be reconstructed from lower dimension
    – Reconstruction is lossy and creates reconstruction error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Incremental PCA

A

Incremental PCA (IPCA) processes one mini-batch at a time, SVD algorithm needs entire training set in memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Kernel PCA

A
  • Based on “kernel trick” from SVM
    – Implicit mapping to very high-dimensional space for nonlinear classification
    – Linear decision boundary in high-dimensional space correspond to complex
    decision boundary in the original space
17
Q

What type of learning algorithm is Kernel PCA (kPCA)?

A

It is an unsupervised learning algorithm.

18
Q

What is a major challenge when using Kernel PCA?

A

Choosing the right kernel and hyperparameters, since there is no clear performance metric in unsupervised learning.

19
Q

How can we find suitable kernels and parameters in Kernel PCA?

A

By using parameter search methods like GridSearchCV, often based on downstream task performance.

20
Q

What is Locally Linear Embedding (LLE)?

A

A non-linear dimensionality reduction technique that preserves local relationships between data points.

21
Q

Is LLE a projection-based method?

A

No, LLE does not rely on projections like PCA.

22
Q

How does LLE represent each data point?

A

As a linear combination of its nearest neighbors.

23
Q

What types of data structures does LLE work well with?

A

Twisted or curved manifolds with relatively low noise.

24
Q

What is a limitation of LLE?

A

It does not preserve global distances well — only local structures. Does not scale well over large datasets

25
Define LLE operation in 2 steps
Step 1: represent instances relative to closest neighbors – For each training instance, identify k closes neighbors – Reconstruct instance as linear function of neighbors – Generate weight matrix that contains all reconstruction information * Step 2: map training instances to lower-dimensional space – Similar to first step, but keep weights fixed and find position of instances
26
Other Dimensionality Reduction Techniques
* Multidimensional Scaling (MDS) – Reduces dimensionality while trying to preserve distance between instances * Isomap – Uses graph connecting neighbors and maintaining geodesic distance * t-Distributed Stochastic Neighbor Embedding (t-SNE) – Keeps similar instances close (used as clustering for visualization)
27