Chapter 8 Flashcards
(27 cards)
Why is Dimension Reduction beneficial
- Data sets may have thousands or millions of features (dimensions)
- Reducing features can make large problems tractable
- Feature reduction techniques identify how to keep most information
Opportunities for reducing features
– Some features are more important than others
– Some features might be highly correlated
Dimensionality reduction projects data set
from ______ space to _____ space
high-dimensional , lower-
dimensional
The Curse of Dimensionality
High-dimensional space is difficult to understand with intuition, More “extreme” points, Sparser coverage
Projection (slice of dimension)
Training data can be projected to lower-dimensional subspace
– Resulting data set has lower dimensionality
– Subspace has own coordinate system (not just elimination of dimension)
What are challenges with dimension reduction with projection
Projection may not work for all
data sets
– “Swiss roll” presents challenges
due to twisted characteristics
What is a manifold in the context of manifold learning?
A d-dimensional manifold is a d-dimensional hyperplane embedded in an n-dimensional space where d < n.
Why is the Swiss roll dataset commonly used in manifold learning?
It’s a 2D shape twisted into 3D space, illustrating how low-dimensional manifolds can exist within higher-dimensional spaces.
What does PCA stand for
Principal Component Analysis
What is PCA
– PCA identifies hyperplane that lies closes to the data
– PCA projects data onto that hyperplane
- Determines the slice with highest variance
What does Explained Variance Ratio tell you
Explained Variance Ratio states how much variance lies along axis
– Highest ratio with first principal component, etc.
What is the common amount of desired variance to preserve
95%
Besides choosing a specific variance how else can you chose a sufficient number of dimensions
Elbow in variance curve
PCA for Compression
- Data set is smaller after dimensionality reduction
– Example: MNIST with ~150 features retains 95% variance at ~20% size - Data can be reconstructed from lower dimension
– Reconstruction is lossy and creates reconstruction error
Incremental PCA
Incremental PCA (IPCA) processes one mini-batch at a time, SVD algorithm needs entire training set in memory
Kernel PCA
- Based on “kernel trick” from SVM
– Implicit mapping to very high-dimensional space for nonlinear classification
– Linear decision boundary in high-dimensional space correspond to complex
decision boundary in the original space
What type of learning algorithm is Kernel PCA (kPCA)?
It is an unsupervised learning algorithm.
What is a major challenge when using Kernel PCA?
Choosing the right kernel and hyperparameters, since there is no clear performance metric in unsupervised learning.
How can we find suitable kernels and parameters in Kernel PCA?
By using parameter search methods like GridSearchCV, often based on downstream task performance.
What is Locally Linear Embedding (LLE)?
A non-linear dimensionality reduction technique that preserves local relationships between data points.
Is LLE a projection-based method?
No, LLE does not rely on projections like PCA.
How does LLE represent each data point?
As a linear combination of its nearest neighbors.
What types of data structures does LLE work well with?
Twisted or curved manifolds with relatively low noise.
What is a limitation of LLE?
It does not preserve global distances well — only local structures. Does not scale well over large datasets