Dimensionality Reduction Flashcards

1
Q

PCA

A

Principal Component Analysis
feature extraction, the data is transformed from a high-dimensional space to a lower dimensional space.
With PCA the transformation is taken to be linear, but many other methods exist.

PCA aims to detect the correlation between variables. If a strong correlation between variables exists, the attempt to reduce the dimensionality only makes sense
Goal: Finding the directions of maximum variance in high-dimensional data and project it onto a smaller dimensional subspace while retaining most of the information.

Choose the k features to use in new subspace by computing the eigenvalues of the correlation matrix. The larger they are the more magnitude this feature has and their corresponding eigenvector also indicates the new direction. Thus take the top k.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why dimensionality reduction ?

A
  • Remove redundancies and simplifies the dataset making it easier to understand.
  • It’s easier to visualize low dimensional data.
  • It reduces storage space for large datasets (because of less features).
  • It reduces time for computationally intensive tasks (again, becasue of less features).
  • Reducing dimensionality can help avoid overfitting in supervised learning tasks.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

projection matrix

A

projecting our data onto the new subspace
use the eigenvector of the chosen features to do so
eigenvector indicate the direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

explained variance

A

each feature is attributed an eigenvalue that indicate its magnitude on the variance from the covariance matrix
the explained variance is how much that one feature explains the variance of the whole data, its just the eigen value scaled with the others.
can do cumulative variance to see the explained variance of the total chosen features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Matrix Factorization

A

breaking down of one matrix in a product of multiple matrices
eg: market basket purchases ( 𝑛 customers -by- 𝑑 products)

  • Singular Value Decomposition (SVD)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

SVD

A

Singular Value Decomposition (SVD)
an algorithm that decomposes a matrix 𝐴 into the best lower rank (i.e. smaller/simpler) approximation of the original matrix 𝐴 . Mathematically, it decomposes 𝐴 into a two unitary matrices and a diagonal matrix

thus if we have A containing the user ratings of a movie (row is a user and column is a movie) can split it into a matrix of user and a matrix of movies. The diagonal matrix indicate how important a feature is so we can reduce the dimension by only keeping the most important ones.
We can also make predictions if we replace one feature matrix by another one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly