L12 - Dimensionality Reduction Flashcards

Question 1

Q

What is meant by dimensionality reduction?

Answer

A

The reduction of feature count within a data set.

Question 2

Q

What 3 ways does dimensionality reduction improve / enhance the creation and running of ML models?

Answer

A

Saves time, saves money, removed irrelevant data.

Question 3

Q

What are the 2 methods of dimensionality reduction? Define each…

Answer

A

Feature extraction -> Extract useful combinations of features from the data.
Feature selection -> Analyse all features of the data to establish the relevant ones.

Question 4

Q

What are the 3 methods for feature selection?

Answer

A

Filter Methods
Wrapper Methods
Embedded Methods

Question 5

Q

Explain the Filter Method…

Answer

A

Method of feature selection for dimensionality reduction
1 -> Bring features to the same scale through normalisation or standardisation
2 -> Choose some variance threshold
3 -> Calculate the variance of each feature, dropping ones that are below the threshold

Question 6

Q

Explain the Wrapper Method…

Answer

A

Method of feature selection for dimensionality reduction
Can either be conducted via Forward Search or Recursive Feature Elimination
Both FS and RFE conduct a battle royale process to establish best features.
Both stop when there are N models each with N best features

Question 7

Q

Explain how Forward Search works in Wrapper Method of Feature Selection…

Answer

A

Create N models with 1 feature each
Find the best feature E.g Feature 3
Create N-1 models each with previous best feature (F3) + another feature E.g Model1(F3,F1), Model2(F3,F2) etc…
Repeat until we have a models with N features and can choose the best one

Question 8

Q

Explain how Recursive Feature Elimination works in Wrapper Method of Feature Selection…

Answer

A

Reverse of Forward Search
Start with N-1 models each with N-1 features
Repeatedly remove the single worst feature from each model
Result in final best model with M features

Question 9

Q

Explain the Embedded Method

Answer

A

Method of feature selection for dimensionality reduction
Use decision trees to establish the best features
The use random forests to aggregate the result of the decision trees

Question 10

Q

What is a Random Forest?

Answer

A

An aggregation of decision trees

Question 11

Q

What are the 2 types of methods for Feature Extraction?

Answer

A

Linear
Non-Linear

Question 12

Q

What is the main Linear method for feature extraction? Explain it…

Answer

A

Principal Component Analysis
Find an orthogonal coordinate transformation such that every new coordinate is maximally effective
This creates new N variables, named Principal Components
Principal Components are linear combinations of the original coordinates
The orthogonal coordinate with the most variation is the most informative

Question 13

Q

What is the worst case scenario of PCA?

Answer

A

When all variables are equally important, but are uncorrelated.
This provides us with no information.

Question 14

Q

What are the steps of PCA?

Answer

A

Generate the covariance matrix from the dataset
Diagonalise the covariance matrix
Multiply XV
Take first K principal components with the largest eigenvalues.
This gives us a K-dimensional representation of the data, having extracted the K most important features.
The dimensionality reduction comes from the removal of least important principal components

Question 15

Q

What do the Eigenvalues represent in PCA?

Answer

A

The variance capture by each principal component.

Question 16

Q

What is the Input-Output into PCA’s?

Answer

A

Input -> High-D data
Output -> Low-D data

Question 17

Q

What are the 2 main Non-linear methods for feature extraction?

Answer

A

t-SNE
UMAP

Question 18

Q

Explain t-SNE

Answer

A

A non-linear method for feature extraction

1 - Calculate the distribution of distances across the N points and call this D

2 - Scatter N points randomly in 2 or 3 dimensions

3 - Move the N points around until distance distribution resembles D

Question 19

Q

What are some issues with t-SNE?

Answer

A

Faraway points are meaning less
Poor scaling due to high memory usage

Question 20

Q

Explain UMAP

Answer

A

A non-linear method for feature extraction
Runs faster and uses less memory than t-SNE

Question 21

Q

What are some issues with both t-SNE and UMAP?

Answer

A

Hyperparameter dependence
Cluster sizes and distance between clusters mean nothing
X and Y axis are almost impossible to interpret

Brainscape's Knowledge GenomeTM

L12 - Dimensionality Reduction Flashcards

Brainscape's Knowledge Genome^TM