Dimensionality Reduction Flashcards

(15 cards)

1
Q

What is the aim of Dimensionality Reduction?

A

To remove noise from the data and focus on the features that are actually important and increase model efficiency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two types of Dimensionality Reduction?

A
  1. Feature Selection
  2. Feature Reduction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an example of a filter method?

A

Variance Thresholding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does variance thresholding work?

A

Calculate the variance of each feature, then drop features with variance below some threshold. The idea is that low-variance features contain less information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are two examples of wrapper methods?

A
  1. Forward Search
  2. Recursive Feature Elimination
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a wrapper method?

A

A method that searches for an optimal feature subset tailored to a particular algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the main steps for a forward search?

A
  1. Create n models with one feature each
  2. select the best one
  3. Create n-1 models, by adding one feature
  4. select the best one
  5. proceed until you have chosen m features
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the main steps for a Recursive Feature Elimination?

A
  1. create n-1 models, with n-1 features
  2. Select the best one
  3. Create n-2 models, by removing one feature
  4. select the best one
  5. Proceed until you have removed m features
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are three techniques for splitting data for a Decision Tree?

A
  1. Gini impurity coefficient
  2. Information gain
  3. Variance reduction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two types of Feature Extraction?

A

Linear Methods and Non-Linear Methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does Principle Component Analysis (PCA) Work?

A

It finds an orthogonal coordinate transformation such that every new coordinate is “maximally informative”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are t-SNE and UMAP?

A

They are common methods when visualising high-dimensional dataset but they are suited to data visualisation only.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the process for t-SNE?

A
  1. take the distribution of distances between the N points in the dataset. Call that D.
  2. Scatter N points in 2 or 3 dimensions, randomly.
  3. Move those N points around ntil the distribution of distances between them resembles D.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the advantage of UMAP over t-SNE?

A

It is only slightly different, but it runs faster and uses less memory while having no problem embedding into >3 dimensions. It can also preserve local and global structure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the problems with t-SNE and UMAP?

A
  1. Both depend a lot on their hyperparameters.
  2. Cluster sizes and distances between clusters means nothing
  3. x and y axes are basically impossible to interpret
How well did you know this?
1
Not at all
2
3
4
5
Perfectly