Dimensionality Reduction Flashcards

1
Q

Curse of Dimensionality

A

Increasing the number of features will not always improve classification accuracy, in fact, it may make it worse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Two main routes to reduce dimensionality

A

Feature extraction
Feature Selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Application of dimensionality reduction

A

Customer relationship management
Text Mining
Image retrieval
Microarray data analysis
Protein classification
face recognition
handwriting digit recognition
intrusion detection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Feature Selection

A

A process that chooses an optimal subset of features according to a objective function

Objectives: reduce dimensionality and remove noise. Improve speed of learning, predictive accuracy, and simplicity

Think stepwise / forward / backward regressions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Feature Extraction

A

The mapping of the original high dimensionality data to a lower dimensional space

Goals can change based on end usage:
Unsupervised learning - minimize information loss (PCA)
Supervised learning - maximize class discrimination (LDA)

Think PCA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Pros of feature reduction

A

All original features are used although they may not be used in the same form. They are combined linearly.

In feature selection, only a subset of the original features are selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Feature selection methods

A

Remove features with missing values
remove features with low variance
remove highly correlated features
univariate feature selection
feature selection using select from model
filter methods
wrapper methods
embedded methods
hybrid methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Univariate feature selection

A

selecting best features based on univariate statistical tests. sklearns selectKbest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Filter Methods for Feature Selection

A

Filter based on:
Information Gain
Chi Squared Test
Fishers Score
Correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Information gain

A

Calculates the reduction in entropy from the transformation of a dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Fisher Score

A

Fishers score is one of the most widely used supervised feature selection methods.

The algorithm returns the ranks of variables based on the fishers score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Correlation Coefficient

A

Variables should be correlated with the target but should be uncorrelated among themselves (think the grid map)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Wrapper Methods

A

Generally ends with better results than filter methods as it can include feature interactions. It follows a greedy search approach by evaluating all the possible combinations of features against evaluation criterion

Forward selection (start with the best predictor and add), backwards selection (start with all features and remove weak ones), exhaustive (tries all combos), recursive selection (selects features by recursively considering smaller and smaller sets of features)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Embedded methods

A

These methods encompass the benefits of both wrapper and filter methods, by including interactions of features but also maintaining a reasonable computational cost.

LASSO, Random Forest Importance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

LASSO

A

More accurate than base regressions

Uses shrinkage - where all data values are shrunken towards a central point as the mean

Encourages simple, sparse models

Well suited for models showing high levels of multicollinearity

Regularization consists of adding a penalty to the different parameters of the
machine learning model to reduce the freedom of the model, i.e. to avoid over-
fitting. In linear model regularization, the penalty is applied over the coefficients.
Lasso or L1 is able to shrink some of the coefficients to zero. Therefore, that
feature can be removed from the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Random Forest Importance

A

Random forests naturally rank by how well they improve the purity of the node, i.e. decrease the Gini impurity over all trees. Pruning trees below a particular node can create a subset of the most important features.

17
Q

Feature Extraction

A

y = f(x)

Goals -
Minimize information loss: represent the data as accurately as possible in the lower dimensional space

Maximize discriminatory information: Enhance the class-discriminatory information in the lower dimensional space

18
Q

Popular feature extraction methods

A

PCA (principal component analysis) - seeks to preserve information
LDA (linear discriminate analysis) - seeks to maximize discrimination