Feature Selection Flashcards

1
Q

Pearson Correlation

A

Filter-Based Feature Selection

Pearson’s correlation statistic, or Pearson’s correlation coefficient, is also known in statistical models as the r value. For any two variables, it returns a value that indicates the strength of the correlation

Pearson’s correlation coefficient is computed by taking the covariance of two variables and dividing by the product of their standard deviations. The coefficient is not affected by changes of scale in the two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Mutual Information

A

Filter-Based Feature Selection

The mutual information score measures the contribution of a variable towards reducing uncertainty about the value of another variable: namely, the label. Many variations of the mutual information score have been devised to suit different distributions.

The mutual information score is particularly useful in feature selection because it maximizes the mutual information between the joint distribution and target variables in datasets with many dimensions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Kendall Correlation

A

Filter-Based Feature Selection

Kendall’s rank correlation is one of several statistics that measure the relationship between rankings of different ordinal variables or different rankings of the same variable. In other words, it measures the similarity of orderings when ranked by the quantities. Both this coefficient and Spearman’s correlation coefficient are designed for use with non-parametric and non-normally distributed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Spearman Correlation

A

Filter-Based Feature Selection

Spearman’s coefficient is a nonparametric measure of statistical dependence between two variables, and is sometimes denoted by the Greek letter rho. The Spearman’s coefficient expresses the degree to which two variables are monotonically related. It is also called Spearman rank correlation, because it can be used with ordinal variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Chi Squared

A

Filter-Based Feature Selection

The two-way chi-squared test is a statistical method that measures how close expected values are to actual results. The method assumes that variables are random and drawn from an adequate sample of independent variables. The resulting chi-squared statistic indicates how far results are from the expected (random) result.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Fisher Score

A

Filter-Based Feature Selection

The Fisher score (also called the Fisher method, or Fisher combined probability score) is sometimes termed the information score, because it represents the amount of information that one variable provides about some unknown parameter on which it depends.

The score is computed by measuring the variance between the expected value of the information and the observed value. When variance is minimized, information is maximized. Since the expectation of the score is zero, the Fisher information is also the variance of the score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Count Based

A

Filter-Based Feature Selection

Count-based feature selection is a simple yet relatively powerful way of finding information about predictors. The basic idea underlying count-based featurization is simple: by calculating counts of individual values within a column, you can get an idea of the distribution and weight of values, and from this, understand which columns contain the most important information.

Count-based feature selection is a non-supervised method of feature selection, meaning you don’t need a label column. This method also reduces the dimensionality of the data without losing information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Fisher Linear Discriminant Analysis

A

This method is often used for dimensionality reduction, because it projects a set of features onto a smaller feature space while preserving the information that discriminates between classes. This not only reduces computational costs for a given classification task, but can help prevent overfitting.

To generate the scores, you provide a label column and set of numerical feature columns as inputs. The algorithm determines the optimal combination of the input columns that linearly separates each group of data while minimizing the distances within each group. The module returns a dataset containing the compact, transformed features, along with a transformation that you can save and apply to another dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Permutation Feature Importance

A

In this module, feature values are randomly shuffled, one column at a time, and the performance of the model is measured before and after. You can choose one of the standard metrics provided to measure performance.

The scores that the module returns represent the change in the performance of a trained model, after permutation. Important features are usually more sensitive to the shuffling process, and will thus result in higher importance scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly