Feature Selection Flashcards

Question 1

Q

Pearson Correlation

Answer

A

Filter-Based Feature Selection

Pearson’s correlation statistic, or Pearson’s correlation coefficient, is also known in statistical models as the r value. For any two variables, it returns a value that indicates the strength of the correlation

Pearson’s correlation coefficient is computed by taking the covariance of two variables and dividing by the product of their standard deviations. The coefficient is not affected by changes of scale in the two variables.

Question 2

Q

Mutual Information

Answer

A

Filter-Based Feature Selection

The mutual information score measures the contribution of a variable towards reducing uncertainty about the value of another variable: namely, the label. Many variations of the mutual information score have been devised to suit different distributions.

The mutual information score is particularly useful in feature selection because it maximizes the mutual information between the joint distribution and target variables in datasets with many dimensions.

Question 3

Q

Kendall Correlation

Answer

A

Filter-Based Feature Selection

Kendall’s rank correlation is one of several statistics that measure the relationship between rankings of different ordinal variables or different rankings of the same variable. In other words, it measures the similarity of orderings when ranked by the quantities. Both this coefficient and Spearman’s correlation coefficient are designed for use with non-parametric and non-normally distributed data.

Question 4

Q

Spearman Correlation

Answer

A

Filter-Based Feature Selection

Spearman’s coefficient is a nonparametric measure of statistical dependence between two variables, and is sometimes denoted by the Greek letter rho. The Spearman’s coefficient expresses the degree to which two variables are monotonically related. It is also called Spearman rank correlation, because it can be used with ordinal variables.

Question 5

Q

Chi Squared

Answer

A

Filter-Based Feature Selection

The two-way chi-squared test is a statistical method that measures how close expected values are to actual results. The method assumes that variables are random and drawn from an adequate sample of independent variables. The resulting chi-squared statistic indicates how far results are from the expected (random) result.

Question 6

Q

Fisher Score

Answer

A

Filter-Based Feature Selection

The Fisher score (also called the Fisher method, or Fisher combined probability score) is sometimes termed the information score, because it represents the amount of information that one variable provides about some unknown parameter on which it depends.

The score is computed by measuring the variance between the expected value of the information and the observed value. When variance is minimized, information is maximized. Since the expectation of the score is zero, the Fisher information is also the variance of the score.

Question 7

Q

Count Based

Answer

A

Filter-Based Feature Selection

Count-based feature selection is a simple yet relatively powerful way of finding information about predictors. The basic idea underlying count-based featurization is simple: by calculating counts of individual values within a column, you can get an idea of the distribution and weight of values, and from this, understand which columns contain the most important information.

Count-based feature selection is a non-supervised method of feature selection, meaning you don’t need a label column. This method also reduces the dimensionality of the data without losing information.

Question 8

Q

Fisher Linear Discriminant Analysis

Answer

A

This method is often used for dimensionality reduction, because it projects a set of features onto a smaller feature space while preserving the information that discriminates between classes. This not only reduces computational costs for a given classification task, but can help prevent overfitting.

To generate the scores, you provide a label column and set of numerical feature columns as inputs. The algorithm determines the optimal combination of the input columns that linearly separates each group of data while minimizing the distances within each group. The module returns a dataset containing the compact, transformed features, along with a transformation that you can save and apply to another dataset.

Question 9

Q

Permutation Feature Importance

Answer

A

In this module, feature values are randomly shuffled, one column at a time, and the performance of the model is measured before and after. You can choose one of the standard metrics provided to measure performance.

The scores that the module returns represent the change in the performance of a trained model, after permutation. Important features are usually more sensitive to the shuffling process, and will thus result in higher importance scores.

Brainscape's Knowledge GenomeTM

Feature Selection Flashcards

Brainscape's Knowledge Genome^TM