What Is Feature Selection Flashcards

1
Q

HOW DO STATISTICAL-BASED FEATURE SELECTION METHODS WORK? P128

A

Evaluation of the relationship between each input variable and the target variable using statistics and selecting those input variables that have the strongest relationship with the target variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

HOW MANY MAIN TYPES OF FEATURE SELECTION TECHNIQUES ARE THERE? WHAT ARE THEIR NAMES? P128

A

2: supervised and unsupervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

WHAT ARE THE TYPES OF SUPERVISED METHODS OF FEATURE SELECTION? P128

A

Wrapper, Filter, Intrinsic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

HOW DO FILTER-BASED FEATURE SELECTION METHODS WORK? P128

A

Using statistical measures to score correlation or dependence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

WHAT IS THE DIFFERENCE BETWEEN SUPERVISED AND UNSUPERVISED FEATURE SELECTION METHODS? P129

A

Whether features are selected based on the TARGET VARIABLE or not. (unsupervised selection does not use the target variable, supervised selection DOES use the TARGET VARIABLE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

DEFINE UNSUPERVISED FEATURE SELECTION TECHNIQUES? GIVE EXAMPLES P129

A

Feature selection techniques that ignore the target variable, such as methods that remove redundant variables using correlation or features that have few values or low variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

EXPLAIN THE TYPES OF SUPERVISED FEATURE SELECTION? P129-P130

A

Intrinsic: Algorithms that perform automatic feature selection during training (as part of learning the model). Including algorithms such as penalized regression models like Lasso and decision trees, including ensembles of decision trees like random forest.
Filter: Select subsets of features based on their relationship with the target
Wrapper: Search subsets of features that perform according to a predictive model. These methods create many models with different subsets of input features and select those features that result in the best performing model according to a performance metric.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

…. IS A TYPE OF FEATURE SELECTION WHICH IS UNCONCERNED WITH THE VARIABLE TYPES, BUT CAN BE COMPUTATIONALLY EXPENSIVE? P129

A

Wrapper

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

WHAT IS THE DEFINITION OF UNIVARIATE STATISTICAL MEASURES? P131

A

The statistical measures used in filter-based features selection are generally calculated one input variable at a time with the target variable. Hence, they are called univariate statistical measures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

WHAT ARE THE MOST COMMON UNIVARIATE STATISTICAL MEASURES FOR NUMERIC INPUT-NUMERIC OUTPUT FILTER-BASED FEATURE SELECTION? P132

A

Pearson’s correlation coefficient (linear)
Spearman’s rank coefficient (nonlinear): WWW: Consider Spearman’s rank order correlation when you have pairs of continuous variables and the relationships between them don’t follow a straight line, or you have pairs of ordinal data.
Mutual Information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

WHAT ARE THE MOST COMMON UNIVARIATE STATISTICAL MEASURES FOR NUMERIC INPUT-CATEGORICAL OUTPUT FILTER-BASED FEATURE SELECTION? P132

A

ANOVA correlation coefficient (linear)
Kendall’s rank coefficient (nonlinear): Assumes that the categorical variable is ordinal
Mutual Information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

WHAT ARE THE MOST COMMON UNIVARIATE STATISTICAL MEASURES FOR CATEGORICAL INPUT-CATEGORICAL OUTPUT FILTER-BASED FEATURE SELECTION? P132

A

Chi-Squared test (contingency tables)
Mutual Information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

WHAT ARE THE METHODS FOR USING A WRAPPER AS A FEATURE SELECTION TECHNIQUE? P133

A

ˆ Tree-Searching Methods (depth-first, breadth-first, etc.).
ˆ Stochastic Global Search (simulated annealing, genetic algorithm).
ˆ Step-Wise Models.
ˆ RFE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

HOW CAN YOU CHOOSE THE NUMBER OF VARIABLES (FEATURES) IN THE FILTER-BASED FEATURE SELECTION METHODS OF SKLEARN? (2 WAYS) P134

A

There are 2 main techniques for filtering input variables. 1st is to rank variables by their score and select k-top input variables with the largest scores.

2nd is to convert scores into a percentage of the largest score and select all the features above a minimum percentile.
K variables: SelectKBest , top percemtile: SelectPercentile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

WHAT ARE THE ASSUMPTIONS OF PEARSON’S CORRELATION? P134

A

Gaussian distribution and linear relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly