Chapter 6- Feature Selection Flashcards

1
Q

what is the aim of feature selection?

A

automatically identify meaningful smaller subsets of feature variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

why do different types of models have different best feature sets?

A

different models draw different types of boundaries and allow different degrees of flexibility when you change their parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

for d features, how many possible feature sets are there?

A

2^d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is combinatorial optimisation?

A

finding the right point in a binary search space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

give the steps of wrapper method for feature selection

A

start with an initial guess for a good set of features

train and test a model (maybe cross val)

if your test error is deemed good enough, stop

otherwise, choose a new set of features and go to line 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

name some wrapper methods

A

greedy search
genetic algorithm
simulated annealing
branch and bound

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is forward selection?

A

add features greedily and sequentially. Find which of the remaining ones improves our model the most and add it permanently to our set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is backward elimination?

A

sequentially evaluate removing features and discard the one that damages performance the least.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is stepwise, or floating selection?

A

wrapper method that combines forward and backward selection

two steps forward and one step back

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are filter methods?

A

find out how useful a feature is without training any models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe the pearsons correlation coefficient equation in words

A

covariance of the two variables divided by the product of their standard deviations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

pearsons correlation coefficient, r = ?

A

sum (x-xmean)(y-ymean) / square root of(sum:(x-xmean^2) x sum:(y-ymean^2) )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how do we rank features?

A

In order of the absolute value of the correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what type of correlation does pearsons measure?

A

linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is entropy?

A

the reduction in uncertainty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is information gain?

A

it quantifies the reduction in uncertainty (entropy) of adding a new feature

17
Q

information gain I(X;Y) = ?

A

H(Y) - H(Y|X)

18
Q

another name for information gain is?

A

mutual information

19
Q

when we are measuring the information gain of each feature, I(X,Y), we are really calculating? (hint: relation to Y)

A

the mutual information between the feature and the target label

20
Q

what is the major advantage of information gain?

A

detects nonlinearities

21
Q

what is the major disadvantage of information gain?

A

may choose redundant features, including very similar features that may be detrimental to the learning algorithm

22
Q

if i use a wrapper method, with N samples and d features and use a greedy search, how many models will I have to create

A

(d)(d+1) / 2

23
Q

why do we perform feature selection (3)

A

logistical - there may be too much data to process

interpretability - we may collect more data than is useful

overfitting - inclusion of too many features could mean we overfit

24
Q

pros (3) and cons (2) of forward and backward selection (wrapper method)

A
  • Impact of a feature on the ML classifier is explicit
  • Each subset of features, we know exactly how well the model performs.
  • Better than exhaustive search.
  • No guarantee of best solution
  • Need to train and evaluate lots of models
25
what two types of filter are there
univariate multivariate
26
what is a univariate filter
Evaluate each feature independently
27
what is a multivariate filter
Evaluate features in context of others
28
what filtering metric can we use when the data is divided into 2 classes, rather than a regression
fisher score
29
give the equation for fisher score, F=
(m1 - m2)^2 ---------------- v1 + v2
30
in words, the fisher score puts ..... over ...
``` between class scatter (m1-m2)^2 over within class scatter (v1 + v2) ```
31
what is the main disadvantage of fisher score
only works on single features
32
J(X) is the mutual information of X, what are the possibilities of what X can be?
a feature | the joint probability of two or more features
33
what are embedded methods for feature selection
In Embedded Methods, the feature selection algorithm is integrated as part of the learning algorithm. Embedded methods combine the qualities of filter and wrapper methods.
34
which wrapper method combines forward and backward selection in two steps forward and one step back
stepwise / floating selection