Chapter 6- Feature Selection Flashcards by Anna L

what is the aim of feature selection?

automatically identify meaningful smaller subsets of feature variables

How well did you know this?

Not at all

Perfectly

why do different types of models have different best feature sets?

different models draw different types of boundaries and allow different degrees of flexibility when you change their parameters

How well did you know this?

Not at all

Perfectly

for d features, how many possible feature sets are there?

2^d

How well did you know this?

Not at all

Perfectly

what is combinatorial optimisation?

finding the right point in a binary search space

How well did you know this?

Not at all

Perfectly

give the steps of wrapper method for feature selection

start with an initial guess for a good set of features

train and test a model (maybe cross val)

if your test error is deemed good enough, stop

otherwise, choose a new set of features and go to line 2

How well did you know this?

Not at all

Perfectly

name some wrapper methods

greedy search
genetic algorithm
simulated annealing
branch and bound

How well did you know this?

Not at all

Perfectly

what is forward selection?

add features greedily and sequentially. Find which of the remaining ones improves our model the most and add it permanently to our set

How well did you know this?

Not at all

Perfectly

what is backward elimination?

sequentially evaluate removing features and discard the one that damages performance the least.

How well did you know this?

Not at all

Perfectly

what is stepwise, or floating selection?

wrapper method that combines forward and backward selection

two steps forward and one step back

How well did you know this?

Not at all

Perfectly

what are filter methods?

find out how useful a feature is without training any models

How well did you know this?

Not at all

Perfectly

Describe the pearsons correlation coefficient equation in words

covariance of the two variables divided by the product of their standard deviations

How well did you know this?

Not at all

Perfectly

pearsons correlation coefficient, r = ?

sum (x-xmean)(y-ymean) / square root of(sum:(x-xmean^2) x sum:(y-ymean^2) )

How well did you know this?

Not at all

Perfectly

how do we rank features?

In order of the absolute value of the correlation coefficient

How well did you know this?

Not at all

Perfectly

what type of correlation does pearsons measure?

linear

How well did you know this?

Not at all

Perfectly

what is entropy?

the reduction in uncertainty

How well did you know this?

Not at all

Perfectly

what is information gain?

Study These Flashcards

it quantifies the reduction in uncertainty (entropy) of adding a new feature

information gain I(X;Y) = ?

Study These Flashcards

H(Y) - H(Y|X)

another name for information gain is?

Study These Flashcards

mutual information

when we are measuring the information gain of each feature, I(X,Y), we are really calculating? (hint: relation to Y)

Study These Flashcards

the mutual information between the feature and the target label

what is the major advantage of information gain?

Study These Flashcards

detects nonlinearities

what is the major disadvantage of information gain?

Study These Flashcards

may choose redundant features, including very similar features that may be detrimental to the learning algorithm

if i use a wrapper method, with N samples and d features and use a greedy search, how many models will I have to create

Study These Flashcards

(d)(d+1) / 2

why do we perform feature selection (3)

Study These Flashcards

logistical - there may be too much data to process

interpretability - we may collect more data than is useful

overfitting - inclusion of too many features could mean we overfit

pros (3) and cons (2) of forward and backward selection (wrapper method)

Study These Flashcards

Impact of a feature on the ML classifier is explicit
Each subset of features, we know exactly how well the model performs.
Better than exhaustive search.
No guarantee of best solution
Need to train and evaluate lots of models

what two types of filter are there

univariate multivariate

what is a univariate filter

Evaluate each feature independently

what is a multivariate filter

Evaluate features in context of others

what filtering metric can we use when the data is divided into 2 classes, rather than a regression

fisher score

give the equation for fisher score, F=

(m1 - m2)^2 ---------------- v1 + v2

in words, the fisher score puts ..... over ...

``` between class scatter (m1-m2)^2 over within class scatter (v1 + v2) ```

what is the main disadvantage of fisher score

only works on single features

J(X) is the mutual information of X, what are the possibilities of what X can be?

a feature | the joint probability of two or more features

what are embedded methods for feature selection

In Embedded Methods, the feature selection algorithm is integrated as part of the learning algorithm. Embedded methods combine the qualities of filter and wrapper methods.

which wrapper method combines forward and backward selection in two steps forward and one step back

stepwise / floating selection

Chapter 6- Feature Selection Flashcards

(34 cards)