9-featureselection Flashcards

1
Q

What are the two main ways to do feature selection?

A

Wrapper methods and filtering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are wrapper methods?

A

Wrapper methods are a feature selection method focussed on choosing a subset of attributes that give the best performance on the development data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the advantages of wrapper methods?

A

Build feature set with optimal performance on development data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the disadvantages of wrapper methods?

A

They take a long time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are more practical wrapper methods?

A

Greedy wrapper method
Ablation wrapper method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the greedy wrapper approach?

A

Train and evaluate model on each single attribute. Choose best attribute. Then train by combining best(s) attributes with each other attribute. Choose best combination. End when accuracy is not increased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the disadvantages of greedy wrapper approach?

A

Still takes n^2/2 time, and converges usually to a suboptimal outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the ablation approach?

A

Start with the entire feature set. Remove each attribute and assess on the remaining set. Stop when performance significantly degrades

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the advantages of ablation method?

A

Quickly remove irrelevant attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is pointwise mutual information?

A

PMI(A,C) = log2(P(A,C)/P(A)P(C)). We want to find values with high PMI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the disadvantages of ablation method?

A

Still takes O(m^2). Assumes features are independent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are feature filtering methods?

A

Methods that evaluate the goodness of each feature, by finding features that better predict the class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What makes a single feature good?

A

Well correlated with class, reverse correlated with class, well correlated with not class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is mutual information?

A

The weighted average of all PMI.

P(a, c)PMI(a, c) + P( ̄a, c)PMI( ̄a, c)+
P(a, ̄c)PMI(a, ̄c) + P( ̄a, ̄c)PMI( ̄a, ̄c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are alternatives to mutual information?

A

Chi-square

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the principle of Chi-square?

A

Compare the observed value O(w) to the expected value E(w)

14
Q

How do we conduct feature selection on nominal attributes?

A

Either treat attributes as multiple binary attributes or modify mutual information definition

15
Q

How do we conduct feature selection on continuous attributes?

A

Estimate probability using Gaussian distribution

16
Q

How do we conduct feature selection on ordinal attributes?

A

Treat as binary, treat as continuous or treat as nominal

17
Q

What are the disadvantages of mutual information?

A

It is biased towards rare, uninformative features

18
Q

What is an unsupervised feature selection method used for text documents?

A

Term frequency, inverse document frequency (TF-IDF). This helps us find words that are relevant to a document in a given document collection.

19
Q

What are examples of models that do feature selection inherently?

A

Decision trees
Regression models with regularisation
Neural networks