Attribute Selection and Imbalanced Classes Flashcards
What are the two approaches that can be used to select attributes?
Filter and Wrapper
What is the Filter approach?
Attribute selection is INDEPENDANT of classification algorithm to be applied later.
- Evaluates the quality of candidate attribute
subset without use the target classification
algorithm
What is the Wrapper approach?
Attribute selection is TAILORED to the classification to be applied later.
- Quality of candidate attribute is evaluated
based on the classification accuracy (on
training data) of the target classification
algorithm run with attribute subset.
What are the main components of most attribute selection methods?
A search method
An evaluation function
What is the pros and cons of Sequential Attribute Selection Methods?
Pros:
- Forwards and backwards are simple to understand and implement.
- Forwards sequential is fast.
Cons:
- Backwards selection is quite slow
- Both are heuristic methods - no guarantee of finding the optimal solution
- Both are greedy (do not cope well with interactions)
What is the approach that can be used to deal with imbalanced classes?
Re-sampling techniques
What are the different Re-sampling techniques?
Under-sampling the majority class
Over-sampling the minority class
Hybrid approaches
How does Under-sampling work?
Remove randomly chosen instances from majority class. This:
- Throws away a lot of relevant information.
- Reduces the time taken to run the
classification algorithm.
How does Over-sampling work?
Duplicate minority class instances chosen at random or create new synthetic minority class instances. This:
- Avoids the loss of instances associated with
Under-sampling.
- Introduces redundancy or new new
potentially noisy class labels.
- Increases the time taken to run the
classification algorithm.
What is SMOTE?
Synthetic Minority Oversampling TEchnique
What does SMOTE do?
Creates synthetic instances along the line joining a minority class instances and some or all of its Nearest Neighbours of the same minority class.