Esemble Classifier Flashcards

1
Q

Why combining classifiers?

A

stuck with the bias inherent in a given algorithm if only use one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

ensemble learning

A

constructs a set of base classifiers from a given set of
training data and aggregates the outputs into a single
meta-classifier so that:

• the combination of lots of weak classifiers can
be at least as good as one strong classifier
• the combination of a selection of strong
classifiers is (usually) at least as good as the best of the
base classifiers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

voting

A

• for a nominal class set, run multiple base classifiers over the test data and select the class predicted by the most base classifiers

• for a continuous class set, average over the numeric
predictions of our base classifiers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Approaches to Classifier Combination

A
  1. Instance manipulation (most common)
  2. Feature manipulation (most common)
  3. Class label manipulation
  4. Algorithm manipulation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Instance manipulation

A

generate multiple training

datasets through sampling, and train a base classifier over each

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Feature manipulation

A

generate multiple training

datasets through different feature subsets, and train a base classifier over each

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Class label manipulation

A

generate multiple training
datasets by manipulating the class labels in a ireversible
manner

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Algorithm workspaces

A

semi-randomly “tweak” internal parameters within a given algorithm to generate multiple base classifiers over a given datasetquiz

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

4 popular esemble methods

A
  1. stacking
  2. bagging
  3. random forest
  4. Boosting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Stacking

A

Basic intuition: “smooth” errors over a range of algorithms with different biases

• Method 1: simple voting
presupposes the classifiers have equal performance

• Method 2: train a classifier over the outputs of the base
classifiers (meta-classification)
train using nested cross validation to reduce bias (usually Logistic Regression)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Pros of stacking

A

Mathematically simple but computationally expensive
method
• Able to combine heterogeneous classifiers with varying
performance
• Generally, stacking results in as good or better results than
the best of the base classifiers
• Widely seen in applied research; less interest within
theoretical circles (esp. statistical learning)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

bagging/bootstrap aggregating

A

Basic intuition: the more data, the better the performance
(lower the variance), so how can we get ever more data out
of a fixed training dataset?

Construct “novel” datasets through a combination
of random sampling and replacement
• Randomly sample N’ the original dataset N times, with replacement (same instance can be selected over and over again)
• Thus, we get a new dataset of the same size, where any individual instance is absent with probability (1 −1/N)^N
• construct k random datasets for k base classifiers, and
arrive at prediction via voting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

bagging/bootstrap aggregating

A

Basic intuition: the more data, the better the performance
(lower the variance), so how can we get ever more data out
of a fixed training dataset?

Construct “novel” datasets through a combination
of random sampling and replacement
• Randomly sample N’ the original dataset N times, with replacement (same instance can be selected over and over again)
• Thus, we get a new dataset of the same size, where any individual instance is absent with probability (1 −1/N)^N
• construct k random datasets for k base classifiers, and
arrive at prediction via voting

• The same classification algorithm is used throughout
• As bagging is aimed towards minimising variance through
sampling, the algorithm should be unstable ( =
high-variance)
• high variance: DT (if a few instances are excluded, the whole model might be different)
• low variance: SVM (hard margin; soft margin wouldn’t help much), LR (the overall result wouldn’t change much anyway)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Pros of Bagging

A

Pros:
• Simple method based on sampling and voting
• Possibility to parallelise computation of individual base classifiers
• Highly effective over noisy datasets (outliers may vanish)
• Performance is generally significantly better than the base classifiers (esp. DT) and only occasionally substantially worse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Random Tree

A

A “Random Tree” is a Decision Tree where:
• At each node, only some of the possible attributes that are randomly selected are
considered
• Attempts to control for unhelpful attributes in the feature set (DT does not do that)
• Much faster to build than a “deterministic” Decision Tree,
but increases model variance (which is our goal because high variance is good for bagging)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Random Forests

A
An ensemble of Random Trees (many trees = forest)
• Each tree is built using a different Bagged training dataset
• As with Bagging the combined classification is via voting
• The idea behind them is to minimise overall model variance,
without introducing (combined) model bias
17
Q

Pros & Cons of RF

A

Pros:
• Generally a very strong performer
• parallelisable & efficient
• Robust to overfitting

Cons:
• Interpretability sacrificed

18
Q

Boosting

A

Basic intuition: tune base classifiers to focus on the “hard to classify” instances

Iteratively change the distribution and weights of
training instances to reflect the performance of the classifier on the previous iteration
• start with each training instance having a 1/N
probability of being included in the sample
• over T iterations, train a classifier and update the weight of each instance according to whether it is correctly classified
• combine the base classifiers via weighted voting

19
Q

AdaBoost

A

alpha = importance of Ci = the weight associated with the classifier vote

if the error rate is low, alpha is more positive; if the error rate is high, alpha is more negative.

  • Base classification algorithm: decision stumps (1-R) or decision trees
  • reinitialise the instance weights whenever i > 0.5
20
Q

Pros & Cons of Boosting

A

• Mathematically complicated but computationally cheap
method based on iterative sampling and weighted voting
• More computationally expensive than bagging
• The method has guaranteed performance in the form of
error bounds over the training data
• Interesting effect with convergence of the error rate over the
training vs. test data
• In practical applications, boosting has the tendency to
overfit

21
Q

Bagging/RF vs. Boosting

A
Bagging/RF
• Parallel sampling 
• Simple voting 
• Single classification algorithm 
• Minimise variance 
• Not prone to overfitting 
Boosting
• Iterative sampling 
• Weighted voting 
• Single classification algorithm 
• Minimise (instance) bias
• Prone to overfitting