7.1 Classifier Combination Flashcards

Question 1

Q

What is boosting?

Answer

A

Intuition: tune base classifiers to focus on the hard-to-classify instances
Method: iteratively change the distribution and weights of training instances to reflect the performance of the classifier on the previous iteration

Question 2

Q

What is Bagging?

Answer

A

Bagging = bootstrap aggregating

Intuition: the more data, the better performance (lower the variance), so how can we get more data out of a fixed training dataset?
Method: construct new datasets through a combination of random sampling and replacement

Question 3

Q

What are the techniques that use instance manipulation approach to combine classifiers?

Answer

A

Boosting
Bagging
Random Forests

Bagging constructs multiple new datasets through a random sampling of instances with replacement to train multiple classifiers. Random Forest adopts the same bagging technique to generate multiple datasets for different random trees. Boosting also iteratively samples instances from the training dataset (to train multiple classifiers) while assigning more weights to the instances that are not correctly classified in the previous iteration.

In contrast, Stacking introduces meta classifier to decide which base classifiers to rely on.

Question 4

Q

Which of the following statement(s) are TRUE about ensemble learning?

Answer

A

An ensemble of classifiers may not be able to outperform any of its individual base learners.
Combining meaningful base learners improves the generalizability of the model.

Ensembling diverse meaningful base learners typically yields better results and generalized models. However, it is not always guaranteed to have improved performance by ensembling.

Question 5

Q

Which of the following statement(s) are TRUE about Random Forest? Group of answer choices

Answer

A

Random Forest adopts both feature manipulation and instance manipulation approaches.

Random forest adopts instance manipulation to train multiple random trees using different bagged datasets. For each random tree, feature manipulation is used to consider different feature combinations at different nodes. By training multiple random trees with different bagged datasets, random forest reduces the variance (not the bias). The predictions made by a random tree can be explained by following the decisions made along the tree. However, combining multiple random trees using a voting mechanism (i.e., random forest) degrades the interpretability of the overall logic.

Question 6

Q

Which of the following statement(s) are TRUE about Boosting?

Answer

A

Boosting assigns higher weights to better-performing base learners

Boosting adopts a weighted voting strategy to combine base learners based on the importance of each base learner. Boosting is an instance manipulation technique, where the wrongly predicted samples (i.e., difficult samples) are iteratively emphasized.

Question 7

Q

Suppose there are 3 independent binary classifiers C1, C2, and C3, with error rates 0.3, 0.2, and 0.2 respectively. If the classifiers are combined by majority voting, what is the error rate of the combined classifier?

Answer

A

0.136

To make an error by the combined classifier, at least two classifiers should make errors. There are four scenarios to generate wrong predictions:

incorrect classifiers

error rate

{C1, C2}

0.3*0.2*(1-0.2) = 0.048

{C2, C3}

0.3*(1-0.2)*0.2 = 0.048

{C1, C3}

(1-0.3)*0.2*0.2 = 0.028

{C1, C2, C3}

0.3*0.2*0.2 = 0.012

Thus, the error rate of the combined classifier is 0.048+0.048+0.028+0.012=0.136

Question 8

Q

What is a random forest?

Answer

A

An ensemble of Random Trees, many trees = forest

Each tree is built using a different Bagged training dataset
The combined classification is via voting

Question 9

Q

What is stacking?

Answer

A

Intuition:

smooth errors over a range of algorithms with different biases
Method 1:

voting? Which classifier to trust?
Method 2:

train a meta-classifier (level-1 model)
over the outputs of the base classifiers (level-0
model)

learn which classifiers are the reliable ones, and combine the output of base classifiers
train using nested cross validation to reduce bias

7.1 Classifier Combination Flashcards

(9 cards)