19-Ensembe learning Flashcards

1
Q

What is ensemble learning?

A

Ensemble learning constructs a set of base classifiers from a given set of training data and aggregates the outputs into a single meta classifier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the approaches for ensemble learning?

A

Instance manipulation
Feature manipulation
Class label manipulation
Algorithm manipulation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is instance manipulation?

A

Generate multiple training datasets through sampling and train a base classifier over each dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is feature manipulation in the context of ensemble learning?

A

Generate multiple training datasets through different feature subsets and train a base classifier over each dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is class label manipulation?

A

Generate multiple training datasets by manipulating the class labels in a reversible manner

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is algorithm manipulation?

A

Semi randomly tweak internal parameters within an algorithm to generate multiple base classifiers over a given dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the intuition behind ensemble learning?

A
  1. Combination of lots of weak classifiers can be at least as good as 1 strong classifier
  2. A combination of strong classifiers is at least as good as the best of the classifiers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the relationship between base and ensemble classifiers error if they’re independent?

A

Logit functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the relationship between base and ensemble classifiers error if they’re identical?

A

Linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is stacking?

A

Use different algorithms to train multiple base classifiers
Use base classifiers to generate predictions on unseen samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the pros of stacking?

A

Mathematically simple
Able to combine heterogeneous classifiers
Generally results in as good or better results than the best of the base classifiers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the cons of stacking?

A

Computationally expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is bagging?

A

Bagging is used to reduce variance.

Create multiple training datasets for training multiple classifiers based on the same algorithm and average the predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do we generate datasets?

A

Randomly sample the original dataset (N instances) N times, with replacement. Any individual instance is absent with probability (1-1/N)^N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the benefit of bagging?

A

Possibility to parallelise computation of individual base classifiers
Highly effective over noisy datasets
Produces the best results on unstable models that have high variance and low bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a random forest?

A

Ensemble of random trees. Random trees are decision trees where only some attributes are considered at a node

17
Q

What are the benefits of random forest?

A

Parallelisable and robust to overfitting

18
Q

What are the cons of random forest?

A

Sacrifices interpretability

19
Q

What is boosting?

A

Iteratively change the weights of training instances to train next base classifier and combine the base classifiers via weighted voting

20
Q

What is adaboost?

A

A boosting algorithm

21
Q

What’s the difference between how boosting and bagging build models?

A

Bagging builds models in parallel. Boosting builds models sequentially

22
Q

What’s the difference between how boosting and bagging sample data?

A

Bagging resamples data points with replacements. Boosting uses iterative sampling. It reweights data points by modifying their distribution

23
Q

What’s the difference between how boosting and bagging weight models?

A

Bagging has the same weight for each model. Boosting has different weights

24
Q

What’s the difference between how boosting and bagging impact evaluation?

A

Bagging reduces variance. Boosting reduces bias

25
Q

What’s the difference between how boosting and bagging fit data?

A

Bagging is not prone to overfitting. Boosting is prone to overfitting