Flashcards in Ensemble Deck (19):

1

## How do ensemble methods work?

### They work by combining predictions from several estimators built with a given learning algorithm in order to improve generalization.

2

## What kind of methods are used?

###
- averaging - reduce variance of 'strong' estimators

- boosting - reduce bias of 'weak' estimators

3

## How does averaging work?

### It works by building several estimators independently and averaging their predictions to reduce the variance.

4

## How does boosting work?

### It works by building sequentially several estimators such that the combined estimator has a reduced bias.

5

## What is 'Pasting' - averaging methods?

### Algorithm that uses random subsets of samples drawn randomly from the dataset for its independent estimators

6

## What is 'Bagging'?

### For averaging methods this means that the samples are drawn with replacement.

7

## What is 'Random Subspaces' - averaging methods?

### Algorithm that uses random subsets of samples drawn as random subsets of features.

8

## What is 'Random Patches' - averaging methods?

### Algorithm that uses random subsets of samples drawn as random subsets of both data and features.

9

## RandomForests and Extra-Trees - averaging methods

### They are combine-and-perturb methods based on constructing randomized decision trees and then averaging their prediction results.

10

## Differences between RandomForests and ExtraTrees

###
During tree construction:

- in RFs the node split is picked based on the best split among a random subset of features

- in ETs the splits are drawn at random for each candidate feature with the best one being picked for the node split

11

## What is bias?

###
It is the error from erroneous assumptions in the learning algorithm.

High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).

12

## What is variance?

###
It is error from sensitivity to small fluctuations in the training set.

High variance can cause overfitting: modeling the random noise in the training data, rather than the intended outputs.

13

## What is a decision tree?

### It is a model for predicting a dependent variable Y using an independent var X by checking a collection of splits..

14

## What is a decision tree split?

###
A split is a condition or query on a single independent variable that is either true or false.

Splits are arranged as a tree with 2 child nodes: left for true condition, right for false condition

15

## Boosting intuition

### minimizes bias by using estimators with low variance and high bias (i.e. shallow decision trees)

16

## Bagging intuition

### minimizes variance by using estimators low bias and high variance (i.e. full decision trees)

17

## What is Gini impurity?

###
- a measure of how often a randomly chosen element from a set will be classified incorrectly

- the classification is done randomly according to the distribution of the labels in the set

18

## What is information gain?

###
- a measure given by the difference between the entropy of, for example, the target variable and the entropy of the target variable conditioned by a regressor

IG(T, rgr) = H(T) - H(T | rgr)

19