Week 3: Regression & Classification (Tree-Based Models) Flashcards

1
Q

Algorithm 1R

A

A method for computing rules based on the frequencies of individual attributes.

  1. Loop through all attributes
    1.1 Loop through each attribute value and class
    1.1.1 Build rule based on the most frequent class.
    1.1.2 For this attribute value, c_freq(v) = most frequent class for attribute value v
  2. Calculate the rule error rates, and choose the rules with the smallest error rates.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Accuracy for Classification Rules

A

accuracy = number of correctly predicted instances / number of all instances covered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Gini Coefficient/Impurity Measure

A

G = \sum_{k=1}^C Pr(k)^2 (in practice for the module). The higher the Gini coefficient, the purer the split. Splits should maximise Gini coefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Entropy

A

E = (-1) \cdot [\sum_{i=1}^C Pr(k) \cdot \log_2 (Pr(k))]

The lower the entropy, the purer the split. Splits should minimise entropy. Higher entropy means higher disorder.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Information Gain

A

IG = E(v) = [f_1 E(v_1) + … + f_\ell E(v_{\ell})]

v = parent node
v_1,…,v_{\ell} = child nodes
f_k = proportion of instances in child node over all instances. Ideally, information gain needs to be maximised.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Chi-square

A

\chi^2 = \sum_{k=1}^C \frac{(O_k - E_k)^2}{E_k}

Observed Frequency = Cell value

Expected Frequency = (Column Frequency * Row Frequency) / (Total Frequency)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Classification and Regression Trees (CART)

A

Input: Numerical and Categorical
Split Selection: Gini Index
Pruning: Adjusted Error Rate

This method grows binary trees as long as it can find new splits that increase purity. Then it replaces any subtree whose leaves make the same classification with their common parent. It exploits the tradeoff between number of leaves and error rate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Iterative Dichotomiser (ID3)

A

Input: categorical only
Split Selection: Information Gain
Pruning: No

This method randomly chooses a subset of the training set and builds the decision tree using this subset. It then evaluates this decision tree on the other instances. If the error rate is sufficiently low terminate. Otherwise, it adds the misclassified instances to the subset and repeats.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

C5.0

A

Input: numerical and categorical
Split Selection: Information Gain
Pruning: Error rate computation with statistical confidence intervals

This method is similar to CART, but can make multiway splits on categorical variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Decision Trees

A

Pros:
1. Can handle mixed data types.
2. Can deal with missing values.

Cons:
1. Decision boundaries are typically parallel to input axes.
2. Overfitting might occur if not properly designed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Leave-One-Out Cross-Validation

A

A type of N-fold cross-validation, where one fold is left out of the training process entirely and used as the validation set. There is no sampling bias to the lack of random sampling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Bootstrapping

A

Unlike cross-validation, which uses sampling without replacement, bootstrapping uses sampling with replacement. In each of n trials, the chosen samples are used for training and the non-chosen samples are used for testing. The chance of being chose is 1/n for the n-th trial. As n approaches infinity, the training set size is approximately 63.2% of all instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

True Positive Rate

A

(True Positives) / (True Positives + False Negatives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

False Positive Rate

A

(False Positives) / (False Positives + True Negatives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Cohen’s Kappa

A

This metric measures the level of agreement between two classifiers

\kappa = \frac{Pr(a) - Pr(e)}{1 - Pr(e)}

If there are two classifiers, C and C’, and two classes, A and B:

aa: C and C’ agree the instance is A
bb: C and C’ agree that the instance is B
ab, ba: when C and C’ disagree about the classification.

Assuming there are n instances covered by C and C’
Pr(a) = \frac{aa + bb}{n}

Pr(e) = (\frac{aa + ba}{n}) \times (\frac{aa+ab}{n}) + (\frac{bb + ab}_n}) \times (\frac{bb+ba}{n})

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Lift

A

Evaluates classifiers by comparing the results with and without the classifier and considering subsets of the instances.

  1. Sort the instances in descending order of predictive probabilities for target output value, according to the classifier.
  2. Given a sample size, select the corresponding number of mostly likely for the target output value instances according to the model.
  3. Calculate the lift factor by counting the number of true positive instances in the samples and divide by the sample size.

The greater the space between the lift curve and the expected line, the better.

17
Q

Receiver Operating Characteristic (ROC)

A

This metric detects the relationship between the true positive relationship and the false positive rate over a noisy channel.

It looks similar to the lift curve. Unlike Lift, ROC also factors in the false positive rate. The greater the area under the curve (AUC), the more accurate the model.

18
Q

Stratification

A

The process of randomly sampling data to create training and test sets.

19
Q

Purity

A

It’s a measure of how many different classes appear in a node of a decision tree.