Data Mining - Lecture Performance Measures Flashcards

1
Q

What do errors based on the training set tell us?

A

About the model’s fit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do errors based on the validation set tell us?

A

The ability to predict new data.

These errors are called prediction errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which three types of outcome do we deal with in this course?

and can we also evaluate

A
  1. A predicted numerical value
  2. A predicted class membership
  3. The probability of class membership
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do we measure prediction accuracy for numerical prediction?

A

We use the error for each record and compute one of the following measures:

  1. Mean absolute error (MAE)
  2. Mean error
  3. Mean percentage error (MPE)
  4. Mean absolute percentage error (MAPE)
  5. Root mean squared error
  6. Lift Chart
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you compute the mean absolute error?

A

For each record you get the error. You disregard the sign before the value and you sum up all errors.

You multiply the sum with (1/n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you compute the mean error?

A

You sum up all errors (including the signs before the value). You multiply the sum of errors with (1/n).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you compute the mean percentage error?

A

You divide each record’s error by the records actual value (Yi). You do this for all records and sum them up.

You multiply the sum with (1/N) and you do that * 100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you compute the mean absolute percentage error?

A

Same as the mean percentage error, except in the first part you do not take the sign into considerations before the error value/Yi.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you compute the Root mean squared error?

A

You square each individual error and sum them up. You multiply it by (1/n)/ The answer you put in a square root.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a lift chart?

A

It is a chart that to compare a model with a baseline model with no predictions, to see which subset of records gives the highest cumulative predicted values.

On the x-axis you put percentage of samples.
On the y-axis you put the percentage where the model predicts the positive class well.

Then you can see how your model performs. It can be that for a small percentage of the sample you will get a more then average prediction/response rate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can we evaluate classifier performance?

A
  1. We make a confusion matrix.
  2. Based on that confusion matrix we can compute accuracy, prediction, recall, F1
  3. We can also make a ROC Curve based on the confusion matrix.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a misclassification?

A

If your model puts records in the wrong class:

False Negative and False Positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a confusion matrix?

A

A matrix with the predicted classes on the x-axis and the actual classes on the y classes. It lists the TP, FP, TN, FN.
It reads like:

True Negative | False Positive
False Negative | True Positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a type I error?

A

False Positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a type II error?

A

False Negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you compute accuracy to evaluate classifier performance?

A

True Positive + True Negative / n

17
Q

What is a ROC Curve?

A

A diagram in which you can put multiple models to evaluate their performance.

On the x-axis you have False Positive Rate
On the y-axis you have True Positive Rate

18
Q

What happens if TP,FP in the ROC curve is (0.0)?

A

The model declares everything as a negative class

19
Q

What happens if the TP,FP in the ROC curve is (1,1)?

A

It declares everything as a positive class

20
Q

What happens if the TP,FP in the ROC curve is (1,0)?

A

Ideal situation

21
Q

What is the area under the curve in ROC in the random guessing situation and what is it in an ideal situation?

A

Random: 0.5
Ideal: 1.0

So your model should definitely be in between those two.

22
Q

What is a limitation of accuracy?

A

If you have many records in one class and zero in the other class, the accuracy can still be high, even though it is not putting anything in one class.

23
Q

How can you avoid this?

Refers to accuracy and it’s flaw

A

If you get a cost matrix as well.

  • > You compute the accuracy
  • > You also compute total cost by (TP * cost TP) + (FP * cost FP) + (TN * cost TN) + (FN * cost FN)
24
Q

What is the Kappa statistics in multiclass prediction?

A

You can use it if you have an actual predictor confusion matrix and a random predictor confusion matrix.
-> It measures the improvement compared to the random predictor

(success rate actual predict - success rate random predictor) / (1 - succes rate random predictor)

25
Q

What is Recall?

A

The ability of the model to find all of the items of the class.

(True Positive) / (True Positive + False Negative)

26
Q

What is Precision?

A

The ability of the model to correctly detect class items.

True Positive) / (True Positive + False Positives

27
Q

What is the F-Measure?

A

Takes into account both Recall as Precision

F = (2 / (1/R) + (1/P)