Machine Learning - Model Evaluation Flashcards

1
Q

Accuracy

A

Percent of all predictions that were correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Confusion Matrix

A

A matrix showing the predicted and actual classifications. A confusion matrix is of size LxL, where L is the number of different label values. Rows for each of the actual values cross-tabbed against columns of the predicted values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Cross-validation: Overview

A

A method for estimating the accuracy of an inducer by dividing the data into k mutually exclusive subsets, or folds, of approximately equal size. The inducer is trained and tested k times. Each time it is trained on the data set minus a fold and tested on that fold. The accuracy estimate is the average accuracy for the k folds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Cross-validation: How

A

Leave-one-out cross validation

K-fold cross validation

Training and validation data sets have to be drawn from the same population

The step of choosing the kernel parameters of a SVM should be cross-validated as well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Model Comparison

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Model Evaluation: Adjusted R^2 (R-Square)

A

The method preferred by statisticians for determining which variables to include in a model. It is a modified version of R^2 which penalizes each new variable on the basis of how many have already been admitted. Due to its construct, R^2 will always increase as you add new variables, which result in models that over-fit the data and have poor predictive ability. Adjusted R^2 results in more parsimonious models that admit new variables only if the improvement in fit is larger than the penalty, which improves the ultimate goal of out-of-sample prediction. (Submitted by Santiago Perez)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Model Evaluation: Decision tables

A

simplest way of expressing output from machine learning, cells in table represent the resulting decision based on the row and column which represent the conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Model Evaluation: Mis-classification error

A

Define Test error as summed error for classification (false prediction)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Model Evaluation: Negative class

A

Negative means not having the symptoms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Model Evaluation: Positive class

A

presence of something we are looking for - 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Model Evaluation: Precision

A

Of all predicted positives, how many are positive?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Model Evaluation: Recall

A

Of all positives how many were predicted as positive?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Model Evaluation: True negative

A

Hypotheses correctly predicts negative output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Model Evaluation: True positive

A

Hypotheses correctly predicts positive output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Model Selection Algorithm

A

algorithm that automatically selects a good model function for a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Precision

A

Percent of predicted Positives that were correct.

17
Q

P-values

A

18
Q

Receiver-Operator Curves

A

19
Q

Sensitivity

A

aka Recall or True positive rate. Percent of actual Positives that were correctly predicted.

20
Q

Specificity

A

aka True negative rate. Percent of actual Negatives that were correctly predicted.