Chapter3: Machine Learning Experiments Flashcards by Anna L

Describe the accuracy of a classification model

correct / real

How well did you know this?

Not at all

Perfectly

describe the error of a classification model

incorrect / real

How well did you know this?

Not at all

Perfectly

what is the problem with error and accuracy? what is a better alternative?

it is unreliable for imbalanced data

confusion matrix

How well did you know this?

Not at all

Perfectly

how can we compute accuracy from a confusion matrix

total

How well did you know this?

Not at all

Perfectly

what does precision and recall measure about a classifier

its ability to classify positive samples

How well did you know this?

Not at all

Perfectly

recall =

TP + FN

How well did you know this?

Not at all

Perfectly

Precision =

TP + FP

How well did you know this?

Not at all

Perfectly

what is the f_1 score

precision + recall

How well did you know this?

Not at all

Perfectly

what is specificity

measure for assessing a classifiers ability to classify negative samples

How well did you know this?

Not at all

Perfectly

specificity =

TN + FP

How well did you know this?

Not at all

Perfectly

1 - specificity is otherwise known as

false positive rate

How well did you know this?

Not at all

Perfectly

what is ROC analysis

applies to binary classifiers. we plot sensitivity(true positive rate) and 1 - specificity.

we want the area under the curve to be as close to 1 as possible

How well did you know this?

Not at all

Perfectly

what is error in a regression model

difference between predicted and desired output

How well did you know this?

Not at all

Perfectly

list types of error for a regression model

root mean square error
mean absolute error
mean absolute percentage error
sum of squares error

How well did you know this?

Not at all

Perfectly

what is the coefficient of determination

R^2 score in single output case.

sum of squares error / sum to n (y - 1/n sum to n y) ^2

How well did you know this?

Not at all

Perfectly

what is sample error

Study These Flashcards

the error computed using a performance metric from a set of samples

what is true error

Study These Flashcards

the probability a random sample is misclassified

how is true and sample error different in regression

Study These Flashcards

it is the expectation of the error

how do we get bias and variance values

Study These Flashcards

from the expected squared prediction error

what is bias error

Study These Flashcards

(y - E(f))^2

repeat with different sets of training data and measure how true they are

what is variance error

Study These Flashcards

E[(f - E[f])^2]

repeat with different sets of training data and measure how much prediction varies

what is overfitting

Study These Flashcards

model is over complex. Low bias high variance

what is underfitting

Study These Flashcards

model is too simple. high bias, low variance

what is a confidence interval

Study These Flashcards

how good of an estimate of true error is provided by sample error

what is a z test

compares two classifiers

what is the confidence interval of a classifier

a range [error -a, error + a ] | a = zp root(error * (1 - error) / n)

describe the steps of a z test

1. calculate zp = d / o where d = error A - error B o = root (zpa + zpb) 2. get p using table 3. c = 1 - (1-p /2)

what is a hypothesis

a model trained on a sample set

how do we evaluate a model

multiple train-test trials and average these error rates

what methods for data splitting are there

``` holdout random subsampling k fold cross validation leave one out bootstrap ```

what is holdout

we have a data set split into a single test and train division

what is random subsampling

have k splits of a number of test samples, chosen randomly. the rest are training samples

what is k fold cross validation

we split the data into k partitions. all examples are used in train and test, but only in a test set once. low k = not enough trails, high k = small test set -> high variance

what is Leave one out

like k fold cross validation but k = n, so only one sample in the test set each time

what is bootstrap

randomly select m samples, use these for training. use the rest for testing

what is hyperparameter selection

choose the model with the least error.

Chapter3: Machine Learning Experiments Flashcards

(36 cards)