Chapter3: Machine Learning Experiments Flashcards

(36 cards)

1
Q

Describe the accuracy of a classification model

A

correct / real

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

describe the error of a classification model

A

incorrect / real

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the problem with error and accuracy? what is a better alternative?

A

it is unreliable for imbalanced data

confusion matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how can we compute accuracy from a confusion matrix

A

total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what does precision and recall measure about a classifier

A

its ability to classify positive samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

recall =

A

TP + FN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Precision =

A

TP + FP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the f_1 score

A

precision + recall

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is specificity

A

measure for assessing a classifiers ability to classify negative samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

specificity =

A

TN + FP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

1 - specificity is otherwise known as

A

false positive rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is ROC analysis

A

applies to binary classifiers. we plot sensitivity(true positive rate) and 1 - specificity.

we want the area under the curve to be as close to 1 as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is error in a regression model

A

difference between predicted and desired output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

list types of error for a regression model

A

root mean square error
mean absolute error
mean absolute percentage error
sum of squares error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the coefficient of determination

A

R^2 score in single output case.

sum of squares error / sum to n (y - 1/n sum to n y) ^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is sample error

A

the error computed using a performance metric from a set of samples

17
Q

what is true error

A

the probability a random sample is misclassified

18
Q

how is true and sample error different in regression

A

it is the expectation of the error

19
Q

how do we get bias and variance values

A

from the expected squared prediction error

20
Q

what is bias error

A

(y - E(f))^2

repeat with different sets of training data and measure how true they are

21
Q

what is variance error

A

E[(f - E[f])^2]

repeat with different sets of training data and measure how much prediction varies

22
Q

what is overfitting

A

model is over complex. Low bias high variance

23
Q

what is underfitting

A

model is too simple. high bias, low variance

24
Q

what is a confidence interval

A

how good of an estimate of true error is provided by sample error

25
what is a z test
compares two classifiers
26
what is the confidence interval of a classifier
a range [error -a, error + a ] | a = zp root(error * (1 - error) / n)
27
describe the steps of a z test
1. calculate zp = d / o where d = error A - error B o = root (zpa + zpb) 2. get p using table 3. c = 1 - (1-p /2)
28
what is a hypothesis
a model trained on a sample set
29
how do we evaluate a model
multiple train-test trials and average these error rates
30
what methods for data splitting are there
``` holdout random subsampling k fold cross validation leave one out bootstrap ```
31
what is holdout
we have a data set split into a single test and train division
32
what is random subsampling
have k splits of a number of test samples, chosen randomly. the rest are training samples
33
what is k fold cross validation
we split the data into k partitions. all examples are used in train and test, but only in a test set once. low k = not enough trails, high k = small test set -> high variance
34
what is Leave one out
like k fold cross validation but k = n, so only one sample in the test set each time
35
what is bootstrap
randomly select m samples, use these for training. use the rest for testing
36
what is hyperparameter selection
choose the model with the least error.