Midterm Flashcards

(36 cards)

1
Q

when to use ML?

A

to hard to hardcode
automation
problem is changing frequently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

when to not use ML?

A

algorithm already exists
not enough data
ethical concerns
requires explanations, not just predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the free lunch theorem?

A

no model works best for every dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what does regularization do?

A

lowers variance without raising bias much

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how to pick hyperparameters?

A

grid search, random select

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is softmax regression?

A

train multiple logistic regressions
make sure probabilities all sum to 1
higher temperature means less confident

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is logistic regression?

A

turns linear regression into classification
outputs a probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

hard margin SVM

A

trying to perfectly separate the two classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

soft margin SVM

A

allow some points to be misclassified, as long as the mistakes are not too large

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

kernels in svm

A

allows to map to non linear relationships without transforming the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

OvR vs OvO

A

OvR - train c models each saying if it is the class or not
OvO - train models to compare between each possible pair of classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

MSE and MAE formulas

A

take difference, square/take absolute value, sum up, divide by n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

advantages of MSE and MAE

A

MSE - differentiable, good for learning
MAE - result is interpretable, simple, less prone to outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

L1 regularization

A

make least important weights as close to 0 as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

L2 regularization

A

have the smallest weights possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Elastic net

A

combines l1 and l2

17
Q

l0 regularization

A

looks to have the most amount of 0s in the weights

18
Q

what do gini impurity and entropy measure?

A

how impure the data is
a lower number means the data is more pure
we want more pure data in decision trees, means the split is better

19
Q

pros and cons of decision trees

A

pros
- not much feature engineering
- interpretable
- applies to many tasks
cons
- can overfit
- high variance

20
Q

what is bagging

A

training lots of models of the same type, each on a different subset of the data
bagging has replacement

21
Q

what is boosting

A

training lots of weak learners, trained to undo the errors of the previous models

22
Q

types of kernels

A

quadratic, polynomial, rbf

23
Q

when to use cross validation?

A

when the data set is small
when we want to do hyperparameter tuning

24
Q

why have a validation set?

A

to make sure the model is not overfitted (hyperparameters)

25
what is inductive bias
bias from assumptions that the model makes based on its design
26
type 1 error
false positive, predicted positive when it should have been negative
27
type 2 error
false negative, predicted negative when it should have been positive
28
accuracy
correct vs total
29
recall
also called true positive rate TP / (TP + FN) how many predicted positives were correct
30
false positive rate
FP / (FP + TN)
31
precision
TP / (TP + FP) how many true positives were positive?
32
how to set up confusion matrix
actual on top, predicted on the side positive positive in the top left
33
why does precision recall tradeoff exist
precision and recall are sort of inversely related guessing more positives leads to higher recall, but lower precision
34
ensemble learning for classifiers
hard voting - predict the class with the most votes soft voting - predict the class with the highest probability
35
what is the base rate fallacy?
occurs when the positive label is rare most of the time, then, when we predict positive, it is a false positive
36