Final Stuff Flashcards

(27 cards)

1
Q

MSE and MAE formulas

A

true - predicted, square/take absolute value, sum up, divide by n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

advantages of MSE and MAE

A

MSE - differentiable, good for learning
MAE - result is interpretable, simple, less prone to outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

accuracy

A

correct vs total
diagonal vs total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

recall/true positive rate

A

also called true positive rate
TP / (TP + FN)
out of all the positives, how many was the model able to get correct (recall)
first column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

false positive rate

A

FP / (FP + TN)
second column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

precision

A

TP / (TP + FP)
how many predicted positives were truly positive?
across first row

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how to set up confusion matrix

A

actual on top, predicted on the side
positive positive in the top left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

false negative

A

we predict negative, but that is false (actually positive)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

false positive

A

we predict positive, but that is false (actually negative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

types of transformers

A

encoder, decoder, encoder-decoder

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

high level of how to train an LLM

A

pretraining - predict next token
supervised fine tuning - train on prompts and good responses
reinforcement learning with human feedback - humans rate responses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

normalization vs regularization

A

normalization - making sure weights are same scale
regularization - makes sure model doesn’t overfit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

types of normalization

A

batch norm, L2 norm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are support vectors in SVM

A

the data points that are closest to the hyperplane

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is out of bag evaluation

A

evaluate tree on the data that wasn’t used for training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is calibration?

A

making sure the output of the model represents how confident it is

17
Q

2 examples of non parametric models

A

knn, decision trees
don’t make assumptions about the underlying distribution

18
Q

what is LIME?

A

a technique that aims to create an interpretable model local to a data point

19
Q

what are proxy models?

A

models that behave similar to complex models

20
Q

why does the vanishing/exploding gradient problem occur?

A

we are multiplying gradients as we move to earlier layers, multiplying lots of small/large numbers together

21
Q

types of autoencoders

A

deep (multiple layers), sparse, variational, etc.

22
Q

what is the loss function for logistic regression?

A

binary cross entropy, it is strictly convex

23
Q

what are scaling laws?

A

take in model size and data and try to predict loss
how much will throwing resources at the model improve it?

24
Q

three svm kernels

A

linear, polynomial, RBF

25
why use kernels in SVM?
transform the data into a higher dimensional space where the data is now linearly seperable
26
what are sampling layers in cnns?
pooling layers
27
what is grokking?
when test error falls way after training error has fallen, model suddenly generalizes well.