Final Stuff 2 Flashcards by Jason Chen

what is LIME?

a technique that aims to create an interpretable model local to a data point

How well did you know this?

Not at all

Perfectly

why does the vanishing/exploding gradient problem occur?

we are multiplying gradients as we move to earlier layers, multiplying lots of small/large numbers together

How well did you know this?

Not at all

Perfectly

what is grokking?

when test error falls way after training error has fallen, model suddenly generalizes well.

How well did you know this?

Not at all

Perfectly

what are support vectors in SVM

the data points that are closest to the hyperplane

How well did you know this?

Not at all

Perfectly

how to set up confusion matrix

actual on top, predicted on the side
positive positive in the top left

How well did you know this?

Not at all

Perfectly

what is the loss function for logistic regression?

binary cross entropy, it is strictly convex

How well did you know this?

Not at all

Perfectly

types of transformers

encoder, decoder, encoder-decoder

How well did you know this?

Not at all

Perfectly

false positive rate

FP / (FP + TN)
second column

How well did you know this?

Not at all

Perfectly

2 examples of non parametric models

knn, decision trees
don’t make assumptions about the underlying distribution

How well did you know this?

Not at all

Perfectly

recall/true positive rate

also called true positive rate
TP / (TP + FN)
out of all the positives, how many was the model able to get correct (recall)
first column

How well did you know this?

Not at all

Perfectly

false positive

we predict positive, but that is false (actually negative)

How well did you know this?

Not at all

Perfectly

advantages of MSE and MAE

MSE - differentiable, good for learning
MAE - result is interpretable, simple, less prone to outliers

How well did you know this?

Not at all

Perfectly

what is calibration?

making sure the output of the model represents how confident it is

How well did you know this?

Not at all

Perfectly

three svm kernels

linear, polynomial, RBF

How well did you know this?

Not at all

Perfectly

what are sampling layers in cnns?

pooling layers

How well did you know this?

Not at all

Perfectly

types of autoencoders

Study These Flashcards

deep (multiple layers), sparse, variational, etc.

precision

Study These Flashcards

TP / (TP + FP)
how many predicted positives were truly positive?
across first row

accuracy

Study These Flashcards

correct vs total
diagonal vs total

what are scaling laws?

Study These Flashcards

take in model size and data and try to predict loss
how much will throwing resources at the model improve it?

MSE and MAE formulas

Study These Flashcards

true - predicted, square/take absolute value, sum up, divide by n

normalization vs regularization

Study These Flashcards

normalization - making sure weights are same scale
regularization - makes sure model doesn’t overfit

false negative

Study These Flashcards

we predict negative, but that is false (actually positive)

types of normalization

Study These Flashcards

batch norm, L2 norm

what is out of bag evaluation

Study These Flashcards

evaluate tree on the data that wasn’t used for training

high level of how to train an LLM

pretraining - predict next token supervised fine tuning - train on prompts and good responses reinforcement learning with human feedback - humans rate responses

why use kernels in SVM?

transform the data into a higher dimensional space where the data is now linearly seperable

what are proxy models?

models that behave similar to complex models

Final Stuff 2 Flashcards

(27 cards)