Final Stuff 2 Flashcards
(27 cards)
what is LIME?
a technique that aims to create an interpretable model local to a data point
why does the vanishing/exploding gradient problem occur?
we are multiplying gradients as we move to earlier layers, multiplying lots of small/large numbers together
what is grokking?
when test error falls way after training error has fallen, model suddenly generalizes well.
what are support vectors in SVM
the data points that are closest to the hyperplane
how to set up confusion matrix
actual on top, predicted on the side
positive positive in the top left
what is the loss function for logistic regression?
binary cross entropy, it is strictly convex
types of transformers
encoder, decoder, encoder-decoder
false positive rate
FP / (FP + TN)
second column
2 examples of non parametric models
knn, decision trees
don’t make assumptions about the underlying distribution
recall/true positive rate
also called true positive rate
TP / (TP + FN)
out of all the positives, how many was the model able to get correct (recall)
first column
false positive
we predict positive, but that is false (actually negative)
advantages of MSE and MAE
MSE - differentiable, good for learning
MAE - result is interpretable, simple, less prone to outliers
what is calibration?
making sure the output of the model represents how confident it is
three svm kernels
linear, polynomial, RBF
what are sampling layers in cnns?
pooling layers
types of autoencoders
deep (multiple layers), sparse, variational, etc.
precision
TP / (TP + FP)
how many predicted positives were truly positive?
across first row
accuracy
correct vs total
diagonal vs total
what are scaling laws?
take in model size and data and try to predict loss
how much will throwing resources at the model improve it?
MSE and MAE formulas
true - predicted, square/take absolute value, sum up, divide by n
normalization vs regularization
normalization - making sure weights are same scale
regularization - makes sure model doesn’t overfit
false negative
we predict negative, but that is false (actually positive)
types of normalization
batch norm, L2 norm
what is out of bag evaluation
evaluate tree on the data that wasn’t used for training