Machine Learning NanoDegree Flashcards
(23 cards)
apply bias and variance to ( underfitting || overfitting )
bias - underfitting, variance - overfitting
define the harmonic mean for (x, y)
2xy/x+y
what is the f1 score
the harmonic mean of precision and recall - raised a flag if any of the values are small
what is precision
the percent of labeled positives that are actually positive
what is recall
the percent of total positives that are label positive
What is fbeta score?
an f1 score that allows biasing towards either precision or recall. beta = 1 = harmonic mean, beta > 1 tends towards recall, 0 < beta < 1 tends towards precision
what is an ROC curve and how do you interpret it?
close to 1 is good, 0.5 is random,
What is r2 score and how do you interpret it?
the difference between a regression model and the simple averaging of all the points. close to 1 is good, close to 0 is bad
What is the point of having a bias node in a NN layer?
To provide the constant or intercept.
What is the ‘perceptron trick’ to get a line to move closer to a point?
subtract the the point vector (plus one for bias) time the learning rate from the linear equation if the point is negative labeled positive, add if the point is positive labeled negative.
What is the formula for multi-class entropy?
- sum(for i in p){ p[i] * log2(p[i]) }
what does ‘naive’ refer to in naive bayes?
assuming that all variables are independent.
a function must be ___ not ___ in order to be optimized
continuous, discreet
describe l2 regularization, including its alternate name
also called ridge regression, l2 regularization adds the square of the coefficients to the cost function, perhaps scaled by lambda. This works to penalize the model for being too complex and reduce overfitting.
describe l1 regularization, including its alternate name
also called lasso regression, l1 regularization adds the absolute value of the coefficients to the cost function, perhaps scaled by lambda. This works to penalize the model for being too complex and reduce overfitting. Reduces less important features to 0 and thus may be suitable for feature engineering.
What is a polynomial kernal and what does degree refer to?
A polynomial kernal projects a two dimensional function into 5 dimensions by adding terms x^2 , xy, and y^2. Higher degree polynomials add more exponents and combinations and therefore higher dimensions.
is softmax for n=2 the same as sigmoid activation ?
yes
how is softmax defined?
for i, softmax = ei / sum(e0…e**n)
What is cross entropy ?
-log(P)
What is the chain rule?
the partial derivative of a series of composed functions is equal to the product of the partial derivative of each of the functions
What is a monotonic function?
A function which is either entirely non-decreasing or non-increasing.
What is early stopping?
Stop training when the cross - val error starts to increase
how do you use a nn for regression instead of classification?
remove the final activation function and let it return the result of the last layer.