# logistic regression Flashcards

disadvantage of linear model?

predicted probabilities may be below 0 or above 1

what does logic(p) equal to?

ln(p/1-p)=β0+β1*x (β1 is the expected increase in log-odds when X increases by one unit)

intercept in odds?

e^β0

slope

e^β1

can estimate β be interpreted as a change in the probability Y=1 associate with unit change in X?

No. Odds not linear

sensitivity?

TP/P (used if FN more costly than FP), RAISE SENSITIVITY BY CLASSIFYING MORE AS ‘YES’ (less FN but more FP, specificity reduced)

true positive rate?

TP/P (Sensitivity 1 – Type 2 error)

false positive rate?

FP/N (1 – Specificity Type 1 error)

positive prediction rate?

TP/hat P (precision)

negative prediction rate?

TN/ hat N

what doesROC (Receiver Operator Characteristic) curve traces out?

true positive rate and false positive rate as we vary the probability threshold from 0 to 1

AUC is the area under the ROC curve. what does it measure?

it measures overall performance of classifier (max AUC=1) the larger the better the classifier

what is the chance line?

random guess can produce the classifier at a 45 degree angle. no classifier should be worse than this line. AUC=0.5

for cross validation, what is used instead of MSEs

number of misclassified observations

converting factor variable for numeric linear regression (has negative values so ignore)

Default$default_yes = ifelse(Default$default == “Yes”, 1, 0)

lm_fit = lm(default_yes ~ balance, data = Default)

summary(lm_fit)