# 06.b Logistic Regression Flashcards

1
Q

What is Logistic Regression

A

The logistic regression is a supervised predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent categorical variable and one or more nominal, ordinal, interval or ratio-level independent variables by estimating probabilities using a logistic function.

2
Q

What type of output variable comes from Logistic Regression

A

When the outcome variable is categorical in nature, logistic regression can be used to predict the likelihood of an outcome based on the input variables.

3
Q

Name four use cases for Logistic Regression

A

Medical
Finance
Marketing
Engineering

4
Q

What shape is the common Logistic Curve

A

An S Shape curve. Bottom left is zero, top right is One, with an S Shape joining the two corners

5
Q

What is the Logistic Function (equation)

A

f(y) = e^y / (1+e^y) for -infinity < y < +infinity

6
Q

What is MLE in terms of Logistic Regression

A

MLE stands for Maximum Likelihood Estimation

7
Q

What does churn mean

A

Churn refers to the likelihood of a customer will switch to another company

8
Q

Which function should you use for Logistic Regression in R

A

The Generalised Linear Model function glm()

OutputDF = glm (Churned ~ Age + Married + Cust_Years+Churned_Contacts, data=churn_input, family=bionomial(link=”logit”))

9
Q

Describe Odds

A

The Odds of something happening are the chances of A happening divided by the chances of B happening.

10
Q

Describe Probability

A

The Probablity of something happening are the chances of A happening divided by the chances of all possible results.

11
Q

Once you have calculated the Generalised Linear Model for y which equation should you use to calculate the probability

A

p = e^y / (1-e^y)

12
Q

What is the Akaike Information Criteria (AIC)

A

You can look at AIC as counterpart of adjusted r square in multiple linear regression. It’s an important indicator of model fit. It follows the rule: Smaller the better. AIC penalises increasing number of coefficients in the model. It helps to avoid over-fitting.

13
Q

In Logistic Regression what is the Null Deviance

A

The Null Deviance is the value where the likelihood function is based only on the intercept term

14
Q

In Logistic Regression what is the Residual Deviance

A

The Residual Deviance is the value where the likelihood function is based on the parameters in the specified logistic model

15
Q

In Logistic Regression how do you calculate a Pseudo - R squared

A

Pseudo R Squared = 1 - (residual dev. / null dev.)

16
Q

The Deviance of an observation is calculated how

A

-2 * log (likelihood of that observation)

17
Q

What is a confusion matrix

A

A table of Actual Class (AC) against Predicted Class (PC) showing false and true
PC
Positives (1) Negatives (0)
Positives (1) True Pos False Neg
AC Negatives (0) False Pos True Neg

A good classifer should have high True (Pos&Neg) and low False (Pos & Neg)

18
Q

What is the true positive rate (TPR)

A

TPR = TP / (TP +FN)

All the TP divided by all the actual Positives

19
Q

What is the false positive rate (FPR)

A

FPR = FP / ( FP + TN)

All the FP divided by all the actual Negatives

20
Q

What is the true negative rate (TNR)

A

TNR = TN / (FP + TN)

All the TN divided by all the actual Negatives

21
Q

What is the false negative rate (FNR)

A

FNR = FN / (TP + FN)

All of the FN divided by all the actual Positives

22
Q

What is Accuracy of a Confusion Matrix

A

Accuracy = TP + TN / (TP + TN + FP + FN)

So the correct ones / everything

23
Q

What is Precision of a Confusion Matrix

A

Precision = TP / ( TP + FP)

P for Precsion all of the P’s! TP / all of the positives

24
Q

What is Recall of a Confusion Matrix

A

Recall = TP / ( TP + FN )

Which is the same as the TPR

25
Q

How do you calculate the F Score of a Confusion Matrix

A

FScore = 2 x ((Precision x Recall)/ (Precision + Recall

26
Q

When would you consider using Ridge Regression or Lasso Regression

A

In the case of multicollinearity you could consider using Ridge or Lasso regression because they apply penalties based on the size of the coefficients in an effort to reduce the impact of the multicollinearity

27
Q

What is a ROC curve

A

It is a receiver operator curve
It is a plot of the TPR against the FPR
A 45 deg line represents as many correct as wrong
A straight line up the y axis and flat line across the top represents a 100% accuracy

ROC Curve is looking at the Positives!

28
Q

In logistic regression what is the default threshold

A

50%

29
Q

What is another name for a leaf node

A

A class label