Section 2 Logistic Regression Flashcards
(35 cards)
What is w0 in linear regression
Parameter w0 is the intercept allowing for any fixed offset in the data. It is often called bias.
How are the regression coefficients found in linear regression?
Maximisation of the log-likelihood is equivalent to minimization of the squared loss function will lead to the same optimisation.
What is the motivation for logistic regression?
Logistic regression models a binary target categorical variables given a collection of predictors. The motivation for logistic regression is to map R to the 0,1 set to model Pr(Y=1 given the predictor X).Logistic regression provides an interpretable model where can understand how the traits of individual observations contribute to their classification.
In linear regression what is the error assumption?
We assume errors in our linear regression are normal with zero mean and variance sigma
Explain classification
Predicting the value of Y using inputs of X can be called classification since we assign the observation to a category, or class.
What are linear regression models used for
Linear regression models relationships between numerical response and multiple predictors using a linear model (real set).This cannot be used for a binary/categorical response.
What is logistic regression?
Logistic regression is used to model a binary categorical variable given a collection predictors. Maps the real numbers to just 0,1. The logistic regression model defines the probability p as a function of parameter and predictors in terms of the logistic function.
Interpret w0
The intercept w0 is the value of the logit corresponding to xi = 0. The intercept w0 is sometimes called bias.
Interpret W weights
The coefficient w measures the effect of the variable on the logit.
R function to fit a logistic regression
glm() to fit logistic regression models. This is because it allows variables forming the regression to be of all different categories.
What does R return using glm function?
estimates for weights parameters, std errors of these estimates, z value and p value
Define a hypothesis test to determine if a variable has a significant effect on the target variable
To determine if a variable Xj has a significant effect on the target variable Y , we may wish to perform the hypothesis test:
H0:wj=0 vs HA wj not equal to 0
Name three estimation options for parameter estimates
Maximum likelihood estimation, minimization of least squares loss function or minimising the mean squared error loss function.
What is the response variable modelled as under logistic regression?
Response variable is modelled by a Bernoulli distribution given the values of the input variables.
How is wj generally estimated?
Maximising the log-likelihood. or minimising loss functions. No closed form solution for wj is available, and optimization is performed numerically and software is used. In the GLM literature, maximisation of the logistic regression log-likelihood is performed using the Newton-Raphson algorithm.
Why would mean squared error loss function not make sense for logistic regression data
Mean squared error for a target variable that’s binary doesn’t make sense.
Explain information theory
Information theory revolves around quantification of information of an event with respect to its likelihood of happening
Define entropy
Entropy is used to quantify the information over the entire probability distribution P: Entropy allows us to measure and quantify the variability of a categorical variable, by giving a measure of how spread out the probability values are.
Define cross entropy
Suppose P denotes a probability distribution of interest, while Q a probability distribution used to estimate P.
Cross-entropy measures the expected “surprisal” of an observer with probabilities Q after seeing data actually generated according to probabilities P (Replacing set of probabilities by another from a different distribution to quantify the difference between the two distributions):
What is the gradient of a loss function
The gradient ∇l(w: D) of the loss function is the vector of all partial derivatives
Explain gradient descent concept
Using the information from the gradient, a first-derivative-based algorithm can be devised to efficiently locate a local minimum by moving towards the direction of the negative gradient, as its always “downhill”. This iterative optimization is called gradient descent.
What is eta in gradient descent optimisation algorithm
The learning rate η determines the size of the step and is usually set to be small: If steps are too big you risk missing the optimal point, if too small optimization will take a long time. Steps are not necessarily same very time but will be proportional to η
When does gradient descent algorithm converge
The gradient descent algorithm converges when all the elements of the gradient are (numerically) zero.
Explain concept of complete separation
Complete separation in logistic regression.A complete separation in a logistic regression, sometimes also referred as perfect prediction, happens when the outcome variable separates a predictor variable completely.
Logistic regression tries to fit a sigmoid curve to the data.
The coefficient w measures the slope of the curve. As the value of w increases to ∞, we get a better fit to completely separated data (clear vertical line/upward sloping line between the two data sections. )