Multiple regression and logistic regression Flashcards Preview

Stats intercal > Multiple regression and logistic regression > Flashcards

Flashcards in Multiple regression and logistic regression Deck (16)
Loading flashcards...

Describe multiple regression

with a single explanatory variable we would carry out a simple linear regression.
With several explanatory variables we can use a more general form of regression model that allows more than one explanatory variable at a time


What are the assumptions for multiple regression?

residuals are normally distributed
residuals have mean 0
residuals have constant variance
observations are independent
error-free measurement of predictors


What does the regression line indicate?

The regression line indicated the nature of the relationship between the explanatory variables and the response


What does the coefficient for each variable?

it gives the expected change that an increase in one unit would give to the response assuming that the other variables do not change


What can the coefficients' standard errors can be used for?

to create confidence intervals for the coefficients


What is the Ho in multiple regression?

each coefficient = 0


Why is the adjusted R squared used instead of the raw value in multiple regression?

The normal R squared will increase whenever another variable is added to the model regardless of whether the variable has any predictive ability


Describe adjusted R squared

most commonly used selection criteria
it adjusts the estimated proportion of variance explained by the model


What are the other selection criteria used in multiple regression?

predicted R squared
Mallow's C-p
Principle of parsimony


describe the predicted R-squared

cross validation
takes over fitting into account


Describe Mallow's C-p

penalises for having too many variables, the model with minimum C-p is most effective


Describe the principle of parsimony

keep it simple


What are model searching methods?

stepwise regression - usually based on forwards selection or backwards elimination
Best subsets regression - considers all potential models


What are the potential pitfalls of multiple regression?

overfitting - large number of variables and small sample size
Typically need sample size to be at least 10 times larger than the number of variables considered
The final number of variables in the model should not be more than the square root of the sample size
Collinearity - when explanatory variables are highly correlated to each other. usually best to remove one before the regression


Describe logistic regression

A logistic regression model is appropriate when we are interested in modelling a binary response or dependent variable
This variable can be related to one or more risk factors or covariates that may either be categorical or continuous


Describe multiple logistic regression

can be extended to examine the influence of large numbers of explanatory variables
need a large sample size
assumptions and error checking tricky - seek help