W3 Multiple Linear Regression Flashcards
WHAT is multiple linear regression
the linear relationship between the dependent variable y and 2 or more independent x variables
equation of line in multiple regression
Y = B0 + B1X1 + B2X2 … + E
what is r^2 the coefficient of multiple determination
tells us how much of Y is explained by x independent variables
why is using adjusted r^2 more reliable
never decreases degrees of freedom when a new x variable is added to the model
what is a residual error
difference between actual and assumption
should residuals be random or not
random
confidence interval for the population slopeof b
coefficient of B +- t stat*standard error
how to know when to reject null hypothesis
f stat > critical f reject
how to test contributions of a single variable
test with all variables
test with all variables except the one we’re testting
why might you want to test contributions of a single variable
maybe the variable are just getting a leg up from others that already had an effect
what does the coefficient of parial determination tell us
how much of the variance is described by 1 variable when the others are held constant
when are dummy variables used
when data is categorical
what two numbers are used rather than numerical data as the xs in equation including dummy variable
1 for present
0 for absent
how to test interaction between independent variables
ssr(all) - ssr(all except new variables) / mse (all)
when is logistic regression used
when the Y variable is binary (a dummy variable) eg
prefer A or B
voted or didn’t vote
In what industry is logistic regression used
machine learning and AI
what is odds ratio
prob of event of interest / 1 - prob of event of interest
what is estimated odds ratio
e^ln(odds ratio)
what is estimated probabilit
estimated odds ratio / 1 + estimated odds ratio
when testing to see if a non liner model should be used, how do we pick the model with the best fit
the one with the highest r^2
two types of transformations to transform non linear models into linearones
square root
log
what is the problem with collinearity in regression
you cannot hold one variable constant because of the close relationship between variables
what to do in the case of colinear variables
avoid regression or choose one to include
indications that collinearity has happened
incorrect signs on coefficients
large change in value of previous coefficient when new one is added
variance of model increases when new variable is added