week 5 DSE Flashcards
(29 cards)
What is the disadvantage of running simple linear regressions separately instead of multiple?
ignores potential confounding factors or
synergy effect, leading to misleading results
If there are 2 or more independent vairbales, how to find error?
use least squares to find the regression plane that best fits the data.
If there are 2 predictors, how many parameters are we supposed to estimate
3
p+1
What is the eror term of the ith point?
ϵi = yi − yˆi
What does b2 =188 mean?
Holding the expenditure on x1 constant, every increase in x2 by 1 unit increases sales by 188 units
How many degrees of freedom does RSE have for multiple linear model
n-(p+1)
What is R^2?
fraction of variance explained
What is a flaw of R^2 in multilinear regression?
value never decreases, even if we add redundant variables to the regression model.
What causes the flaw in R^2 in multilinear regression?
equation solves for the coefficients such that RSS are minimized
If the variable does not improves model fit, the estimated coefficient will be zero. BUT R2 remains unchanged.
If the variable improves model fit, the estimated coefficient will be nonzero ⇒ R2 increases.
r^2 cannot decrease, can only remain same or increase
Use adjusted R^2
Can adjusted R^2 be negative?
Yes
What is the formula for penalization factor?
It is always ______
(n-1)/ ( n-(p+1) )
It is always larger than 1
How does R^2 adjusted compare to R^2?
- always smaller, can be negative
- when new independent variable added, r^2 decreases
INCLUDES PENALIZATION FACTOR
What is H0 and H1 for multiple lienar regression?
H0 : β1 = β2 = … = βp = 0
H1 : at least one βj is not zero.
What test for multi regerssion hypothesis testing?
F-statistic
refer to P-VALUE of f stats from R output table
What is interpolation?
Predicting Y for a value of X that is within the range of the original data
What is extrapolation
Predicting Y for a value of X that is outside the range
How does overfitting arise?
Using too many variables or too complex of a model can often lead to overfitting
How do you know when you are overfitting?
Wont be able to predict any other new observatio
Only able ot fit the current data perfectly
What is the problem of collinearity in multi linear regres?
We can only hold factors constant figuratively.
However, when two or more Xs are highly correlated, we can’t even hold factors constant figuratively
What are 2 solution to collinearity
Remove the redundant variable. You will need subject knowledge to understand which one is redundant.
Combine the collinear variables into a single variable
How to write operations within linear regression model equations in R?
use I
lm_mpg2 = lm(mpg ~ horsepower + I(horsepowerˆ2), data = Auto)
If x1= a and x2= a^2, is it still a linear model? Why?
Yes. Linear in coefficients
How to exclude variables in R?
use .-
eg lm(mpg ~ . - name, ..)
.- means everything EXCEPT FOR NAME
If you have a categorical variable, how many of coefficients for them will you see in R?
n-1
If you figure out 2, you can figure out hte last 1. ( If first 2 are 0, last one must be 1)