Week 5 Flashcards
(41 cards)
model that is even simpler than the univariate regression model.
It is called the intercept-only model, which is a regression model without any predictor.
uy = B0 +ei
Can you guess what the β0 equal to? What is the best prediction for Yi
if you don’t know anything else?
Answer: β0 = µy
For the intercept-only model, we can only test what hypothesis
H0 : β0 = 0
H1 : β0 does not equal 0.
what is this equivalent to
H0 : β0 = 0
H1 : β0 does not equal 0. in a one sample t-test
H0 : µy = 0
H1 : µy does not equal 0.
Univariate Model:
Equation?
What does B1 mean?
What tests?
yi = β0 + β1x1 + ϵi
§ β1: for one unit increase in x1, there is β1 unit increase in Y
§ t-test for the regression coefficient and correlation coefficient and F-test for the overall model fit (or R-squared) are equivalent.
Bivariate Model:
Equation?
What does B1 mean?
What tests?
yi = β0 + β1x1 + β2x2 + ϵi
β1: holding x2 constant, for one unit increase in x1, there is β1 unit increase in Y .
t-test for the partial regression coefficient is different from F-test for the overall model (or R-squared).
the F-test for the univariate and bivariate regression tests whether…
the variance explained in the criterion variable can be significantly accounted for by all the predictors
H0: p2yy=0
H0: p2yy>0
p2yy= what at the population level?
ssregression/sstotal
Another way of looking at the F-test is that it is a ratio comparing the current model and the intercept-only model.
p2yy = SScurrentmod/SSinterceptonlymod
In the intercept-only model, can you do an F test?
No
Why is unadjusted R^2 not good?
Because the the sample R-squared r2yyˆ
is a biased estimator of the population R-squared ρ2yyˆ
§ Over repeated studies, the sample R-squared r2yyˆ tends to be higher than ρ2yyˆ
.
The sample R-squared r2yyˆtends to increase as the numberof predictors (denoted by p) increases.
§ As p increases, the model tends to be overfitting.
§ Overfitted model will be very unstable; the estimation varies widely across repeated samples - line too close to points
Bias-Variance Trade Off
Define bias and variance
Explain how they influence
For any statistical modelling, there is a bias-variance tradeoff.
Bias: how good is the model fit to the current data.
§ Less bias means less residual.
§ observed and predicted value are similar.
§ more variance in the criterion variable can be explained by the predictors.
§ In regression, usually, as you add more predictors, you will get less bias.
Variance: how variable is your estimated across repeated samples.
§ Large variance implies large standard error and more prediction error.
§ In regression, usually, as you add more predictors, you will get large standard error and predictor error.
§ recall multicollinearity.
Underfitted Model - bias and variance?
High bias; low variance
Overfitted Model: - bias and variance?
Low bias; high variance.
Both underfitted and overfitted models have _________ prediction error
large
The unadjusted R2 tends to favor ______________ models even
though they are not good.
overfitted
The goal of the adjusted R2 is…
to provide a more balanced evaluation of the fit relative to the number of predictors.
Unadjusted R^2 formula
r2yy = 1- (ssregression/sstotal)
adjusted R^2 formula
r2yy = 1 = (ssresidual/dfresidual)/(sstotal/dftotal)
As the number of predictors, relative to sample size, increases, the R-squared is adjusted how?
downward more.
In short, adjusted R squared adjusts the unadjusted R-squared downward to provide a better evaluation of fit.
We know that in the population model, the error term is…
what is the notation
random variable with normal distribution
ei ~ N(0,o^2)
deterministic view
The deterministic view assumes that the variability of the criterion variable can be fully accounted for by a list of predictors at the population level; therefore, there is no error term in the population
stochastic view
The stochastic view assumes that the variability of the criterion variable CANNOT be fully accounted for by a list of predictors at the population level; therefore, there should be an error term in the population
Modern statistics takes which view?
stochastic view.
There are two fundamentally different interpretations of the
regression coefficients.
- Descriptively as an empirical association
- Causally as a structural relation