Regression Analysis, Time Series Analysis Flashcards
(33 cards)
How is correlation coefficient notated/
If measured for a population, called ? (rho)
If estimated from a sample, sure r; i.e r estimates ?
What is true of the correlation coefficient?
- Correlation and covariance are appropriate for use with continuous variables whose distributions have the same shape (e.g. both normally distributed)
- If these assumptions are not met, r will be ‘deflated’ and underestimates ?.
What are the data in a regression analysis?
- One continuous response variable (called y - dependent variable, response variable)
- One or more continuous explanatory variable (called x - independent variable, explanatory variable, predictor variable, regressor variable)
What is true of εi?
Mean of εi will be zero
What does E(Y|X=x) = β0 + β1x mean?
The expected value of Y when X = x is β0 + β1x
What is the true/population regression line?
- Yi = β0 + β1xi + εi
- β0 and β1 are constant to be estimated
- εi is a random variable with mean = 0 if our line is going through the middle of our data
How is the population regression line estimated?
- ŷ = b0 + b1x
- b0 and b1 are estimated values
- ŷ is a fitted value of the response
What is a residual?
vertical distance between observed response and fitted value of response
How are residuals estimated?
ri estimates εi, the error variable
What is SSE?
The error sum of squares
SSE = nΣi=1 (yi - ŷ)2
What error assumptions do we make in regression analysis?
- In our fitting we assume the errors have a particular distribution - that is, ε ~ N(o, σε2)
- Normal distibution
- Mean = 0
- Constant variance = σε2
- Errors associated with any two y values are independent
What is sε?
- sε = standard error of the estimate
- Interpretation - standard deviation of residuals; standard error in predicting Y from the regression equation - best definition: standard deviation around prediction line
What are the t stats in regression analysis output?
T = test statistic (that population intercept/slope = 0 against two sided alternative), compared to t with n-2 degrees of freedom finds P = 0, i.e. intercept/slope is not 0
What is S in regression analysis output?
Standard Error of the Regression (S) = average distance that values fall from regression line
What is R^2?
- Determine the strength and significance of association
- coefficient of determination
- measures proportion of total variation explained, i.e.
- = explained variation / total variation = SSreg / SSy =(correlation coefficient)^2
- Will be between 0 and 1; a value close to 1 indicates most of the variation in y is explained by the regression equation
What is important about R?
r = ± √r2
What is Homoscedasticity?
If variation is constant (residuals show constant spread around zero), called homoscedastic
What is Hetroscedasticity?
If variation is non-constant (residuals show varying spread around zero), called heteroscedastic
What is true about Large Standardised Residuals?
Minitab flags “Large Standardised Residuals” R - should be about 5%, - indicates normality of residuals
What must be true to make predictions from a regression analysis?
- High R-sq, small std error of estimate
- All assumptions appear valid
- Predictions should only be made for values inside the observed limits
What does β1 represent in a multiple regression with 2 predictors?
β1 represents the expected change in Y when X1 is increased by one unit, but X2 is held constant or otherwise controlled
What is meant by additive effects of multiple regression?
Combined effects of X1 and X2 are additive - if both X1 and X2 are increased by one unit, expected change in Y would be ( β1 + β2 )
What must be true for us to find a Least Squares solution for a multiple regression?
- Number of predictors is less than number of observations
- Non of the independent variables are perfectly correlated with each other
What is true of the coefficient of multiple determination?
- Will go up as we add more explanatory terms to the model whether they are important or not
- Often we use adjusted R-sq - compensates for adding more variables, so it lower than R-Sq when variables are not “important”
- So, if comparing models with differing numbers of predictors, use Adjusted R-Sq to compare how much variation in response is explained by model