Week 6 theoretical quesitions Flashcards
In assessing the linear relationship between two interval variables, what is common between Pearson’s coefficient of correlation and simple linear regression analysis?
Common: Both methods can indicate whether there exists a linear relationship between the two interval variables and, if yes, the direction (positive or negative) of the linear relationship.
In assessing the linear relationship between two interval variables, what is different between Pearson’s coefficient of correlation and simple linear regression analysis?
Different: Pearson’s coefficient of correlation measures the strength of the linear relationship over the range of [-1,1], while slope estimate in simple linear regression analysis measures what is the expected change in y given one-unit change in x.
What is the ANOVA test for linear regression?
ANOVA test is essentially a F-test of sum of squares for regression against sum of squares for error.
What conclusion we can obtain from this test?
The test result can inform us whether the variation in the outcome variable explained by the regression model is sufficiently large relative to the variation unexplained (corresponding to a rejection of the null hypothesis in the F-test).
What is the difference between prediction interval and confidence interval in making prediction based on an estimated linear regression model?
Prediction interval is used when one is interested in estimating one-time value of the dependent variable given certain value(s) of the explanatory variable(s), while confidence interval is used when one is interested in estimating the mean of all values of the dependent variable given certain value(s) of the explanatory variable(s). All else being equal, confidence interval will always be narrower than the prediction interval.
What is the meaning of ceteris paribus?
Ceteris paribus: Other relevant factors being equal (all else being equal; holding all other relevant factors constant)
Why is multiple linear regression analysis (compared to simple linear regression analysis) more able to make ceteris paribus inference?
Reason: By modeling the dependent variable as a function of multiple independent variables, multiple linear regression analysis can explicitly control for many other factors that simultaneously affect the dependent variable when we assess the effect of the focal independent variable on the dependent variable.
Why do we have adjusted R^2 in multiple linear regression analysis?
Adjusted R^2 is used to address the drawback of the normal R^2: more (even irrelevant) variables included in the model, R^2 will always increase. Adjusted R^2 addresses this drawback by imposing a penalty for adding additional independent variables into the model such that only if the additional variables explain a sufficiently large extra variation in y would the adjusted R^2 increase.
What is the issue of multicollinearity?
Multicollinearity is an issue in multiple linear regression analysis where two or more of the independent variables are strongly correlated. In other words, at least one of the independent variables can be largely approximated by a linear function of the other independent variables.
How is a categorical variable with k levels included as an explanatory variable in a linear regression model? How do we interpret the estimated coefficients?
k-1 dummy variables should be created corresponding to k-1 levels of the categorical variable. That is, one (arbitrary) level has to be omitted. Then the k-1 dummy variables, rather than the original variable, are included in the regression model.
The estimated coefficient of any of the k-1 dummy variables indicates, ceteris paribus (all other factors being the same), the expected mean difference in the outcome variable between the focal category and the omitted category.