Untitled Deck Flashcards
(100 cards)
What is the primary purpose of multiple linear regression?
To model the relationship between one dependent variable and two or more independent variables.
In the equation Y=β0+β1X1+β2X2+ε, what does β0 represent?
The intercept; the expected value of Y when all independent variables are zero.
How does multiple linear regression differ from simple linear regression?
Multiple linear regression involves two or more independent variables, while simple linear regression involves only one.
What assumption is made about the relationship between the dependent and independent variables in multiple linear regression?
The relationship is assumed to be linear.
What is multicollinearity, and why is it problematic in multiple linear regression?
Multicollinearity occurs when independent variables are highly correlated, making it difficult to assess the individual effect of each predictor.
What does the coefficient β1 represent in a multiple linear regression model?
The expected change in the dependent variable for a one-unit increase in X1, holding other variables constant.
What is the purpose of the error term ε in a regression model?
It accounts for the variability in Y that cannot be explained by the linear relationship with the predictors.
How is the goodness-of-fit of a multiple linear regression model typically assessed?
By examining the R-squared value, which indicates the proportion of variance in the dependent variable explained by the model.
What is the difference between R-squared and adjusted R-squared?
Adjusted R-squared adjusts the R-squared value for the number of predictors in the model, providing a more accurate measure when multiple variables are involved.
Why might adding more predictors to a regression model not always lead to a better model?
Adding unnecessary predictors can lead to overfitting, where the model captures noise rather than the underlying relationship.
What is the purpose of ANOVA in the context of regression analysis?
To assess the overall significance of the regression model by comparing the model variance to the residual variance.
In an ANOVA table, what does the ‘Sum of Squares’ represent?
It quantifies the total variation in the dependent variable, partitioned into components attributable to the regression model and residual error.
What does a significant F-statistic in an ANOVA table indicate about a regression model?
It suggests that at least one predictor variable significantly explains variation in the dependent variable.
How is the Mean Square Error (MSE) calculated in an ANOVA table?
By dividing the Sum of Squares for residuals by its corresponding degrees of freedom.
What is the null hypothesis tested by the overall F-test in regression ANOVA?
That all regression coefficients are equal to zero, implying no linear relationship between predictors and the dependent variable.
In the context of ANOVA, what does the term ‘degrees of freedom’ refer to?
The number of independent values or quantities that can vary in the analysis, associated with the sources of variation.
Why is the principle of parsimony important when interpreting ANOVA results in regression?
It emphasizes choosing the simplest model that adequately explains the data, avoiding overfitting with unnecessary predictors.
What does the Total Sum of Squares (SST) represent in an ANOVA table?
The total variation in the dependent variable around its mean.
How is the Regression Sum of Squares (SSR) interpreted in ANOVA?
It measures the portion of total variation explained by the regression model.
What is the Residual Sum of Squares (SSE) in an ANOVA context?
It quantifies the variation in the dependent variable that remains unexplained by the regression model.
What is the Akaike Information Criterion (AIC) used for in model selection?
To compare models by balancing goodness-of-fit and complexity, penalizing models with more parameters.
How does the Bayesian Information Criterion (BIC) differ from AIC in model selection?
BIC imposes a stricter penalty for model complexity, favoring simpler models compared to AIC.
What are the four primary assumptions of multiple linear regression?
Linearity, independence, homoscedasticity (constant variance), and normality of residuals.
How can you visually assess the linearity assumption in regression analysis?
By plotting residuals against fitted values; a random scatter suggests linearity.