Wronged Questions: Linear Models Flashcards
T/F: Error terms are considered to have a dimensionless measure.
False. The error term is not dimensionless. Since it is defined as ei = Yi - B0 - B1Xi, it has the same unit as the target variable.
T/F: The error representation is based on the Poisson theory of errors.
False. The error representation is based on the Gaussian theory of errors. The error terms follow a Gaussian/normal distribution.
T/F: Error terms are also known as disturbance terms.
True. The error representation is based on the Gaussian theory of errors. The error terms follow a Gaussian/normal distribution.
T/F: Error terms are observable quantities.
False. There isn’t a specific definition for disturbance terms. The Frees text (page 31) states that error terms are also called disturbance terms.
T/F: A model with a higher sum of squared errors has a higher total sum of squares compared to a model with lower sum of squared errors.
False. The total sum of squares for both models would be the same.
T/F: The validation set approach is a special case of k-fold cross-validation.
False. Neither the validation set approach nor k-fold CV is a special case of each other.
Note that LOOCV is a special case of k-fold CV with k = n.
T/F: The validation set approach is conceptually complex to implement.
False. The validation set approach is conceptually simple and easy to implement.
T/F: Performing the validation set approach multiple times always yield the same results.
False. While performing LOOCV multiple times always yields the same results, this is not true for the validation set approach, where results vary due to randomness in the split.
T/F: The validation error rate will tend to underestimate the test error rate.
False. One of the major drawbacks of the validation set approach is that it only uses the training dataset
LOOCV uses all the data so it doesn’t have this issue as much.
T/F: The validation set approach has higher bias than leave-one-out cross-validation.
True. The LOOCV approach has lower bias than the validation set approach since almost all data is used in the training set, meaning it does not overestimate the test error rate as much as the validation set approach.
T/F: The validation set approach is conceptually simple and straightforward to implement.
True
T/F: The validation estimate of the test error rate can exhibit high variability, depending on the composition of observations in the training and validation sets.
True
T/F: The model is trained using only a subset of the observations, specifically those in the training set rather than the validation set.
True
T/F: Given that statistical methods typically perform worse when trained on fewer observations, this implies that the validation set error rate may tend to underestimate the test error rate for the model fitted on the entire dataset.
False
T/F: The leverage for each observation in a linear model must be between 1/n and 1.
True
T/F: The n leverages in a linear model must sum to the number of explanatory variables.
False. The leverages must sum to p+1, which is the number of predictors plus the intercept.
T/F: If an explanatory variable is uncorrelated with all other explanatory variables, the corresponding variance inflation factor would be zero.
False. If an explanatory variable is uncorrelated with all other explanatory variables, the corresponding variance inflation factor would be 1.
T/F: In best subset selection the predictors in the k-variable model must be a subset of the predictors in the (k+1)-variable model.
False. The predictors in the k-variable model do not need to be a subset of those in the (k+1)-variable model.
T/F: In best subset selection, if p is the number of potential predictors, then 2^(p-1) models have to be fitted.
False. The correct number of models that need to be fitted is 2^p, not 2^(p-1).
T/F: In best subset selection, the residual sum of squares of the k-variable model is always lower than that of the (k+1)-variable model.
False. The residual sum of squares for the k-variable model must be higher than or equal to that of the (k+1)-variable model (as long as the models are nested).
T/F: In each step of best subset selection, the most statistically significant variable is dropped.
False. In each step of best subset selection, the least statistically significant variable is dropped.
T/F: In high-dimensional settings, best subset selection is computationally infeasible.
True. In high-dimensional settings, the computational complexity of fitting all possible models makes best subset selection infeasible.
se_b0
se_b1