Quantitative Methods Flashcards
(56 cards)
Overfitting
when a machine learning model learns the input and target dataset too precisely, making the system more likely to discover false relationships or unsubstained patterns that will lead to prediction errors.
Adjusted R2
goodness of fit measure that adjusts the coefficent of determination, R2, for the number of independent variables in the model
Akaike’s Information Criterion (AIC)
a statistic used to compare sets of independent variables for explaining a dependent variable. it is preferred for finding the model that is best suited for prediction.
Analysis of Variance (ANOVA)
A table that presents the sums of squares, degrees of freedom, mean squares, and F-statistic for a regression model.
Coefficient of Determination
A measure of how well data points fit a linear statistical model, indicated in percentage terms from 0 to 100%. also known as R-squared (R2).
General Linear F-Test
A test statistic used to assess the goodness of fit for an entire regression model, so it tests all independent variables in the model.
Joint Test of Hypotheses
The test of hypotheses that specify values for two or more independent variables in the hypotheses.
Nested Models
Models in which one regression model has a subset of the independent variables of another regression model.
Restricted Model
A regression model with a subset of the complete set of independent variables.
Schwarz’s Bayesian Information Criterion (BIC or SBC)
A statistic used to compare sets of independent variables for explaining a dependent variable. it is preferred for finding the model with the best goodness of fit.
Unrestricted model
A regression model with the complete set of independent variables.
What are the principles for proper regression model specification?
Economic reasoning behind variable choices, parsimony, good out-of-sample performance, appropriate model functional form, no violations of regression assumptions
These principles ensure the model is effective and reliable in making predictions.
What are common failures in regression functional form?
Omitted variables, inappropriate form of variables, inappropriate variable scaling, inappropriate data pooling
These failures can lead to violations of regression assumptions, impacting model accuracy.
What is heteroskedasticity in regression analysis?
When the variance of regression errors differs across observations
It can affect the validity of statistical inferences made from the model.
What is unconditional heteroskedasticity?
When the error variance is not correlated with the independent variables
It does not create major problems for statistical inference.
What is conditional heteroskedasticity?
When the error variance is correlated with the values of the independent variables
This type of heteroskedasticity can lead to inflated t-statistics and Type I errors.
How is conditional heteroskedasticity detected?
Using the Breusch–Pagan (BP) test
This test assesses the correlation between error variance and independent variable values.
How can the bias created by conditional heteroskedasticity be corrected?
By computing robust standard errors
This adjustment helps ensure accurate statistical inference.
What is serial correlation (or autocorrelation) in regression?
When regression errors are correlated across observations
It is particularly problematic in time-series regressions.
What are the consequences of serial correlation?
Inconsistent coefficient estimates, underestimated standard errors, inflated t-statistics
This can lead to misleading conclusions about the relationships between variables.
What test is used to detect serial correlation?
The Breusch–Godfrey (BG) test
This test uses residuals from the regression to check for correlation with lagged residuals.
What does a variance inflation factor (VIF) measure?
Multicollinearity
It quantifies how much the variance of an estimated regression coefficient increases due to multicollinearity.
What does a VIF of 1 indicate?
No correlation between the variable and other regressors
This suggests that multicollinearity is not a concern for that variable.
What VIF value indicates serious multicollinearity requiring correction?
VIFj > 10
This level of multicollinearity can significantly affect the reliability of the regression model.