Quantitative Methods Flashcards

(56 cards)

1
Q

Overfitting

A

when a machine learning model learns the input and target dataset too precisely, making the system more likely to discover false relationships or unsubstained patterns that will lead to prediction errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Adjusted R2

A

goodness of fit measure that adjusts the coefficent of determination, R2, for the number of independent variables in the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Akaike’s Information Criterion (AIC)

A

a statistic used to compare sets of independent variables for explaining a dependent variable. it is preferred for finding the model that is best suited for prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Analysis of Variance (ANOVA)

A

A table that presents the sums of squares, degrees of freedom, mean squares, and F-statistic for a regression model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Coefficient of Determination

A

A measure of how well data points fit a linear statistical model, indicated in percentage terms from 0 to 100%. also known as R-squared (R2).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

General Linear F-Test

A

A test statistic used to assess the goodness of fit for an entire regression model, so it tests all independent variables in the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Joint Test of Hypotheses

A

The test of hypotheses that specify values for two or more independent variables in the hypotheses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Nested Models

A

Models in which one regression model has a subset of the independent variables of another regression model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Restricted Model

A

A regression model with a subset of the complete set of independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Schwarz’s Bayesian Information Criterion (BIC or SBC)

A

A statistic used to compare sets of independent variables for explaining a dependent variable. it is preferred for finding the model with the best goodness of fit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Unrestricted model

A

A regression model with the complete set of independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the principles for proper regression model specification?

A

Economic reasoning behind variable choices, parsimony, good out-of-sample performance, appropriate model functional form, no violations of regression assumptions

These principles ensure the model is effective and reliable in making predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are common failures in regression functional form?

A

Omitted variables, inappropriate form of variables, inappropriate variable scaling, inappropriate data pooling

These failures can lead to violations of regression assumptions, impacting model accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is heteroskedasticity in regression analysis?

A

When the variance of regression errors differs across observations

It can affect the validity of statistical inferences made from the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is unconditional heteroskedasticity?

A

When the error variance is not correlated with the independent variables

It does not create major problems for statistical inference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is conditional heteroskedasticity?

A

When the error variance is correlated with the values of the independent variables

This type of heteroskedasticity can lead to inflated t-statistics and Type I errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How is conditional heteroskedasticity detected?

A

Using the Breusch–Pagan (BP) test

This test assesses the correlation between error variance and independent variable values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How can the bias created by conditional heteroskedasticity be corrected?

A

By computing robust standard errors

This adjustment helps ensure accurate statistical inference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is serial correlation (or autocorrelation) in regression?

A

When regression errors are correlated across observations

It is particularly problematic in time-series regressions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the consequences of serial correlation?

A

Inconsistent coefficient estimates, underestimated standard errors, inflated t-statistics

This can lead to misleading conclusions about the relationships between variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What test is used to detect serial correlation?

A

The Breusch–Godfrey (BG) test

This test uses residuals from the regression to check for correlation with lagged residuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does a variance inflation factor (VIF) measure?

A

Multicollinearity

It quantifies how much the variance of an estimated regression coefficient increases due to multicollinearity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does a VIF of 1 indicate?

A

No correlation between the variable and other regressors

This suggests that multicollinearity is not a concern for that variable.

24
Q

What VIF value indicates serious multicollinearity requiring correction?

A

VIFj > 10

This level of multicollinearity can significantly affect the reliability of the regression model.

25
What are possible solutions to multicollinearity?
Dropping regression variables, using a different proxy, increasing sample size ## Footnote These strategies help mitigate the effects of multicollinearity on regression analysis.
26
27
What should the model be grounded in?
Economic reasoning ## Footnote Economic reasoning provides the rationale for the choice of variables in the model.
28
What does it mean for a model to be parsimonious?
Each variable included in regression should play an essential role.
29
What is a key performance expectation for a model?
Model should perform well out of sample ## Footnote This means the model may explain only the specific dataset on which it was trained, indicating potential overfitting.
30
What should the functional form of the model be?
Appropriate ## Footnote If a nonlinear relationship between regressors is expected, the model should incorporate the appropriate nonlinear terms.
31
What must the model satisfy?
Regression assumptions ## Footnote If heteroskedasticity, serial correlation, or multicollinearity are detected, it may be necessary to revise regression variables and/or functional form.
32
What is an omitted variable in regression?
One or more important variables are omitted from the regression. ## Footnote May lead to heteroskedasticity or serial correlation.
33
What is an inappropriate form of variables in regression?
Ignoring a nonlinear relationship between the dependent and independent variable. ## Footnote May lead to heteroskedasticity.
34
What is inappropriate variable scaling in regression?
One or more regression variables may need to be transformed before estimating the regression. ## Footnote May lead to heteroskedasticity or multicollinearity.
35
What is inappropriate data pooling in regression?
Regression model pools data from different samples that should not be pooled. ## Footnote May lead to heteroskedasticity or serial correlation.
36
37
What is the impact of lagged independent variable on coefficient estimates?
Invalid Coefficient Estimates: Yes
38
What is the impact of lagged independent variable on standard error estimates?
Invalid Standard Error Estimates: Yes
39
What is the impact of independent variable being a lagged value of the dependent variable on coefficient estimates?
Invalid Coefficient Estimates: No
40
What is the impact of independent variable being a lagged value of the dependent variable on standard error estimates?
Invalid Standard Error Estimates: Yes
41
Breusch - Godfrey (BG) test
A test used to detect autocorrelated residuals up to a predesignated order of the lagged residuals
42
Breusch - Pagan (BP) test
A test for the presence of heteroskedasticity in a regression
43
Conditional Heteroskedasticity
A condition in which the variance of residuals of a regression are correlated with the value of the independent variables
44
Durbin - Watson (DW) test
A test for the presence of first-order serial correlation
45
First-order serial correlation
the correlation of residuals with residuals adjacent in time
46
Heteroskedastic
When the variance of the residuals differs across observations in a regression
47
Model Specification
The set of independent variables included in a model and the model's functional form.
48
Multicollinearity
When two or more independent variables are highly correlated with one another or are approximately linearly related
49
Negative serial correlation
A situation in which residuals are negatively related to other residuals
50
Omitted variable bias
Bias resulting from the omission of an important independent variable from a regression model
51
Positive serial correlation
A situation in which residuals are positively related to other residuals
52
Robust standard errors
Method for correcting residuals for conditional heteroskedasticity. also known as heteroskedasticity-consistent standard errors or white-corrected standard errors.
53
Serial Correlation
A condition found most often in time series in which residuals are correlated across observations. also known as autocorrelation
54
serial-correlation consistent standard errors
method for correcting serial correlation. also known as serial correlation and heteroskedasticity adjusted standard errors, Newey-West standard errors, and robust standard errors.
55
unconditional heteroskedasticity
When heteroskedasticity of the error variance is not correlated with the regression's independent variables.
56
Variance inflation factor (VIF)
A statistic that quantifies the degree of multicollinearity in a model.