Regression, event, panel, portfolio Flashcards
Regression analysis shows correlations, not causal relationships. Why?
Because the direction or nature of causality depends on a solid theory, not just statistical modeling.
Explain Exogeneity of covariates
That the error u is not a function of X.
- The covariates (independent variables) don’t contain any information that predicts the error term (u).
- This ensures that the model is correctly specified, and the independent variables only explain the dependent variable, not the errors.
- For every data point, the expected value of the error term, given the independent variables, is zero.
- This ensures that errors are purely random and not systematically related to the covariates.
What is endogeneity?
Endogeneity is when the error term u is related to the independent variables → biased and inconsistent estimates
What is homoskedasticity?
aka constant variance assumption.
Assumes that errors are uncorrelated: the covariance between any error terms is zero. The errors are evenly distributed. When errors are uncorrelated, it ensures that the error terms are independent, meaning one error does not affect the other.
Homoskedasticity ensures that the regression model treats all observations equally. If the variance changes (heteroskedasticity), it can lead to inefficient or biased estimates.
The error u has the same variance given any value of the explanatory variable.
What can you say about data generating process of covariates and errors?
The data in X can be a combination of constant and random variables
- OLS relies on variance in the covariates to estimate the relationship between independent variables and the dependent variable.
- If a covariate doesn’t vary (e.g., all values are the same), OLS cannot estimate its effect because it has no explanatory power.
What does the exogeneity assumption say?
The error term u is unrelated to the independent variables X.
It ensures that the model captures the true relationship between X and Y without bias from omitted variables.
Which are the OLS derivations?
- standard errors
- the t-test
- goodness-of-fit (rsquare)
What is standard errors?
Standard errors tell us how much the model’s predictions and estimates (like the coefficients) might vary due to random noise or limited data.
What is residual variance?
Measures how far the actual data points are from the model’s prediction on average (tells how much error is left after fitting the regression line).
What is the residual standard error?
The average size of the errors in the model’s predictions.
What is the t-test?
T-tests are used in regression to check if a regression coefficient B is significantly different from zero. It helps determine if an independent variable significantly contributes to the model.
The significance level (p-value) should be below 0.05 for a variable to be considered meaningful in most cases.
What is heteroskedasticity?
Occurs when the variance of the error terms u in a regression model is not constant. So the “errors” (mistakes) in your regression model don’t have a consistent spread (their variability changes across observations).
Heteroskedasticity doesn’t bias the regression coefficients but it makes standard errors and hypothesis testing unreliable.
How can heteroskedasticity be addressed?
- Robust Standard Errors: Adjusts the standard errors to account for heteroskedasticity.
- Weighted Least Squares (WLS): Reweights observations to stabilize variance.
What is auto-correlated errors?
Autocorrelated errors occur when the errors (residuals) in a regression model are not independent but instead show a pattern or relationship over time. This violates one of the key assumptions in regression analysis, leading to unreliable results.
How can you solve auto-correlated errors? (2)
- Adjust your model to directly address the source of autocorrelation (e.g., include lagged terms, past values or leads)
- Use robust standard errors (like Newey-West) to correct for the issues in residuals.
What is multicollinearity?
Multicollinearity doesn’t violate the assumption of “no perfect linear dependence” (as long as predictors aren’t perfectly collinear), but it still causes numerical issues in estimating coefficients.
Large standard errors due to multicollinearity make it hard to determine the true effect of each variable, leading to unstable regression results.
Multicollinearity can be measured through VIF, Variance Inflation Factor. High VIF indicates severe multicollinearity and inflated standard errors.
How can you solve multicollinearity?
- increase sample size N (this increases SST)
- remove or combine highly correlated variables
What are irrelevant variables?
An over specified model occurs when there are irrelevant variables included in the model. Irrelevant variable is named z.
Including irrelevant variables (z) does not introduce bias in the coefficient estimates (β). However, it increases variance in the estimates due to sampling error, making the model less efficient.
Over-specifying the model adds unnecessary noise, which can affect the reliability and interpretability of the results.
What is omitted variables?
When the error term u is not purely random noise; it contains an omitted variable z, which creates bias.
Omitted variables creates:
Bias in Coefficients:
Omitting a relevant variable z introduces bias in β^ because the effect of z is wrongly attributed to X.
The bias increases if z is strongly correlated with X or has a large γ (strong effect on y).
In contrast to over-specified models (where coefficients remain unbiased but less efficient), under-specified models produce biased estimates.
Practical Impact:
Omitting relevant variables can severely distort conclusions from the model, leading to incorrect inferences about the relationship between X and y.
What is the difference between sampling error and omitted variables bias?
Sampling error diminishes when sample size N increase. Not the same for omitted variable bias because the bias is systematic and stems from the structure of the model itself caused by leaving out a relevant variable; the omitted variable is correlated with X. X picks up the effects of the omitted variables.
How can you treat outliers?
- Transformation:
Apply mathematical transformations to reduce the influence of extreme values. Example: Use the natural logarithm to compress large values and spread smaller ones.
- Trimming:
Remove extreme values (e.g., top and bottom 5% of the dataset) from the analysis.
- Winsorizing:
Replace extreme values with the nearest non-outlier value. Example: Cap values at the 95th percentile or floor them at the 5th percentile.
What is the constant elasticity model?
constant elasticity model is a type of regression model where the relationship between the dependent variable and the independent variable(s) exhibits a constant percentage change (elasticity).
What is ordinary least squares?
chooses the estimates to minimize the SSR.
Each slope estimate measures the partial effect of the corresponding independent variable on the dependent variable, holding all other independent variables fixed.
What does no perfect collinearity mean?
In the sample (and therefore in the population), none of the independent variables is constant, and there are no exact linear relationships among the independent variables