Chapter 11 - Brooks Flashcards
(21 cards)
what is panel data
Panel data refers to a panel of data
What is a panel of data
a panel of data is a dataset where variables are both cross sectional AND a timeseries. Meaning, we consider different entities together into the asme model, and look at how they evolve through time.
Across time and space, is sort of the theme
give the panel data setup
y_it = alpha + beta x_it + u_it
entities are given by the “i”.
time is given by the t.
what is the simplest way to deal with panel data?
Create a pooled regression. This would assume that every entity follow the same relationships, so that we generalize the constants.
Would be done using OLS
what happens if we use a pooled regression on the panel of data?
We assume that the relationships between the dependent variable is fixed through time and entities. This is a very static model, and is basically not a time series model.
why not create a regression for each entity?
then we wont be able to leverage common structure
name one approach to leverage a panel of data
SUR
elaborate on SUR
Seemingly Unrelated Regression.
It is called SUR because it might seem like the dependent variables are independent, but a closer looks show that they are not independene.t
The idea is to use GLS. Transform the model so that the error terms are no longer correlated. If the error terms were already uncorrelated, running GLS would be identical to running OLS.
There is an issue with SUR: We need to compute COVARIANCE matrix of the residuals. Since we have NT size, we get matrix of NTxNT. This grows very quickly.
elaborate on GLS
A method used to perform regression estimation when there is non-zero amount of correlation between the residuals.
elaborate on the classes of panel data estimator approaches that can be used in finance
1) Fixed effects
2) random effects
can we distinguish certain panels?
Balanced vs unbalanced.
A balanced panel has the same number of time series observations for each cross sectional unit.
elaborate on fixed effects model
we take the standard panel data equation, and then we decompose the error term.
u_it = mu_i + v_it
Essentially, what this does is to allow each entity to have a fixed component that is specific to the entity.
The equation looks like:
y_it = alpha + beta x_it + mu_i + v_it
alpha is the common intercept, and mu_i is entity specific.
When modeling, we can achieve this by treating mu_i is the coefficient of a variable that is 1 if and only if the enttiyy “i” is considerd. In other words, a binary dummy. We include a binary dummy for each entity.
NB: Because this technique will make it so that one of the dummys aloways take the value 1, we need to remove the intercept term.
broadly speaking, how can we test to see if the fixed-effects is necessary?
Test with hypothesis that all the coefficients of the dummys are equal.
If they are equal, we are better of with a pooled regression.
elaborate on the within transformation
We subtract the time-mean from each variable, dependent and independent, including the residual.
Then we obtain a regression on the demeaned variables only.
This cause the slope, in every scenario, to run through the origin. as a result, applying the within transformation removes the need to model in separate fixed-effects for the entities.
BUT: This also means that we lose some interpretation in regards to base levels of the various entities.
elaborate on the between estimator
The between estimator will remove the time dimension by averaging each variable over all time. Thus, we only have cross-sectional data left.
what other than within and between can we use?
First difference.
This entails making the mdoel about explaining change rather than absolute levels.
Why is this useful?
Becasue when differencing, we remove all variables that do nto change over time. Since fixed-effects doesnt change, they are removed and no longer a consideration.
Thus, first difference is an option, as well as within and between, in order to remove the issue of dummys.
if we subtract the mean from data, what do we call it?
Demeaned
elaborate on time-fixed effects models
Opposite of entity fixed effects.
The idea here is that the “time” carry a certain common structure. We model this by including a time-varying intercept:
y_it = alpha + beta x_it + lambda_t + v_it
Having the varying intercept creates most likely an even bigger issue with dummies than the entity fixed-effects model. Therefore, the within transformation is golden.
elaborate on the random effects model
like the fixed-effects model, the random effect approach has a differnt intercept for each entity, and is fixed through time.
However, the entity-specific intercept is thought to come from the common intercept term alpha, plus some random variable epsilon_i that varies cross-sectionally, but remains constant through time.
epsilon_i measures the random deviation from the entity’s intercept term from the global intercept.
We get the model:
y_it = alpha + beta x_it + w_it, where w_it = eps_i + v_it
Now there are no dummy variables.
under this model, a requirement is that the expected value of epsilon-i is 0, has constant variance and is idnependent from the residuals v_it.
We treat the epsilons as random variables.
Then, we perform quasy-demeaning, which is almost the same as demeaning over time (summing over time) but quasi in the sense that we sum oly part of the mean. the part isgiven by a transformation that is provided by a nasty formula. this formukla and quasi-deamning ensure that there is no correlations in the error terms.
how to determine whetehr to use fixed effects or random effects?
It is generally said that random effects is preferred when the entities picked are randomly selected from a sample. Fixed effects is preferred when the sample represent the population well, usually in the case where we actually have access to the entire population (like all stocks in an index etc).
The Random effects model has a major drawback in that it is only valid when the composite error term w_it (eps_i + v_it) is independent of the explanatory variables . Thi is more harsh than for fixed effects, because it requires that both v_it and eps_i are independent of the variables, while fixed effects only require v_it to be independent.
A version of the hausman test can be used to test for whether w_it is idnepednent or not.
To see the idea of this, consider the case where we have only one explanatory variable. if it is dependent/correlated with eps_i, the model will be biased because it assigns the movement that is actually described by eps_i to the explanatory variable. This makes teh explanatory variable not behave properly. It will try to explain more than it actually can.