Chapter 11 - panel data Flashcards

(41 cards)

1
Q

what is cross sectional data

A

data that spans different sort of categories at the same point in time, or I suppsoe if time is not a variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is panel data

A

combination of cross sectional and time series data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

name some examples of cross sectional variables

A

market cap

PE ratio

age of the firm

Shit that doesnt really change much dynamically.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how do we refer to panel data

A

“a panel of data contains both tieme series data and cross sectional data”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is a pooled regression

A

pooled regression takes all observations into the same dataset, and fit models based on that. The outcome is that we assume all relationships are consistent, because something like OLS will treat it this way.

For instance, if one variable is PE ratio and SIZE, one assume that each firm has the same relationship between “return” and size and PE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

elaborate on what actually consistutes panel data

A

it is about the structure. Contains both time and cross sectional on the variable. Meaning, the data we are looking at has sort of 2 dimensions, where one is time and one is something else (entity).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is cross sectional data

A

Cross sectional data is data across many entities, like firms or entities, AT a single point in time.

A good example is a snapshot of a population’s obesity. We’d draw a sample at some point in time, and make models trying to explain what percentage is obese etc. We could also try to predict obesity based on the cross sectional information.

However, this does not provide us any insights in the trend. We do not know whether obesity is increasing or decreasing. This is because we drew all the data from a single point in time, as it is cross dectional data.

Therefore, cross sectional data contrast with time series data.

Time series actaully refers to a single entity, and how it evolve through time. Therefore, a time series would give us answers like “how does the obesity of this person/entity evolve”.

Panel data allows us to observe how entities evolve over time and to identify patterns that generalize across the population, controlling for individual-specific traits. This makes it possible to study both within-entity dynamics and between-entity relationships.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

give teh simplest setup of panel data model

A

y_{it} = alpha + beta x_{it} + u_{it}

So, we have multiple entities in the time series, given by the variable “i”, and we have records for these at different points in time, given by “t”.

alpha is a constant, and is same for all.

beta is a vector, one value for each parameter.

Such a model assumes that a relationship that holds for a single entity also holds for the entire population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

main limitation of the pooled regression OLS approach+

A

assumes that patterns remain the same through time and entiteis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

why not use all as independent time series

A

then we dont generalize anything. A goal is to provide insights on the population as a whole

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is meant by the data in this chapter

A

“A panel of data” refers to the data of combined time series and cross sectional

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what two primary methods are we working with+

A

Fixed effect method

random effect method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is a balanced panel?

A

A balanced panel has the same number of time series observations for each cross sectional variable unit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is an unbalanced panel?

A

different number of time series observations for the various cross sectional entiteis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

elaborate on SUR

A

Models each entity separately, so that each entity gets its own fitted model. However, the coefficeints are constant over time.

We assume correlated error across individuals.

We use GLS to transform the regressions based on the correlation between errors. The otucome is a regression where al lthe errors are uncorrelated.

So we still have one regression per entity, and the regressions are still time-invariant. The key now is that we have weighted in the correlation between entities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

is SUR good?

A

if the covariance matrix of the errors is true, then it produce the true result. However, this is unlikely in practice.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

limitations of SUR

A

1) Then number of time series points must be at least as large as the nmber of search units. This is one limitations.

2) The covariance matrix of the errors must be computed. An entity of T observations has T errors. We have N entiteis. Therefore, we have NT error terms. To get the covariance matrix, we’d have NT x NT matrix. This can be insanely large.

18
Q

elaborte on the size of the covariance matrix using SUR

A

the book and professor claim NTxNT because we have NT errors etc.

However, SUR assume that errors are independent through time. Therefore, we’d only need NxN matrix to get the covariance matrix.

however, if we do not assume this, we need to perform the entire coviarnace matrix to allow GLS to re.qeight properly.

19
Q

briefly introduce “fixed effects method”

A

allows the intercept to change cross-sectionally, but not through time.

Slope estimates remain fixed throgh time and cross sections.

So, each entity gets its own intercept term.

20
Q

elaborate on the fixed effects method

A

we take the error term, u_{it}, and decompose it into two parts:

1) mu_i
2) v_{it}

mu_i is the new intercept that is entity specific.

The error term remains the same as before, sort of. it still has the same interpretation of “encapsulating everything that is not explained about y_{it}”.

The model could then be estimated using a dummy variable apprach, where we have a dummy for each entity. This is called LSDV. Least squares dummy variable.
Such a model still keeps the variables as always, but hte intercepts are now used with dummys.

21
Q

elaborate on testing the fixed-effect method for whether it is actually necessary in regards to panel data

A

Since the slope parameters are fixed through time and entities, the intercepts are the only thing making it different than regular OLS.

As a result, we can treat the regular OLS version as a restricted variant of the fixed-effects LSDV approach. The restriction is that all intercepts must be equal.

We can use the Chow test for this. If the test is not rejected, it measn that the parameters are not significantly different from each other, and we can suffice with a pooled regression.

22
Q

how many parameters must be estiamted with the LSDV method?

A

n+k

number of entities + number of regressors

23
Q

what can we do to avoid estimating so many pdummy variable params?

A

Use the within transformation

24
Q

elaborate on the within transformation

A

subtract the time-mean of each entity away from the values of the variable. We’d do this for all variables.

The result is a new demeaned regression.

Why do we want to do this?

Doing this removes the need for intercept terms. if they all have the same mean, which is zero, we know that all theregression lines would go through the origin. As a result, the itnercept is 0.

So, now we have removed the dummy variables.

25
why would we not use within over LSDV in some cases?
if interpretability of the model is important, we do not want to transform it. For instance, havign the intercepts allows us to compare different baselines.
26
elaborate on the between estimator
we average each entity's cross sectional variable over time, so that we completely remove the time dimension. Then we perform regular OLS on the aggregated values. The result is a model that attempt to explain variation between entities.
27
elaborate on the first difference operator
differencing, model explains change instead of absolute values. any variable that do not change over time will cancel out.
28
what do we actually mean by "fixed effects"?
entity specific intercepts that account for unobserved time-invariant characteristics. These are characteristics that dont change over time for each entity, and could give a bias if left unaccounted for.
29
does the between estimator remove fixed individual specific effects?
No, becasue it still have an intercept term per entity.
30
can the within estimator interpret fixed effects?
No, because the mdoel is now explaining how "deviations from a variable's mean affect the dependent variable". It is more relative than absolute.
31
there are two types of "fixed-effects models", elaborate
1) allow the intercept to change cross-sectionally 2) Allow the intercept to change over time, but not cross-sectionally when we allow the intercept to change cross-sectionally, we are essentially saying that an entity has certain fixed levels of "something" that is important to predict the dependent variable. if we instead only allow the intercept to change over time, then we are making an assumption that certain time specific dynamics cause the total fixed level to change.
32
how do we setup a model for time fixed-effects?
LSDV approach, but since the number of dummys is T, we use the within transformation.
33
elaborate on combining fixed effects time and cross-sectionally
it is possible, but we get n+t+k parameters, and the within transformation is more difficult.
34
alternative to the fixed-effects model
random effects
35
other word for random-effects model
error components model
36
what is the intercept and slope structure of the random-effects model
similar to the regualr fixed-effects. There is an itnercept per entity, and this intercept is temporally invariant. The slope structure is also the same, meaning that the relationship between some entity "i" and the dependent variable, is constant through entiteis and time. One therefore assume thtat the impact the variables have on the dependent variables are the same for all entiteis and doesnt change over time.
37
then what is the difference between fixed effects and random effects
Random effects model assumes that the itnercept conssit of 2 parts: 1) alpha, which is the same for all 2) some random variable epsilon_i, which is entity specific. It will vary cross-sectionally but not through time.
38
elaborate on the random effect
measures the entity's deviation from the global intercept. this model requires strict assumptions regarding the random effect variabels: 1) The random effects must have mean 0 2) must be independent of the corresponding error terms 3) has constant variance 4) Is independent of the explanatory variables
39
elaborate on the estimation of coeffietns using random effects model
alpha and beta (global intercept and slopes) are estimated consistently but inefficiently by OLS. There are other issues as well. Therefore, GLS is used.
40
elaborateo n how we use GLS on random effects
we could use outright GLS, but this sucks because of the size we're working with. Finding the inverse of the covariance matrix etc is difficult. Instead, we leverage the fact that: If we quasi-demean the data using the specific formula for the shrinkage factor, then we obtain a regression that is equivalent to the GLS regression. This is just simply a lot more efficient.
41