Chapter 11 - panel data Flashcards

(31 cards)

1
Q

what is cross sectional data

A

data that spans different sort of categories at the same point in time, or I suppsoe if time is not a variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is panel data

A

combination of cross sectional and time series data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

name some examples of cross sectional variables

A

market cap

PE ratio

age of the firm

Shit that doesnt really change much dynamically.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how do we refer to panel data

A

“a panel of data contains both tieme series data and cross sectional data”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is a pooled regression

A

pooled regression takes all observations into the same dataset, and fit models based on that. The outcome is that we assume all relationships are consistent, because something like OLS will treat it this way.

For instance, if one variable is PE ratio and SIZE, one assume that each firm has the same relationship between “return” and size and PE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

elaborate on what actually consistutes panel data

A

it is about the structure. Contains both time and cross sectional on the variable. Meaning, the data we are looking at has sort of 2 dimensions, where one is time and one is something else (entity).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is cross sectional data

A

Cross sectional data is data across many entities, like firms or entities, AT a single point in time.

A good example is a snapshot of a population’s obesity. We’d draw a sample at some point in time, and make models trying to explain what percentage is obese etc. We could also try to predict obesity based on the cross sectional information.

However, this does not provide us any insights in the trend. We do not know whether obesity is increasing or decreasing. This is because we drew all the data from a single point in time, as it is cross dectional data.

Therefore, cross sectional data contrast with time series data.

Time series actaully refers to a single entity, and how it evolve through time. Therefore, a time series would give us answers like “how does the obesity of this person/entity evolve”.

Panel data allows us to observe how entities evolve over time and to identify patterns that generalize across the population, controlling for individual-specific traits. This makes it possible to study both within-entity dynamics and between-entity relationships.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

give teh simplest setup of panel data model

A

y_{it} = alpha + beta x_{it} + u_{it}

So, we have multiple entities in the time series, given by the variable “i”, and we have records for these at different points in time, given by “t”.

alpha is a constant, and is same for all.

beta is a vector, one value for each parameter.

Such a model assumes that a relationship that holds for a single entity also holds for the entire population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

main limitation of the pooled regression OLS approach+

A

assumes that patterns remain the same through time and entiteis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

why not use all as independent time series

A

then we dont generalize anything. A goal is to provide insights on the population as a whole

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is meant by the data in this chapter

A

“A panel of data” refers to the data of combined time series and cross sectional

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what two primary methods are we working with+

A

Fixed effect method

random effect method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is a balanced panel?

A

A balanced panel has the same number of time series observations for each cross sectional variable unit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is an unbalanced panel?

A

different number of time series observations for the various cross sectional entiteis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

elaborate on SUR

A

Models each entity separately, so that each entity gets its own fitted model. However, the coefficeints are constant over time.

We assume correlated error across individuals.

We use GLS to transform the regressions based on the correlation between errors. The otucome is a regression where al lthe errors are uncorrelated.

So we still have one regression per entity, and the regressions are still time-invariant. The key now is that we have weighted in the correlation between entities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

is SUR good?

A

if the covariance matrix of the errors is true, then it produce the true result. However, this is unlikely in practice.

17
Q

limitations of SUR

A

1) Then number of time series points must be at least as large as the nmber of search units. This is one limitations.

2) The covariance matrix of the errors must be computed. An entity of T observations has T errors. We have N entiteis. Therefore, we have NT error terms. To get the covariance matrix, we’d have NT x NT matrix. This can be insanely large.

18
Q

elaborte on the size of the covariance matrix using SUR

A

the book and professor claim NTxNT because we have NT errors etc.

However, SUR assume that errors are independent through time. Therefore, we’d only need NxN matrix to get the covariance matrix.

however, if we do not assume this, we need to perform the entire coviarnace matrix to allow GLS to re.qeight properly.

19
Q

briefly introduce “fixed effects method”

A

allows the intercept to change cross-sectionally, but not through time.

Slope estimates remain fixed throgh time and cross sections.

So, each entity gets its own intercept term.

20
Q

elaborate on the fixed effects method

A

we take the error term, u_{it}, and decompose it into two parts:

1) mu_i
2) v_{it}

mu_i is the new intercept that is entity specific.

The error term remains the same as before, sort of. it still has the same interpretation of “encapsulating everything that is not explained about y_{it}”.

The model could then be estimated using a dummy variable apprach, where we have a dummy for each entity. This is called LSDV. Least squares dummy variable.
Such a model still keeps the variables as always, but hte intercepts are now used with dummys.

21
Q

elaborate on testing the fixed-effect method for whether it is actually necessary in regards to panel data

A

Since the slope parameters are fixed through time and entities, the intercepts are the only thing making it different than regular OLS.

As a result, we can treat the regular OLS version as a restricted variant of the fixed-effects LSDV approach. The restriction is that all intercepts must be equal.

We can use the Chow test for this. If the test is not rejected, it measn that the parameters are not significantly different from each other, and we can suffice with a pooled regression.

21
Q

how many parameters must be estiamted with the LSDV method?

A

n+k

number of entities + number of regressors

22
Q

what can we do to avoid estimating so many pdummy variable params?

A

Use the within transformation

23
Q

elaborate on the within transformation

A

subtract the time-mean of each entity away from the values of the variable. We’d do this for all variables.

The result is a new demeaned regression.

Why do we want to do this?

Doing this removes the need for intercept terms. if they all have the same mean, which is zero, we know that all theregression lines would go through the origin. As a result, the itnercept is 0.

So, now we have removed the dummy variables.

24
why would we not use within over LSDV in some cases?
if interpretability of the model is important, we do not want to transform it. For instance, havign the intercepts allows us to compare different baselines.
25
elaborate on the between estimator
we average each entity's cross sectional variable over time, so that we completely remove the time dimension. Then we perform regular OLS on the aggregated values. The result is a model that attempt to explain variation between entities.
26
elaborate on the first difference operator
differencing, model explains change instead of absolute values. any variable that do not change over time will cancel out.
27
what do we actually mean by "fixed effects"?
entity specific intercepts that account for unobserved time-invariant characteristics. These are characteristics that dont change over time for each entity, and could give a bias if left unaccounted for.
28
does the between estimator remove fixed individual specific effects?
No, becasue it still have an intercept term per entity.
29
can the within estimator interpret fixed effects?
No, because the mdoel is now explaining how "deviations from a variable's mean affect the dependent variable". It is more relative than absolute.
30