Chapter 11 - panel data Flashcards by Martin Sund

what is cross sectional data

data that spans different sort of categories at the same point in time, or I suppsoe if time is not a variable.

How well did you know this?

Not at all

Perfectly

what is panel data

combination of cross sectional and time series data

How well did you know this?

Not at all

Perfectly

name some examples of cross sectional variables

market cap

PE ratio

age of the firm

Shit that doesnt really change much dynamically.

How well did you know this?

Not at all

Perfectly

how do we refer to panel data

“a panel of data contains both tieme series data and cross sectional data”

How well did you know this?

Not at all

Perfectly

what is a pooled regression

pooled regression takes all observations into the same dataset, and fit models based on that. The outcome is that we assume all relationships are consistent, because something like OLS will treat it this way.

For instance, if one variable is PE ratio and SIZE, one assume that each firm has the same relationship between “return” and size and PE.

How well did you know this?

Not at all

Perfectly

elaborate on what actually consistutes panel data

it is about the structure. Contains both time and cross sectional on the variable. Meaning, the data we are looking at has sort of 2 dimensions, where one is time and one is something else (entity).

How well did you know this?

Not at all

Perfectly

what is cross sectional data

Cross sectional data is data across many entities, like firms or entities, AT a single point in time.

A good example is a snapshot of a population’s obesity. We’d draw a sample at some point in time, and make models trying to explain what percentage is obese etc. We could also try to predict obesity based on the cross sectional information.

However, this does not provide us any insights in the trend. We do not know whether obesity is increasing or decreasing. This is because we drew all the data from a single point in time, as it is cross dectional data.

Therefore, cross sectional data contrast with time series data.

Time series actaully refers to a single entity, and how it evolve through time. Therefore, a time series would give us answers like “how does the obesity of this person/entity evolve”.

Panel data allows us to observe how entities evolve over time and to identify patterns that generalize across the population, controlling for individual-specific traits. This makes it possible to study both within-entity dynamics and between-entity relationships.

How well did you know this?

Not at all

Perfectly

give teh simplest setup of panel data model

y_{it} = alpha + beta x_{it} + u_{it}

So, we have multiple entities in the time series, given by the variable “i”, and we have records for these at different points in time, given by “t”.

alpha is a constant, and is same for all.

beta is a vector, one value for each parameter.

Such a model assumes that a relationship that holds for a single entity also holds for the entire population.

How well did you know this?

Not at all

Perfectly

main limitation of the pooled regression OLS approach+

assumes that patterns remain the same through time and entiteis

How well did you know this?

Not at all

Perfectly

why not use all as independent time series

then we dont generalize anything. A goal is to provide insights on the population as a whole

How well did you know this?

Not at all

Perfectly

what is meant by the data in this chapter

“A panel of data” refers to the data of combined time series and cross sectional

How well did you know this?

Not at all

Perfectly

what two primary methods are we working with+

Fixed effect method

random effect method

How well did you know this?

Not at all

Perfectly

what is a balanced panel?

A balanced panel has the same number of time series observations for each cross sectional variable unit.

How well did you know this?

Not at all

Perfectly

what is an unbalanced panel?

different number of time series observations for the various cross sectional entiteis

How well did you know this?

Not at all

Perfectly

elaborate on SUR

Models each entity separately, so that each entity gets its own fitted model. However, the coefficeints are constant over time.

We assume correlated error across individuals.

We use GLS to transform the regressions based on the correlation between errors. The otucome is a regression where al lthe errors are uncorrelated.

So we still have one regression per entity, and the regressions are still time-invariant. The key now is that we have weighted in the correlation between entities.

How well did you know this?

Not at all

Perfectly

is SUR good?

if the covariance matrix of the errors is true, then it produce the true result. However, this is unlikely in practice.

How well did you know this?

Not at all

Perfectly

limitations of SUR

Study These Flashcards

1) Then number of time series points must be at least as large as the nmber of search units. This is one limitations.

2) The covariance matrix of the errors must be computed. An entity of T observations has T errors. We have N entiteis. Therefore, we have NT error terms. To get the covariance matrix, we’d have NT x NT matrix. This can be insanely large.

elaborte on the size of the covariance matrix using SUR

Study These Flashcards

the book and professor claim NTxNT because we have NT errors etc.

However, SUR assume that errors are independent through time. Therefore, we’d only need NxN matrix to get the covariance matrix.

however, if we do not assume this, we need to perform the entire coviarnace matrix to allow GLS to re.qeight properly.

briefly introduce “fixed effects method”

Study These Flashcards

allows the intercept to change cross-sectionally, but not through time.

Slope estimates remain fixed throgh time and cross sections.

So, each entity gets its own intercept term.

elaborate on the fixed effects method

Study These Flashcards

we take the error term, u_{it}, and decompose it into two parts:

1) mu_i
2) v_{it}

mu_i is the new intercept that is entity specific.

The error term remains the same as before, sort of. it still has the same interpretation of “encapsulating everything that is not explained about y_{it}”.

The model could then be estimated using a dummy variable apprach, where we have a dummy for each entity. This is called LSDV. Least squares dummy variable.
Such a model still keeps the variables as always, but hte intercepts are now used with dummys.

elaborate on testing the fixed-effect method for whether it is actually necessary in regards to panel data

Study These Flashcards

Since the slope parameters are fixed through time and entities, the intercepts are the only thing making it different than regular OLS.

As a result, we can treat the regular OLS version as a restricted variant of the fixed-effects LSDV approach. The restriction is that all intercepts must be equal.

We can use the Chow test for this. If the test is not rejected, it measn that the parameters are not significantly different from each other, and we can suffice with a pooled regression.

how many parameters must be estiamted with the LSDV method?

Study These Flashcards

n+k

number of entities + number of regressors

what can we do to avoid estimating so many pdummy variable params?

Study These Flashcards

Use the within transformation

elaborate on the within transformation

Study These Flashcards

subtract the time-mean of each entity away from the values of the variable. We’d do this for all variables.

The result is a new demeaned regression.

Why do we want to do this?

Doing this removes the need for intercept terms. if they all have the same mean, which is zero, we know that all theregression lines would go through the origin. As a result, the itnercept is 0.

So, now we have removed the dummy variables.

why would we not use within over LSDV in some cases?

if interpretability of the model is important, we do not want to transform it. For instance, havign the intercepts allows us to compare different baselines.

elaborate on the between estimator

we average each entity's cross sectional variable over time, so that we completely remove the time dimension. Then we perform regular OLS on the aggregated values. The result is a model that attempt to explain variation between entities.

elaborate on the first difference operator

differencing, model explains change instead of absolute values. any variable that do not change over time will cancel out.

what do we actually mean by "fixed effects"?

entity specific intercepts that account for unobserved time-invariant characteristics. These are characteristics that dont change over time for each entity, and could give a bias if left unaccounted for.

does the between estimator remove fixed individual specific effects?

No, becasue it still have an intercept term per entity.

can the within estimator interpret fixed effects?

No, because the mdoel is now explaining how "deviations from a variable's mean affect the dependent variable". It is more relative than absolute.

there are two types of "fixed-effects models", elaborate

1) allow the intercept to change cross-sectionally 2) Allow the intercept to change over time, but not cross-sectionally when we allow the intercept to change cross-sectionally, we are essentially saying that an entity has certain fixed levels of "something" that is important to predict the dependent variable. if we instead only allow the intercept to change over time, then we are making an assumption that certain time specific dynamics cause the total fixed level to change.

how do we setup a model for time fixed-effects?

LSDV approach, but since the number of dummys is T, we use the within transformation.

elaborate on combining fixed effects time and cross-sectionally

it is possible, but we get n+t+k parameters, and the within transformation is more difficult.

alternative to the fixed-effects model

random effects

other word for random-effects model

error components model

what is the intercept and slope structure of the random-effects model

similar to the regualr fixed-effects. There is an itnercept per entity, and this intercept is temporally invariant. The slope structure is also the same, meaning that the relationship between some entity "i" and the dependent variable, is constant through entiteis and time. One therefore assume thtat the impact the variables have on the dependent variables are the same for all entiteis and doesnt change over time.

then what is the difference between fixed effects and random effects

Random effects model assumes that the itnercept conssit of 2 parts: 1) alpha, which is the same for all 2) some random variable epsilon_i, which is entity specific. It will vary cross-sectionally but not through time.

elaborate on the random effect

measures the entity's deviation from the global intercept. this model requires strict assumptions regarding the random effect variabels: 1) The random effects must have mean 0 2) must be independent of the corresponding error terms 3) has constant variance 4) Is independent of the explanatory variables

elaborate on the estimation of coeffietns using random effects model

alpha and beta (global intercept and slopes) are estimated consistently but inefficiently by OLS. There are other issues as well. Therefore, GLS is used.

elaborateo n how we use GLS on random effects

we could use outright GLS, but this sucks because of the size we're working with. Finding the inverse of the covariance matrix etc is difficult. Instead, we leverage the fact that: If we quasi-demean the data using the specific formula for the shrinkage factor, then we obtain a regression that is equivalent to the GLS regression. This is just simply a lot more efficient.

Chapter 11 - panel data Flashcards

(41 cards)