Seminar 1 and 2 Flashcards by Alexandru Valentin Gabor

What are the types of data sets?

Cross section data: contains observations for multiple subjects at one point in time
Time-series data: contains observations for one subject at different times
Longitudinal data (panel data): is a combination of the previous two, containing observations for multiple subjects at different times.

How well did you know this?

Not at all

Perfectly

What are the types of variables?

There are 2 types of variables:

Quantitative
-discrete (finite numbers): 1,2,3 etc
-continous: can take any real value within an interval: 0-14; 14-31 etc
Qualitative
-nominal: nationality, gender etc
-ordinal: variables have a speciffic order (job rank for example)

How well did you know this?

Not at all

Perfectly

When working with files in Eviews:

How should you structure the date based on the data type?

If you work with cross sections: unstructured/undated

If you work with time series: dated

If you work with panel data: undated

How well did you know this?

Not at all

Perfectly

What is the coefficient of variation used for?

The coefficient of variation is used to determine if a distribution is heterogenous or homogenous

coeff = st dev/mean

If<30: homogenous (not represent)

If >30: heterogenous (representative)

How well did you know this?

Not at all

Perfectly

What is the median?

The median splits the distribution into two equal parts (it is the 50th percentile)

How well did you know this?

Not at all

Perfectly

What does skewness represent?

Skewness measures the symmetry of a distribution:

There are 3 types of skewness:

Positive skew (>0)
* tail goes to the right
* mode is to the left
* mean is to the right
Symmetrical (0)
Negative skew (<0)
* tail goes to the left
* mode is to the right
* mean is to the left

How well did you know this?

Not at all

Perfectly

What is the mode?

The mode shows the most frequent value in the distribution

How well did you know this?

Not at all

Perfectly

What does kurtosis represent?

Kurtosis measures the peakness/flatness of the distribution.

There are 3 types of kurtosis:

Platykurtic (low peakness) (<3)
Normal (mezokurtic) (=3)
Leptokurtic (high peakness) (>3)

How well did you know this?

Not at all

Perfectly

What are the characteristics of the normal distribution?

The bell curve

The bell curve has a skewness of 0 (symmetrical) and a kurtosis of 3

How well did you know this?

Not at all

Perfectly

How do you create a logaritmic variable in Eviews?

series (name of new variable)=log(old variable)

How well did you know this?

Not at all

Perfectly

How does the logaritmic function affect a distribution?

A logaritmic function smoothens the distribution
It makes the distribution look closer to the Gauss Laplace curve
It removes outliers

How well did you know this?

Not at all

Perfectly

What is the range of a graphical representation?

The range is defined as:

maximum - minimum

How well did you know this?

Not at all

Perfectly

What are the estimation strategies when running a regression model?

The speciffic to general approach
The general to speciffic approach
Keep it as general as possible

How well did you know this?

Not at all

Perfectly

How do we estimate a regression model when using the speciffic to general approach?

By using the ommited variable test we include variables that are statistically significant.

NULL: variable is not significant
ALTERNATIVE: variable is significant

How well did you know this?

Not at all

Perfectly

How do we estimate a regression model when using the general to speciffic approach?

By using the redundant variable test we exclude/drop redundant varaibles from the model.

NULL: variable is redundant
ALTERNATIVE: variable is not redundant (is significant)

How well did you know this?

Not at all

Perfectly

What are the selection critera to discriminate between models?

Study These Flashcards

Maximization criteria
- maximize R^2 (increases with more variables)
- maximize R^2 adjusted (adds a penalty to account for R^2 problem)
- maximixe F statistic (significance of model)
Minimization criteria
- minimize AIC (decreases with more variables)
- minimize SIC (adds a penalty to account for AIC problem)
- minimize HQIC

If R^2 is too high, the model has some critical issues. A good model has an adjusted R^2 between 0.3 - 0.6/0.65

When do we accept/reject the NULL hypothesis?

Study These Flashcards

If P>5%, we ACCEPT the NULL
If P<5%, we REJECT the NULL

What is F-statistic?

Study These Flashcards

F-statistic shows the overall significance

NULL: all betas are 0
ALTERNATIVE: at least one beta is different from 0 (there is significance)

What are residuals?

Study These Flashcards

Residuals are the difference between actual data (data that we have in our database), and the predicted values of a model (through OLS)

Residuals=actual-predicted

What are the kinds of residuals?

Study These Flashcards

Positive residuals: in this case the OLS regression underpredicts the dependent variable
Negative residuals: in this case the OLS regression overpredicts the dependent variable
Residuals = 0: in this case we have a perfect prediction (unlikely)

What is OLS?

Study These Flashcards

OLS is an estimation method, by which we try to estimate a linear trend, by minimising the distance between actual and predicted valuues.

Resulting in an OLS regression

What are the assumptions of the OLS model?

Study These Flashcards

The linearity of the model
Observations must be independent from each other (random data from population)
Residuals must be independent
Perfect or near multicollinearity should not exist
Homoskedasticity needs to be present in the model
Error terms should be approximately normally distributed

When all 6 assumptions are met the OLS estimators are considered BLUE (Best - Linear - Unbiased - Estimator)

How can we verify the 1st assumption:

The model should be linear

Study These Flashcards

We can check the appearance of the scatter plot
We can run the Ramsey-Reset test

NULL: model is linear
ALTERNATIVE: model is not linear (we need to change the functional form of the model)

This can be done by making a log function, or raising to the power of 2 etc

How can we verify the 2nd assumption:

The observations must be independent from each other

Study These Flashcards

We compare the individual sample with the common sample

How can we verify the 3rd assumption: Residuals should be independent

* Make a scatterplot of residual VS fitted values If the regression line is horizontal, there is no relationship

How can we verify the 4th assumption? Perfect or near multicollinearity should not exist

Before estimating the model: use correlation matrix After estimating the model: use variance inflation factors test ## Footnote If we have coefficients higher than 5, we have multicollinearity

How can we verify the 5th assumption? Homoskedasticity needs to be present in the model

* Make a scatter plot of residuals against fitted values (if pattern looks like a cone, variance is not constant) * Run the BP/White test (Heteroskedasticity test) ## Footnote NULL: residuals are homoskedastic ALTERNATIVE: residuals are heteroskedastic

What is homoskedasticity?

Homoskedasticity reffers to the situation in which we have a constant variance of the residuals (homoskedasticity may be accepted in the case of large datasets, with more than 30 observations)

What is heteroskedasticity?

Heteroskedasticity reffers to the situation in which the variance of residuals is not constant. This affects the standard errors and p-values of the model.

What is the correction for heteroskedasticity?

If residuals are heteroskedastic, we can apply the HAC correction.

How can we verify the 5th assumption? Erorr terms should be approximately normally distributed.

* Make a histogram of the residuals * Run the Jaque-Bera test ## Footnote NULL: residuals follow the normal distribution ALTERNATIVE: residuals do not follow the normal distribution

What are the particularities of a BLUE estimator?

* When we have a blue estimator we can say that the beta and alpha of our estimation is close to the beta and alpha of the population level. * If the first 4 assumptions are met, the model is unbiased * If all 6 assumptions are met, we can rely on F-statistics

Seminar 1 and 2 Flashcards

(32 cards)