Seminar 1 and 2 Flashcards
(32 cards)
What are the types of data sets?
- Cross section data: contains observations for multiple subjects at one point in time
- Time-series data: contains observations for one subject at different times
- Longitudinal data (panel data): is a combination of the previous two, containing observations for multiple subjects at different times.
What are the types of variables?
There are 2 types of variables:
- Quantitative
-discrete (finite numbers): 1,2,3 etc
-continous: can take any real value within an interval: 0-14; 14-31 etc - Qualitative
-nominal: nationality, gender etc
-ordinal: variables have a speciffic order (job rank for example)
When working with files in Eviews:
How should you structure the date based on the data type?
If you work with cross sections: unstructured/undated
If you work with time series: dated
If you work with panel data: undated
What is the coefficient of variation used for?
The coefficient of variation is used to determine if a distribution is heterogenous or homogenous
coeff = st dev/mean
If<30: homogenous (not represent)
If >30: heterogenous (representative)
What is the median?
The median splits the distribution into two equal parts (it is the 50th percentile)
What does skewness represent?
Skewness measures the symmetry of a distribution:
There are 3 types of skewness:
- Positive skew (>0)
* tail goes to the right
* mode is to the left
* mean is to the right - Symmetrical (0)
- Negative skew (<0)
* tail goes to the left
* mode is to the right
* mean is to the left
What is the mode?
The mode shows the most frequent value in the distribution
What does kurtosis represent?
Kurtosis measures the peakness/flatness of the distribution.
There are 3 types of kurtosis:
- Platykurtic (low peakness) (<3)
- Normal (mezokurtic) (=3)
- Leptokurtic (high peakness) (>3)
What are the characteristics of the normal distribution?
The bell curve
The bell curve has a skewness of 0 (symmetrical) and a kurtosis of 3
How do you create a logaritmic variable in Eviews?
series (name of new variable)=log(old variable)
How does the logaritmic function affect a distribution?
- A logaritmic function smoothens the distribution
- It makes the distribution look closer to the Gauss Laplace curve
- It removes outliers
What is the range of a graphical representation?
The range is defined as:
maximum - minimum
What are the estimation strategies when running a regression model?
- The speciffic to general approach
- The general to speciffic approach
- Keep it as general as possible
How do we estimate a regression model when using the speciffic to general approach?
By using the ommited variable test we include variables that are statistically significant.
NULL: variable is not significant
ALTERNATIVE: variable is significant
How do we estimate a regression model when using the general to speciffic approach?
By using the redundant variable test we exclude/drop redundant varaibles from the model.
NULL: variable is redundant
ALTERNATIVE: variable is not redundant (is significant)
What are the selection critera to discriminate between models?
- Maximization criteria
- maximize R^2 (increases with more variables)
- maximize R^2 adjusted (adds a penalty to account for R^2 problem)
- maximixe F statistic (significance of model) - Minimization criteria
- minimize AIC (decreases with more variables)
- minimize SIC (adds a penalty to account for AIC problem)
- minimize HQIC
If R^2 is too high, the model has some critical issues. A good model has an adjusted R^2 between 0.3 - 0.6/0.65
When do we accept/reject the NULL hypothesis?
- If P>5%, we ACCEPT the NULL
- If P<5%, we REJECT the NULL
What is F-statistic?
F-statistic shows the overall significance
NULL: all betas are 0
ALTERNATIVE: at least one beta is different from 0 (there is significance)
What are residuals?
Residuals are the difference between actual data (data that we have in our database), and the predicted values of a model (through OLS)
Residuals=actual-predicted
What are the kinds of residuals?
- Positive residuals: in this case the OLS regression underpredicts the dependent variable
- Negative residuals: in this case the OLS regression overpredicts the dependent variable
- Residuals = 0: in this case we have a perfect prediction (unlikely)
What is OLS?
OLS is an estimation method, by which we try to estimate a linear trend, by minimising the distance between actual and predicted valuues.
Resulting in an OLS regression
What are the assumptions of the OLS model?
- The linearity of the model
- Observations must be independent from each other (random data from population)
- Residuals must be independent
- Perfect or near multicollinearity should not exist
- Homoskedasticity needs to be present in the model
- Error terms should be approximately normally distributed
When all 6 assumptions are met the OLS estimators are considered BLUE (Best - Linear - Unbiased - Estimator)
How can we verify the 1st assumption:
The model should be linear
- We can check the appearance of the scatter plot
- We can run the Ramsey-Reset test
NULL: model is linear
ALTERNATIVE: model is not linear (we need to change the functional form of the model)
This can be done by making a log function, or raising to the power of 2 etc
How can we verify the 2nd assumption:
The observations must be independent from each other
- We compare the individual sample with the common sample