Midterm Flashcards

1
Q

Right vs left skewed distribution

A

Right = mode, median, mean
Left = mean, median, mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the variance

A

The arithmetic average of the squared differences of the data values of the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to calculate standard error

A

Standard deviation divided by the square root of the sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is standard deviation

A

Describes the spread o f values in a continuous distribution - a sample or population
It is used as a descriptive statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a standard error

A

Is used to measure the accuracy of a sample distribution in representing a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you calculate confidence bounds

A

Mean plus minus (t value * standard error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to calculate standard error of the difference

A

Root of (se1 squared plus se2 squared)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to calculate a chi square

A

Sum of ((observed frequency minus expected frequency)squared) divided by expected frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does the chi square tell us about the null hypothesis

A

If the chi square is larger than the critical value of the degrees of freedom, the null hypothesis can be rejected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain type I vs type II error

A

Type II error is one where you fail to reject the null hypothesis when you should
Type I error is one where you reject a null hypothesis when you shouldn’t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the central limit theorem

A

Establishes that means of repeated large samples are normally distributed even when underlying distribution of the data is not normal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a confidence interval?

A

Best guess of finding the mean of a data set and the confidence that it lies somewhere in the desired interval (generally use 95)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Pearson’s correlation coefficient

A

Measure the association between two continuous variables
R is scaled between -1 and 1
R= covariation of x and y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Correlation vs regressions

A

Correlation tells us how strongly associated two variables are
Regression can tell us on average how much a one unit increase in the independent variable changes the predicated value of the dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does the line of best fit do

A

Minimizes the Y distance from each observation to the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why do we use y hat and how does it differ

A

Y hat means we are producing estimated y values
In actual values of y we need the error term so y=a+bXi+ei

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Standard error of the slope

A

Given by the root mean square error over standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

T ratio

A

t= (b - ßH0)/s.e.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the five OLS assumptions

A

Linearity
Mean independence
Homscedasticity
Uncorrelated disturbances
Normal disturbance

20
Q

Explain linearity

A

Linearity - the dependent variable is a linear function of the x’s plus a population error term ex. yea+ß1x1+ß2x2+e

Pertains to linearity in the parameters

21
Q

Explain mean independence

A

Zero conditional mean

The mean value of error does not depend on any of the x’s
Assume that e(€)=0

Most important because violations 1. Can generate large bias in the estimates and 2. Cannot be tested for without additional data

Omitted variable bias
Endogenous bias
Measurement error

22
Q

Explain homoscedasticity

A

The variance of the error cannot depend on the x’s
standard deviation squared is constant
You want homoskedacity
P value has to be >0.05

Non constant variance
Biases the standard errors

23
Q

Explain uncorrelated disturbances

A

Teh value of the error for any observation is uncorrelated with the value of the error for any other observation

Correlated errors can arise from connected observations, causal effects, or serial correlation

They shrink standard errors, observations are assumed to be more independent that they are, type 1 error danger

24
Q

Explain normal disturbance

A

The disturbances ,e, are distributed normally

Only the disturbances not the variables must be normally distributed
Normality is the least important assumption

25
How are the OLS assumptions related
1+2 are unbiased estimators 3+4 are BLUE (best linear unbiased estimator) and standard errors are at least as small are those produced by any other method 5 implies that a t table of z table can be used to calculate p values
26
What does the dummy variable do
Helps with comparison of the means of y for different categories of x
27
Collider bias
Occurs when a treatment (independent) variable and outcome (dependent) variable or factors causing these each influence a common third variable and that variable (the collider) is controlled for by design or analysis More general form of selection bias
28
Post treatment bias
While omitting relevant covariates can lead to omitted variable bias, including covariates that control for you causal mechanism can result in post treatment bias
29
What is multicollinearity and what are the consequences
A situation where more than two explanatory variables in a multiple regression are highly linearly related Does not bias the estimation of your coefficient estimates Inflate standard errors of highly colinear variables Induce unstable estimates
30
which OLS assumptions do time series tend to violate
Mean independence and the independence of errors
31
What is stationarity
A time series is weakly stationary if it’s mean and variance remain constant over time
32
What is a dummy variable
A variable that is coded 1 or 0
33
Regression outlier
An observation where the dependant value y is unusually extreme given its independent value x
34
In which direction is the ß1 biased
ß2>0 and corr(x1,x2)>0 positive ß2>0 and corr(x1,x2)<0 negative ß2<0 and corr(x1,x2)>0 negative ß2<0 and corr(x1,x2)<0 positive
35
Interpret the different long linear relationship
Level level : y=a+ßx one unit change in x leads to a ß unit change in y Log linear: log(y) = a+ßx one unit change in x leads to a 100*ß change in y Linear log : y=a+ßlog(x) one percent change in x leads to a ß/100 unit change in y Log log : log(y)=a+ßlog(x) one percent change in x leads to a ß percent change in y
36
How do we interpret shared terms in non linear regression
If b2 is negative and b3^2 is positive then y is convex (smiley) If b2 is positive and b3^2 is negative then y is concave (frowny)
37
What does a time counter do
It draws out the trends in a time series data
38
What is a unit root
How much of y is explained by the previous y The y is almost exactly the same as it’s previous value Also known as a random walk
39
Random walk
Same value today as yesterday with just a bit of randomness
40
Weakly dependent time series
Covariance stationary time series is weakly dependent if the correlation between x1 and x1+h goes to zero sufficiently quickly as h increases
41
What are the two types of panel data
True panels - longitudinal data measuring the same units repeatedly over time pooled cross sections - random surveys in multiple years with a new random sample each time
42
pooled cross sections
Advantages Are amenable to OLS with only minor complications Increased sample size increases accuracy of estimators and adds statistical power Pitfalls Distributions may change in different years Panel heteroskedaciity
43
Fixed effects/within model
Subtract off the mean value of each group from each observation in a group Equivalent to adding a dummy variable for each group Super power- yields within estimation in which only the variation within groups is used for coefficients
44
What is persistence
Persistence in time series Evers to the continuity of an effect after the cause is removed Often related to the notion of memory properties of time series Has an effect on standard errors and can lead to false positives and negatives If the effect of infinitesimally small shock will be influencing in future predictions of time series for a very long time you will have a persistent time series process
45
How do you deal with persistence
Use lag data Make sure to model your data
46
How do you interpret a marginal effects plot
The y axis is the marginal effect of x on y dy/ dx And the x axis is now the value of the conditioning variable
47
How do you calculate vif
1/1-r^2 1/tolerance Tolerance is 1-r^2