OLS Flashcards
What is the difference between causal effect and correlation?
Causal effect tells us that changes in one variable (say hot weather) lead to changes in an other variable (say ice cream sales). Correlation is slightly similar as it shows two variables moving in a similar pattern (either negatively or positevely so) but it does not mean that one causes the other, it could mean that there is another factor influencing both (say a higher amount of wasps seems to correlate with a higher amount of ice cream sales but they are both driven by hot weather).
What are Quasi-Experimental Methods?
Research designs that sharecharacteristics with experimental designs but lack full randomization of participants into treatment and control groups. Often involve naturally occurring events that researchers leverage to study the effects of a treamtent. When true randomization is not feasible or ethical.
What is OLS?
Ordinary Least Squares: method used to estimate parameters of a linear model. The OLS method involves finding the values of the regression coefficients that minimize the sum of the squared residuals, where the residual is the difference between the observed and predicted values of the dependent variable. minB0,B1,…Bk sumi=1->n (Yi -(B0 + B1Xi1 +B2Xi2 +…BkXik))^2
What is the line of best fit?
the line which best describes the relationship
between yi and xi. The line of ‘best fit’ is the one that gives the best approximation to all the
data points. It does this by minimising the (squared) distance between the line and all of the data points.
In OLS, how do we define the predicted outcome
yi^ = a + Bxi
What is the residual in OLS
ui = yi - yi^
What is the best line in function notation
y^ = a + Bx
How do you visually represent OLS and line of best fit?
draw graph
Formally, what is the OLS estimate?
The value of a,B that minimises SSR(a,B): a^,B^ = arg min(a,B) SSR (a,B). We find a^ = mean(y) - B^mean(x) and B^ = cov(x,y)/var(x).
What does a^ capture
the intercept
What does B^ capture
The fact that the slope
coefficient relates to what happens to y when x changes.
Why is the simple OLS model not favoured by researchers?
Usually as social scientists we want more than just the best linear
approximation of one variable given another variable (or variables).
We want to say something about the causal effect and for that we need to specify a model
What is the classic linear model?
y = α + β1x1 + β2x2 + … βkxk + u. y, x1, x1, . . . xk and u are random variables.
u is the residual or error term.
α and the βs are referred to as parameters (or coefficients when we
estimate the model) and are real numbers. The model writes the outcome of interest y (e.g. wage in our earlier
example) as a linear function of some explanatory variables x (say age,
gender, education, . . . ) plus a residual or error term u. This is the first
assumption of the model: that the relationship is linear in the parameters.
What is the residual u of the classic linear model?
The residual u can be thought of as standing for ‘unobserved’ – everything
that we think may affect y but we do not obse
How are we able to determine a causal effect?
In order to give the causal interpretation we want to the model, we need to
be able to interpret βk as the marginal effect of xk on y whilst keeping all
of the other variables (xm for m ̸= k) and the error term u constant
i.e. our good old ceteris paribus condition, where ‘all’ includes the
unobservables. In practice we cannot do this since u is unobserved, we cannot hold it
constant.
Why do we need assumptions for OLS/classic linear model?
In practice we cannot hold all other terms cosntant since u is unobserved, we cannot hold it
constant. The model therefore requires us to make some assumptions about the
unobserved u given what we do observe: the xs.
What are the OLS/Gauss-Markov Assumptions?
A1: Linearity of parameters
A2: No endogeneity: E(ui|xi) = 0 (mean independence of error term)
A3: Homoskedasticity: variance of the error terms is constant: var(ui|x1, x2, …, xk) = σ2
A4: Zero covariance between the error terms (independent distribution of errors): Cov(ui, uj) = 0, ∀i ̸= j
A5: The error has a normal distribution ui~N(0,σ2) (for statistical inference)
A6: No multicollinearity of variables. If one variable has a linear
relationship with another then we cannot distinguish between the effects of
each individual variable and so cannot estimate the coefficients.
What is unbiasedness in an estimator?
Bias: the bias of µˆ is given by E(µˆ) - µ. The estimator is unbiased if
E(ˆµ) − µ = 0 i.e. E(µˆ) = µ. Unbiasedness means that if we compute µˆ for
many different random samples then the average of the estimates over
these samples will be the true population parameter µ. Probability density of x¯n the estimator for µ (i.e. x¯n is our estimator µˆ).
x¯n is an unbiased estimator for µ since E(x¯n) = E(x) = µ
What is consistency in an estimator?
Consistency: µˆ is a consistent estimator of µ if
p lim µˆn = µ, limn→Pr (|µˆn − µ| < ϵ) → 1, ∀ϵ > 0, µˆ is a consistent estimator for µ if for any ϵ > 0 the probability that the
distance between µˆ and µ is less than ϵ tends to 1 as the sample size n tends to infinity. That is, as the sample size increases the estimate will converge to the population value. A consistent estimator delivers estimates such that as the sample size
increases the distribution of the estimates is concentrated ever closer to the
single point µ i.e. the variance of the distribution of the estimates
produced by the estimator tends to zero.
What is efficiency in an estimator?
Efficiency: let µ˜ be another estimator of µ and assume that both µˆ and µ˜ are unbiased. µˆ is more efficient than µ˜ if var(ˆµ) < var(˜µ). Efficiency of an estimator is relative to other potential estimators. For a
more efficient estimator the estimates computed for different samples tend
to be more tightly centred around their average than for a less efficient
estimator. Using an efficient estimator means that it is less likely that we
obtain a random sample which yields an estimate far from the
corresponding population value.
What are finite and small sample properties?
Unbiasedness and efficieny, they apply to samples of any size
What is an asymptotic or large sample property?
Consistency which needs at least n>30 to apply
What does the Gauss-Markov theorem tells us?
If assumptions A(1) to A(4) hold, then the OLS estimator is BLUE (best linear unbiased estimator). OLS is an estimator and as usual its estimates vary from (repeated) sample to sample. GMT says that:
the expected values of the parameter estimates across these samples
are equal to the true (population) regression parameters, i.e. OLS gets
it right on average (unbiasedness). OLS has the lowest variance among all linear unbiased estimators, so the probability that in a random sample you get an estimate close to
the true parameter is highest when using OLS (efficiency). OLS is also consistent.
What is the estimate model?
yˆi = ˆα + βˆxi
is an estimate for E(yi
|xi) = α + βxi (which is the CEF if the
CEF is linear and is the best linear approximation to it if the CEF is
non-linear)