L3 - Estimation of Regression Parameters Flashcards

1
Q

How can the bivariate linear regression model be wrote?

A

Yi=α + βXi + ui

i = 1, K, N

  • X is the independent or explanatory variable
  • Y is the dependent or explained variable
  • u is a random error or disturbance
  • α and β are parameters which characterise the relationship between Y and X. The parameters are not observable directly.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is regression analysis useful?

A
  • Regression analysis is the most important tool which economists use to quantify their models.
  • Economic theory provides explanations of linkages between variables of interest e.g. the relationship between consumption expenditures and disposable income.
  • However, theory rarely gives precise values for the size of the response of one variable to another. For this we must turn to econometrics and, in particular, to regression analysis.
  • The regression model provides a mechanism by which the response of one variable to another can be quantified and evaluated from a statistical perspective.
  • It therefore acts as one of the key items in the toolkit of the applied social scientist and the objective of this chapter is to discuss how it can be used sensibly in the investigation of economic relationships.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are two interpretations of the regression model?

A

1 - The X values are chosen by the investigator e.g. by a process of experimentation.

  • In this case the X variable is not random and can be treated as being ‘fixed in repeated samples’ 2
  • The X and Y variables are jointly distributed random variables with cov(X,Y) ≠ 0 (covariance)
  • This is more realistic for economic data but harder to deal with when deriving the distribution of estimators
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Who solved the issue of solving linear regression?

A
  • Mayer’s (1750) solution. Form linear combination of equations to reduce number of equations to number of unknown coefficients.
  • He would write out each value of x and y and its corresponding algebraic equation
  • each had the variable is an estimation of the regression line (α(hat) and β(hat))
  • He would then take an average of these to reduce the number of coefficient and solve ( in this example simultaneously)
  • These estimates are unbiased estimates of the population parameters.
  • However, there are an infinite number of linear combinations which are consistent with this procedure.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Method of Least Squares?

A
  • An estimator is a rule for calculating an estimate of an unknown value using observable data. Mayer’s method gives us a possible estimator but this is not unique.
  • An alternative method is to choose estimates of the parameters which minimises the residual sum of squares:

min{α(hat),β(hat)} RSS = ΣNi=1(Yi -α(hat)- β(hat)Xi)2

  • This is the least-squares estimator or, as it sometimes referred to, the ordinary least squares (OLS) estimator.
  • OLS provides a simple method for the generation of such estimates which, under certain assumptions, can be shown to have the desirable properties that the estimates are both biased and efficient (in the sense that they have the lowest possible variances in the class of unbiased estimators) –> lowest value of d
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Who introduced the Method of Least Squares?

A

This method was first introduced by Legendre in 1805. It improves on Mayer’s method because the variance of the parameter estimates is the lowest possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the least-squares normal equations?

ADJUST

A
  • Minimising the residual sum of squares yields the following pair of equations known as the least-squares normal equations.:

α(hat)N +β(hat)ΣXi = ΣY

α(hat)ΣXi+β(hat)ΣXi2=ΣXiYi

where i = 1,…,N Solving these equations yields the least-squares estimates:

α(hat)= Y(bar) -β(hat)X(bar)

substituing this into the second equation above gives:

β(hat)= (Σ_i=1^N(X{i}-X(bar))(Y{i} -Y(bar)))/(Σ_i=1^N(X{i}-X(bar)^2)

OR

β(hat) = cov (X,Y)/var(X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can the OLS estimates be calculated?

A
  1. Calculate the slope coefficient as the ratio of the sample covariance of X and Y to the sample variance of X –> solve for β(hat)
  2. Calculate the intercept using the property that the regression line passes through the sample means of the data (X(bar) and Y(bar) –> solve for α(hat)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between α/β and α(hat)/β(hat)?

A
  • α and β are population parameters - 𝛼(hat) and 𝛽 (hat) are estimators of the population parameters based on sample data. - The estimators are random variables because they are constructed from the random variables Y and (possibly) X. - The population parameters are not random variables. They are unknown/unobservable parameters which we must estimate using the data available.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What do the different parts of the OLS estimators mean?

A
  • Mean for X{i} = X(bar) = Σ_i=1^N(X{i})/N) - Mean for Y{i} = Y(bar) = Σ_i=1^N(Y{i})/N) - Deviations of X from mean = (X{i}-X(bar)) ∀i - Deviations of Y from mean = (Y{i}-Y(bar)) ∀i - Squared Deviations of X from mean = (X{i}-X(bar))^2 ∀i - Squared Deviations of Y from mean = (Y{i}-Y(bar))^2 ∀i
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Maximum Likelihood?

A
  • The method of maximum likelihood is an alternative way to generate estimates of the unknown parameters. It begins by making an assumption about the distribution of the errors.

Y{i}=α + βX{i} + u{i}

u{i}~N(0,σ{u}^2) E(u{i},u{j}) = 0 ∀ i≠j

  • The errors are assumed to be independent, identically distributed (iid),normal random variables - if data the data collected is (iid) then it is said to be a random sample
  • In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a statistical model given observations, by finding the parameter values that maximize the likelihood of making the observations given the parameters
  • e.g. if we had a set of data which is normally distributed - what values of μ and σ2, is most likely responsible for creaying the data points that we observed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the PDF for the errors in the Maximum Likelihood model?

A

f(u{i})= (1/sqrt(2πσ{u}^2) * exp((-(Y{i}-α -βX{i})^2))/(2σ{u}^2)))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the likelihood function?

A

L(α,β,σ{u}^2)= Π_i=1^N (1/sqrt(2πσ{u}^2) * exp((-(Y{i}-αβX{i})^2))/(2σ{u}^2))) - this shows the joint probability of the errors in PDF form - Taking logarithms of this gives us the log-likelihood function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the log-likelihood function?

A

LL(α,β,σ{u}^2)= -(N/2)Ln(2π) - (N/2)Ln(σ{u}^2) - Σ_i=1^N(Y{i}-α -βX{i})^2))/(2σ{u}^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the method of maximum likelihood involves?

A

The method of maximum likelihood involves choosing estimates of the population parameters which maximise the log-likelihood function. - The first order conditions for a maximum are:

  • dLL/dα = 1/2σ{u}^2*Σ_i=1^N(Y{i}-α -βX{i}) = 0
  • dLL/dβ =1/2σ{u}^2*Σ_i=1^N(Y{i}-α -βX{i}) = 0
  • dLL/dσ{u} =-N/2σ{u}^2 1/2(σ{u}^2)^2*Σ_i=1^N(Y{i}-α -βX{i})2 = 0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

For the method of maximum likelihood what do the first two first order conditions yield?

A
  • These are identical to the least-squares normal equations. Therefore, for normally distributed errors, least squares and maximum likelihood give identical parameter estimates.

ΣNi=1Yi = αML(hat)*N + βML(hat)ΣNi=1<span>X</span>i

ΣNi=1XiYi = αML(hat)*ΣNi=1Xi+ βML(hat)ΣNi=1Xi2

17
Q

For the method of maximum likelihood what does the third condition yield?

A

This is different from the formula normally used for the variance of a least squares regression because it does not adjust for the loss of degrees of freedom when estimating the other regression parameters.

  • The maximum likelihood estimator of the error variance will be biased in small samples. However, the bias will tend to zero as the sample size becomes large.

u2)ML= 1/N*ΣNi=1(Yi - αML(hat)- βML(hat)ΣNi=1Xi)2

18
Q

What are degrees of freedom?

A

Degrees of freedom of an estimate is the number of independent pieces of information that went into calculating the estimate. It’s not quite the same as the number of items in the sample –> d.f.= n-1 for 1 sample under the t-distribution - Another way to look at degrees of freedom is that they are the number of values that are free to vary in a data set.

19
Q

When do you use OLS or ML?

A
  • when the number of observations are large enough you can use both - but when the number of oberservations are small us OLS as ML is biased?
20
Q

What are the regressional residuals defined as in the ML method?

A

The regression residuals are defined as the difference between the actual values and the fitted values from the regression model:

  • u{i}(hat) = Y{i}-α(hat)-β(hat)X{i}
  • These will be the same for both OLS and ML estimates.
  • Note that these are not the same as the equation errors whichdepend on the unknown population parameters rather than the regression parameter estimates. Note also: Σ_i=1^N(u{i}(hat))=0 Σ_i=1^N(X{i}*u{i}(hat))=0 These are true by construction
21
Q

What is the difference between regression residuals and errors?

A
  • error –> is the difference between the data observed and the population regression line using the actual values of α and β
  • residual –> is the difference between the data and the sample regression line using the parameter estimates: α(hat) and β(hat)
  • residual can be referred to as an estimation of the errors
22
Q

What is standard error?

A
  • sometime referred to as standard error of the mean, it is the variability of the mean of data from different samples taken from a single population
  • this is the most basic version but in general, if you have multiple sets of data e.g. medians, and found the standard deviations of of them, you would have found the standard error
  • it is the standard deviation of multiple sample moments taken from one population
  • standard error, in the standard deviation of sample statistics