L7 - Goodness of Fit Flashcards

1
Q

What is Goodness of Fit?

A
  • The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question.
  • the extent to which a regression model can be said to ‘explain’ the variation in the data. This is the issue of goodness of fit.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you calculate the sun of squared deviations?

A

TSS (total sum of squares)= Σ(Yi - Y(bar))2

  • consists of the sum of the squared deviations of the observations of Y from their sample mean value

ESS (explained sun of squares) = Σ(Yi(hat)- Y(bar))2

  • explained sum of squares and consists of the sum of the squared deviations of the fitted values from the regression equation from the sample mean of the data

RSS (residual sum of squares) = Σ(Yi- Y(hat))2

  • the residual sum of squares which consists of the sum of the squared deviations of the observations of Y from the fitted values

The sum of squared deviations from the mean of a variable can be decomposed as follows:

Σ(Yi - Y(bar))2 = Σ(Yi(hat)- Y(bar))2 + Σ(Yi- Y(hat))2

TSS = ESS + RSS

This decomposition can be used to define the R-squared or coefficient of determination for a regression equation.

R2 = ESS/TSS = 1 - (RSS/TSS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the Properities of R-squared?

A
  1. R-squared always lies in the range zero to one.

2 If R-squared equals one then the regression is a perfect fit to the data (this almost always indicates that there is something wrong with it! –> can never be perfect because of errors).

  1. If R-squared is equal to zero then the regression has no explanatory power.
  2. In multivariate regressions the R-squared will always increase when we add an extra variable (even if that variable is completely irrelevant). –> models not necessarily better just have more variables
  3. Tells us how much we have explained using our regression! –> e.g. 0.36 means 36% of the variance around on the mean is being captured
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the Market Model?

A
  • Let Pk be the share price for company k and Pm be an index forthe market as a whole. The returns from holding equity in company k and the overall market portfolio can be measured aw:

Rtk= ΔLn(Ptk) and Rtm= ΔLn(Ptm)

(the first difference of their logarithms)

We can model the rleationship between these as:

Rtk= α + βRtm + ut

If we estimate this model, then our estimate of β will indicate how the share price moves with the market and the R-squared for the regression will indicate how much of the variance of the shareprice is due to market movements. Rtm expected return of the market and Rtk expected return on a security

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Capital Asset Pricing Model (CAPM)?

A
  • the Capital Asset Pricing Model (CAPM) describes the relationship between systematic risk and expected return for assets, particularly stocks. CAPM is widely used throughout finance for pricing risky securities and generating expected returns for assets given the risk of those assets and cost of capital.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the role of β has with the CAPM?

A
  • Relevant measure of a stock’s risk (or company’s risk)
  • Measures volatility, (i.e. how much the price of a particular stock jumps up and down compared with how much the entire stock market jumps up and down.

If a share price moves:

–Exactly in line with the market, then the stock’s beta is 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Is a low value of R2 bad?

A
  • Not necessarily, as if there is no time trend of the data and it is still able to capture a small proportion of the data e.g. 40% the regression maybe a good fit
  • But with a time trend you should expect 90% and above
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are unconditional and conditional variances of Y?

A

The unconditional and conditional variances of Y are defined as follows:

unconditional - value of x is not adding anything to the model –> E(Yi-E(Yi​))2 =E(Yi​ - μY)2 = σY2

conditional - the explanatoru value of x is adding to the model –> E(Yi-E(Yi​|Xi))2 =E(Yi​ - α - βXi)2 = σu2

These are unknown population parameters. However, unbiased estimators for these are given below:

σY2(hat) = ΣNi=1(Yi(hat)- Y(bar))2 = (1/N-1) * TSS

E(σY2(hat))= σY2

σu2(hat) = ΣNi=1(Yi​ - α(hat) - β(hat)Xi)2= (1/N-2)*RSS

E(σu2(hat)) = σu2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do we test if an equations has explanatory power?

A

Suppose we wish to test:

H0u2= σY2

H1u2< σY2

i.e. the null is that there is no difference between the conditional and unconditional variance of Y (does the explanatory value of x have any effect on the model). Under the null hypothesis we can show that:

F = ((TSS - RSS)/1)/(RSS/(N-2)) = ((TSS-RSS)/RSS)*(N-2/1)~F1,N-2

  • we divide by 1 in the case as we only have 1 explanatory variable, the first subscript of F and the number than divides TSS-RSS is how many explanatory variables we have
  • N is number of X variables

This is the F-statistic for a bivariate regression equation. We can compare the test statistic with a critical value from the F tables and reject the null if it exceeds this value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the relationship between the F-statistic and R-squared?

A

F = (ESS/RSS) * (N-2/1) = ((ESS/TSS)/(RSS/TSS)) * N-2/1

= (R2/1 - R2) *(N-2/1)

In multivariate models we can think of the F-test as a test of the joint hypothesis that all the slope coefficients are equal to zero or one of them is different to zero.

Yi = βXi1 + βXi2 + ui

H0: β1=β2= 0

H0: β1 and/or β2≠ 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does the F-test relate to the t-ratio?

A

For a bivariate regression equation, there is also a relationship between the F-test and the t-ratio for the slope coefficient:

F = (ESS/RSS) * (N-2/1) = (β(hat)/SE(β(hat))2= (tβ(hat)​)2

This relationship only holds for bivariate regression equations. Things become more complicated when we move to multivariate regressions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the relationship between the R-squared and the standard error of the regression?

A

σu(hat) = sqrt(σu2) = sqrt(RSS/N-2)

RSS= (1-R2)TSS

∴ σu(hat) = sqrt((1-R2)*TSS/N-2)

= σu(hat) = sqrt((1-R2)*(TSS/N-2)* (N-1/N-1))

as TSS/N-1 = (σY(hat))2

σu(hat) = sqrt((1-R2)*(σY(hat))2*N-1/N-2)

Y(hat)* sqrt((1-R2)*(N-1/N-2))

A similar relationship will hold for the multivariate case but we will need to adjust for the loss of degrees of freedom when we introduce extra regressors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly