Linear regression Flashcards

SSR, MSR, R SQ

1
Q

How to quantify the quality of a model and its predictions?

A

By calculating Sum of Squared Residuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you calculate the sum of squared residuals?

A
  1. First calculate the residuals by finding the differences between observed and predicted values.
  2. Then square the residuals and sum up the squared residuals.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sum of squared residuals formula

A

SSR = Sigma(observed - predicted) ** 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What kind of models can we apply SSR

A

All kinds of models - linear or curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do we calculate the residuals - vertical or perpendicular distance to the model

A

By calculating the vertical distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Perpendicular distance to the model is also called as

A

Shortest distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why do we use vertical distance instead of the shortest distance

A

Since the perpendicular or the shortest distance doesn’t give the correct values on x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the problem of SSR?

A

SSR is not easy to interpret since it depends on the amount of data we have. For example - For three data points, SSR is 14. For 5 data points, SSR is 22. It doesn’t imply that the second model is worst than first. Higher the data, worse the result. It only tells us that the model with more data has more residuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Should SSR be low or high

A

The smaller the value of SSR, the better the model fits the data. If SSR is zero, the model fits perfectly to the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to compare two models that may fit to different sized datasets is to calculate

A

Mean Squared Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Formula for Mean squared error

A

SSR/number of observations
sigma(observed - predicted) ** 2/ n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does MSE calculate intutively

A

Average of residual, so MSE is present than SSR which increases when we add more data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why are MSEs difficult to interpret

A

When comparing two models, the values depend on the scale that is used in the models. One model using mm has MSE 4.7 while the other model using meters has MSE 0.0000047

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to overcome the disadvantage of MSE

A

Using R squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How R squared overcomes the issue with MSE

A

R squared is independent of both size of the dataset and scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is R squared calcualted?

A
  1. R squared is calculated by comparing the SSR/MSE around the mean y-axis value. Compare this to SSR/MSE around the model we are interested in. Therefore R squared gives the percentage of how much the predictions improved by using the model instead of just mean.
17
Q

What is the range of R squared values

A

0 to 1

18
Q

When R squared is closer to one it means

A

The model fits the data better than using the mean y-axis value.

19
Q

R squared formula

A

SSR(mean) - SSR(fitted_line)/SSR(mean)

20
Q

SSR(mean) - SSR(fitted_line) - what does it mean

A

Tells us the percentage the residuals around the mean shrank when we used the fitted line.

21
Q

Rsquare = 1 means

A

Fitted line fits data perfectly

22
Q

Rsquare = 0 means

A

SSR(mean) = SSR(fitted_line) - they are both equally good or bad

23
Q

SSR(fiited_line) = 0 mean

A

Fitted line fits data perfectly

24
Q

In what scenarios does Rsquared results have low confidence

A

Small amount of data can have high (close to 1) R squared. Anytime we see trend in a small dataset, it is difficult to have confidence that a high R squared value is not due to random chance.

25
Q

When does R squared result have high confidence

A

When there is large amount of data.

26
Q

Is intuition only way to have confidence in R squared results?

A

No, having large data intuition is not enough. So, statisticians developed p-values.

27
Q

R squared formula using MSE

A

MSE(mean) - MSE(fitted_line)/MSE(mean)

28
Q

Does R squared always compare the mean to a straight fitted line?

A

The most common way to calculate R squared is to compare mean to a fitted line. We can calculate R squared to compare square wave to sine wave.

29
Q
A