LInear regression Flashcards

1
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Linear regression

A

predicts a response variable from values of an explanatory variable by fitting a model of a straight line to the data

(different from correlation as correlation treats both variables equally and measures strength of association whilst regression measures how steeply response variable changes on average with changes in explanatory variable)

Overall:
- Estimate values for slope and intercept using equations based off least squares
- Calculate mean square error (S^2) of slope and intercept
- Calculate SE and CI using S^2 and T values
- Use T test to see if slope or intercept significantly different from 0
- Use ANOVA table to see if a significant amount of variation is explained by the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Fitting the line

A

y = ax + b OR y = mx + c (+error)

Use equation to work out slope and intercept -> base off least square equations

Equation can be used to predict Y hat values from X values (aim is to minimise the error in these predictions)

Can only make reliable predictions w/in range of X – cannot extrapolate (as relationship may not be linear outside range, we don’t know)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Assumptions

A
  • At each x value there is a pop’n of possible Y values whose mean lies on the true regression line – this is assumption that relationship is linear
  • At each value of x, the distrib of possible Y values is normal
  • Variance of Y values is same at all values of X
  • At each value of X, y measurements represent a random sample form the pop’n of possible Y values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Residuals

A

measure vertical deviation of Y from the least squares regression line

Residual = observed value – predicted value = Y-Y ̂

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Testing model significance

A

Error sum of square (SSE/ SSE): sums of difference between obserbed and expected squared or calculated by SST-SSR

Error mean square (MSR/ MSE/ S^2): ESS/ N-2

T test approach:

Estimating SE of slope and intercept requires S^2

Can calculate CI for intercept and slope using SE and T value for T disitrbution (using n-2 df)

Do CI overlap 0?

Use T test to see if slope and intercept are significantly different from 0.
-> Tend to be significant if T value is twice the SE

ANOVA approach
- Split variation into variation explained by regression line and error.
- create ANOVA table

Regression variation: SSR: 1: MSR
Error variation: SSE (= TSS- RSS): n-2: S^2
Total variation: TSS: N-1:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly