Simple Linear Regression Flashcards

1
Q

what is correlation?

A
  • looking at how two variables are related to each other
    -> we arenā€™t making predictions from one to the other
    -> relationship is symmetrical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is regression?

A
  • trying to predict one variable from another using the model
    -> predict criterion variables from the predicting variable
    -> relationship is asymmetrical
    -> assuming one (the predictor) precedes the other (outcome)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the whole idea of a regression?

A

predict outcome (dependent / criterion variable) from a predictor (independent) variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

whatā€™s an example of a regression question?

A

How can you predict university success from school results?
* Tariff score and Honours Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can we predict regression?

A

Y = b0 + b1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

b0

A

intercept
-> where our line crosses the y axis - itā€™s constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

b1

A

ā€˜gradient/slopeā€™

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how does the ā€˜slopeā€™ work?

A

the gradient of the line has been fitted to the data
* for every unit X goes up
* Y goes up (or down) in line with the gradient

i.e. for every unit of X that does up, Y goes up 0.5 of a unit [thatā€™s the perfect prediction]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

X = 2. What is Y?

A

Y = 0 + 0.5 (2)
Y is 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If b0 = 3.75 and b1 (slope) = .469. An individual scores 7 on their maths test. What is Y?

A

Y = 3.75 + .469(7)
Y = 7.03

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the issue with Y though?

A

fit of our line is not perfect, yet weā€™re interested in being able to quantify the gap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

b0 = 11.35 and b1 = -0.722. What is the equation?

A

Y = 11.35 + -0.722(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the regression outcome?

A

statistics we look at to predict how good our predictor is at predicting our outcome variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the technique about making decisions about the data?

A

aim us to ensure the line of best fit produces a small residual
* not always a good fit but itā€™s the best fit -> we can measure how good a fit is is and estimate how good our regression is (how good is our equation at predicting the outcome -> knowing the predictor)
* and ifā€™s itā€™s significant

There are two outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are these two outcomes?

A

R^2: how good the model / regression is (predicting) [trying to test the null hypothesis that r = 0]
F ratio: is it significant or not [trying to say there is no predictive relationship / variation]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are the questions we are asking ourselves?

A
  • The general question we are asking how good is our model at predicting the actual data (Y, the dependent measure, the criterion variable)?
  • The technical question is how much of the variance in the Y data set can we predict/account for using our model?
  • Outcome of the analysis is what proportion of the variation in the data set can we predict using our model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what can we use to calculate this proportion?

A
  • model
  • data the model produces (the predicted Y score)
  • the actual data (observed/actual Y scores)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

There are different types of variation, what is this called?

A

the residual -> differences between the observed and predicted Y scores
* actual Y score minus the predicted Y score using the equation and X value
* squared to stop them cancelling each other out
* the gap between the actual and the predicted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

the gap before the actual score and the predicted score, what does this tell us?

A

The weaker the prediction, the greater the residual variance
* the bigger the gap between the actual scores and the scores that our model predicts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

if the gap is small?

A

youā€™ve got a good prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

if the gap is large?

A

you donā€™t have a good prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what is variation not predicted by?

A

the model/equation/regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what does the residual tell us?

A

the difference between the score predicted by the equation and the score we actually have

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How do we calculate the SSResidual

A

Y = score for each participant
Ŷ = score for each participant calculated by the equation (predicted Y)
Ŷ- Y = score for each participant calculated by the equation minus score for each participant
(Ŷ - Y)^2 = score for each participant calculated by the equation minus score for each participant squared

The Equation: āˆ‘(Ŷ- Y)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

what is the total variance?

A
  • the total variance of Y scores in the data set
    All the variation that there is to explain
26
Q

how to calculate SS total?

A

sum of (Y-M)^2
* each data point minus the mean for all Y data points

27
Q

So.. How do we figure out the variance?

A

Sum of Squares of the Residual (SS residual): an estimate of the amount of variation that is not predicted by our regression in our sample (gap between the actual and the predicted)

Total Sum of Squares (SS total): an estimate of all the variation in the sample

28
Q

What do we need to find out?

A

an estimate of how of the variation is actually predicted by our model

29
Q

How do we find an estimate of how of the variation is actually predicted by our model?

A

Take away the SS residual from the SS total -> sum of squares model

30
Q

what is SSm (Sum of Square of the Model) / SS reg (Sum of square of the regression)

A

an estimate of the amount of variance explained by the repression or the model

31
Q

How can we calculate SSreg directly

A

take the mean of the actual Y score away from the predicted Y
-> gives you the variance explained by the regression equation or model

32
Q

what SS total?

A

an estimate of all the variance in the data set

33
Q

what is SSm or SSreg

A

estimate of the variance accounted for by the model/regression (gives us a idea of the variation explained)

34
Q

What is SSreg/m affected by?

A

sample size and amount of total variation in the sample

  • you canā€™t compare it from different studies and samples as different sample sizes etc produce different estimates -> yet very useful if we want to generalise and compare results

instead we need a standardised measure of the total proportion of the variation explained by the regression

35
Q

what is the standardised measure of the total proportion of the variation explained by the regression?

A

R^2

36
Q

R^2

A

Proportion of the variance predicted by the regression equation
* SSreg divided by SStotal
* Better 1 and 0 -> larger the better
* can be expressed as a percentage i.e. 80% of the variance is explained by the model/regression

37
Q

Sstotal

A

an estimate of all the variance in the data set

38
Q

Ssres

A

A measure of the amount of variance not explained but our regression

39
Q

SSreg or SSm

A

-> an estimate of the variance accounted for by the model / regression
Take SStotal from SSres and that leaves us with the amount of variance explained by our equation

40
Q

R^2

A

Standardise this by dividing SSreg by SStotal
-> what proportion of the total variation is explained by the regression/model

41
Q

what is the F ratio?

A

ratio between variance that is predicted and the variance that is not predicted (error)
* a way to see whether a significant amount of the variance is explained

If F ratio is high -> this means the effect is strong; there is lots of variance explained in relation to the variance that is not explained (we should get a significant result)

42
Q

How do we calculate the F ratio?

A

ā€˜mean square errorā€™
* SS divided by degrees of freedom

43
Q

what is F ratio?

A

the ratio between mean squared error
-> SS / Df

The degrees of freedom for the regression model is simply the number of predictors

SS reg / m divided by the number of predictors in the model/regression (ā€˜kā€™)

44
Q

how many predictors are there in the linear regression?

A

1

45
Q

SS res divided by N minus the number of parameters in our model. What else is there?

A
  • These are the intercept and the predictors
  • There is always one intercept and ā€œkā€ number of predictors
46
Q

how many degrees of freedom is the F ratio reported with?

A

2 -> for each of the mean squared errors (df Msreg/m , df of Msres)

47
Q

What does it mean if the F value (found in the F table) is large and the p value is significant

A

itā€™s predicting a significant amount of the variance -> a lot of variance too

48
Q

what does the p-value mean?

A

tells us that the result is significant
-> allows us to make decisions about the null hypothesis

49
Q

If p < 0.05

A

can reject the null hypothesis

50
Q

in regression, the null hypothesis means

A

the variance explained by the model is 0

51
Q

in t-tests, the null hypothesis means

A

there is no difference between the two means (or that the data comes from the same population)

52
Q

F

A

The ratio of the Mean square model (or ā€˜regressionā€™) error to the mean square residual error.

53
Q

Big F ->

A

little p values

54
Q

where is the R-squared?

A

in the module summary, sometimes we report adjusted r squared next to it

55
Q

what are the assumptions of a simple regression?

A
  • variable type must be continuous (predictor can be continuous or discrete)
  • non-zero variance: predictors must not have zero variance
  • independence: all values of outcomes should come from a different person or item
  • linearity: the relationship we model is, in reality, linear (x and y is still important to see if thereā€™s a relationship)
  • homoscedasticity: for each value of predictors, the variance of the error term should be constant
    AND independence of errors: Plot ZRESID (y-axis) against ZPRED (x-axis)
  • Normally-distributed errors: the residual (score) must be normally distributed (should form a normal distribution - if they donā€™t then we have some problems with the data)
    ā—‹ Do a normal probability plot or ā€˜saveā€™ the residuals and then compute all the usual tests for normality
56
Q

How to calculate F

A

ssreg/m divided by ā€˜kā€™ -> number of predictors) (1)

SSres divided by N - K - 1 = 2
MSres = answer / 2 = 3

F = 1 / 3

57
Q

Regression

A

way of predicting an outcome

58
Q

SStotal

A

total sum of squares of the differences between data points and the mean of y (all the variance there is to explain/account for)

59
Q

SSres

A

total sum of squares of the differences between the data points and the line of best fit (variation that is not explained by the model)

(an estimate of the variance that is not accounted for by the model/regression)

60
Q

SSmodel/regression

A

difference between SStotal and SSres
-> variation explained by the model

61
Q

R^2

A

SSmodel/regression / SStotal
-> proportion of variance explained by the model)