lecture 9 - linear regression Flashcards

(24 cards)

1
Q

what is the aim of linear regression?

A

create a linear model that minimises the sum of squared residuals (errors) (SSE)

linear regression is always a comparison to the situation when the independent variable did not exist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what does R refer to in SPSS?

A

correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what does R square mean in SPSS?

A

coefficient of determination (i.e. explained variance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what does the adjusted R square mean in SPSS?

A

adjusted to consider sample size and number of variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what do we need to predict the next outcome data point based on the independent variable?

A

need the equation of linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the slope-intercept form of a line and what does each letter mean?

A

y = mx+b

where x is a random variable
m: slope of the line: rise over run
b = y-intercept (where the line crosses the y axis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the linear regression equation?

A

y = 𝛽_0+ 𝛽_1 𝑥_1+ 𝜀
where 𝛽_0 is b in y=mx+b
and 𝛽_1 is the coefficient of x (m)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the univariable linear regression equation?

A

𝑦= 𝛽_0+ 𝛽_1 𝑥_1+ 𝜀

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the multivariable linear regression equation?

A

𝑦= 𝛽_0+ 𝛽_1 𝑥_1+ 𝛽_2 𝑥_2+ 𝛽_𝑘 𝑥_𝑘+ 𝜀

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

where does 𝛽_1 come from in the table?

A

e..g mock exam result - unstandardised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what does the coefficient of x mean (𝛽_1)?

A

one unit increase in x is associated with a 𝛽_1 increase in Y?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

where does the constant (𝛽_0) come from in the table?

A

constant unstandardised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is linear regression a comparison to?

A

to the situation when the independent variable did not exist, and we used the mean as prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how do you work out the F-score?

A

F = mean square of SSR/Mean square of SSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what does the p value tell us?

A

there is a 0.1% chance that an F-statistic at least this large would happen if the null hypothesis were true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what could we conclude from a 0.001 p value?

A

our model results in a significantly better prediction of final exam results compared to only using the mean value of final exams.

In short, the linear model overall predicts final exam results significantly

17
Q

how is the t value worked out?

A

t=B/standard error

if a predictor is having a significant impact on our ability to predict the outcome then its b should be different from 0.

18
Q

what is the probability of this t value (or integer) occurring?

A

The probability of this t value (or larger) occurring if the value of b in the population were zero is less than 0.001.

In other words, b is significantly different from 0.
So Mock exam result is a significant predictor of final exam results.

19
Q

what is correlation (correlation vs regression)?

A
  • bounded measurement that can be interpreted independently of the scale of two variables
  • the closer the correlation is to +/-, the closer the two variables are to a perfect linear relationship
20
Q

what is regression (correlation vs regression)?

A

Slope gives you a useful quantity interpreted as the estimated value of y for a given value of x.
Specially 𝛽1 tells you the change in the expected value of y corresponding to a one unit increase in x.
This information cannot be deduced from the correlation coefficient alone.

21
Q

what are the assumptions of linear regression?

A
  • linearity - is the slope actually linear?
  • independence - observations/measurements must be independent of each other
  • normality of residuals - normal distributed residuals
  • mean of zero of residuals - mean 0 across all values
  • np major outliers - standardised residuals between +/-2
  • constant variance: homoscedasticity - the residuals must have a constant variance across the range of each x variable
  • constant variance assumption of general linear regression is violated
22
Q

which constant variance situation is ideal?

A

unbiased and homoscedastic

23
Q

what is multicollinearity?

A

moderate to high correlations (r> +/- 0.70) among the independent variables = multicollinearity

24
Q

what is singularity?

A

when there is a perfect linear relationship between variables, or in terms of correlation, when r = +/-1.00. This is an extreme form of multicolluinearity