lecture 9 - linear regression Flashcards
(24 cards)
what is the aim of linear regression?
create a linear model that minimises the sum of squared residuals (errors) (SSE)
linear regression is always a comparison to the situation when the independent variable did not exist
what does R refer to in SPSS?
correlation
what does R square mean in SPSS?
coefficient of determination (i.e. explained variance)
what does the adjusted R square mean in SPSS?
adjusted to consider sample size and number of variables
what do we need to predict the next outcome data point based on the independent variable?
need the equation of linear regression
what is the slope-intercept form of a line and what does each letter mean?
y = mx+b
where x is a random variable
m: slope of the line: rise over run
b = y-intercept (where the line crosses the y axis)
what is the linear regression equation?
y = 𝛽_0+ 𝛽_1 𝑥_1+ 𝜀
where 𝛽_0 is b in y=mx+b
and 𝛽_1 is the coefficient of x (m)
what is the univariable linear regression equation?
𝑦= 𝛽_0+ 𝛽_1 𝑥_1+ 𝜀
what is the multivariable linear regression equation?
𝑦= 𝛽_0+ 𝛽_1 𝑥_1+ 𝛽_2 𝑥_2+ 𝛽_𝑘 𝑥_𝑘+ 𝜀
where does 𝛽_1 come from in the table?
e..g mock exam result - unstandardised
what does the coefficient of x mean (𝛽_1)?
one unit increase in x is associated with a 𝛽_1 increase in Y?
where does the constant (𝛽_0) come from in the table?
constant unstandardised
what is linear regression a comparison to?
to the situation when the independent variable did not exist, and we used the mean as prediction
how do you work out the F-score?
F = mean square of SSR/Mean square of SSE
what does the p value tell us?
there is a 0.1% chance that an F-statistic at least this large would happen if the null hypothesis were true
what could we conclude from a 0.001 p value?
our model results in a significantly better prediction of final exam results compared to only using the mean value of final exams.
In short, the linear model overall predicts final exam results significantly
how is the t value worked out?
t=B/standard error
if a predictor is having a significant impact on our ability to predict the outcome then its b should be different from 0.
what is the probability of this t value (or integer) occurring?
The probability of this t value (or larger) occurring if the value of b in the population were zero is less than 0.001.
In other words, b is significantly different from 0.
So Mock exam result is a significant predictor of final exam results.
what is correlation (correlation vs regression)?
- bounded measurement that can be interpreted independently of the scale of two variables
- the closer the correlation is to +/-, the closer the two variables are to a perfect linear relationship
what is regression (correlation vs regression)?
Slope gives you a useful quantity interpreted as the estimated value of y for a given value of x.
Specially 𝛽1 tells you the change in the expected value of y corresponding to a one unit increase in x.
This information cannot be deduced from the correlation coefficient alone.
what are the assumptions of linear regression?
- linearity - is the slope actually linear?
- independence - observations/measurements must be independent of each other
- normality of residuals - normal distributed residuals
- mean of zero of residuals - mean 0 across all values
- np major outliers - standardised residuals between +/-2
- constant variance: homoscedasticity - the residuals must have a constant variance across the range of each x variable
- constant variance assumption of general linear regression is violated
which constant variance situation is ideal?
unbiased and homoscedastic
what is multicollinearity?
moderate to high correlations (r> +/- 0.70) among the independent variables = multicollinearity
what is singularity?
when there is a perfect linear relationship between variables, or in terms of correlation, when r = +/-1.00. This is an extreme form of multicolluinearity