Flashcards in Regression Deck (14)
How does regression take correlation a step further?
Regression takes correlation a step further by using information about the association between two variables to predict a dependent variable as a function of an independent variable
What is the simplest mathematical model used in regression?
Linear function or 'line of best fit'
What is simple regression? Multiple regression?
Single: one predictor variable
Multiple: two or more predictor variables
What is the regression line and how is it determined?
The linear function that best fits the data (best describes the relationship between predictor and outcome variables )
It is determined by using the method of least squares
What does the least squares procedure do?
minimizes the sum of the squared deviations (residuals) around the line of best fit
What is the 'residual'
the distance between a point and the predicted value on the regression line
What is the predicted value?
the predicted value is actually the mean of all obtained outcome values for a given predictor value
What is Heteroscedastic
Variability of a predicted value is NOT consistent across different values of the independent variable
What is Homoscedastic
Variability of a predicted value is consistent across different values of the independent variable
What are the assumptions of regression?
(1) variables are normally distributed
(2) best fitting function is linear
(4) interval or ratio
What are the Standard error of estimate and R squared and what is the difference between them?
Measures of how well a model predicts the observed data
Standard error: absolute measure of the typical distance that the data points fall from the regression line
-> used to calculate confidence intervals
R squared: relative measure of the percentage of the dependent variable variance that the model explains
e.g. this model explains 70% of the variance in variable Y
What does 'regression' to the mean mean?
Regression has the tendency to predict closer to the mean and away from the extremes (i.e. normally distributed means)
What happens if you restrict the range in a Regression?
the correlation coefficient will underestimate the true degree of relationship