Chapter 8/9 Flashcards
(13 cards)
prediction
- when two variables are correlated, we can make a prediction about one variable based on the other. The higher the correlation, the more accurate the prediction
- Prediction will not be exact (unless it’s a perfect correlation) -> it’s just an estimate of what the mean likely is
if correlation is 0, what should you predict?
the mean
line of best fit/regression line
- used to predict result on 1 variable from info from another variable -> can find it using a regression equation
- More accurate at predicting than taking the average of scores on Y that everyone who had the same score on X got
- More resistant to sampling variation
- Can only be used for linear relationships
- Should minimize the sum of squares of discrepancies between the actual value of Y and the predicted values
- Can only be used for normally distributed linear variables -> bivariate normal distribution
- always passes through the centroid (the point of intersection of the X mean and the Y mean), but its slope depends on the value of r
resistant line
- line of best fit calculated using the median instead of the mean
- Deals with outliers, but cannot be manipulated/analyzed any further
error of estimate (residuals)
- difference between predicted value of Y and the actual value
- The mean of the residuals about the regression line will be zero
standard error of estimate
- standard deviation of residuals
- Residuals normally distributed around regression line -> most will be near it
- Can be used to set limits around a predicted score within which a person’s score is likely to fall
- When correlation is linear and residuals are normally distributed:
- Approximately 68% of the observed Y-scores will deviate less than one sYX from their predicted score
- Approximately 96% of the observed Y-scores will deviate less than two sYX from their predicted score
- The standard deviation of the residuals will be less than sY except when r = 0
- When correlation is perfect, the standard error of estimate is zero
- When there is no correlation, the standard error of estimate is equal to the standard deviation (Sy)
homoscedascity
assumption that the variability of Y values around the predicted Y values are the same for all values of X (standard deviation of residuals is about the same for all values of X)
does prediction prove causation?
NO
one interpretation of r
amount of increase in Y that on average accompanies a unit increase in X (when both measures are expressed in standard score form)
regression towards the mean
- when the correlation between variable isn’t perfect, the more extreme the score on one variable, the more likely it will be paired with a less extreme score on the other variable
- The more extreme the value, the greater regression
- The higher the value of r, the less regression
- More regression = more variation
- The mean regress, but the distributions of X and Y still have same variability and central tendency
coefficient of determination
- (r^2)
- gives the proportion of Y variance associated with differences in X
- Ex. If r = 0.5, r^2 = .25 -> 25% of variance in Y is associated with differences in X
coefficient of non-determination
proportion of variance on Y that is not predictable from X
regression and pre-test posttest gains
- When subjects are selected because they deviate markedly from the mean on some measure (eg. Diagnosed with learning disability), regression towards the mean will occur on subsequent dependent measures unless there’s a perfect correlation
- If a program’s mean stays the same (does not regress on 2nd test), program is actually effective -> it overcame regression to the mean