AP Stat Ch 3 and 12.2 Flashcards
Explanatory variable
Attempts to explain or influence changes in a response variable.
Independent variable. X axis
Response variable
Measures an outcome of a study.
Dependent variable
Which is explanatory variable:
- Scuba diving: depth and visibility
- World population vs. year
- Amount of rain vs. crop growth
- Height vs. GPA
- Depth
- Year
- Amount of rain
- No association
Scatter plot
The most effective way to display the relation between two quantitative variables measured on the same individuals.
Tips for drawing scatterplot by hand
- Plot explanatory variable on x axis
- Label both axes
- Scale the axes with uniform intervals
- Make plot large enough to see details
Four major features in interpreting scatter plots
Direction
Form
Scatter
Outliers
Direction
A pattern from the upper left to the lower right is said to have a negative direction. A pattern from lower left to upper right has a positive direction.
Form
Approx linear, curved, exponential…
Scatter
Strength of relationship.
Strong to weak on a scale
Positive vs negative association
Positive when above average values of one tend to accompany above average values of other. Slope is positive.
Negative when above average with one accompanies below average of the other variable. Negative slope
Correlation
The correlation, r, is a common measure used to numerically asses the association between two quantitative variables. Measures the direction and strength of a linear relationship. On a scale of -1 to 1.
Indicates direction by its sign and strength by how far r moves away from 0.
Obtained from stat menu, Calc, 8.
Don’t need to calculate by hand, but it is sum of the standard deviations of x times the sum of the standard deviations of y divided by n-1
What happens as r gets closer to 0
Weaker
Stronger as further from zero
Properties of r
- No units
- Doesn’t depend on which variable is x and y as product of scores of x times y is same as y times x.
- Correlation requires both variables to be quantitative
- -1<= r => 1
When r is greater than zero, relationship is positive.
When r less than zero, relationship negative - r only =1 or -1 when the data is perfectly linear.
- Value of r is a measure of the strength of a linear relationship only. Measures how closely the data fall into a straight line. R value near zero doesn’t indicate no relationshop, but rather, no linear relation.
- Not resistant.
Don’t confuse correlation with causation
Just because number of students taking stat has increased and murders are down, doesn’t mean that one causes the other
Regression line
A line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x.
Stat Calc 8
LRSL- least squares regression line
Y hat = a + bx
LSRL form
Y hat = a + bx
A is the y intercept
B is the slope
Y hat is used as a prediction of the model
When interpreting the slope, always mention according to the Model, as x increases by one, the y variable is expected to increase by b.
Extrapolation
Predict based on ref line outside data domain. DO NOT EXTRAOLATE EVER
Residuals
The difference between an observed value of the response variable and the value predicted by the regression line. The vertical distance from the point to the line.
Y minus y hat
What does it mean when residual is pos/neg?
When pos, y is greater than y hat. So value above prediction, above LSRL
when neg, y is less than y hat. So value below prediction, below LSRL.
Important questions to consider with LSRL
- Is linear model really appropriate, or would curved model be better
- Are there any unusual aspects of the data set?
- If we make predictions, how accurate?
Residual plot
Scatterplot of the regression residuals against the explanatory variable. If there is a pattern then it shows that linear is not the best model
If an observation has a positive residual, then…
Y minus y hat is positive. So y is above the expected value. So y is above the line. The prediction is too low.
If an observation has a negative residual, then…
Y minus y hat is negative, so y hat is larger. This means that the predicted value is too high. We are below the predicted value
Only way to tell you if a linear model is the best choice…
RESIDUAL PLOT PATTERN!