Chapter 6- Correlation and Regression Flashcards Preview

Statistics > Chapter 6- Correlation and Regression > Flashcards

Flashcards in Chapter 6- Correlation and Regression Deck (28):

What is a univariate distribution?

It is a frequency distribution where you're only working with one variable
Ex. height, time, test scores, errors


What is a bivariate distribution?

A frequency distribution where the scores on the two variables are paired
Bivariate distributions are required in order to use correlation and regression techniques
They also may show positive/negative/zero correlation


What is considered to be a positive correlation and give an example?

It's when high measurements on one variable tend to lead to high measurements on the other variable.
And low measurements on one variable also tend to lead to low measurements on the other variable.
Example- all fathers tend to have sons who grow up to have sons who grow up to be tall men, and
short fathers tend to have sons who grow up to be short men


What is considered to be perfect correlation?

When the coefficient of correlation, r, is equal to 1.00
When there is perfect correlation all points fall exactly on the regression line
The only requirement for perfect correlation is that the differences between the pairs all be the same


What is a regression line also referred to as?

The line of best fit


What is a scatterplot?

It is a graph of the scores of a bivariate frequency distribution


What do you see graphically in a scatterplot with negative correlation?

Increases in one variable are accompanied by decreases in the other variable (they have an inverse relationship).
The line of regression goes from the upper left corner of the graph to the lower right corner, it has a negative slope


Is there a such thing as a perfect negative correlation?

Yes, when r = -1.00


What is r and what does it tell us?

r is the correlational coefficient
It gives us the degree of relationship of two variables. One value on a plot isn't more valuable than the other.
We just want to find how correlated the two factors are to find out how they relate.
And you'd rather have a stronger r value towards 1 or -1, because it helps solidify your data rather than a small


Examples of negative correlation

Daily rain and daily sunshine, grouchiness and friendships, highway driving speed and gas mileage


What does having zero correlation mean?

That the high and low scores on the two variables are not associated in any predictable manner
There is no linear relationship between the two variables
When r = 0.00 the regression line is literally just a horizontal straight line


What are the two computational formulas used to find r?

The blanched formula and the raw-score formula
We're going to be using the blanched formula, because it uses means and standard deviations to find it (bro, but
I really like just using the raw-score formula


What is N when concerned with the computational formulas?

It isn't every single data set or point. It is the number of number PAIRS.


What is r used for sample or population?

r is used for a sample statistic. The population parameter is symbolized by the Greek letter rho (weird p, in physics it's
for density or pressure or somethin')


What is considered small, medium, and large for the correlational coefficient?

Small- 0.10
Medium- 0.30
Large- 0.50


What is the coefficient of determination?

It is literally r squared.
It is an estimate of common variance.


How can you help determine r in a scatterplot?

If you draw a circle/envelope around the scatterplot. It helps the picture become clearer.


How does an envelope around a scatterplot indicate low/high correlation?

The thinner the envelope, the larger the correlation. Because the points are closer to the line of regression.
The larger the envelope, the smaller the correlation.


What does the coefficient of determination tell you and how does it differ from r?

r squared is different because. Okay. You get r. It tells you the proportion of variance among the data, right? Well.
You square it for r squared. This is the amount of common variance. And then you get a number. Let's say you get 0.42.
Subtract this from 1, and you get 0.58 left right?This is the independent variance. This means that 58% of the variance
is due to other factors.


What is independent variance?

It's the r squared subtracted from 1.0. It is the variance that is due to other factors. Variance in one test that is not
associated with variance in the other test


With a ven diagram how can we interpret where the common and independent variances lie?

Common variances are in the middle. Factors that influence both variables.
Independent variances are on the outside. On each variables side. These are the factors that ONLY influence the variable
that they are associated with


How does the coefficient of variance test reliability?

We are able to test the reliability of the experiment of itself. I'm talking about its measuring devices specifically.
Devices such as tests, questionnaires, and/or different testing instruments.
If the devices are reliable they will produce consistent scores that are not subject to change fluctations.
What I mean by this is that if you test it once and get a score, and then test it twice and get a score,
r would still be relatively the same.


What is considered a reliable r value for social science measurements?

An r of 0.8 or higher indicates reliability


What is the golden rule concerning correlation/causation?

Correlation does NOT equal causation.
There are so many other factors that could've been related.
Example- researchers concluded smoking killed people. And although this was true, they concluded too early, because
the people who smoke tended to also be heavily stressed which caused them to smoke in the first place. It is important
to have well formed control groups to prevent these other confounding variables.


What does r look like with a curved regression line?

If a curved regression line fits the data better than a straight line, r will be low, not reflecting the true
relationship between the two variables.
That's why it may be important to look at a scatterplot to determine if the Pearson r value is appropriate for
the data set at hand.


What is a truncated range?

when the range of the sample is smaller than the range of the population


What is a linear regression line and what can you use it to do?

Linear regression is a technique that uses the data to produce an equation for a straight line.
You can use it to make predictions.


What is the regression equation?

It is literally the slope intercept equation. But with a and b.