UNIT 2 - REGRESSION Flashcards

1
Q

How do you describe a scatterplot?

A

DIRECTION

FORM

STRENGTH

and STRANGE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

describe a scatterplot’s strength?

A

give the r value (if straight),

or say…

“tightly packed… loosely packed”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how do you describe direction?

A

positive or negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how do you describe form of a scatterplot?

A

straight or curved?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is wrong with “for each additional hour studied, a person’s test score will go up by five points?”

A

This implies CAUSATION. this is just a correlation. You should say “on average, students with an additional hour of study time tend to score five points higher” These are all different students, and we don’t know if it CAUSED it. We can only show causation with an EXPERIMENT. This is a study. If students were randomly assigned hours of study time, then you could discuss causation because that would be an experiment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Diff between association or correlation?

A

association is talking about a relationship.

If you see a pattern in the scatterplot, there is an association.

Correlation is an actual calculated number, r, between two quantitative variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is it called the “least squares regression line?”

the LSRL?

A

Because, after you find the mean-mean point, you fix the line so that it minimizes the squared vertical distancesto that line from each point.

It minimizes the squared residuals, the least squares….

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you find outliers in regression?

A

they don’t follow the “flow”

(pinky trick, cover with you pinky.. Then uncover.. Does it follow the flow?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is homoscedasticity?

A

equal scatter along the regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What values can r be?

A

from -1 to +1

(r near 0 is WEAK)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the line that you plot?

A

IT IS A MODEL!

It is the LSRL and it is the model we are talking about

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is a linear model?

A

It is an equation you can use or a line of a graph,

but it is just a model that says what kind of happens,

and can be used to ESTIMATE WHAT MIGHT HAPPEN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does r2 tell us?

(r-squared)

A

It tells us the percent of variablility of y that is explained by the model with x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

If study time vs test score equation is

predicted score = 40 + 15 (study time).

How would you interpret the slope?

A

The model finds that on average, for each additional hour of study time a student has, they tend to score about 15 points more.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

If study time vs test score equation is

predicted score = 40 + 15 (study time).

How would you interpret y intercept?

A

The model predicts that a person who studies 0 hours would score around 40 points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If a linear association between study time and test score has an

r2 =0.85,

How do you interpret this?

( r2 is a.k.a the coefficient of determination)

A

85% of the variability in test score can be explained by study time with the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What if a scatterplot goes straight across horizontally?

A

NO ASSOCIATION.

That would be like height and IQ, they are independent so each height has about the same IQ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the “coefficient of determination?”

A

A fancy name for r2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Does r2 tell direction?

A

NO

r2 is always positive, so you can’t use it to see if the relationship is negative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Can there be a correlation between grade and music preference?

A

No, music preference is categorical.

There is an association, however.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Does the regression line (LSRL) go through a lot of points?

A

No, usually it goes through NONE!

It just goes through the center of the cloud of points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

If r= -0.9 is there a strong, negative linear relationship?

A

Maybe not.

CHECK THE SCATTER. One outlier or typo can make the r value look STRONG.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what is the LSRL

A

the “least squares regression line”

that line you plot

OR

That equation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does r tell us?

(r is a.k.a the correlation coefficient)

A

The direction (+/-) and how strong a LINEAR relationship is between two QUANTITATIVE variables… (when linear)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the “correlation coefficient?”

A

The r value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

which is response?

A

y variable,

the Vertical axis..

It “responds” to the x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Lurking variable: Why are there more ice cream sales on days that there are more surfing accidents? Is the ice cream putting surfers at risk? are people buying ice cream because they got hurt?

A

The WEATHER is the lurking variable.

When it is a nice day, more surfers and more ice creams are sold.

So, the WEATHER causes both to go up and down together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Give example of incorrectly using the word “correlation”

A

“there is a correlation between gender and video game playing”

This person should say “association.”

You can’t say correlation because gender is categorical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What’s wrong: Age and height have a correlation of 2.7

A

WRONG.

Correlation must be between 1 and -1

30
Q

If r2= 0.99 is there a strong positive association?

A

It could be negative. r2 does not tell us the direction because any number squared is positive.

31
Q

What is tricky about a lot of scatterplots you see on the AP test?

A

They are often the residuals plots, and not the actual data. Watch out!

Read the the diagrams carefully.

32
Q

What should we look for in resid plot?

A

Curve or pattern- if you see this you need a better model.

Also, it should have equalish scatter from left to right

It should look RANDOM

33
Q

What if the scatterplot is curved?

A

Either straighten the scatter and fit a line,

or keep it and fit a curve

Try quadreg, cubicreg, lnreg, logreg and check the graph and the r.

34
Q

What is extrapolation?

A

Making predictions outside of the x values you have.

35
Q

does correlation mean causation?

A

NO WAY DUDE

Just because variables go up and down together doesn’t mean one cause the other.

36
Q

What’s up with extrapolation? Is it OK?

A

Not ideal. Sometimes it’s all you can do, but state CAUTION.

37
Q

If something is associated is it correlated?

A

Not necessarily.

It can be associated and have a zero correlation

( parabolic scatterplot)

or categorical variables.

38
Q

Will residual plots always show outliers?

(will outliers always have large residuals?)

A

Usually, but not always. Some points have so much leverage, they pull the line up to it…

39
Q

How can you check if the scatterplot is “straight enough?” for a linear model?

A

Residuals plot fool!

check the resids

40
Q

Give example of correlation without causation and explain the lurking variable.

A

Ski accidents are higher on days with more hot chocolate sales, therefore, hot chocolate must cause ske accidents. (lurking variable: the number of people on the mountain). What is happening is that on days when the mountain is crowded, there are more hot chocolate sales and more ski accidents. So the population on the mountain is causing both to rise and fall together.

41
Q

How do you make a residuals plot? (find RESID?)

A

stat>plot make a scatterplot, but instead of L1 vs L2, change L2 by putting cursor on it and going to 2nd>lists down to RESID.

You can plot L1 vs RESID

or you can plot L2 vs RESID

42
Q

What are some strong r values and some weak r values

A

Strong r values are close to 1 or -1, like -0.83 or 0.94. Weak r values are close to zero like 0.10 or -0.06

43
Q

What point is on every regression line?

A

the mean-mean point. (x bar, y bar).

This point is generally not one of the points on the scatterplot.

Usually none of the scatterplot points are on the regression line.

44
Q

Which is explanatory variable?

A

the x

the horizontal axis.

it “explains” what happens to y

45
Q

What do we want to see in a residuals plot in order to continue with the current model?

A

random scatter. No pattern.

if there is a pattern, then find a new model or proceed with caution.

46
Q

What is a residual?

A

Vertical distance to the LSRL (to the model)

ACTUAL-PREDICTED,

A-P, like this class AP (get it?)

y - yhat

Take y data found and from that, subtract the y you get from plugging the x into the model (equation).

47
Q

If something is correlated is it associated?

A

Yes.

If it is correlated then it must be associated.

However, if it is associated, it may not be correlated.

48
Q

is r sensitive to outliers?

A

yes. A single outlier can make it seem like there is a relationship ( if way out in x direction), or even seem like there is no relationship.

49
Q

what is leverage?

A

Far right or left from the middle.

leverage just means it is far away from x-bar

Some leverage points are not influential if they go along with the flow of the scatter.

50
Q

Interpret residual:

Points below the line

(negative residual)

A

“the model overpredicted”

or

“Actual value was below the the expected (or predicted)”

51
Q

Interpret residual:

Points above the line

(positive residual)

A

“the model underpredicted” or “actual performance was above the expected performance

52
Q

If r= 0.8.

An x value that is 2 standard deviations above the mean in the x direction will have a predicted y value that is _______

A

1.6 standard deviations above the mean in the Y direction

53
Q

Does high r squared mean a good model?

A

CHECK SCATTER FIRST..

Make sure model “FITS” the data.

You should check your scatterplot and residuals plot to make sure model is appropriate and no outliers present… then it means something

So YES, but after you check the resids.

54
Q

How do you interpret slope?

A

For an increase of 1 [unit of x] there is an (increase/decrease) of [SLOPE] [units of y].

You can write “SLOPE UNITS Y/ ONE UNITS X” to help

55
Q

How do you interpret slope EQUATION?

rSy/Sx

A

for each increase of 1 st dev in x direction,

you go r st dev in y direction.

2st dev in x, you go 2r st. dev in y.

3st dev in x, you go 3r st. dev in y.

56
Q

what does influential mean?

A

It impacts the SLOPE.

Influential points with leverage.

It means that the point, when added or removed to data, will influence the SLOPE.

Generally these are outliers in the x direction. Far left or right.

57
Q

if you switch x and y does r change?

A

NO. The strength stays the same.

58
Q

Can you predict an X by using a Y?

A

NOT WITH THE SAME EQUATION!

BE CAREFUL!! Don’t just solve for x…

You have to change the entire equation and start from scratch.

(run LinReg L2, L1)

59
Q

Interpret r squared

A

r squared % of variability in y can be explained by the model with x. The rest is in residuals…

60
Q

If there is a crazy outlier, what can you do?

A

Run the analysis with and without the outlier and write about both.

61
Q

how do you interpret y intercept?

A

The model predicts that if there were no [x stuff] this is how much [y stuff] you’d have

62
Q

First step in interpreting slope

A

Write “slope units y over 1 unit x” and look at it.

Then say “for each unit of x there is a change of “slope” units of y”

63
Q

How do you get equation from computer output?

variable coeff indep: age

constant 7.2

Height 3.5

A

For this case:

predicted age= 7.2 + 3.5 (height)

Under “coeff” go down and left

64
Q

If you switch x and y will slope change?

A

YES (but not just reciprocal)

65
Q

Height and weight has an r value of 0.7. You would expect a person with a height that is 2 st. dev above the mean in height to have a weight that is only___St. Dev above the mean weight.

A

only 1.4 S.D above the mean for weight.

(for each SD in the x direction you change r SD in the y direction)

66
Q

Computer ouput:

What does “constant” mean?

A

It is the y intercept

67
Q

Computer Output:

What is “S”

A

The average, or typical residual..

Standard deviation of the residuals

typical distance from actual value to the model’s prediction.

About how far off your prediction is likely to be.

68
Q

How can you straighten data?

A

Do stuff to the y (square it, root it, log it, etc) and recheck the plot. Remember to put the transformation into your equation.

Example

Sqrt y = 4.33 - 2.03 x

69
Q

if you mult or divide the x’s or y’s (shift/scale) does r change?

A

no. the strength remains the same. (If you log or square it, it will change, but just adding or multiplying won’t change it)

70
Q

What other regressions does your calculator do?

A

Quadreg, cubicreg, lnreg, etc.

just be careful when substituting while writing the equation given.

71
Q

How do you get equation from computer output?

variable coeff indep: doc

constant 0.005

genet - 0.233

A

predicted doc = 0.005 - 0.233 (genet)