W2 Correlations and Predictions Flashcards

1
Q

What does no correlation look like?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does positive correlation look like?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does negative correlation look like?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does covariance tell us?

A

Covariance is a measure of how much two random variables vary together. The magnitude of their relationship. Their directionality ( + or -)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the formula for covariance?

A

SUM[i-n] (x[i] - μ(x))(y[i] - μ(y)) / n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the problems with covariance? σXY

A

The variables are centred, but not to scale. If Cov(X,Y) = 3.9 and Cov(Z,Q) = 5.2, we know both pairs are positively correlated, but we don’t know which one has the stronger correlation, because they could be in different scales/units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to scale covariance with Z scores?

A

z = (x – μ) / σ Divide it by the standard deviation. Standardized scores are called z-scores. SUM[i-n] ZxZy / n #for each x and y in data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to scale covariance from raw scores? ρ

A

Start with covariance. Replace (x - μ)(y - μ) with the ((x – μ) / σ) ((y – μ) / σ) Simplify and it becomes: σxy / Sqrt(σx^2 σy^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do we determine if correlation means causation?

A

Run an experiment, explicitly manipulate independent variable, one at a time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the linear regression formula?

A

Y(hat) = b[0] + b[1]X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What’s the difference between Y and Y[hat]?

A

Y is the actual real life value on plotted on the graph, Y[hat] is the predicted value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are Residuals?

A

Vertical deviations from a point (dot) to the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the formula for SSresidual/SSerror?

A

SUM[i-n] (Y[i] - Y[i hat])^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do we calculate INTERCEPT & SLOPE from SSerror?

A

Start with formula, then sub in b[0] + b[1]X in place of Y[i.hat]. SUM[i-n] (Y[i] - Y[i hat])^2 SUM[i-n] (Y[i] - b[0] + b[1]Xi)^2 Then rearrange to make b1 or b0 the subject. b0 = Y[mean] - b1X[mean]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the key assumptions of linear regression?

A

Linear relationship (straight, not curve)

Homoscedasticity (not a cone, equal distrubution)

Normality of residuals (On both extremes on ends cancel out/match)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does Heteroscedasticity look like vs Homoscedasticity?

A

Homo is even

17
Q

What is confidence?

A

A shaded part of around the line. How sure you are that the values fall within the shaded part. Usually uses a confidence interval of 95%.

18
Q

What is overfitting?

A

When the line fits the graph too strictly and bends for noise.

19
Q

What is multiple regression?

A

When you use many X to predict one Y

20
Q

What is the formula for multiple regression?

A

̂ Y = b0 + b1X1 + b2X2 + .. + bnXn

21
Q

What is ρxy? How do you get there?

A

A centred scaled measure for correlation.

Start with the Covariance. For the z score of x, divide the Xi - Mx by the standard deviation.

Then simplify.

22
Q

How can we calculate intercept and slope using the SSerror?

A
23
Q

What is the formula for the SSerror? (Or SSresidual)

A

It’s the sum of all squared residuals