1.2 the correlation between two variables Flashcards

1
Q

Correlation

A

measures the linear relationship between two variables

we need to find the covariance first

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

covariance formula

A

sXY = (Sum of all (Xi - Xmean)(Yi - Ymean))/(n-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

correlation formula

A

rXY = sX*sY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

the 4 important properties of correlation

A
  1. The correlation coefficient is bounded by -1 and +1.

-1 <= rXY <=1

  1. A correlation of 0 (i.e., rXY=0) indicates that there is no linear relationship between the two variables.
  2. A positive correlation coefficient (i.e., rXY>0) indicates a positive linear relationship between the variables.

–> In other words, an increase in X is associated with an increase in Y.

–> When rXY=1, the variables have a perfect positive linear relationship

  1. A negative correlation coefficient (i.e., rXY<0) indicates a negative linear relationship between the variables.

–> In other words, an increase in X is associated with a decrease in Y

–> When rXY=−1, the variables have a perfect inverse linear relationship or perfect negative linear relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

if there is no linear pattern, is it appropriate to use the correlation coefficient to test any relationship between variables?

A

nope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Limitations of Correlation Analysis

A
  1. The correlation coefficient is not a reliable measure when the variables have a nonlinear relationship
  2. The correlation coefficient is very sensitive to outliers.

–> Analysts need to justify the inclusion of the outliers in the data or handle them through trimming or winsorization

  1. Correlation does not imply causation
  2. the conclusions on any causal relationships, even if supported by data, may not be valid.

–> A spurious correlation

  1. Correlation may not produce a full picture of the data

–> Different pairs of datasets may have the same correlation but different underlying relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly