Bivariate Data Flashcards

1
Q

What are properties of bivariate data?

A

Both variables are considered
Both could be random
Usually displayed on scatter graph
Relationship could be linear (LoBF)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the PMCC?

A

Product Moment Correlation Coefficient is used to test to see how strong a correlation is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the formula for covariance?

A

S(xi - xbar)(yi - ybar) /// n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is covariance?

A

Covariance gives to what extent points are positively or negatively correlated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the formula for PMCC?

A

Covariance /// Standard Deviation

Sumxy /// Sqr(Sumxx x Sumyy)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do the values of PMCC mean?

A

PMCC = -1 (Perfect Negative Correlation)
PMCC = 0 (No Deviation)
PMCC = 1 (Perfect Positive Correlation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What values of PMCC are considered strong correlation?

A

r < -0.7
r > 0.7

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the formula for Sxy ?

A

Sum(xi yi) - Sum(xi)Sum(yi) / n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the formula for Sxx ?

A

Sum(xi2) - Sum(xi)2 / n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is population correlation coefficient estimated?

A

Often, a small sample is used to calculate correlation coefficient ‘r’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the steps to estimating ‘p’ using ‘r’?

A

Write down the null and alternate hypothesis
Write down significance level
Calculate PMCC(r)
Find critical values from table
Compare ‘r’ to critical value and finding conclusion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the formula for Spearman’s Rank Correlation Coefficient?

A

rs = 1 - Sum(di2) /// n(n2 - 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What do the values of SRCC mean?

A

rs = 1 is perfect agreement
rs = -1 is perfect disagreement
rs = 0 is no agreement or disagreement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does perfect agreement mean?

A

Ranking is constant in two data sets
Graph is y = x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does perfect disagreement mean?

A

Where ranking is opposite in two data sets
Graph is y = -x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is association?

A

Association is any relationship between two variables

17
Q

What is correlation?

A

Correlation is a subset of association where eh relationship is linear and variables are random

18
Q

Why is SRCC different to PMCC?

A

In SRCC, association replaces correlation as linear relationship cannot be assumed
rs = +-1 does not imply perfect agreement/disagreement

19
Q

Why are hypothesis tests used on PMCC?

A

Hypothesis tests test whether a ‘r’ value is statistically significant as an indicator for ‘p’

20
Q

What is a constant in all hypothesis tests of PMCC?

A

The null hypothesis suggest that p = 0

21
Q

What is the least squares regression line of y on x?

A

This is the regressive line for which the sum of the squares of the vertical distance of each point is as small as possible

22
Q

What is the formula for the least squares regression line of y on x?

A

y = a + bx

Where:
b = Sum(xy) / Sum(xx)
a = ybar - b(xbar)

23
Q

Why might a LOBF not be appropriate in some situations?

A

If the relationship is non-linear
It may not model some range of values

24
Q

What is the least squares regression line of x on y?

A

The line for which the sum of squares of the horizontal distance of each point is as small as possible

25
What is the formula for least squares regression line of x on y?
x = a + by _Where:_ b = Sum(xy) / Sum(yy) a = xbar - b(ybar)
26
Why is only one regression line used in real life?
Only one regression line is calculable because one of the variables is controlled (not random) Where ‘x’ is controlled, the line is y on x
27
What is the residual?
The residual of a data point is a measure of the “error” from the regression line
28
What is the formula for the residual for y on x?
ri = yi - a - bxi
29
What is the formula for the residual for x on y?
ri = xi - a - byi
30
What is the residual for y on x?
For y on x, the residual is the vertical distance between point and regression line
31
What is the residual for x on y?
For x on y, the residual is the horizontal line between point and regression line
32
What is the coefficient of determination?
Sum(ri2) measures how close points are to regression line
33
What does the value of the coefficient of determination mean?
The closer to 0, the worse the model The closer to 1, the better the model
34
What does the coefficient of determination also equal?
Coefficient of determination = PMCC
35
How can a graph show a set of data is appropriate for PMCC?
If the set of points are approximately elliptical on the graph, it suggests bivariate normal distribution, meaning it is appropriate for PMCC
36
What is the conclusion if the 'r' value is greater than the critical value in PMCC and SRCC?
If 'r' > critical value Reject H0 There is evidence of correlation/association