Correlation and Linear Regression Flashcards

1
Q

What correlation and linear regressions were discussed in the lecture?

A
  • Pearson’s correlation/Spearman rank correlation
  • Linear regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When do you use Pearson’s correlation and Spearman rank correlation?

A
  • Pearson = normally distributed variables
  • Spearman = non-normally distributed variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does correlation quantify?

A

Quantifies the strength and direction of association between two numerical variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Should correlations be interpreted with caution?

A

Yes, correlations should be interpreted with great care because they do not necessarily indicate causation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some possible reasons for correlations between variables?

A

Correlations between variables can result from:
- a causal relationship
- shared dependency on some third unmeasured variable
- coincidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why are correlated time series unreliable indicators of causal relationships?

A

Correlated time series are unreliable indicators of causal relationships because over time a variable can only follow four possible trajectories (steady state, increase, decrease, or fluctuation), and there are bound to be many coincidences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Pearson’s product-moment correlation (r)?

A

Pearson’s product-moment correlation (r) is a statistical method that compares two numerical continuous variables and ranges in value from -1 through 0 to +1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What rules must be followed when using Pearson’s product-moment correlation?

A
  • the first action should be to draw a scatterplot
  • both variables must be continuous & normally distributed (check for normality)
  • if these assumptions are not met, a Spearman’s rank-order correlation (non-parametric correlation) should be used
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is H0 in Pearson’s product-moment correlation (r)?

A

H0 in Pearson’s product-moment correlation (r) states that the two variables are not correlated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is H1 in Pearson’s product-moment correlation (r)?

A

H1 in Pearson’s product-moment correlation (r) states that the two variables are correlated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the two ways to calculate Pearson’s product-moment correlation (r) in Excel?

A

The two ways to calculate Pearson’s product-moment correlation (r) in Excel are:
- through the Analysis Toolpak (“Correlation”)
- by using the function key (“=CORREL”).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What statement is included if a Pearson correlation test is carried out?

A

Reject H0: “There was a significant correlation between ‘variable 1’ and ‘variable 2’ (r = ___ , df = __, p < 0.05).”

Accept H0: “There was no significant correlation between ‘variable 1’ and ‘variable 2’ (r = ___ , df = __, p > 0.05).”

NOTE: do not state reject/accept H0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are 2 ways we can do the Pearson correlation test in R studio?

A

> cor()
cor.test()
(slide 7)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How much of the variation in one variable can be explained by the other variable if we express the correlation coefficient as r^2?

A

r^2 indicates the proportion of variance in the dependent variable that can be explained by the independent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What caution should be taken when interpreting a significant correlation coefficient with a big sample size?

A

A big sample size can lead to a highly significant correlation but may explain a very small percentage of the variation. Therefore, it is important to carefully evaluate the practical significance of the relationship between the variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What method can be used to model and explain the relationship between two variables once a significant correlation is found?

A

Regression analysis, such as linear regression, can be used to model and explain the relationship between two variables once a significant correlation is found.

17
Q

What is Spearman’s rank-order correlation (rs)?

A

Spearman’s rank-order correlation (rs) is a non-parametric statistical measure that describes the strength and direction of the monotonic relationship between two variables when the data is ordinal or not normally distributed.

18
Q

How is the rank-order correlation coefficient calculated in Spearman’s method?

A

The rank-order correlation coefficient is based on comparing the rank order of the two variables. It ranks the data in each variable and then calculates the Pearson correlation coefficient on the ranks.

19
Q

How can the RANK function be used in Excel to calculate the rank-order correlation coefficient?

A

The RANK function can be used in Excel to rank the data in each variable. Then, the Analysis Toolpak or the function key “Correl” can be used to calculate the rank-order correlation coefficient.

20
Q

What do you include if a Spearman’s rank-order correlation test is carried out?

A

Reject H0: “There was a significant correlation between ‘variable 1’ and ‘variable 2’ (r = ___ , df = __, p < 0.05).”

Accept H0: “There was no significant correlation between ‘variable 1’ and ‘variable 2’ (r = ___ , df = __, p > 0.05).”

21
Q

What are 2 ways we can do the Spearman’s rank-order correlation test in R studio?

A

> cor(comp.dat$Female,comp.dat$Male, method=”spearman”)

> cor.test(comp.dat$Female,comp.dat$Male , method =”spearman”)
(slide 10)

22
Q

What is casual inference?

A

The process of drawing conclusions about the causal relationship between two variables, based on the observed data.

23
Q

What is linear regression?

A

Linear regression is a simple way of modeling cause and effect.

24
Q

What is the cause in linear regression?

A

The cause in linear regression is the independent or predictor variable.

25
Q

What is the effect in linear regression?

A

The effect in linear regression is the dependent or response variable.

26
Q

When is linear regression not appropriate?

A

Linear regression is not appropriate when unsure about which of the two variables is the cause.

27
Q

Why do we use linear regression?

A
  • to determine the form and strength of a relationship between two variables
  • to predict a value of y from a given value of x
28
Q

What is the equation for a line in linear regression?

A

The equation for a line in linear regression is y = a + bx

29
Q

What can we estimate in linear regression?

A

We can estimate both a (intercept) and b (slope)

30
Q

What can we do in linear regression to fit confidence limits?

A

We can fit confidence limits to both a (intercept) and b (slope)

31
Q

What can we test in linear regression?

A

We can test whether each of a (intercept) and b (slope) is significantly different from zero

32
Q

What is the method used for fitting the best line in linear regression?

A

The method used for fitting the best line in linear regression is Least Squares

33
Q

What is the rationale behind using Least Squares in linear regression?

A

To minimise the sum of the squared differences between the observed data points and the predicted values

34
Q

Does a line fitted by eye often look the same as a line fitted by Least Squares in linear regression?

A

No, a line fitted by eye often looks rather different from a line fitted by Least Squares in linear regression.

35
Q

What are residuals in linear regression?

A

Residuals in linear regression are the differences between the observed values and those predicted by the line.

36
Q

How can we evaluate the fit of a linear regression model?

A

We can evaluate the fit of a linear regression model by examining the distribution of the residuals.

37
Q

What can the distribution of residuals tell us about the fit of a linear regression model?

A

The distribution of residuals can often tell us whether a linear model is the best fit for our data.
(slide 10)

38
Q

Why is it important to evaluate the fit of a linear regression model?

A
  • ensure that the model is appropriate for the data
  • identify any potential issues or limitations with the model