Lecture 3 - Correlations and Regression Flashcards

1
Q

Define variability and covariability.

A

Variability is much a given varies from observation to observation.

Covariability refers to how much two variables vary together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Correlation does not necessarily imply causation.

True or false?

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to determine the degrees of freedom when doing a correlation?

A

df = n - 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is sample size so important when doing correlation analysis?

A

Because a small sample size could easily yield a correlation that is not actually reflective of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between correlation and regression?

A

Correlation refers to whether there is an association between variables and regression is the predictive model of that association.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In regards to regression what is the least squares solutions?

A

The least squares solution is how the line of best fit (regression model) is determined.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is homosedasticity?

A

Equal variance across observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between correlation and regression?

A

Correlation refers to the association between two variables.
Regression refers to predicting one variable from another (if in fact we can predict one variable from another).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Regression?

A

With regression, we are not asking is there a relationship between two variables, we are asking what is the best linear relationship to describe that association. This is often expressed as y = a + bx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In regards to the typical equation relating to Regression in psychology, where y = a + bx + e, what do the different coefficients refer to?

A

y - the outcome variable
x - predictor variable
a - intercept parameter (sometimes called the constant)
b - slope parameter
e - error or residual term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

With regards to regression in psychology, what are the conditions/assumptions made about the error included in a regression term? i.e. the “e” in y = a + bx?

A

The error or residual term is assumed to be;
1. independent
2. Normally distributed - with a mean of zero
3. Homescedastic - equal error variance for predicted values of Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

With regard to Regression, what is the Least Squares Solution?

A

The Least Squares solution is how the “line of best fit” is determined from observed data, where the the regression line is determined by summing together the error between a predicted value and the observed value, such the when the the differences are squared and summed the the regression line has the smallest value for the sum of squares - hence the “least squares solution” being what this is referred to as.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is a regression line determined from a set of data?

A

The Least Squares Solution is how a regression is determined/arrived at.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Does the slope in a regression model reflect the correlation between the two variables?

A

No. The slope does not predict or reflect correlation.
You could have a steep slope and a poor correlation, or a shallow slope and a strong correlation. The slope is simply a results of the least squares solution used to determine the line of best fit.
The direction of the slope, however, does reflect the correlation, such that if there is a downwards slope then this is reflective of a negative correlation, whereas if there is an upward slope this is reflective of a positive correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In Regression, what does “error” refer to?

A

Error refers to the difference in what the predicted outcome and the observed outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Should a regression always report the variance explained? And if so, what does this mean?

A

Yes.
Variance explained simply refers to the idea that the regression proposed for a set of data between two variables should tell us how much of the variance between two variables is explained by the correlation/relationship between these variables or by the predictor. Variance explained is understood at as the R-squared value, which can be obtained from JASP output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the ANOVA (F test)?

A

The F test tells us whether the variance explained is significantly different from zero and therefore whether the variance explained (r-squared) given with regression indicates a significant relationship between two variables. Another way of saying this is that the F test tells us whether a variable is a significant predictor for the outcome variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does the F test for regression tell us?

A

The F test answers the question: does the regression line help to explain the variance in Y? Where the Null hypothesis would say that r-squared is = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Does r-squared and F (ANOVA) testing tell us the strength of a relationship/correlation?

A

No. The r-squared value and whether it is significant or not does NOT tell us the strength or direction of a relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the regression coefficients?

A

If we think about regression as y (outcome variable) = a + bx(predictor variable), then the regression coefficients are a and b (as they were taught to us in the lecture), where b is the size and the direction of the slope and a is the y intercept.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

When measuring the correlation between two variables using regression, is it always necessary to determine whether the y-intercept is significant?

A

No. The y - intercept refers to the value of y when x = 0. Depending on the variables, this may not always be a meaningful measure. An example of this response times or intelligence. In these examples the predictor variable can never be 0.

22
Q

What does Pearson’s r measure?

A

Pearson’s correlation coefficient (r) measure the linear association between two continuous variables. The Pearson’s correlation coefficient measures how much two variables vary together (covariability) and much the vary separately(variability).
The equation for Pearson’s correlation coefficient r is: r = covariability of X and Y/Variability of X and Y separately

23
Q

When looking at whether there is a linear correlation between two continuous variables, we can use Pearson’s Correlation Coefficent (r). How does sample size affect whether we will see a significant value of r?

A

With Pearson’s r, we need a large sample size (or vary strong correlation) in order to see a significant r value, as r is dependent on degrees of freedom (where degrees of freedom in this case is n - 2).

24
Q

What are the degrees of freedom for Pearson’s Correlation Coefficient?

A

Degrees of freedom for Pearson’s correlation coefficient are n - 2.

25
Q

What is Spearman’s Correlation?

A

Spearman’s correlation is used to measure whether there is a correlation between variables where the data is one-directional, but not linear. In order to use Spearman’s correlation the data needs to be made ordinal/ranked - doing this makes non-linear data linear.

26
Q

What is the benefit of using Spearman’s correlation coefficient?

A

Sometimes we have data that suggest a correlation between variables, but the relationship does not appear to be linear, i.e. it may be a non-linear relationship. Using Pearson’s r we may find that there is no relationship between the variables, but there may be a non-linear relationship. Spearman’s correlation allows to determine whether there may be a relationship between our variables that is non-linear. In order to determine whether this is the case the data need to be made ordinal/ranked. The process from there is then similar to Pearson’s r, as when we rank the data, if there is a relationship, it will will be linear.

27
Q

What is Cronbach’s alpha?

A

Cronbach’s alpha is a measure of consistency of measure. Or the reliability of a measure. Cronbach’s alpha is measure of internal consistency and a measure of scale reliability. Cronbach’s alpha is calculated by taking the average covariance and dividing it by the average total variance. Therefore, in order to have a high alpha there needs to be a high level of covariance. In other words, we need the covariance to explain the majority of the variance seen.

28
Q

Are correlation and regression the same?

A

No. Correlation tells us whether there is an association between two continuous variables.
Regression refers to the linear relationship between two variables that best describes the association such that predictions about one variable can be made based on the value of the other variable. This can be expressed as y (outcome variable) = a (intercept) + b(slope) x (predictor variable), i.e. y = a + bx

29
Q

What does the regression coefficient b (slope) tell us about the association/correlation between two variables?

A

b or the slope tells us:
1. whether there is a relationship between X and Y
2. Whether the relationship is positive or negative
3. Estimate/prediction of expected change in Y when X increases by 1

30
Q

What kind of statistical test is used to determine whether the regression coefficient r is significant or not?

A

A t-test is used and JASP/SPSS can determine whether it is significant or not.

31
Q

When doing a regression analysis there are assumptions that need to be met. What are the assumptions about errors that are made when doing a regression analysis?

A
  1. The errors are independent (this has more to do with how data are measured and collected)
  2. Errors are assumed to have a normal distribution.
  3. Errors are assumed to show homescedasticity.
    NB: errors are the difference between the observed and predicted scores.
32
Q

One of the assumptions made about the errors/residuals when doing a regression between two continuous variables is that the errors/residuals will show homoscedasticity. What does this mean?

A

If errors are homoscedastic this means that the error or residual for each observed value is random, i.e. there is no pattern or relationship to the errors observed. If this was shown on a graph of predicted values and errors then there would be a straight line going across the graph i.e. x cannot predict y.

33
Q

What is the opposite of homoscedasticity?

A

Heteroscedasicity - in regards to errors this would refer to a pattern in the errors.

34
Q

Is the standarized slope coefficient the same as Pearson’s r?

A

Yes.

35
Q

What is r-squared?

A

R-squared is Pearson’s correlation coefficient (r) squared and is reflective of the amount of variability in the outcome variable that is due to the relationship with the predictor variable. So if the correlation between weight and height is 0.8 then r-squared would be 0.8x0.8, which is 0.64. This means that 64% of the variation in weight is due to it’s relationship with height.

36
Q

Is correlation a form of bIVARIATE ANALYSIS?

A

Yes.
Correlation is a measure between two variables.

37
Q

If the relationship between two variables is non-linear, will a linear correlation analysis be helpful?

A

No.
A correlation analysis may not indicate that there is much of a relationship.

38
Q

What does the Pearson’s correlation coefficient r measure?

A

Pearson’s correlation coefficient is used to measure the linear association between two CONTINUOUS variables.

39
Q

Pearson’s correlation coefficient compares how much two continuous variables vary together to how much they vary separately.

A

Yes.

40
Q

Will there be a larger Sum of Squares when there is more variability between two variables?

A

Yes.

41
Q

What is the SUM OF PRODUCTS?

A

Sum of products is a measure of covariability.
SP = Sum of ( (X 0bserved - X mean)(Y observed - Y mean)

SS measures variability of a given variable
SS = Sum of (X obs. - X mean) squared

42
Q

For two variables X and Y, if X = Y then does SP = SS?

A

yeS.

43
Q

Does Pearson’s r = covariability of X and Y/variability of X and Y separately?

A

yes.

44
Q

How to identify whether a score should be considered an extreme score?

A

An extreme score tends to be followed by a less extreme score.

45
Q

For correlation, what is the null hypothesis?

A

That the correlation between two variables in the population will be zero.

46
Q

What is the degrees of freedom for correlation?

A

df = n-2

47
Q

If correlation refers to the association between two variables, what does regression refer to?

A

Regression refers to the description of that association and can be used as a predictive measure.
In other words it is the equation for the line of best fit of the association between two variables.

Y = a + bX + (error)
a = intercept
b = slope

48
Q

For the error in regression models, what are its qualities?

A

Normal distribution.
Homoscedastic.
Independent.

49
Q

How is the line of best fit (regression model) determined?

A

Through the LEAST SQUARES SOLUTION.

50
Q

Is the slope of a correlation the same as the correlation?

A

No.

51
Q

In regression we always need to report the variance explained (R-squared).

A

R-squared, or variance explained, is the proportion of variation in the outcome variable explained by variation in the predictor variable.
If all variation in one is explained by the other then R-squared would = 1.
If R-squared = 0.23, then 23% of the variation in the outcome variable is explained by the variation in the predictor variable.

52
Q

Does the ANOVA F test tell us whether the variance explained is significant or not and therefore whether the regression model is of use or not?

A

Yes.