correlation and regression Flashcards

1
Q

what may be related to each other? - give an example

A
  • two datasets may be related
    e.g., height, weight
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

when can you see the relationship of the datasets?

A
  • when you look at them on a graph
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what were the first statistics invented for?

A
  • for analysing co- relationships
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

when is there probably a mistake in data?

A
  • if your data shows a perfect straight line
  • if there’s more than one datapoints a long way away from all the others
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

when might data be worth checking for mistakes?

A
  • if there’s no relationship at all between things you really expect to be related
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the definition of correlation?

A
  • finds the best fit line by minimising the difference between the data and line
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what does a correlation report about a relationship?

A
  • strength and direction of a relationship
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is a residual?

A
  • difference between an observed value and a predicted value in regression analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is a zero correlation?

A
  • no relationship between the variables
  • cluster of data points
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is a positive correlation?

A
  • relationship between two variables that tend to move in the same direction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is a negative correlation

A
  • two individual variables generally move in opposite directions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what would you do to get the line of best fit?

A
  • could try adjust the line manually but wouldn’t be the best fit
  • need to use maths instead
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what equation allows you to work out the line of best fit?

A

r = Sxy/ Sx.Sy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what does Sxy stand for?

A
  • how much x and y change together
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is Sx. Sy?

A
  • how much x and y change separately
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is the equation to work out r?

A

n/i = 1 (xi-x)(yi-y) / square root of n/i= 1 (xi-x)^2 square root of n/i= 1 (yi- y) ^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what two aspects does a R value tell you?

A
  • direction
  • strength
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what value is r when the correlation is positive?

A
  • if r is above 0
    1 > r > 0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what value is r when the correlation is negative?

A
  • r is below zero
    -1 < r < 0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is the value of r when the correlation is strong?

A
  • if r is close to one
    r +/- 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is the r value when the correlation is weak?

A
  • r is close to zero
    r- 0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

when are r values especially useful?

A
  • useful for values in the middle e.g., - 0.4 to 0.4
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what does the r-squared value tell you?

A
  • how much of the variance is explained by your correlation
24
Q

what is the r- squared value when correlation explains a lot of variance?

A
  • if r2 is close to one
    r2-1
25
what is the r- squared value when correlation explains only a little variance?
- if r2 is close to zero r2- 0
26
what other name is r-squared given?
- coefficient of determination
27
what is 1-r2?
- amount of variance not explained - random noise
28
what is regression?
- gives your the strengths, directions and equations of relationships
29
what is the regression equation?
y = mx + c
30
what is m in the equation?
- slope
31
what is c in the equation?
- intercept
32
what happens when x= 0 ?
y= intercept
33
what happens to y-axis when x increases by 1?
- y increases by the slope
34
what do both correlation and regression involve?
- both involve linear relationships between one or more input (predictor) variables and a single output (outcome) variable
35
what data can both correlation and regression deal with?
- categorical, ordinal, and non- linear predictors
36
what relationship does correlation describe?
- single relationship
37
what relationship does regression describe?
- multiple relationships
38
what is the difference between X and Y in correlation compared to regression?
- in correlation, X and Y are inter- changeable whereas X and Y are not inter- changeable in regression
39
do correlation and regression allow prediction?
- correlation doesn't allow prediction - regression allows prediction
40
what symbols are used for correlations?
- R and r2
41
what symbols are used for regression?
- R - R2 - F - t - SE - B1-n
42
what does jamovi allow us to explore? what do you use?
- multiple relationships in one go - use a correlation matrix
43
what does correlation matrix include? what do we calculate?
- includes all information we need but we must calculate df ourselves
44
how do you calculate df?
df = n - 2
45
how do you calculate correlations?
r([df])= [Pearson's r], p = [p-value]
46
what is overall regression?
- r2= [(r2) value]
47
what is the model fit of regression?
F ([df1], [df2]) = [F-value], p= [p-value]
48
what is multiple linear regression?
- single outcome variable (y) but multiple predictor variables (x1, x2)
49
what do you find in multiple linear regression?
- find the best- fitting surface
50
where are residuals in multiple linear regression?
- residuals are distance from the surface
51
what can the predictors be in multiple linear regression?
- predictors can be almost anything: continuous, ordinal, discrete normally- distributed or not linear or non- linear
52
what is multiple linear regression said to be?
- flexible e.g., ChatGPT, fMRI, COVID, elections
53
what does each predictor result in?
- result in an estimate, a standard error, a t- score and a p- value
54
what is the problem with correlation and regression?
- extrapolation
55
what does non- linear relationships cause?
- causes problems
56
what are the solutions to the problems?
- look at the data - check for mistakes - perhaps transform the data: quadratic, cubic, logarithmic
57
does correlation equal causation?
- no