Week Eleven Flashcards

1
Q

bivariate cases

A

one predictor and one criterion variable.

- The question becomes, do they vary together. Thus can we predict one variable from another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

pearson’s correlation coefficient (r)

A
  • A statistic ranging between -1 and +1 that indicates via its sign and magnitude the direction and strength of a linear relationship between an X and Y variable
    • A negative correlation indicates a relationship where increases in one variable are associated with decreases in the other and vice versa
    • A positive correlation indicates a relationship where increases in one variable are associated with increases in another and decreases with decreases
    • Designed to test for a linear relationship and is based on the ability to invisibly draw a straight line through the data points
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Z scores

A
  • Z scores tell us whether a score (X or Y) is above (+Z) or below (-Z) a mean (M)
  • by multiplying Z values on X and Y for each person we get crossproducts
  • positive correlation = mostly positive crossproducts
  • negative correlation = mostly negative crossproducts
  • no correlation = equal number of + and - crossproducts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

corrlelation strength

A
  • 0.9 and above is a very strong relationship.
  • 0.3-0.5 is usually the range of correlation.
  • 0.3 is average, 0.5 is quite strong.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

correlation tells.

A
  • The direction of our relationship

- The strength of the relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

regression tells

A
  • Help us actually plot the line that the correlation metaphorically draws through the data
    Use that line to predict scores on our DV
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

line of best fit

A
  • The correlation co-efficient is a measure of how close to the line the data falls.
  • To draw the line of best fit we need two pieces of information which we can calculate based on our X and Y scores
    • The slope
    • The Y-axis intercept
  • Pearson’s r and the slope are very connected.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

slope: regression co-efficient

A
  • The slope is an indication of the gradient of the line OR its “steepness”
  • In mathematical terms it is how many units of the Y variable (your DV) you increase for every unit increase in your X variable (your IV or predictor)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Y intercept

A
  • The intercept is the point at which the line of best fit crosses (or “intercepts” if you will) the Y-axis
  • It is the predicted value of Y when X is zero
  • It can be positive or negative
  • In our case it will be negative (because it crosses the Y-axis in the negative area below the zero point)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

residuals

A
  • residual variance is that than cannot be explained.

residual scores indicate how far away from the predicted score the actual or raw score is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

line of best fit name

A
  • Our line of best fit is called this because it is the one that overall will have the minimal distances from all the data points.
  • The regression line is often known as the least squares regression line.
  • This property refers to the fact that were we to draw the regression line anywhere
    else on our scatterplot and work out the sum of the squared residuals we would always get a larger number than we do for our line.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Sum of squared residuals

A
  • The average or mean of the residual scores would represent for us the average distance of our data points to the regression line
  • It also represents the average amount we would be in error if we used the regression line to predict a person’s score
    We call this the Standard Error of the Estimate or SEE for short
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

ss (variability)

A
  • Just as in ANOVA data, it is possible to partition the variability in Y-scores into two components;
    • Variability due to error (SSerror).
    • Variability due to regression (SSreg).
  • The total variability (SStotal) will be SSY; that is the sum of the squared deviations between each Y-score and the mean of Y.
  • The error variability (SSerror) will be the sum of squared deviations between each Y-score and the predicted Y-score.
  • The regression variability (SSreg) will be the sum of squared deviation between each predicted Y-value and the mean of Y.
  • From these SS’s we can calculate MS’s using 1 and N-2 df
  • We can then calculate an F-statistic as MSreg/ Mserror.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

power in regression

A
  • Power (the ability to obtain a significant result when an effect is real) operates as a function of both effect size and sample size
    • It is possible to have a large effect size with a small sample and obtain a significant result
    • It is possible to have a small effect size with a large sample and obtain a significant result
    • Always temper your interpretation of significance with an examination of the obtained effect size (i.e., is it a meaningful size) and the sample size (i.e, is the sample size so small that it is preventing a decent effect size from being significant or so large that it it making a negligible effect significant)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

significance level

A

Sig level is the percentage of time that you will get the Fobserved just by chance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

assumptions for correlations and regression

A
  • Normality
    • Regression and correlation assume that both the X and Y variables have relatively normal distributions.
    However, this is not a deal-breaker. Instead, you need to know that the residuals are distributed evenly around the line of best fit.
  • Linearity
    • Pearson’s correlation and linear regression both assume that the relationship they are testing is linear.
17
Q

outliers

A
  • Impact of outliers/extreme scores
    • Regression equations and correlations are highly influenced by outliers
    • An outlier is a data point that lies away from the rest of the pack of data
    • If an outlier is in a position consistent with the pattern of the rest of the data it is okay but if it is inconsistent it will unduly influence the positioning of the regression line
    • This is akin to how the mean is affected by outliers
  • NB other issues such as restriction in
    range are also problems
18
Q

causation

A

Correlation does not imply Causation:
- A reminder that just because two variables are correlated it does not automatically follow that one has cause the other.
- Movement on one variable is merely associated with movement on the other
You cannot infer direction of relationship (though sometimes a certain direction makes more sense) nor causality, nor rule out the influence of other variables in leading to the correlation between two variables