Prediction Flashcards

(23 cards)

1
Q

prediction

A

using scores or one variable (X) to predict scores on another variable (Y), based on known correlation between them rXY

  • A method for predicting Y from X, using information about XY’s relationship
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Note on Prediction

A

First need a sample in which both pieces of information (X and Y) are available, have an established correlation

Then, use this information in a new sample where only X scores are available, in order to predict the Y scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Features of Prediction

A
  • often temporal symmetry: X is measured at an earlier occasion than Y
  • Y is referred to as a dependent or criterion variable
  • X is referred to as an independent or predictor variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Simple VS Multiple Regression

A
  • simple regression: one X and one Y
  • multiple regression: more than one X

there is a focus on simple regression in this course

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Regression Equation

A

also called a prediction equation

y = a+bX (like high school)

  • equation describes a straight-line that best fits the data points in 2 dimensional (X-Y) space
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Mapping Correlation to Prediction

A

y’ or y^ = predicted score on y
y’ = My

Step 1: Convert Mx (mean scores of X) to Zx, Zx = X-M / Sx

Step 2: equation in z-scores, Zy’ = rZx

Step 3: Zy’ to Y’ (predicted __), Y’ = My + Zy’Sy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If r =+1.0 then Zy’ = Zx, a thought experiment

A

would never typically get r= +1.0

  • what is being predicted would equal what has already occurred (possibly?)
  • a perfect relationship between the two variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If r is between 0 and 1,

A

X provides some information to help us predict Y, but the relationship is not perfect (other factors and noise involved)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Note 2 on Prediction

A

The amount by which our prediction differs from 0 will depend on the strength of r

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The optimal prediction is given by:

A

Zy’ = rZx (prediction equation in terms of z-scores)

two extremes = r=0, r=1

  • in between is where the prediction of Y based on X lies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The prediction equation on a graph

A
  • regression line passes through origin and has a slope of r
  • prediction equation describes the line of best fit through the scatter plot of Zy against Zx
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Building predictions from raw scores

A

Y’ = a+bX

where b = r sy/sx

where a = My - bMx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Jamovi for Linear Regression

A

“estimate” refers to raw score regression, intercept = a, x = b

  • jamovi will produce standardised estimates, which refer to the z-score equation, if you select standardised estimate under model coefficients
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

r^2

A

r^2 = proportion of variance accounted for

how well we are making predictions, to know if we should keep using this model or not

y - My = y’-My + y-y’

Deviation = prediction + error (residual)

y scores from the mean = predicted scores from the mean + differences in predicted scores and real scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

r^2 in equations

A

y - My = y’ - My + y - y’

Σ(y-My)^2 = Σ(y’-My)^2 + Σ(y-y’)^2

SS(total) = SS(regression) + SS(residual)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Finding scores due to prediction (r^2)

A

r^2 = Σ(y’-My)^2/Σ(Y-My)^2

= ss (regression)/ ss(total)

proportion of variability in y scores associated with changes in X (i.e. due to prediction)

17
Q

Finding scores not due to prediction

A

1-r^2 = Σ(y-y’)^2/ Σ(y-My)^2

= ss(residual)/ss(total)

proportion of variability in Y scores not associated with changes in X (i.e. not due to prediction)

regression line is defined so that ss (residual) is minimised (based on least squares criterion)

no other straight line will generate a smaller ss (residual) than y’ = a+bX

18
Q

when error in prediction when r is large

A
  • y values will cluster around y’ values
  • a larger proportion of Sy^2 is accounted for by prediction
19
Q

Error in prediction when r is small

A
  • y values will vary more widely around y’ values
  • smaller proportion of Sy^2 is accounted for by prediction
20
Q

Note on r^2

A

r and r^2 convey information about how well we can predict scores on Y for the sample as a whole

  • what about how well we are predicting individual subjects? first look at assumptions being made when undergoing a prediction analyses
21
Q

Assumptions for Linear Regression

A

in population, X and Y form a bivariate normal distribution

  • both samples come from a normal distribution

For each X there is a normal distribution of Y scores

  • y’ is what we expect y to be on average, under the circumstance of X
  • we won’t get perfect y’ every time, but it will be normal, the highest likelihood is that y’ is closest to Y
  • mean of that distribution is what y is represented as

Linearity

  • X and Y is linearly related

Homoscedasticity

  • variance of distribution of y scores for each X score is the same
  • each y distributions should have the same deviation
22
Q

When assumptions of linear regression are met:

A

can use prediction to estimate:

  • percentage of cases that are a certain distance from their predicted value
  • probability of a score being a certain distance from its predicted value
23
Q

Standard Error of Estimate

A

SD of distribution of observed scores around corresponding predicted score

  • measures predictive error
  • under assumption of bivariate normality Syx is the standard deviation of normal distribution of y scores, for any value of X

equation:

Syx = √ Σ(y-y’)^2/n-2

alternative, (when large sample sizes)

Syx = Sy √ (n-1)(1-r^2) / n-2