## True score

- A person's average score on a test over an infinite number of repeated tests (error would be eliminated over an infinite #). It describes the theoretical performance on a test.

- in CTT, 'T' incorporates systematic error

## Systematic error

- Error that affects the individual the same way each time he or she takes the test (e.g. reading ability, test wiseness).

- In CTT, systematic error is incorporated in the true score 'T'

- Affects validity of scores, NOT reliability

## Unsystematic error

- Random error that affects individuals differently each time a test is taken (eg noise, anxiety).

- In CTT, 'E' refers to unsystematic error.

## Classical Test Theory (CTT)

X = T + E

where X is observed score, T is true score and E is error (unsystematic)

## Index of Reliability

- The proportion of true score variance reflected in the observed score variance.

- quantifies the closeness of the relationship btw X and T for a set of examinees

## Reliability Estimates

1) alternate forms reliability

2) test-retest reliability

3) split-half reliability

4) cronbach's coefficient alpha

5) kuder-Richardson formula 20

## Sources of error

1) content sampling error: error d/t items selected/heterogeneity

2) time sampling error: error d/t daily fluctuations that affect test performance

3) scorer error: error d/t scorer variability in test-retest

## Variables that affect reliability

1) test length: longer tests increase reliability

2) group heterogeneity: greater diversity increases reliability

3) item difficulty: items of medium difficulty increase reliability

## observed score

### : True score + error (X = T + E); result obtained from a single sampling

## reliabiliaty

• The extent to which test scores remain consistent over repeated administrations of the same or parallel test

• The degree to which test scores are free from measurement error

• Increases with 1) greater test length, 2) greater group heterogeneity, and 3) item difficulty closer to medium.

## Reliability coefficient:

## standard error of measurement

• average size of error scores

• helps to interpret accuracy of test scores

• sem = so √1-Rxx, where Rxx= reliability

• if Rxx = 1, sem = 0

## tau equivalence

tests are “parallel” if:

a) the tests measure the same psychological construct – the true scores on one test are equal to the true scores on the other test

b) the tests have the same level of error variance

if the items meet tau equivalence, then alpha, KR-20, and the split-half reliability

will all give identical and accurate estimates of reliability

## essential tau equivalence

this is less strict than tau equivalence

theoretically, this means that the true scores for two tests (or two versions of a

test) are the same

this is estimated, practically-speaking, by seeing if the observed scores on the

two tests (or test versions) have the same (or nearly the same) mean

the requirement for equal error variances (as seen for tau equivalence) is not

made if the items only meet essential tau equivalence, then the alpha and KR-20 will

give identical and accurate estimates, but the split-half reliability estimate will not

## what happens to Alpha, KR-20 and tau equivalence if neither tau or essnetial tau equivalence is met?

if the items meet neither tau equivalence or essential tau equivalence, then alpha

and KR-20 will underestimate the reliability (although it is not known by how

much or how little) and the split-half reliability estimate will be inaccurate

## split-half reliability

### reliability as an estimate of interal consistency – when you divide your test into two (odd vs. even numbered items, or 1-10, 11-20) and then correlate performance on the two halves.

## alternate forms reliability

### an estimate of reliability as equivalence

20