Week 2 - Score Normalisation and Reliability Flashcards

(17 cards)

1
Q

‘What’ and ‘Why’ of psychological measurement

A

What - quantifying behaviour, attitudes, feelings to make inferences about constructs (unobservable attribute)
Why - assessment must be objective, thus tests use standardisation to avoid bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Score normalisation

A

Raw score - not meaningful without comparison point (e.g. criterion referenced (pass mark), norm referenced (relative to others))
Derived score - transforming raw score to find someone’s relative position to normative sample (percentiles and z-scores)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Percentiles

A

Percentage of people who fall below particular raw score
(data points below/total values) x 100
p50 (median), p25 (1st quartile), p75 (third quartile)
Advantages - easy comparison, easy to understand, universal
Limitation - inequality of units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Standard scores (z)

A

Measure of how extreme a score is relative to normative sample (in SD)
z = (X - M)/SD
Universally applicable - can be calculated for anything if M and SD are known

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Transformations of z-scores

A

T-score (0-100) = (z x 10) + 50
Sten score (1-10) = (z x 2) + 5.5
Deviation IQ (25-175) = (z x 15) + 10
Stanine (1-9) = normal distribution of nine scores with category percentages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Normalised standard scores

A

Scores only comparable if from similar distribution.
Distribution can be normalised to force compatibility
Raw score > percentile > normal curve frequency table > normalised z score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Specificity of Norms

A

Norms (M, SD) specific to population they are derived from
Problems - WEIRD, lack of Australian norms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Test reliability

A

A good test is reliable (reproducible) and valid (measuring what is intended)
Reliability - rxx (correlation between scores on two administrations of test)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

True score theory

A

People have a ‘true score’, but we are never able to measure that due to errors in measurement
If we administered an infinite number of tests - mean of distribution is ‘true score’, SD of distribution is SEm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Reliability and true scores

A

Individual (observed score (x) = true score (t) + error (e))
Sample (observed score variance (s2x) = true score variance (s2t) + error variance (s2e)
Reliability (rxx) = s2t/s2x
Error variance = 1 - (rxx)
Thus, reliability is the proportion of observed score variance that is due to true score variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Test-retest

A

Exact same test done on two occasions
Error variance - changes in conditions and test-takers between conditions (environment threats, time-related factors, order effects)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Alternate forms

A

Two versions of test administered to same people (immediate or delayed)
Error variance - differences in content, time sampling (delay)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Internal consistency

A

Split-half reliability
- One test administered but split into two halves
- Error variance - content sampling
- Problems - deciding on split, timed tests, halving test already reduces reliability
Cronbach’s Alpha and Kuder-Richardson
- One test administered, every item correlated with every other item, mean of these correlations (equals mean value of all possible split-half)
- Error variance - content sampling, heterogeneity of behaviour domain
CA - for items on scale with more than 2 options
KR - for dichotomous items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Inter-rater reliability

A

Two raters give scores for individual on a test (subjective)
Error variance - differences between raters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Additive sources of error variance

A

Different tests of reliability can be added together to find true error variance (as long as error variance type is different between tests)
E.g. delayed alternate form (time + content) + inter-rater

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Sample impact on reliability

A

R is affected by individual differences in a group
R decreases as homogeneity increases (similar scores, so variation is more likely to be error)

17
Q

How high does Rxx need to be?

A

Should be > .8

Nunnally’s heuristic:
- 0.5 for test development
- 0.7 for test in research
- 0.9 for individual assessment