Week 4 - Test Construction and Item Response Theory Flashcards

(14 cards)

1
Q

What is a psychological test?

A

Standardised measure of a sample of behaviour (questionnaire, instrument, survey)
To measure ability/achievement or personality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Substantive and structural validity

A

Substantive (construct) validity - conceptualisation (precise description), create item pool (iterative)
Structural validity - item selection and psychometric evaluation
- Selection strategies - empirical/criterion-based, internal consistency, item response theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Substantive validity

A

Items must be simple and appropriate for target population
Avoid - colloquialisms, jargon, items everyone/no one will endorse
Response format - dichotomous, Likert-rating, visual scale/slider

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Structural validity

A

Empirical/criterion-based - items selected that discriminate between individuals (pure empirical problematic, use rational-empirical)
Internal consistency - Cronbach’s alpha, item-total score correlation (both assume unidimensional construct, may give redundant questions)
Factor analysis - shows whether things are uni/multi-dimensional (grouping of items into factors)
Item response theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Classical test theory

A

Estimating true score and confidence intervals
Criticisms - sample dependent estimates, assumes item similarity, assumption of equal spacing in Likert, test tend to be long, adding/subtracting items requires reliability recalculation, SEm estimate single value for whole scale
Simple to obtain scores however

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Item response theory

A

Aim to produce sample invariant estimates
Relationship between individual items and ‘latent dimension’ θ
Challenges - assumes unidimensional construct, assumes responses don’t influence each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Item characteristic curve intuition

A

Assume θ is unidimensional
Questions placed on continuum based on difficulty
Individuals placed on continuum according to ability (50% expectation point)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Item characteristic curve

A

Relationship between probability of correct response and underlying dimension (cumulative normal distribution)
Method of determining characteristics of individual items in terms of difficulty and discrimination
Helpful as it shows cumulative
chance correct response as items get harder
Point at which an individual’s probability is 50% is also latent dimension point (this point also determines difficulty of question itself) - point of discrimination/ estimate of difficulty
Slope shows how quickly question goes from easy to hard (how discriminating question is, determined by M and SD)
Use computers to create

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

ICC - measure of difficulty

A

Horizontal position = item difficulty (left is easier, right is harder)
Slope = item discriminability (steeper is more discrimination)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

ICC - measure of pseudo-guessing

A

Essentially a ‘raised floor’
When guessing, improbable to get lower than chance (so start curve at point of chance to address pseudo-guessing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

1, 2 and 3 parameter models

A

1 parameter (Rasch) - assume items differ only in horizontal position (simple but poor data fit)
2 parameter - differ on horizontal position and slope
3 parameter - adding ‘floor’ parameter (more widely used)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

IRT and Likert scales

A

IRT does not hold the assumption that there is equal scaling between response options (instead shows likelihood of responses within certain categories)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Adaptive testing

A

With IRT parameters, can choose items that provide best estimate of a person’s θ
Two stage procedure - short initial test with all difficulties > appropriate second test based on score
Computerised adaptive testing - difficulty changes with each question (right = next one harder, wrong = next one easier), start with Q of high discriminability and average difficulty, maximise amount of information on each successive question, finish at set number/time/SE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Test selection

A

Can use ‘difficulty’ stats to place the test-takers or questions themselves on theta continuum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly