Week 5 Flashcards
Item Analysis & Test Revision (35 cards)
What are the two approaches to test construction?
Classical Test Theory
Item Response Theory / Latent Trait Theory / Item Characteristic Curve Theory
What are the 3 factors that are significant in item construction according to Classical Test Theory
Distractors Effectiveness
Item Difficulty Index
Item Discriminability
How do you calculate the effectiveness of a Distractors?
E = (N - n)/2c
E = Effectiveness Value (If E < no. of ppl choosing it), good distractor.
N = Total no. of test takers
n = Total no. of test takers who chose correct item
c = Number of choices
Calculate how effective is each distractor in this example:
100 students completed a MCQ Test, comprised by items with 4 response items.
The correct answer was C and 45 students got it
The remaining 55 students were distributed…
34 selected A
17 selected B
4 selected D
Since 34 and 17> 6.88, A and B were effective distractors while D was not an effective distractor.
What is the Item Difficulty Index
The item difficulty index (p) provides a measure of the proportion of people answering the item correctly.
What does the value of the item difficulty index mean and what is the rule of thumb regarding it.
The value of an item difficulty index can range from 0 (too difficult, no one got it right) to 1 (too easy, everyone chose it).
Rule of Thumb:
Item Difficulty between .3 to .70.
How do you calculate the optimal difficulty of an item?
OpfDiff = (1 + g)/2
Where g = Chance of getting it right through luck (MCQ)
What is the Item Difficulty Formula?
Pi = Nr / Nt
Where:
Nr = no. of ppl who got the item correct
Nt = no. of ppl who took the test
What does Item Discriminability refer to?
Capacity of the item to discriminate between the people with high scores and low scores on the overall test
I.e., Do ppl who do well on this item also do well in the overall test?
What are the 2 methods to measure item discriminability?
Extreme Group Method
Point Biserial Correlation Method
What is the Item Discrimination Index as calculated by the extreme group method?
The Item Discrimination Index (d), is the difference in the frequency of lower scores (L) answering the item correctly and upper scores (U) answering the item correctly expressed as a ratio of the number of cases in either the upper group or the lower group (n)
d = (U-L)/n
In general, we want items with high discrimination index scores.
The higher the discrimination index the greater the number of upper scores answering the item correctly relative to lower scores.
How do you calculate the Item Discrimination Index via Point Biserial Correlation?
Rpb = (Xr - Xt)/SDt * sqroot(p/q)
Where
rpb = correlation between the item and total socre
Xr = Mean score of all people who answered the item correctly
Xt = Mean score of all people
SDt = Standard Deviation of the group
p = proportion of people who answered the item correctly
q = 1 - p
What is the rule of thumb for Rpb value from Point Biserial Correlation
Close to 1 = Ideal
Between .40 and .70 = Very good
Between .30 and .39 = Good
Between .20 and .29 = Fair
<.20 = Poor; revise item
<0 = Remove item (Means those who do not do well in this test do well in this item and those who do well in this test do not get this item correct)
What is done after enough high-quality items have been prepared for a test?
The revised item pool is then administered under standardised conditions to a second appropriate sample of test-takers.
What is the second test tryout referred to as in test revision?
Cross-Validation
What is the name of the phenomena that occurs when the item data on the second sample does not look as good as it did on the first sample
Validity Shrinkage
What is the item response theory?
A family of mathematical models used to design, build, deliver, analyse and score test/assessments.
Why is IRT less popular that CTT
More complex, however, does address limitations in the CTT approach
Requires specifically designed software
What are Latent Traits/Constructs/Variables
Psychological constructs that cannot be observed directly and can only be measured indirectly through one’s behaviour
How to assess latent traits (the true score)
We look at correlations between item measuring the same construct
Invoke the latent trait/construct as the cause of these correlations
Infer how strongly each item correlated with the latent trait/construct
What are the limitations of CTT
Single reliability value for the entire test and all participants
E.g., Cronbach’s Alpha
Scores are item dependent
Item statistics are dependent on the sample used in development
- If an item is only good at discriminating between high performers, it may appear as useless if the sample is only composed of low performers.
Bias towards average difficulty in test construction
Test level is the basis for CTT (not at item level)
What is the item characteristic curve (ICC)
A probabilistic curve that shows that as an individual’s trait level increases, the probability of endorsing/correctly answering an item also increases.
What is the Item Response Function?
A Mathematical function that relates the latent trait/construct to the probability of endorsing an item
What is the Item Information Function?
Indication of quality or an item’s ability to differentiate among test-takers