Week 5 Flashcards by Vignesh Vasudevan

What are the two approaches to test construction?

Classical Test Theory

Item Response Theory / Latent Trait Theory / Item Characteristic Curve Theory

How well did you know this?

Not at all

Perfectly

What are the 3 factors that are significant in item construction according to Classical Test Theory

Distractors Effectiveness

Item Difficulty Index

Item Discriminability

How well did you know this?

Not at all

Perfectly

How do you calculate the effectiveness of a Distractors?

E = (N - n)/2c

E = Effectiveness Value (If E < no. of ppl choosing it), good distractor.
N = Total no. of test takers
n = Total no. of test takers who chose correct item
c = Number of choices

How well did you know this?

Not at all

Perfectly

Calculate how effective is each distractor in this example:

100 students completed a MCQ Test, comprised by items with 4 response items.

The correct answer was C and 45 students got it
The remaining 55 students were distributed…
34 selected A
17 selected B
4 selected D

Since 34 and 17> 6.88, A and B were effective distractors while D was not an effective distractor.

How well did you know this?

Not at all

Perfectly

What is the Item Difficulty Index

The item difficulty index (p) provides a measure of the proportion of people answering the item correctly.

How well did you know this?

Not at all

Perfectly

What does the value of the item difficulty index mean and what is the rule of thumb regarding it.

The value of an item difficulty index can range from 0 (too difficult, no one got it right) to 1 (too easy, everyone chose it).

Rule of Thumb:
Item Difficulty between .3 to .70.

How well did you know this?

Not at all

Perfectly

How do you calculate the optimal difficulty of an item?

OpfDiff = (1 + g)/2

Where g = Chance of getting it right through luck (MCQ)

How well did you know this?

Not at all

Perfectly

What is the Item Difficulty Formula?

Pi = Nr / Nt

Where:
Nr = no. of ppl who got the item correct
Nt = no. of ppl who took the test

How well did you know this?

Not at all

Perfectly

What does Item Discriminability refer to?

Capacity of the item to discriminate between the people with high scores and low scores on the overall test

I.e., Do ppl who do well on this item also do well in the overall test?

How well did you know this?

Not at all

Perfectly

What are the 2 methods to measure item discriminability?

Extreme Group Method

Point Biserial Correlation Method

How well did you know this?

Not at all

Perfectly

What is the Item Discrimination Index as calculated by the extreme group method?

The Item Discrimination Index (d), is the difference in the frequency of lower scores (L) answering the item correctly and upper scores (U) answering the item correctly expressed as a ratio of the number of cases in either the upper group or the lower group (n)
d = (U-L)/n

In general, we want items with high discrimination index scores.
The higher the discrimination index the greater the number of upper scores answering the item correctly relative to lower scores.

How well did you know this?

Not at all

Perfectly

How do you calculate the Item Discrimination Index via Point Biserial Correlation?

Rpb = (Xr - Xt)/SDt * sqroot(p/q)

Where
rpb = correlation between the item and total socre
Xr = Mean score of all people who answered the item correctly
Xt = Mean score of all people
SDt = Standard Deviation of the group
p = proportion of people who answered the item correctly
q = 1 - p

How well did you know this?

Not at all

Perfectly

What is the rule of thumb for Rpb value from Point Biserial Correlation

Close to 1 = Ideal

Between .40 and .70 = Very good

Between .30 and .39 = Good

Between .20 and .29 = Fair

<.20 = Poor; revise item

<0 = Remove item (Means those who do not do well in this test do well in this item and those who do well in this test do not get this item correct)

How well did you know this?

Not at all

Perfectly

What is done after enough high-quality items have been prepared for a test?

The revised item pool is then administered under standardised conditions to a second appropriate sample of test-takers.

How well did you know this?

Not at all

Perfectly

What is the second test tryout referred to as in test revision?

Cross-Validation

How well did you know this?

Not at all

Perfectly

What is the name of the phenomena that occurs when the item data on the second sample does not look as good as it did on the first sample

Study These Flashcards

Validity Shrinkage

What is the item response theory?

Study These Flashcards

A family of mathematical models used to design, build, deliver, analyse and score test/assessments.

Why is IRT less popular that CTT

Study These Flashcards

More complex, however, does address limitations in the CTT approach

Requires specifically designed software

What are Latent Traits/Constructs/Variables

Study These Flashcards

Psychological constructs that cannot be observed directly and can only be measured indirectly through one’s behaviour

How to assess latent traits (the true score)

Study These Flashcards

We look at correlations between item measuring the same construct

Invoke the latent trait/construct as the cause of these correlations

Infer how strongly each item correlated with the latent trait/construct

What are the limitations of CTT

Study These Flashcards

Single reliability value for the entire test and all participants
E.g., Cronbach’s Alpha

Scores are item dependent

Item statistics are dependent on the sample used in development
- If an item is only good at discriminating between high performers, it may appear as useless if the sample is only composed of low performers.

Bias towards average difficulty in test construction

Test level is the basis for CTT (not at item level)

What is the item characteristic curve (ICC)

Study These Flashcards

A probabilistic curve that shows that as an individual’s trait level increases, the probability of endorsing/correctly answering an item also increases.

What is the Item Response Function?

Study These Flashcards

A Mathematical function that relates the latent trait/construct to the probability of endorsing an item

What is the Item Information Function?

Study These Flashcards

Indication of quality or an item’s ability to differentiate among test-takers

What is the significance of invariance in IRT

In IRT, item parameters (like difficulty and discrimination) are estimated for each item, and a key assumption is that these parameters are invariant, meaning they remain consistent across different groups or populations being tested. CTT, the item parameters are not constant as it is dependent on the sample. The ICCs are independent of the sample used I.e., split the sample, the item parameters should be the same. Note: In CTT, the properties of test is dependent on the sample used requiring new evaluations of measured performance E.g., reliability

What are the assumptions of IRT?

Monotonicity Unidimensionality Local Independence Invariance

What is the assumption of Monotonicity in IRT?

As the latent trait/construct level is increasing, the probability of a correct response/endorsing an item also increases.

What is the assumption of Unidimensionality in IRT?

Assumes one dominant latent trait being measured and that this trait is the driving force for the response observed for each item of the test.

What is the assumption of Local Independence in IRT?

Responses given to the separate items in a test are mutually independent given a certain level of ability.

What are the 3 item parameters for dichotomous items?

Discrimination Parameter Difficulty Parameter Pseudo-Guessing Parameter

What is the discrimination parameter in IRT?

Index of how well an item differentiates between low and top examinees. Ranges from 0 to 2, where the higher is better but few items are above 1.0. Note: Slope of IRF curve also indicates discriminability Shallow slope = Lower discrimination Steeper slope = Higher discrimination

What is the difficulty parameter in IRT?

Index of how difficult the item is, or the construct level at which we can expect examinees to have a probability of 0.50 (assuming no guessing) of getting the item ‘correct’. Typically ranges from -3.0 to +3.0, with 0 being at average examine level.

What is the pseudo-guessing parameter in IRT?

Is the probability of getting the item correct by guessing alone. Typically ranges between 0 and 1. Lower Asymptote on IRF curve. 1/k where k is the number of options E.g., 2 options, k = 2.

On an IRF graph, how would you figure out each of the 3 parameters of: discrimination, difficulty and pseudo-guessing?

Discrimination = Slope -- if same slope for all 3 items, then all 3 have similar levels of ability to discriminate Difficulty = How close is it to the y axis (x = 0) Pseudo-Guessing = Asymptote?

What are the advantages of IRT?

Sample independence of scale: - Results from CTT are all sample dependent, and unusable on a different sample; results from IRT are sample-independent. Accounts for Guessing: - CTT does not account for guessing on MCQ Scoring: - Scoring in CTT does not take into account item difficulty - With IRT, you can score a student on any set of items and be sure it is on the same latent scale Adaptive Testing: - Modifying the assessment to best account for the examinee's ability IRT allows for better identification of ‘best’ items for a test IRT allows for ‘ranking’ of items based on ability level IRT lends itself better to differential item functioning (finding test bias)

Week 5 Flashcards

Item Analysis & Test Revision (35 cards)