Lecture 4 Modern Test theory Flashcards
(46 cards)
From classical test theory to modern test theory
What are classical test theory advantages?
- Allows for calculating relability
- Intuitive and easy to apply
- It’s in SPSS and it’s easy to do in Excel
- No large sample sizes/many items needed
What are the disadvantages of classical test theory?
- Focus on the test, not on the items
- Test properties depend on the population (e.g. reliability and difficulty of a test should be generalisable to different populations)
- Person properties depend on the test (i.e. sum score is higher if the test is easy and lower if the test is difficult)
Modern test theory adresses these disadvantages
What is Modern Test Theory? What is the assumption when calculating this theory?
Specify a measurement model in which we mathematically link the item scores to the construct (= latent variable/latent trait/factor)
- The idea of reflective measurement - the construct affects the item scores
Assumption:
- Unidimensionality = you only measure 1 construct
Book uses trait level, Dylan uses latent variable, but it’s the same thi
What is an Item Response Theory?
Specific form of modern test theory where there is a specific mathematical link between the latent variable and the item
- Individual’s response to a particular test item is influenced by qualities of the individual (trait level) and by qualities of the item (difficulty level)
What does each variable mean on the graph demonstrating item response theory?
Picture 1
X-axis = the latent variable (trait level → level of the relevant psychological construct)
- Each subject has a position on the latent variable
- 0 on the x axis is the average of the latent variable (person has 50% chance of answering an item correctly)
Y-axis = probability of the correct response (0 to 1, 1 being the correct answer)
P(Xis = 1|…) → the probability that a correct response will be made by a particular individual when answering a particular item
What is the name of the function? Why is it helpful?
The graph is a Logistic (s-shaped) function
- Runs from 0 to 1 - exactly what we need because we are modellling the probability of a correct response and thanks to that accounting for measurement error
- Do this through Item characteristic curve
What is an item characteristic curve?
A graphical display linking respondents’ trait levels to the probability of correctly answering an item
- There is a curve like this for every item
How does the position of the curve change with different difficulty?
All the way to the left
- very easy item because the person who is below average on the latent variable has a probability of answering correctly very close to 1
All the way to the right
- very difficult item because the person who is above average on the latent variable has probability close to 0
The position of the curve on the latent x axis, depends on how difficult the item is
What is the Rasch model? What is its function formula and what do the variables mean?
Picture 2
Function formula:
𝑃(𝑋𝑖𝑠 = 1|𝜃𝑠, 𝛽𝑖) =(𝑒^(𝜃𝑠−𝛽𝑖))/(1 + 𝑒^(𝜃𝑠−𝛽𝑖))
- P(𝑋𝑖𝑠 = 1|𝜃𝑠, 𝛽𝑖) → the probability that subject s will respond correctly to the item i correctly
- The probability of a correct response only depends on the latent variable and item difficulty
What do the variables in Rasch’s model function mean?
- 𝑋𝑖𝑠 = 1 → ‘correct’ (1) response (X) t the item (i) by a subject (s)
- 𝛽 → the difficulty of an item, can be any positive or negative number (or zero)
- The larger 𝛽 value, the more difficult the item is
- 𝜃 → latent variable (tells us how well a certain subject scores on the variable)
- e = 2.72 (base of natural logarithm)
What is the Two Parameter logistic model (2PL)? How does it differ from Rasch Model?
Similar to the Rasch model (s-shaped function, 𝛽 parameter)
Formula: 𝑃(𝑋𝑖𝑠 = 1|𝜃𝑠, 𝛽𝑖, 𝛼𝑖) = (𝑒^(𝛼𝑖(𝜃𝑠−𝛽𝑖)))/(1 + 𝑒^(𝛼𝑖(𝜃𝑠−𝛽𝑖)))
The probability of a respondent answering an item correctly is conditional on the respondent’s trait level (latent variable), the item difficulty and the item’s discrimination
Now we have α𝑖 parameter = item dicrimination
What is an item discrimination?
The steepness of the ICC indicates the item’s ability to discriminate between individuals with different trait levels
- It indicates the relevance of the item to the latent variable being measure by the test
How can the number of item discrimination be interpreted? Is it mostly positive/negative?
- Mostly positive number but can be negative for contra-indicative item
- The larger the number the better the test can detect differences = strong consistency between an item and the underlying latent variable
↪ The steeper the curve (larger number of α𝑖), the more different do the two subjects score on the test (bigger difference in probability) even though they might be very close on the latent variable (picture 3)
What is the Three Parameter logistic model?
Picture 4
𝑐𝑖 = guessing value → lower-bound probability of a correct answer purely on the basis of chance
𝑃(𝑋𝑖𝑠 = 1|𝜃𝑠, 𝛽𝑖, 𝛼𝑖, 𝑐𝑖 = 𝑐𝑖 + (1 − 𝑐𝑖) * (𝑒^(𝛼𝑖(𝜃𝑠−𝛽𝑖)))/(1 + 𝑒^(𝛼𝑖(𝜃𝑠−𝛽𝑖)))
So the curve doesn’t start at 0 because people are assumed to guess on the more difficult items
What does the guessing value depend on?
Depends on the number of response options available (4 options → guessing will produce a correct answer 25% of the time so c = 0.25)
What is the Graded Response model (GRM)?
Picture 5
A model for a likert scale (the other ones are for binary items)
What is the formula function of GRM and how does it differ from 2PL?
- Separate item characteristic curve for each response option → 𝑃(𝑋𝑖𝑠 > 𝑗|𝜃𝑠, 𝛽𝑖𝑗, 𝛼𝑖) = (𝑒^(𝛼𝑖(𝜃𝑠−𝛽𝑖𝑗)))/(1 + 𝑒^(𝛼𝑖(𝜃𝑠−𝛽𝑖𝑗)))
- The function formula is the same as for 2PL but now the item difficulty (𝛽) is specific to each response option
What does the graph of GRM show? Use Dom as an example
Picture 5
- The characteristic curve of each response option is positioned on the latent variable based on each difficulty
- The probability of each response option for each person (on the latent variable axis) is shown in this model
- the smiley face (let’s call him Dom) has the probability of ~ 0 to choose option 2, ~ 0.2 to choose option 5, and ~ 0.5 to choose option 4…
How many difficulty parameters are there in GRM?
There are m-1 difficulty parameters (𝛽𝑖𝑗) for each item (m= response options)
- each difficulty value represents the trait level required to move from one response option the the next ‘‘higher’’ one on the scale → making it dichotomous instead of polytomous like likert scale is
- That’s why in the formula it says Xis is j or higher since we always compare the one higher category
How do we use the GRM formula and the difficulty parameters to conclude what is a person’s most likely response?
Long flashcard - don’t remember, just read it and try to understand
- We are given difficulty parameters 𝛽𝑖𝑗 for each distinction (if there are 5 options on the scale, we will have 4 distinctions → m-1)
- Let’s assume a person has an average level of extroversion (𝜃 = 0) and 𝛼𝑖 = 2.32
- With this data we can calculate the four probabilities of each response option (picture 11)
- The probabilities become smaller as the response options become more extremely positive (more difficult)
- With these four values we can estimate the probability that a person will choose a specific response option when responding to this item. We do this by computing the difference between two adjecent ‘‘range’’ probabilities
- j refers to one response option and j-1 refers to the immediately prior option (picture 12)
- Thus we see that the person is most likely to respond ‘‘neutral’’, since this option has the largest probability (0.53)
People with low trait levels will be relatively likely to respond with the lower response options
What is the model fit?
Whether the actual responses to a set of items are well represented by a given measurement model that the test user has chosen (e.g. 1PL or 2PL)
- Once obtaining evidence of ‘‘good fit’’, test users might proceed to examine item parameters
- If the model turns out to be a poor fit, the test user should cautious about interpreting the info
What is the scale analysis as an application of the item response theory?
You look at items on the questionnaire and try grouping them together based on strength of the item - is the item about the latent variable?
- You then do an analysis in a computer programme which calculates the difficulty levels and the discrimantion value between the items
- The higher the discrimination value, the better the item is since it is easily distinguishable on the latent variable from other items and adds important info to the results
What is the item information?
Tells us the amount of info we have for a given latent variable position
What is beneficial about the IRT approach when it compes to item information?
IRT approach allows for the possibility that a test might be better at reflecting differences at some trait levels rather than at other trait levels