Chapter 8 Flashcards
(38 cards)
What’s an ideal item on a test for a norm-referenced test?
What about criterion?
Top scorers should get it correct, while low scorers should get it wrong.
The above doesn’t matter for criterion-based. An ideal test item is based on how well it assesses mastery.
Scaling definition
Process of settings rules for assigning numbers in measurement.
Stanine scale
Raw scores transformed into scores ranged from 1 to 9.
Rating scale
Records judgements of oneself, others, experiences, or objects
Summative scale
Final test score is a sum of all items
Method of paired comparisons
Asked to choose an option based out of two options.
Comparative Scaling
Sort options in comparison based on judgements. (eg. rank cards)
Categorical scaling
Sort objects into categories (eg. sorting cards to “justified” “sometimes justified” “always justified”
Guttman scale
Weaker to stronger expressions.
Agree with stronger ones will also agree with milder
Direct vs Indirect estimation
Direct (like equal-appearing intervals) transforms responses to another scale.
Indirect is no need to transform to another scale.
Selected-response vs. Constructed-response formats
Item formats. One is multiple options choose one, other is generate own answer.
3 types of selected-response item formats
MCs, matching, t/f
What are the names of the two columns in matching
Premises and responses
Completion item
Fill in the blank item
Computerized adaptive testing
What are the advantages of CAT?
Items are based on performance on previous items.
They reduce number of items needed and reduce measurement error (both by around 50%)
Floor vs. Ceiling effects
Floor: assessment tool is bad at distinguishing testtakers at the low end of what’s measured. (all too hard)
Ceiling: everything’s too easy.
Item Branching
Ability to customize content and order on the basis of previous responses.
Class Scoring or Category Scoring
Responders gets placed in a class or category with other responders based on their responses.
Ipsative scoring
what conclusions can be drawn?
Compare score on one scale to another scale within a same test.
Only appropriate for intraindividual comparison, not interindividual
What makes a good test item?
Can discriminate testtakers.
All high scorers getting a particular item wrong is bad sign. Same for opposite (low scorers and getting that item right).
Item analysis
Statistical procedures to analyze and identify good items for a test.
4 possible analyses for test items
Difficulty, Reliability, Validity, and item discrimination
How to calculate index of item’s difficulty?
item-difficulty index or
item-endorsement index
Just a proportion. correct/total number of people
What’s the ideal item difficulty? What should the range be?
.3 to .8, ideal = .5 for discrimination.
(basis of chance + 1) / 2