Chapter 8 Flashcards

(38 cards)

1
Q

What’s an ideal item on a test for a norm-referenced test?

What about criterion?

A

Top scorers should get it correct, while low scorers should get it wrong.

The above doesn’t matter for criterion-based. An ideal test item is based on how well it assesses mastery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Scaling definition

A

Process of settings rules for assigning numbers in measurement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Stanine scale

A

Raw scores transformed into scores ranged from 1 to 9.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Rating scale

A

Records judgements of oneself, others, experiences, or objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Summative scale

A

Final test score is a sum of all items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Method of paired comparisons

A

Asked to choose an option based out of two options.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Comparative Scaling

A

Sort options in comparison based on judgements. (eg. rank cards)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Categorical scaling

A

Sort objects into categories (eg. sorting cards to “justified” “sometimes justified” “always justified”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Guttman scale

A

Weaker to stronger expressions.

Agree with stronger ones will also agree with milder

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Direct vs Indirect estimation

A

Direct (like equal-appearing intervals) transforms responses to another scale.

Indirect is no need to transform to another scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Selected-response vs. Constructed-response formats

A

Item formats. One is multiple options choose one, other is generate own answer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

3 types of selected-response item formats

A

MCs, matching, t/f

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the names of the two columns in matching

A

Premises and responses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Completion item

A

Fill in the blank item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Computerized adaptive testing

What are the advantages of CAT?

A

Items are based on performance on previous items.

They reduce number of items needed and reduce measurement error (both by around 50%)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Floor vs. Ceiling effects

A

Floor: assessment tool is bad at distinguishing testtakers at the low end of what’s measured. (all too hard)

Ceiling: everything’s too easy.

17
Q

Item Branching

A

Ability to customize content and order on the basis of previous responses.

18
Q

Class Scoring or Category Scoring

A

Responders gets placed in a class or category with other responders based on their responses.

19
Q

Ipsative scoring

what conclusions can be drawn?

A

Compare score on one scale to another scale within a same test.

Only appropriate for intraindividual comparison, not interindividual

20
Q

What makes a good test item?

A

Can discriminate testtakers.

All high scorers getting a particular item wrong is bad sign. Same for opposite (low scorers and getting that item right).

21
Q

Item analysis

A

Statistical procedures to analyze and identify good items for a test.

22
Q

4 possible analyses for test items

A

Difficulty, Reliability, Validity, and item discrimination

23
Q

How to calculate index of item’s difficulty?

item-difficulty index or

item-endorsement index

A

Just a proportion. correct/total number of people

24
Q

What’s the ideal item difficulty? What should the range be?

A

.3 to .8, ideal = .5 for discrimination.

(basis of chance + 1) / 2

25
Item-reliability index
standard deviation multiplied by correlation of item score and total test score. Internal consistency.
26
Item-variability index
Degree to which a test measures what it purports to measure.
27
Item-discrimination index
lowercase "d". Difference between high scorers (upper 25-33%) answering an item correctly and proportion of low scorers (lower 25-33%) answering it correctly. Negative d is bad. Means low scorers answer it more correctly than high scorers.
28
Item-characteristic curve
A graphic representation of item difficulty and discrimination. Probability of correct (y) and Ability (x)
29
What are biased test items?
Ones that prohibit item fairness by favoring a group.
30
How do ICCs helpp identify bias?
Different ICCs for different groups even if totals are the same.
31
How should item analysis people deal with speed tests? What's the problem with analyzing speed tests?
Items at the end might be rushed or have no good results, leading to wrong interpretations of analyses. Adminster the test with lots of time for item-analyses.
32
"think aloud" test administration
Testtaker thinks through his thought process and says out loud. Qualitative research. Sees if they using right line of thought.
33
Sensitivity review
Examine fairness of a test and look for stereotypes, offensive language, etc.
34
Cross-validation
revalidation of a test on a new sample.
35
Validity shrinkage
Items for a final version of a test will have lower item validity.
36
Co-validation Co-norming
Using 2+ tests for same sample of testtakers. Making or revising existing norms with 2+ tests for same sample.
37
Anchor Protocol
A test protocol scored by a highly authoritative scorer
38