Study Guide: Test Construction Flashcards

(39 cards)

1
Q

6 Steps of Test Construction

A
  • Define Test’s Purpose
  • Preliminary Design Issues
  • Item Preparation
  • Item Analysis
  • Standardization and Ancillary Research
  • Preparation of Final Materials and Publication
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Test Purpose

A
  • What will be measured?

* Who is the target audience and does the construct match the group?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Preliminary Design Issues definition

A

Anything that introduces error. Must strike a balance between efficiency and accuracy as well as meet breadth and depth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Examples of Preliminary Design Issues

A
  • Mode of administration
  • Length–longer is more reliable (up to ~15min)
  • Item format (T/F, multiple choice, essay)
  • # of scores (For example, “Depression is multi-faceted and has many scales)
  • Training
  • Background research
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the most important Preliminary Design Issue?

A

Background research!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

4 parts of Item Preparation

A
  • Stimulus
  • Response
  • Conditions governing responses
  • Scoring procedures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Stimulus

A

The question itself is a stimulus.

*We are trying to provoke a specific response correlated with the construct driven ONLY by the stimulus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Response

A

The behavior you are looking for that is correlated to a construct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Conditions governing responses

A

What are your rules? Is there a time limit? Are they able to ask questions?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Scoring procedures

A

Formula or rubric used to formulate final scores

*Make sure each facet is represented (weighted)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Types of Test Items

A
  • Selected-Response Items

* Constructed-Response Items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Selected-Response Items

A

Where you know all possible responses without bias

*T/F, multiple choice, Likert scale, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Constructed-Response Items

A

Responses are unknown/more nebulous

*Essays, oral responses, performance assessment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Benefits of Selected-Response Items

A
  • One clear answer

* scoring reliability and efficiency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Benefits of Constructed-Response Items

A
  • No agreed-upon answer
  • Bx can give further context
  • Goes deeper into construct
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Item Analysis

A
  • Item Tryout
  • Statistical Analysis
  • Item Selection
17
Q

Item Tryout

A

aka Pilot Test

  • Get subjects similar to target population–cannot be same people used in actual survey
  • 2-3x the items you think you will need.
18
Q

Statistical Analysis

A
  • Difficulty
  • Discrimination
  • Distractor Analysis
19
Q

Item Difficulty

A

% of subjects taking the test who answered correctly

20
Q

Difficulty formula

A

p = # people correct // total

21
Q

What shows good variability for Difficulty?

22
Q

Difficulty considerations

A
  • Behavioral measure
  • Characteristic of the item and the sample
  • Extreme p values restrict variability
  • More comparative than a ‘cut-off’
23
Q

Why is Difficulty a behavioral measure

A

It taps into individual differences in holding the construct

24
Q

Item Discrimination

A
  • Assumption that a single item and the test measure the same thing–comparing items to other items within the test
  • Looks at how well any single item is good at discerning who does/does not have a trait
  • You want a high rate!
25
2 Indices of Discrimination
* Index D | * Discrimination Coefficients
26
Index D(iscrimination) formula
* Score and rank * Take top and bottom 27% D= (# correct upper - # correct lower) // # people in larger group
27
Why do we generally focus on just the high/low 27%?
Look up
28
Ranges of D(iscrimination)
.40 and up = good .30 to .39 = okay .20 to .29 = marginal .19 and below = poor
29
Distractor Guidelines
* Plausible * Parallel in structure and grammar * Keep everything short * Mutually exclusive * Alternate placement * Limit 'all of the above' stuff
30
D values for a distractor
* You want low, preferably negative (this means more in the low group chose it) * Zero: it might not be an equally plausible answer * Be cautious of large D values as well
31
Why do we want consistency between distractors?
Moving away from randomness helps determine true measure of construct.
32
Standardization and Ancillary Research
* Norming * Reliability Studies * Equating Programs
33
Test Norming
Two steps: * Define target population * Select sample
34
Sampling Methods
*Probability and Non-Probability
35
Probability Sampling Methods
Every member of Population has a known non-zero chance of being selected * Random * Systematic * Stratefied
36
Non-Probability Sampling Methods
Not every member of population has equal chance of being selected...and some have a zero chance of being selected * Convenience Sampling * Judgement * Quota * Snowball
37
What is more important: original conceptualization or the technical/statistical work?
Original concept!
38
What should you be thinking about, even at the original design stage?
Final Score Reports!
39
Does the norming group need to be large?
Not if it is properly selected!