Chapter 6 Flashcards

1
Q

Major Steps in Test Development

A
  • Define purpose
  • Preliminary design issues
  • Item preparation
  • Item analysis
  • Standardization/research
  • Final materials and publication
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Statement of Purpose

A
  • Just one sentence

- Focus on trait and scores/interpretation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Mode of Administration

A
  • Group versus individual issue to consider

- Understand processes as person is taking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Length of Exam

A
  • Short limits reliability

- Long makes it more reliable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Item Format

A
  • Type of question to consider when addressing issues

- Some subjectivity in essay responses, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Number of Scores

A
  • Issue to consider when designing
  • Bigger test developers with wider application
  • Improves marketability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Administrator Training

A
  • How much to be considered when designing exam
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Background Research

A
  • Literature search issue

- Discussions with practitioners if used by clinicians

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Item Preparation

A
  • Stimulus/item stem (question, apparatus, etc.)
  • Response format (M-C, T-F)
  • Scoring procedures (correct/incorrect, partial credit, etc.)
  • Conditions governing response (time limit, probing of responses, etc.)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Selected Response Items (Fixed Response)

A
  • T/F, M/C, Likert, etc.
  • Objectively scored
  • Assigning of points
  • Keep content right, simple, and don’t be too obvious
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Constructed Response Items (Free Response)

A

Fill in the blank style test items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Inter-Rater Reliability

A

Scoring requires judgment and a certain degree of agreement is crucial so items are evaluated in the same way or similar way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Holistic

A
  • Scoring scheme in which a single judgment about quality

- Overall impression of what the paper is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Analytic

A
  • Scoring scheme in which it is rated on several different dimensions
  • Grammar, organization, vocabulary, criteria, etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Point System

A

Scoring scheme in which certain points must be included for a perfect answer or full credit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Automated Scoring

A
  • Scoring scheme by computer programs that simulate human judgment
  • Comes from the “and now” period
  • Used for scoring essays too
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Suggestions for Writing Constructed Response Items

A
  • Clear directions
  • Avoid optional items (chose to answer 3 out of 5 essays)
  • Be specific about scoring procedure when preparing questions
  • Score anonymously
  • Use sufficient number of items to maximize reliability and validity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Three Selected-Response Advantages

A
  • Scoring reliability
  • Scoring efficiency
  • Temporal efficiency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Two Constructed Response Advantages

A
  • Easier observation of behavior and processes

- Exploring unusual areas such as covering materials multiple choice cannot address

20
Q

Item Analysis

A

Involves the statistical analysis of data obtained from an item tryout

21
Q

Three Phases of Item Tryout

A
  • Item tryout
  • Statistical analysis
  • Item selection
22
Q

Informal Item Tryout

A

5-10 people similar to those for whom the test is intended comment on the items and directions

23
Q

Formal Item Tryout

A

Administration of test items to samples of examinees who are representative of the target population for the test

24
Q

Independent Study

A
  • Conducting a study exclusively for the purpose of item analysis
  • Most common practice
  • Formal practice for item tryout, subjects often paid
25
Q

Attachment

A
  • Including tryout items in the regular administration of an existing test
  • SATs and GREs for example
26
Q

Item Difficulty (p)

A
  • Percent of examinees answering the item correctly
  • P value of .95 is very easy (95% got it right)
  • P value of .15 is very difficult
27
Q

Item Discrimination (D)

A
  • An item’s ability to differentiate statistically in a desired way between groups of examinees
  • D = sample difference in percent correct in the high and low scoring groups
  • 50% maximum possible scoring differentiation
28
Q

Truman Kelly

A
  • Showed that statistically, the best way to look at D is 27% top/bottom
  • Has become the “industry standard” for splits
29
Q

Factor Analysis

A
  • Used to select items that will yield relatively independent/meaningful scores
  • Applications commonly include attitude scales and personality/interest evaluations
  • Basic approach: iter-correlations among the items are factor analyzed and underlying dimensions (factors) are identified
30
Q

High Loading Item

A
  • 0.3 or higher in factor analysis
  • Items that are selected for inclusion in the final test
  • Each correlation between each item and factor
  • High cross-loading is NOT a good item
31
Q

Five Guidelines for Item Selection

A
  • Number of items
  • Content considerations
  • High discrimination indices
  • Relationship between p-value and D
  • Average difficulty level
32
Q

Increased Number of Items

A

Increased reliability

33
Q

Starting with Easier Test Items

A

Increases motivation

34
Q

High Discrimination Indices

A

0.3 to 0.5

35
Q

Maximum possible D-value

A
  • Occurs when p-value is at its midpoint

- Maximum D (1.0) when p = 0.5

36
Q

Mean Score

A

Sum of the p-values

37
Q

To get an easy test…

A

Use items with high p-values (closer to 1)

38
Q

To get a difficult test…

A

Use items with low p-values (closer to 0)

39
Q

Discrimination Index

A

Difference in percent correct between high-scoring and low-scoring groups

40
Q

Standardization

A
  • Used to develop the norms for the test
  • Should be the exact version that is published (changing items throws off established norms)
  • Representativeness is key
41
Q

Equating Programs

A
  • Might be conducted at the same time as the standardization program
  • Alternate forms
  • Revised editions
  • Different levels (such as K-12)
42
Q

Final Test Forms

A
  • Test booklets
  • Technical manuals (psychometrics, how norms obtained, etc.)
  • Administration and scoring manuals (how to score, etc.)
  • Score reports and services
  • Supplementary materials
43
Q

Continuing Research on Published Tests

A
  • Updating norms

- Applicability of the test to various other populations

44
Q

Test Fairness

A

A test measures a trait with equivalent validity in different groups

45
Q

Test Bias

A
  • A test does not measure the trait in the same way across different groups
  • Simple difference in average performance does not constitute bias
  • Difference in averages must NOT correspond to real difference in underlying trait
  • Group averages should differ if the groups really do differ in the trait we are trying to measure