Chapter 6 Flashcards

(45 cards)

1
Q

Major Steps in Test Development

A
  • Define purpose
  • Preliminary design issues
  • Item preparation
  • Item analysis
  • Standardization/research
  • Final materials and publication
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Statement of Purpose

A
  • Just one sentence

- Focus on trait and scores/interpretation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Mode of Administration

A
  • Group versus individual issue to consider

- Understand processes as person is taking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Length of Exam

A
  • Short limits reliability

- Long makes it more reliable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Item Format

A
  • Type of question to consider when addressing issues

- Some subjectivity in essay responses, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Number of Scores

A
  • Issue to consider when designing
  • Bigger test developers with wider application
  • Improves marketability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Administrator Training

A
  • How much to be considered when designing exam
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Background Research

A
  • Literature search issue

- Discussions with practitioners if used by clinicians

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Item Preparation

A
  • Stimulus/item stem (question, apparatus, etc.)
  • Response format (M-C, T-F)
  • Scoring procedures (correct/incorrect, partial credit, etc.)
  • Conditions governing response (time limit, probing of responses, etc.)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Selected Response Items (Fixed Response)

A
  • T/F, M/C, Likert, etc.
  • Objectively scored
  • Assigning of points
  • Keep content right, simple, and don’t be too obvious
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Constructed Response Items (Free Response)

A

Fill in the blank style test items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Inter-Rater Reliability

A

Scoring requires judgment and a certain degree of agreement is crucial so items are evaluated in the same way or similar way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Holistic

A
  • Scoring scheme in which a single judgment about quality

- Overall impression of what the paper is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Analytic

A
  • Scoring scheme in which it is rated on several different dimensions
  • Grammar, organization, vocabulary, criteria, etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Point System

A

Scoring scheme in which certain points must be included for a perfect answer or full credit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Automated Scoring

A
  • Scoring scheme by computer programs that simulate human judgment
  • Comes from the “and now” period
  • Used for scoring essays too
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Suggestions for Writing Constructed Response Items

A
  • Clear directions
  • Avoid optional items (chose to answer 3 out of 5 essays)
  • Be specific about scoring procedure when preparing questions
  • Score anonymously
  • Use sufficient number of items to maximize reliability and validity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Three Selected-Response Advantages

A
  • Scoring reliability
  • Scoring efficiency
  • Temporal efficiency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Two Constructed Response Advantages

A
  • Easier observation of behavior and processes

- Exploring unusual areas such as covering materials multiple choice cannot address

20
Q

Item Analysis

A

Involves the statistical analysis of data obtained from an item tryout

21
Q

Three Phases of Item Tryout

A
  • Item tryout
  • Statistical analysis
  • Item selection
22
Q

Informal Item Tryout

A

5-10 people similar to those for whom the test is intended comment on the items and directions

23
Q

Formal Item Tryout

A

Administration of test items to samples of examinees who are representative of the target population for the test

24
Q

Independent Study

A
  • Conducting a study exclusively for the purpose of item analysis
  • Most common practice
  • Formal practice for item tryout, subjects often paid
25
Attachment
- Including tryout items in the regular administration of an existing test - SATs and GREs for example
26
Item Difficulty (p)
- Percent of examinees answering the item correctly - P value of .95 is very easy (95% got it right) - P value of .15 is very difficult
27
Item Discrimination (D)
- An item's ability to differentiate statistically in a desired way between groups of examinees - D = sample difference in percent correct in the high and low scoring groups - 50% maximum possible scoring differentiation
28
Truman Kelly
- Showed that statistically, the best way to look at D is 27% top/bottom - Has become the "industry standard" for splits
29
Factor Analysis
- Used to select items that will yield relatively independent/meaningful scores - Applications commonly include attitude scales and personality/interest evaluations - Basic approach: iter-correlations among the items are factor analyzed and underlying dimensions (factors) are identified
30
High Loading Item
- 0.3 or higher in factor analysis - Items that are selected for inclusion in the final test - Each correlation between each item and factor - High cross-loading is NOT a good item
31
Five Guidelines for Item Selection
- Number of items - Content considerations - High discrimination indices - Relationship between p-value and D - Average difficulty level
32
Increased Number of Items
Increased reliability
33
Starting with Easier Test Items
Increases motivation
34
High Discrimination Indices
0.3 to 0.5
35
Maximum possible D-value
- Occurs when p-value is at its midpoint | - Maximum D (1.0) when p = 0.5
36
Mean Score
Sum of the p-values
37
To get an easy test...
Use items with high p-values (closer to 1)
38
To get a difficult test...
Use items with low p-values (closer to 0)
39
Discrimination Index
Difference in percent correct between high-scoring and low-scoring groups
40
Standardization
- Used to develop the norms for the test - Should be the exact version that is published (changing items throws off established norms) - Representativeness is key
41
Equating Programs
- Might be conducted at the same time as the standardization program - Alternate forms - Revised editions - Different levels (such as K-12)
42
Final Test Forms
- Test booklets - Technical manuals (psychometrics, how norms obtained, etc.) - Administration and scoring manuals (how to score, etc.) - Score reports and services - Supplementary materials
43
Continuing Research on Published Tests
- Updating norms | - Applicability of the test to various other populations
44
Test Fairness
A test measures a trait with equivalent validity in different groups
45
Test Bias
- A test does not measure the trait in the same way across different groups - Simple difference in average performance does not constitute bias - Difference in averages must NOT correspond to real difference in underlying trait - Group averages should differ if the groups really do differ in the trait we are trying to measure