Chapter 6 Flashcards

Question 1

Q

Major Steps in Test Development

Answer

A

Define purpose
Preliminary design issues
Item preparation
Item analysis
Standardization/research
Final materials and publication

Question 2

Q

Statement of Purpose

Answer

A

Just one sentence

- Focus on trait and scores/interpretation

Question 3

Q

Mode of Administration

Answer

A

Group versus individual issue to consider

- Understand processes as person is taking

Question 4

Q

Length of Exam

Answer

A

Short limits reliability

- Long makes it more reliable

Question 5

Q

Item Format

Answer

A

Type of question to consider when addressing issues

- Some subjectivity in essay responses, etc.

Question 6

Q

Number of Scores

Answer

A

Issue to consider when designing
Bigger test developers with wider application
Improves marketability

Question 7

Q

Administrator Training

Answer

A

How much to be considered when designing exam

Question 8

Q

Background Research

Answer

A

Literature search issue

- Discussions with practitioners if used by clinicians

Question 9

Q

Item Preparation

Answer

A

Stimulus/item stem (question, apparatus, etc.)
Response format (M-C, T-F)
Scoring procedures (correct/incorrect, partial credit, etc.)
Conditions governing response (time limit, probing of responses, etc.)

Question 10

Q

Selected Response Items (Fixed Response)

Answer

A

T/F, M/C, Likert, etc.
Objectively scored
Assigning of points
Keep content right, simple, and don’t be too obvious

Question 11

Q

Constructed Response Items (Free Response)

Answer

A

Fill in the blank style test items

Question 12

Q

Inter-Rater Reliability

Answer

A

Scoring requires judgment and a certain degree of agreement is crucial so items are evaluated in the same way or similar way

Question 13

Q

Holistic

Answer

A

Scoring scheme in which a single judgment about quality

- Overall impression of what the paper is

Question 14

Q

Analytic

Answer

A

Scoring scheme in which it is rated on several different dimensions
Grammar, organization, vocabulary, criteria, etc.

Question 15

Q

Point System

Answer

A

Scoring scheme in which certain points must be included for a perfect answer or full credit

Question 16

Q

Automated Scoring

Answer

A

Scoring scheme by computer programs that simulate human judgment
Comes from the “and now” period
Used for scoring essays too

Question 17

Q

Suggestions for Writing Constructed Response Items

Answer

A

Clear directions
Avoid optional items (chose to answer 3 out of 5 essays)
Be specific about scoring procedure when preparing questions
Score anonymously
Use sufficient number of items to maximize reliability and validity

Question 18

Q

Three Selected-Response Advantages

Answer

A

Scoring reliability
Scoring efficiency
Temporal efficiency

Question 19

Q

Two Constructed Response Advantages

Answer

A

Easier observation of behavior and processes

- Exploring unusual areas such as covering materials multiple choice cannot address

Question 20

Q

Item Analysis

Answer

A

Involves the statistical analysis of data obtained from an item tryout

Question 21

Q

Three Phases of Item Tryout

Answer

A

Item tryout
Statistical analysis
Item selection

Question 22

Q

Informal Item Tryout

Answer

A

5-10 people similar to those for whom the test is intended comment on the items and directions

Question 23

Q

Formal Item Tryout

Answer

A

Administration of test items to samples of examinees who are representative of the target population for the test

Question 24

Q

Independent Study

Answer

A

Conducting a study exclusively for the purpose of item analysis
Most common practice
Formal practice for item tryout, subjects often paid

Question 25

Q

Attachment

Answer

A

Including tryout items in the regular administration of an existing test
SATs and GREs for example

Question 26

Q

Item Difficulty (p)

Answer

A

Percent of examinees answering the item correctly
P value of .95 is very easy (95% got it right)
P value of .15 is very difficult

Question 27

Q

Item Discrimination (D)

Answer

A

An item’s ability to differentiate statistically in a desired way between groups of examinees
D = sample difference in percent correct in the high and low scoring groups
50% maximum possible scoring differentiation

Question 28

Q

Truman Kelly

Answer

A

Showed that statistically, the best way to look at D is 27% top/bottom
Has become the “industry standard” for splits

Question 29

Q

Factor Analysis

Answer

A

Used to select items that will yield relatively independent/meaningful scores
Applications commonly include attitude scales and personality/interest evaluations
Basic approach: iter-correlations among the items are factor analyzed and underlying dimensions (factors) are identified

Question 30

Q

High Loading Item

Answer

A

0.3 or higher in factor analysis
Items that are selected for inclusion in the final test
Each correlation between each item and factor
High cross-loading is NOT a good item

Question 31

Q

Five Guidelines for Item Selection

Answer

A

Number of items
Content considerations
High discrimination indices
Relationship between p-value and D
Average difficulty level

Question 32

Q

Increased Number of Items

Answer

A

Increased reliability

Question 33

Q

Starting with Easier Test Items

Answer

A

Increases motivation

Question 34

Q

High Discrimination Indices

Answer

A

0.3 to 0.5

Question 35

Q

Maximum possible D-value

Answer

A

Occurs when p-value is at its midpoint

- Maximum D (1.0) when p = 0.5

Question 36

Q

Mean Score

Answer

A

Sum of the p-values

Question 37

Q

To get an easy test…

Answer

A

Use items with high p-values (closer to 1)

Question 38

Q

To get a difficult test…

Answer

A

Use items with low p-values (closer to 0)

Question 39

Q

Discrimination Index

Answer

A

Difference in percent correct between high-scoring and low-scoring groups

Question 40

Q

Standardization

Answer

A

Used to develop the norms for the test
Should be the exact version that is published (changing items throws off established norms)
Representativeness is key

Question 41

Q

Equating Programs

Answer

A

Might be conducted at the same time as the standardization program
Alternate forms
Revised editions
Different levels (such as K-12)

Question 42

Q

Final Test Forms

Answer

A

Test booklets
Technical manuals (psychometrics, how norms obtained, etc.)
Administration and scoring manuals (how to score, etc.)
Score reports and services
Supplementary materials

Question 43

Q

Continuing Research on Published Tests

Answer

A

Updating norms

- Applicability of the test to various other populations

Question 44

Q

Test Fairness

Answer

A

A test measures a trait with equivalent validity in different groups

Question 45

Q

Test Bias

Answer

A

A test does not measure the trait in the same way across different groups
Simple difference in average performance does not constitute bias
Difference in averages must NOT correspond to real difference in underlying trait
Group averages should differ if the groups really do differ in the trait we are trying to measure

Brainscape's Knowledge GenomeTM

Chapter 6 Flashcards

Brainscape's Knowledge Genome^TM