Chapter 8 Test Development Flashcards

Question

Multiple choice format.

Answer 1

3 elements: 1. a stem 2. correct alternative or option 3. several incorrect options - distractors or foils.

Answer 2

In a matching item the test taker is presented with two columns: premises on the left and responses on the right. Test taker task is to determine which response is best suited with which premise. p.246

Answer 3

``` Where a multiple choice item contains only two possible responses. EG True - false. Agree - disagree Yes - no Fact - opinion Right - wrong. ```

Answer 4

Completion item Short answer Essay

Answer 5

Advantages: Ability to store items in an item bank. Item bank = large collection of testing questions. Ability to individualize testing through item branching.

Answer 6

CAT refers to an interactive, computer administered test taking process wherein items presented to the test taker are based in part on the teat takers performance on previous items. p.248

Answer 7

A floor effect refers to the diminished utility of an assessment tool for distinguishing test takers at the low end of the ability, trait, or other attribute being measured. Solution = to add some less difficult items.

Answer 8

A ceiling effect refers to the diminished utility of an assessment tool for distinguishing test takers at the high end of the ability, trait, or other attribute being measured. ie test too easy. Solution- add some harder questions.

Answer 9

Is the ability of the computer to tailor the content and order of presentation of test items on the basis of responses to previous items. Patterns of items (eg) based on consecutive correct responses. p. 252

Answer 10

Test taker responses earn credit toward placement in a particular class or category with other test takers whose pattern of responses similar.

Answer 11

Scoring model that compares a test taker's score on one scale with a test to another scale within that same test. p. 253.

Answer 12

A biased item is one that favours one particular group of examinees in relation to another when differences in group ability are controlled.

Answer 13

They can be used to identify biased items. Specific items are identified as biased in a statistical sense if they exhibit differential item functioning...different shapes of item-characteristic curves for different groups.

Answer 14

Is a general term for various non statistical procedures designed to explore how individual test items work. Compares individual test items to each other and to the test as a whole. Qualitative methods involve: interviews group discussions

Answer 15

Cognitive assessment approach. Respondents verbalize thoughts as they occur. p.266 table

Answer 16

eg A sensitivity review - a study of items - conducted during test development process in which items are examined for fairness to all prospective test takers... and for the presence of offensive language, stereotypes, etc...

Answer 17

Some items will be eliminated and others will be rewritten from the original pool. Look at difficult- easy - biased - etc

Answer 18

Cross-validation refers to the revalidation of a test on a sample of test takers other than those on whom test performance was originally found to be a predictor of some criterion.

Answer 19

Validity shrinkage is the decrease in item validities that occurs after cross-validation of findings Such shrinkage is expected and integral to the test development process.

Answer 20

Co-validation is a test validation process conducted on two or more tests using the same sample of test takers.

Answer 21

When used in conjunction with the creation of norms or the revision of existing norms, co-validation may also be referred as co-norming. A current trend among test publishers who publish more than one test designed for use with the same population is to co-validate and/or co-norm tests. Economical.

Answer 22

Is a mechanism for ensuring consistency in scoring ... and is a test protocol scored by an authoritative scorer that os designed as a model for scoring and a mechanism for resolving scoring discrepancies.

Answer 23

A scoring drift is a discrepancy between scoring in an anchor protocol and the scoring of another protocol. Once protocols are scored, the data from them must be entered into a data base.

Answer 24

Each of the items assembled as part of an item bank has undergone rigorous qualitative and quantitative evaluation. Many items come from existing instruments. New items may be written. All items constitute the item pool. p.274

Answer 25

.Likert scales (eg 1=strongly disagree - 7=strongly disagree) .Binary choice scales (true/false: like/dislike) .Forced choice (eg. I am happy most of the time OR I am sad most of the time) . Semantic differential scales (eg. strong .......weak).

Answer 26

To create an item pool. Two general item format options: 1. selected response items 2. constructed response items

Answer 27

- Item difficulty Index - Item discrimination index - Item validity index - Item reliability index

Answer 28

Item difficulty index is calculated as the proportion of test takers who answered the item correctly. (p) P value ranges from 0 to 1 Each item has a corresponding p value. eg p1 is read " item difficulty index for item 1"

Answer 29

It is calculated as the average of all the p values for the test items. Optimal average item difficulty is 0.5 IE individual items should range in difficulty from 0.3 (somewhat difficult) to 0.8 (somewhat easy). The effect of guessing must be taken into account.

Answer 30

Items that everyone answers correctly p item = 1 or that no one answers correctly p item = 0 DO NOT DISCRIMINATE between test takers.

Answer 31

Item discrimination index is the degree to which an item differentiates correctly on the behaviour the test is designed to measure. IE. An item is good if most of the high scorers on the test overall answer the item correctly. Most of the low scorers on the test answer the item incorrectly.

Answer 32

(1 + Probability) /2. = eg (1+.25) /2. =.625 =>. .63 Optimal

Chapter 8 Test Development Flashcards

(56 cards)