Test Development Flashcards by Nika Otaza

It is the product of the thoughtful and sound application of established principles of test construction

Test development

How well did you know this?

Not at all

Perfectly

1st step of Test development

Test conceptualization (what, how, who, when, should?)

How well did you know this?

Not at all

Perfectly

Preliminary research surrounding the creation of a prototype of the test

Pilot study/research

How well did you know this?

Not at all

Perfectly

2nd step of test development

Test construction

How well did you know this?

Not at all

Perfectly

Process of setting rules for assigning number in measurement

scaling

How well did you know this?

Not at all

Perfectly

Credited for being at the forefront of efforts to develop methodologically sound scaling methods

LL Thurstone

How well did you know this?

Not at all

Perfectly

Type of scale the consists grouping of words, statement, symbols on which judgments of the strength of a particular trait, attitude, emotion are indicated by the test-taker

Rating scale

How well did you know this?

Not at all

Perfectly

A scale where the final score is obtained by summing the ratings across all items (e.g. Likert Scale)

Summative scale

How well did you know this?

Not at all

Perfectly

A scale where test takers are presented with pairs of stimuli which they are asked to compare

Method of paired comparison

How well did you know this?

Not at all

Perfectly

Entails sorting tasks and judgments of a stimulus in comparison with every other stimulus on the scale (e.g. sort items from most justifiable to least justifiable)

Comparative scaling (ordinal)

How well did you know this?

Not at all

Perfectly

Stimuli placed into one of two or more alternative categories that differ quantitatively with respect to some continuum

categorical scaling

How well did you know this?

Not at all

Perfectly

Respondents who agree with stronger statements of the attitude will also agree with the milder statements

Guttman scale (ordinal)

How well did you know this?

Not at all

Perfectly

Item analysis procedure and approach to test development that involves a graphic mapping of a testtaker’s responses

Scalogram analysis

How well did you know this?

Not at all

Perfectly

Scaling method used to obtain data that are presumed to be in interval in nature

Equal-appearing intervals (thurstone)

How well did you know this?

Not at all

Perfectly

Reservoir or from which items will or will not be bdrawn for the final version of test

Item pool

How well did you know this?

Not at all

Perfectly

Parts of a multiple-choice item format question

stem (sentence)
correct option
distractors/foils

How well did you know this?

Not at all

Perfectly

Also called as short-answer item

Completion item

How well did you know this?

Not at all

Perfectly

Limitations of essay items

Focus on a liimited area; subjectivity in scoring

How well did you know this?

Not at all

Perfectly

Relatively large and easily accessible collection of test questions

item bank

How well did you know this?

Not at all

Perfectly

Interactive, computer-administered test taking process wherein items presented to the testtaker are based in part on the testtaker’s performance on previous items

Computerized-adaptive testing (CAT)

How well did you know this?

Not at all

Perfectly

Ability of the computer to tailor the content and order of the presentation of test items on the basis of responses to previous items

Item branching

How well did you know this?

Not at all

Perfectly

Most commonly used scoring model

Cumulative scoring

How well did you know this?

Not at all

Perfectly

A type of scoring used by some diagnostic systems wherein individuals must exhibit a certain number of symptoms to qualify to a specific diagnosis

Class/categorical scoring

How well did you know this?

Not at all

Perfectly

Compare testtaker’s score on one scale within a test to another scale within that same test

Ipsative scoring

How well did you know this?

Not at all

Perfectly

3rd step in test development

test tryout

4th step in test development

Item analysis

Items that spur motivation and positive testtaking attitude and lessen anxiety

Give away items

Percent of people who said yes, agreed, endorsed the item not who pass the item

Item endorsement index

Range of the optimal item difficulty

0.3-0.8(easy)

Formula for OID

chance performance +1/2

OID for true-false item

0.75 (chance=0.5)

OID for multiple choice item 4 options

0.63 (chance=0.25)

OID for multiple choice item 5 options

0.60 (chance=0.2)

Equal to the product of the item-score standard deviation and the correlation between the item score and the total test score

Item reliability index

Item Analysis Technique for Questions with right/wrong answers

Item Difficulty Item Discrimination Distractor Analysis

Item Analysis Techniques for either right/wrong answers or self-report scales

Item reliability index Cronbach's alpha

Equal to the item score SD and correlation between item score and criterion score

Item validity index

How adequately an item separates or discriminates between high scorers and low scorers on the entire test

Item discrimination index

What are the key properties of the Item-discrimination index?

Symbolised by d * Compares performance on a particular item by the high ability group & the low ability group (i. e. the top 27% and the bottom 27%) * Items that discriminate well will have a high positive score (to a maximum of 1) * A negative d value is a red flag as it means low test takers are doing better on that item than high test takers

The quality of each alternative within a multiple choice item can be readily assessed with reference to the comparatives performance of upper and lower scorers

Analysis of item alternatives (test developer can get an idea of the effectiveness of a distractor by means of a simple EYEBALL Test

Graphic representation of item difficulty and item discrimination

Item characteristic curve (the steeper the slope, the greater the item discrimination)

Test developer addresses the problem of guessing by including in the test manual...

- explicit instructions regarding this point for the examiner to convey to the examinees (ex. instruct answer only if certain) - specific instructions for scoring and interpretting omitted items

Can be used to identify biased items

item characteristic curves

Different shapes of item-characteristic curves for different groups when 2 groups do not differ in total test score

Differential item functioning

Rely primarily on verbal rather than mathematical procedures to explore how individual test items work

Qualitative item analysis (thru group discussion, interviews)

Approach to cognitive assessment entails having respondents verbalize thoughts as they occur

think aloud test administration (one-on-one basis)

Conducted during the test development process in which items are examined for fairness to all prospective testtakers and for the presence of offensive language, stereotypes or situations

Sensitivity review

last step in test development

test revision

Test revision in the life cycle of an existing test

*APA suggests that an existing test be kept in its present form as long as it remains useful but that it should be revised when significant changes in the doman represented or new conditions of test use and interpretation make the test inappropriate for its intended use

Revalidation of a test on a sample of testtakers other than those on whom test performance was originally found to be a valid predictor of some criterion

cross validation (key step in test development)

Decrease in item validities that inevitable occurs after corss-validation of findings

Validity shrinkage (is expected and integral to test development process)

Test validation conducted on 2 or more test using the same sample of testtakers

co-validation (also referred as co-norming)

Examiners undergo training of test administration using test manual

Quality assurance

A test protocol scored by a highly authoritative scorer that is designed as a model for scoring and a mechanism for resolving scoring discrepancies; ensure consistency in scoring

anchor protocol

A discrepancy between scoring in an anchor protocol and the scoring of another protocol

scoring drift

Evaluate how well an individual item is working to measure different levels of the underlying construct

IRT information curves

Item functions differently in one group of testtakers as compared to another group as compared to another group of testtakers known to have the same level of difficulty of the underlying trait (by culture, gender, age)

Differential item functioning (DIF)

Test developers scrutinize group-by-group item response curves looking for DIF items

DIF analysis

Items that respondents from different groups at the same level of underlying trait have different probabilities of endorsing a function of their group membership

DIF items

An advantage of the response format of the test

Great breadth (cover many topics)

Test Development Flashcards

(60 cards)