W3 - Chapter 8 - Test Development - DN Flashcards Preview

z. Psychological Testing and Assessment > W3 - Chapter 8 - Test Development - DN > Flashcards

Flashcards in W3 - Chapter 8 - Test Development - DN Deck (58):
1

anchor protocol

  • a test answer sheet
  • developed by a test publisher  
  • to test the accuracy of examiners' scoring

p.280

2

biased test item

  • an item that favours one group in relation to another
  • when differences in group ability are controlled

p.271

3

binary-choice item

  • multiple choice item
  • contains only two possible responses (true-false)

p.254

4

categorical scaling

  • system of scaling
  • stimuli placed in one of two or more alternative categories that differ quantitatively with respect to some continuum

p.249

5

categorical scoring

  • a method of evaluation
  • where test responses earn credit toward placement in a particular class/category
  • sometimes testtakers must meet a set number of responses corresponding to a particular criterion to be placed in a specific category
  • also called class scoring
  • contrast with cumulative scoring & ipsative scoring 

p.260

6

ceiling effect

  • diminished utility of a tool of assessment in distinguishing testtakers at the high end of the ability, trait, or other attribute being measured

p.259, 307

7

class scoring

  • a method of evaluation
  • where test responses earn credit toward placement in a particular class/category
  • sometimes testtakers must meet a set number of responses corresponding to a particular criterion to be placed in a specific category
  • contrast with cumulative scoring & ipsative scoring

p.260

8

comparative scaling

  • in test development
  • a method of developing ordinal scales
  • through the use of a sorting task 
  • entails judging a stimulus in comparison with every other stimulus used on the test

p.249

9

completion item

  • requires an examinee to provide a word or phrase that completes a sentence

p.254

10

computerized adaptive testing (CAT)

  • an interactive, computer-administered testtaking process
  • items are presented to the testtaker, based in part on the testtakers' performance on previous items

p.15, 255-256

11

co-norming

  • the test norming process conducted on two or more tests
  • using the same sample of testtakers
  • when used to validate all of the tests being normed, this process may also be referred to as co-validation

p.138n4, 278

12

constructed-response format

  • a form of test item requiring a testtaker to construct or create a response  
  • as opposed to simply selecting a response
  • contrast with selected-response format

p.252

13

co-validation

  • when co-norming is used to validate all of the tests being normed
  • this process may also be referred to as co-validation

p.278

14

cross-validation

  • a revalidation on a sample of testtakers
  • other than the testtakers on whom test performance was originally found to be a valid predictor of some criterion  

p.278

15

essay item

  • a test item that requires a testtaker to write a composition
  • typically one that demonstrates recall of facts, understanding, analysis, and/or interpretation

p.255

16

expert panel

  • in test development process
  • group of people knowledgeable about - the subject matter being tested, and/or  the population for whom the test is being designed
  • they can provide input to improve test's content, fairness etc.

p.274-275

17

floor effect

  • a phenomenon arising from the diminished utility of a tool of assessment in distinguishing testtakers at the low end of the ability, trait, or other attribute being measured

p.256-259

18

giveaway item

  • a test item, usually near the beginning of a test of ability or achievement
  • designed to be relatively easy
  • usually for the purpose of building the testtakers confidence or reducing test-related anxiety

p.263n4

19

What three criteria must be met when correcting for the impact of guessing?

  1. must recognize that guesses are not normally totally random   
  2. must deal with the problem of omitted items
  3. some testtakers are lucky and others unlucky

p.269-271

20

Guttman scale

  • a scale - items range sequentially from weaker to stronger expressions of the attitude or belief being measured
  • constructed so that selection of an earlier item presumes that all following items are also true of the testtaker
  • named after its developer 

p.249

21

ipsative scoring

  • approach to scoring & interpretation
  • responses & presumed strength of measured trait are interpreted relative to the measured strength of other traits for that testtaker
  • contrast with class scoring & cumulative scoring

p.260

22

item analysis

  • general term used to describe various procedures
  • usually statistical, designed to explore how individual items work compared to others in the test & in the context of the whole test
    • e.g., to explore the level of difficulty of individual items on an achievement test 
    • e.g., to explore the reliability of a personality test
  • contrast with qualitative item analysis 

p.262-275

23

item bank

  • a collection of questions to be used in the construction of a test

p.255, 257-259, 282-284

24

item branching

  • in computerised adaptive testing (CAT)
  • the individualised presentation of test items drawn from an item bank based on the testtakers' previous responses

p.260

25

item-characteristic curve (ICC)

  • graphic representation of the probalistic relationship between a person's level of trait (ability, characteristic) being measured and the probability for responding to an item in a predicted way
  • also known as a category response curve or an item trace line

p.177, 281 p.268

26

item-difficulty index

  • items cannot be too easy or too hard in order to differentiate between testtakers knowledge of the subject matter
  • a statistic obtained by calculating the proportion of the total number of testtakers who answered an item correctly
    • p is used to denote item difficulty
    • a subscript 1 refers to the item number = p1
  • can range from 0-1
    • the larger the item-difficulty index, the easier the item
    • (i.e., the higher the p, the easier the item - because p represents the number of people passing the item)

p.263-264

27

item-discrimination index

  • measure of item discrimination
  • symbolised by d

p.264-268

28

item-endorsement index

  • the name given to an item-difficulty test (which is used in achievement testing) when used in other contexts (e.g., personality testing)

p.263

29

item fairness

  • a reference to the degree of bias, if any, in a test item

p.271-272

30

item format

  • a reference to the form, plan, structure, arrangement, or layout of individual test items
  • including whether the test items require testtakers to select or create a response

p.252-255

31

item pool

  • the reservoir or well from which items will or will not be drawn for the final version of the test
  • the collection of items to be further evaluated for possible selection for use in an item bank

p.251

32

item-reliability index

  • provides an indication of the internal consistency of a test
  • the higher the index, the greater the internal consistency
  • index is equal to
    • the product of the item-score standard deviation (s) and
    • the correlation (r) between the item score and the total test score

p.264

33

item-validity index

  • a statistic designed to provide an indication of the degree to which a test is measuring what it purports to measure
  • important when a test developer's goal is to maximise the criterion-related validity of a test
  • the higher the item-validity index, the greater the test's criterion-related validity
  • to calculate we must first know
    • the item-score standard deviation (symbolised as s1, s2, s3 etc.)
    • and the correlation between the item score and the criterion score 
  • then we use the item difficulty index p1 in the following formula
    • s1 = square root of p1 (1 - p1)
  • the correlation between the score on item 1 and a score on a criterion measure (r1c) is multiplied by item 1's item-score standard deviation (s1)
    • the product is an index of an items validity (s1 r1c)

p.264

34

Likert scale

  • summative rating scale with 5 alternative responses
    • ranging on a continuum from e.g., "strongly agree" to "strongly disagree"

p.247

35

matching item

  • the testtaker is presented with two columns
  • premises on the left & responses on the right
  • task is to determine which response is best matched to which premise
    • young testtakers (draw a line)
    • others typically asked to write a letter/number as a response

p.253

36

method of paired comparisons

  • a scaling method
  • a pair of stimuli (e.g., photos) is selected according to a rule
    • (e.g., "select the one that is more appealing") p.248

37

multiple-choice format

  • one of the three types of selected-response item formats
  • three elements
    1. a stem
    2. a correct alternative or option
    3. and several incorrect alternatives (referred to as distractors or foils)

p.252

38

pilot work

  • also referred to as pilot study & pilot research
  • preliminary research surrounding the creation of a prototype test
  • general objective is to determine how best to
    • gauge
    • assess, or
    • evaluate the targeted construct(s)

p.243-244

39

qualitative item analysis

  • non-statistical procedures designed to explore how individual test items work
  • both compared to other items in the test & in the context of the whole test
  • unlike statistical measures, they involve exploration of the issues by verbal means
    • (e.g., interviews & group discussions with testtakers & other relevant parties)

p.272-275

40

qualitative methods

  • techniques of data generation & analysis
  • rely primarily on verbal rather than mathematical or statistical procedures

p.272

41

rating scale

  • a system of ordered numerical or verbal descriptors
  • used to make judgements about the presence, absence, or magnitude of a particular trait, attitude, emotion, or other variable

p.205, 247, 371

42

scaling

  • 1) in test construction
    • the process of setting rules for assigning numbers in measurement
  • 2) the process by which a measuring device
    • is designed and calibrated &
    • the way numbers (or other indices) are assigned to different amounts of a trait, attribute, or characteristic being measured

p.244-251

43

scalogram analysis

  • an item-analysis procedure
  • entails graphic mapping of a testtaker's responses

p.250

44

scoring drift

  • a discrepancy between the scoring in an anchor protocol and the scoring of another protocol

p.280

45

selected-response format

  • a form of test item
  • requiring testtakers to select a response
    • (e.g., true/false, multiple choice, and matching items)
    • as opposed to creating one - contrast with constructed-response format p.252

46

sensitivity review

  • a study of test items
  • usually during test development
  • items are examined for fairness to all prospective testtakers
    • for the presence of offensive language, stereotypes, or situations

p.274

47

short-answer item

  • may also be referred to as a completion item
  • a word, term, sentence or a paragraph may qualify
    • anything beyond this is an essay item

p.254

48

summative scale

  • an index derived from the summing of selected scores on a test or sub-test

p.247

49

test conceptualization

  • an early stage of the test development process
  • when an idea for a particular test or test revision is conceived

p.240, 241-244

50

test construction

  • a stage in the process of test development
  • entails writing test items (or rewriting/revising existing items)
  • as well as formatting items, setting scoring rules, and otherwise designing and building a test

p.240

51

test development

  • an umbrella term for all that goes into the process of creating a test

p.240-284

52

test revision

  • action taken to modify a test's content or format
  • for the purpose of improving the test's effectiveness as a tool of measurement

p.240

53

test tryout

  • a stage in the process of test development that entails administering a preliminary version of a test to a representative sample of testtakers
  • under conditions that simulate the conditions under which the final version of the test will be administered

p.240, 261-262

54

"think aloud" test administration

  • a method of qualitative item analysis
  • examinees verbalize their thoughts as they take the test
  • useful in understanding how
    • individual items function in a test
    • testtakers interpret or misinterpret the meaning of the individual items

p.274

55

true-false item

  • a binary-choice item
    • i.e., contains only one of two responses
  • requires testtaker to indicate whether a statement is or is not a fact

p.254

56

validity shrinkage

  • the decrease in item validities that inevitably occurs after cross-validation

p.278

57

What is the optimal item difficulty?

  • usually midpoint between 1.0 and the probability of answering correctly by guessing
    • which is called the chance success proportion
      • multi choice (50% chance of getting it right by guessing) - .5 +1.00 = 1.5 divided by 2 = .60 10:00

p.263

58

How can you create a visual representation of the best items on a test

(i.e., if the objective is to maximise criterion-related validity)?

  • this can be achieved by plotting each item's
    • item-validity index and
    • item-reliability index

p.265 

Fig 8-5