Chapter 7 - 8 Flashcards

1
Q

usefulness or practical value of testing to improve efficiency

A

Utility

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

used to refer to the usefulness or practical value of a training program or
intervention

A

Utility

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Factors that affect a test’s utility

A
  1. Psychometric Soundness
  2. Cost
  3. Benefits
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Gives us the practical value of both the scores (reliability
and validity)

A

Psychometric Soundness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

They tell us whether decisions are cost-effective

A

Psychometric Soundness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A test must be valid to be useful, but a valid test is not always a useful test, especially if testtakers do not follow test directions

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

It refers to disadvantages, losses or expenses in both economic and noneconomic terms

A

Cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

It refers to profits, gains or advantages

A

Benefit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

It is a family of techniques that entail a cost-benefit analysis designed to yield information relevant to a decision about the usefulness and/or practical value of a tool of assessment

A

Utility Analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

provide an indication of likelihood that a testtaker will score within some interval of scores on a criterion measure – an
interval may be categorized as “passing”, “acceptable” or “failing”

A

Expectancy Table/Chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

estimate of the percentage of employees hired by a particular test who will be successful to their jobs

A

Taylor-Russell Tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

used for obtaining the difference between the means of the selected and unselected groups to derive an index of what the test is
adding to already established procedure

A

Naylor-Shine Tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A formula used to calculate the dollar amount of a utility gain resulting from the
use of a particular selection instrument under specified conditions

A

Brodgen-Cronbach-Gleser Formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

an estimate of the benefit (monetary/otherwise) of using a particular
test or selection method

A

Utility gain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

a body of methods used to quantitatively evaluate selection procedures,
diagnostic classifications, therapeutic interventions or other assessment or
intervention-related procedures in terms of how optimal they are (most typically
from a cost-benefit perspective)

A

Decision Theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

a correct classification

A

hit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

a qualified driver is hired; an unqualified driver is not hired

A

It is a hit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

an incorrect classification; a mistake

A

miss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

a qualified driver is not hired; an unqualified driver is hired

A

It is a miss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

the proportion of people that an assessment tool accurately identified
as possessing a particular variable

A

hit rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

the proportion of qualified drivers with a passing score who actually
gain permanent employee status; the proportion of unqualified drivers with a
failing score who did not gain permanent status

A

This is a hit rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

the proportion of people that an assessment tool inaccurately identified
as possessing a particular variable

A

miss rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

the proportion of drivers whom inaccurately predicted to be qualified;
the proportion of drivers whom inaccurately predicted to be unqualified

A

this is a miss rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

falsely indicates that the testtaker possesses a particular variable; example: a driver who is hired is not qualified

A

false positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

falsely indicates that the testtaker does not possess a particular variable; the assessment tool says to not hire but driver would have been rated as qualified

A

false negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Some practical considerations

A

The Pool of Job Applicants
The Complexity of the Job
The Cut Score in Use

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

a (usually numerical) reference point derived as a result of a judgment and used to divide a set of data into two or more classifications, with some action to be taken
or some inference to be made on the basis of these classifications

A

Cut Score/Cutoff Score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

dictate what sort of information will be required as well as the
specific methods to be used

A

objective of utility analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Used to measure costs vs. benefits

A

Expectancy Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q
  • Based on norm-related considerations rather
    than on the relationship of test scores to a
    criterion
  • Also called norm-referenced cut score
  • Ex.) top 10% of test scores get A’s
  • normative
A

Relative cut score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q
  • set with reference to a judgment concerning a minimum level of proficiency required to be included in a particular classification.
  • Also called absolute cut score
  • criterion
A

Fixed cut score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

using two or more cut scores with reference to one predictor for the purpose of categorizing
testtakers

A

Multiple cut scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Ex.) having cut score that marks an A, B, C etc.
all measuring same predictor

A

Multiple cut scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

the achievement of a particular cut score on one test is necessary in order to
advance to the next stage of evaluation in the selection process

A

Multiple-stage or Multi Hurdle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

written application->group interview->personal interview

A

Multiple-stage or Multi Hurdle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

assumption is made that high scores on one attribute can compensate for low scores on another attribute

A

Compensatory model of selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Who devised Angoff method?

A

William Angoff

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Who devised Angoff method?

A

William Angoff

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

a way to set fixed cut scores that entails averaging the judgments of experts; must have high inter-rater reliability

A

Angoff Method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

a system of collecting data on a predictor of interest from groups known to
possess (and not to possess) a trait, attribute or ability of interest

A

Know Groups Method/Method of Contrasting Groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

a system of collecting data on a predictor of interest from groups known to
possess (and not to possess) a trait, attribute or ability of interest

A

Know Groups Method/Method of Contrasting Groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

a cut score is set on the test that best discriminates the high performance from low
performers

A

Know Groups Method/Method of Contrasting Groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

-in order to “pass” the test, the testtaker must answer items that are considered that has some minimum level of difficulty, which is determined by the experts and serves as the cut score

A

Item Response Theory (IRT)-Based Methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q
  • Based on testtaker’s performance across all items on a test
  • Some portion of test items must be correct
A

IRT Based Method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

a technique for identifying cut scores based on the number of positions to be
filled

A

Method of Predictive Yield

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

a family of statistical techniques used to shed light on the relationship between certain variables and two or more naturally occurring groups

A

Discriminant Analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

determining difficulty level reflected by cut score

A

Item mapping method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

test items are listed, one per page, in ascending level of
difficulty. An expert places a bookmark to mark the divide which separates testtakers who have acquired minimal knowledge, skills, or abilities and those that have not. Problems include training of experts, possible floor and ceiling effects, and the optimal length of item booklets

A

Bookmark-method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Steps in Test Development

A
  1. TEST CONCEPTUALIZATION
  2. TEST CONSTRUCTION
  3. TEST TRYOUT
  4. ITEM ANALYSIS
  5. TEST REVISION
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Conception of idea by the test developer

A

Test Conceptualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

An emerging social phenomenon or pattern of behavior might serve
as the stimulus for the development of a new test.

A

Test Conceptualization

52
Q

An item for which high scorers on the test respond correctly. Low scorers respond to that same item incorrectly

A

Norm-referenced conceptualization

53
Q

The conceptualization is on the construct that is need to maste

A

Criterion-referenced conceptualization

54
Q

high scorers on the test get a particular item right whereas low scorers on the test get that same item wrong.

A

Criterion-referenced conceptualization

55
Q

prototype of the test; necessary for research reason; but not required for
teacher-made test

A

Pilot work

56
Q

To know whether some items should be included in the final form of the instrument

A

Pilot work

57
Q

the test developer typically attempts to determine how
best to measure a targeted construct

A

Pilot work

58
Q

process of setting rules for assigning numbers in
measurement.

A

Scaling

59
Q

credited for being the forefront of efforts to develop methodologically sound scaling methods

A

LL Thurstone

60
Q

Stanine scale

A

Raw score converted from 1-9

61
Q

measuring one construct

A

Unidimensional Scale

62
Q

measuring more than one construc

A

Multidimensional Scale

63
Q

entails judgments of a stimulus in comparison with every other stimulus on the scale (best to worst)

A

Comparative Scaling

64
Q

stimuli are placed into one of two or more alternative categories that differ quantitatively with
respect to some continuum (section 1, section 2, section 3)

A

Categorical Scaling

65
Q

Which can be defined as a grouping of words,
statements, or symbols on which judgments of the strength of a particular trait, attitude, or emotion are indicated by the testtaker

A

Rating Scale

66
Q

when final score is obtained by summing the ratings across all the items

A

Summative Scale

67
Q

a type of summative rating scale wherein each item presents the testtaker with five alternative responses usually on agree-disagree, or approve-disapprove continuum. It is ordinal in nature

A

Likert Scale

68
Q

scaling method whereby one of a pair of stimuli (such
as photos) is selected according to a rule (such as –
“select the one that is more appealing”)

A

Paired Comparison

69
Q

presented with two stimuli and asked to compare

A

Paired comparison

70
Q

judging of a stimulus in comparison with every
other stimulus on the scale

A

Comparative Scaling

71
Q

testtaker places stimuli into a category; those categories differ quantitatively on a spectrum

A

Categorical Scaling

72
Q

items range from sequentially weaker to stronger expressions of attitude, belief, or feeling. A
testtaker who agrees with the stronger statement is assumed to also agree with the milder statements

A

Guttman Scale/Scalogram Analysis

73
Q

a scale wherein items range
sequentially from weaker to stronger expressions of the
attitude or belief being measured

A

Guttman Scale/Scalogram Analysis

74
Q

Developer of Guttman Scale/Scalogram Analysis

A

Louis Guttman

75
Q

direct estimation because
don’t need to transform testtaker’s response to another scale. It is presumed to be interval in nature

A

Thurstone’s Equal Appearing Intervals Method

76
Q

When devising a standardized test using a multiple-choice format, it is usually advisable that the first draft contains approximately ______ the number of items that the final version of the test will contain

A

twice

77
Q

What to consider in writing items

A
  • range of content that the items should cover
  • which item format should be employed
  • written in total and for each content area covered
78
Q

reservoir from which items will not be drawn for the final version of the test

A

Item pool

79
Q

Item pool should be about _____ the number of questions as final will have

A

double

80
Q

variables such as the form, plan, structure, arrangement and layout of individual test items

A

Item format

81
Q

the collection of items to be further evaluated for possible selection for use in an item bank

A

Item pool

82
Q

testtaker selects a response from a set of alternative responses

A

Selected-Response Format

83
Q

What type of item format is multiple choice, true-false, and matching

A

Selected-Response Format

84
Q

testtaker supplies or creates
the correct answer

A

Constructed-Response Format

85
Q

Item format that includes completion item, short answer and essay

A

constructed-response format

86
Q

constructed-response format

A

item bank

87
Q

interactive, computer-administered testtaking process wherein items presented to the testtaker are based in part on testtaker’s performance on
previous items.

A

Computerized Adaptive Testing (CAT)

88
Q

the diminished utility of an assessment tool for distinguishing testtakers at the low end of the ability, trait, or other attribute being measured

A

floor effect

89
Q

diminished utility of an assessment tool for distinguishing testtakers at the high end of the ability, trait, attribute being measured

A

ceiling effect

90
Q

ability of computer to tailor the content and order of presentation of test items on the basis of responses to previous items

A

item branching

91
Q

testtakers earn cumulative credit with regard to a particular construct

A

cummulative scoring

92
Q

testtaker responses earn credit toward placement in a particular class or category with other testtakers whose pattern of responses is presumably similar in some way

A

class/category scoring

93
Q

comparing a testtaker’s score on one within a test to
another scale within that same test

A

ipsative scoring

94
Q

John’s need for achievement is higher than his need for affiliation

A

ipsative scoring

95
Q

offers two alternatives for each item

A

dichotomous format

96
Q

resembles the dichotomous format except that each item has more than two alternatives

A

polytomous format

97
Q

incorrect choices in multiple choice

A

distractors

98
Q

describes the chances that a low-ability test taker will obtain each score

A

guessing threshold

99
Q

uses more choices than Likert; 10-point rating scale

A

category format

100
Q

respondent is given a 100-millimeter line and asked to place a mark between two well-defined endpoints. It measures self-rate healt

A

Visual analogue scale

101
Q

subject receives a long list of adjectives and indicates whether each one is characteristic of himself or herself

A

adjective scale

102
Q

Obtained by calculating the proportion of the total number of testtakers who answered the item correctly “p”

A

Item-Difficulty Index

103
Q

Higher p indicates

A

easier items

104
Q

Difficulty can be replaced with _________________in non-achievement tests

A

endorsement

105
Q
  • Indication of the internal consistency of a test
  • Equal to the product of the item-score standard deviation (s) and the
    correlation (r)
  • Factor analysis and inter-item consistency
A

item Reliability Index

106
Q

Statistic designed to provide an indication of the degree to which a test is measuring what it purports to measure. It requires: item-score standard deviation, the correlation between the
item score and criterion score

A

Item-Validity Index

107
Q

means greater number of high scorers answering the item correctly

A

higher d

108
Q

means low-scoring examinees are more likely to answer the item correctly than high-scoring examinees

A

negative d

109
Q

compares performance on a particular item with performance in the upper and lower regions of a distribution of continuous test scores

A

Item-Discrimination Index

110
Q

Graphic representation of item difficulty and discrimination

A

Item-Characteristic Curves

111
Q

techniques of data generation and analysis that rely primarily on verbal rather than mathematical or statistical procedures

A

Qualitative method

112
Q

various nonstatistical procedures designed to explore how individual test items work

A

Qualitative item analysis

113
Q
  • approach to cognitive assessment that entails respondents vocalizing thoughts as they occur
  • used to shed light on the testtker’s though processes during the administration of a test
A

“Think aloud” test administration

114
Q

study of test items in which they are examined for fairness to all prospective testtakers as well as for the presence of offensive language, stereotypes, or
situations

A

Sensitivity Review

115
Q

Find the correlation between performance on the item and performance on the total test

A

The Point Biserial Method

116
Q

Correlation between a dichotomous variable and a continuous variable

A

point biserial correlation

117
Q

revalidation of a test on a sample of testtakers other than those on whom test performance was originally found to be a valid predictor of some criterion

A

Cross-validation

118
Q

decrease in item validities that inevitably occurs after cross-validation of finding

A

Validity Shrinkage

119
Q

test validation process conducted on two or
more tests using the same sample of testtakers

A

Co-validation

120
Q

when co-validation is used in conjunction with the creation of norms or the revision of existing norms

A

Co-norming

121
Q

test protocol scored by a
highly authoritative scorer that is designed as a model for scoring and a mechanism for resolving scoring discrepancies

A

anchor protocol

122
Q

a discrepancy between scoring in an anchor protocol and the scoring of another protocol

A

scoring drift

123
Q

phenomenon, wherein an item functions differently in one group of testtakers as compared to another group of testtakers known to have the same level of the underlying trait

A

Differential item functioning (DIF)

124
Q

(level of difficulty) optimal average item difficulty (whole test)

A

0.5

125
Q

(level of difficulty) average item difficulty on individual items

A

0.3 to 0.8

126
Q

(level of difficulty) true or false

A

0.75

127
Q

(level of difficulty) multiple choice (4 choices)

A

0.625