Test Development Flashcards

(63 cards)

1
Q

what is an umbrella term for all that goes into the process of creating a test?

A

test development

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are the 5 stages of developing a test?

A
  1. Test Conceptualization
  2. Test Construction
  3. Test Tryout
  4. Item Analysis
  5. Test Revision
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

this stage of test development involves statistical procedures employed to assist in making judgments about which items are good as they are, which items need to be revised, and which items should be discarded

A

item analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

this stage of test development entails writing test items (or rewriting or revising existing items), as well as formatting items, setting scoring rules, and otherwise designing and building a test

A

Test Construction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

in this type of test, items should measure whether test-takers meet specific criteria, regardless of their position relative to others.

Success is defined by meeting set criteria, not by ranking.

A

criterion-referenced tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

this stage of test development refers to action taken to modify a test’s content or format for the purpose of improving the test’s effectiveness as a tool of measurement

A

test revision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

in this type of test, items are deemed “good” if high scorers answer correctly and low scorers answer incorrectly.

A

norm-referenced tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

this term refers to the preliminary research and testing around the creation of a test prototype

A

pilot work

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the purpose of pilot work/studies?

A

pilot studies help evaluate the potential test items to determine their suitability for the final version of the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

this term refers to the process of setting rules for assigning numbers or indices to measure different amounts of a trait, attribute, or characteristic.

A

scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are the different types of scaling methods?

A
  1. Rating Scale
  2. Likert Scale
  3. Method of Paired Comparisons
  4. Comparative Scaling
  5. Categorical Scale
  6. Guttman Scale
  7. Thurstone Scale
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

this type of scale is a summative scale where test-takers rate the strength of a trait, attitude, or emotion

A

rating scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

this type of scale is commonly used in psychology for attitudes, providing options on a continuum

by asking respondents to rate their agreement with a statement

A

likert scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what does a unidimensional rating mean?

A

the scale only measures one underlying dimension

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what does a multidimensional rating mean?

A

the scale measures multiple dimensions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

this type of scale produces ordinal data by comparing stimuli

presents two items at time, asking respondents to choose one based on a specific criterion

A

method of paired comparisons/paired comparison scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

this type of scale involves judgement of stimuli in relation to others on the scale

rating an item relative to a benchmark or another item on the scale

A

comparative scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

this term refers to the collection of potential test items that will be refined for the final test

A

Item Pool

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

this type of item-format requires choosing an answer from given options
(e.g., multiple-choice, true-false)

A

selected-response format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are the two types of item formats?

A
  1. Selected-Response Format
  2. Constructed-Response Format
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

this type of items include a “stem,” a correct option, and distractors.

A

multiple-choice items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

this type of items include two possible responses, such as true/false

A

binary-choice item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

this type of item involves matching premises with correct responses

A

matching items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

this term refers to interactive testing where item selection depends on previous answers, reducing floor and ceiling effects

A

computerized adaptive testing (CAT)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
this type of item format requires creating or supplying an answer
constructed-response format
21
this term refers to a large, accessible database of questions for computerized tests
item bank
22
this effect limits distinguishing low-ability test-takers
floor effect
23
this term refers to tailoring item content and order based on previous responses.
item branching
23
this effect limits distinguishing high-ability test-takers.
ceiling effect
23
this type of scoring involves how higher scores indicate higher ability or presence of the trait being measured.
cumulative model
24
what does discriminative ability refer to?
it refers to how a quality test item effectively differentiates between high and low scorers, with high scorers likely to answer correctly or as expected
24
this tool of item analysis measures the proportion of test-takers who answered an item correctly, denoted as p (e.g., p1 for item 1). A higher p indicates an easier item
item difficulty index
24
this type of scoring compares scores within different scales of the same test, focusing on internal comparisons rather than across individuals
ipsative scoring
25
this type of scoring involves responses that assign test-takers to specific classes or categories
class scoring (category scoring)
25
what is a needed characteristic of a good item?
Discriminative Ability
25
what are the tools for item analysis?
1. Item Difficulty Index 2. Item Reliability Index 3. Item Validity Index 4. Item Discrimination Index
26
this tool for item analysis indicates internal consistency of a test. Calculated as the product of the item-score standard deviation (s) and the correlation (r) between the item score and the total test score.
item reliability index
27
this tool of item analysis reflects how well an item measures what it is intended to measure, determined by the item-score standard deviation and the correlation between the item score and the criterion score
item validity index
28
this tool of item analysis measures how well an item differentiates between high and low scorers on the test, denoted by d.
item discrimination index
29
this term refers to whether a test item is biased against certain groups when controlling for group ability
item fairness
29
this term refers to the graphic representations that illustrate item difficulty and discrimination. A steeper slope indicates greater discrimination ability of the item
item-characteristic curves
30
what is the threshold for item discrimination?
greater than 0.19
31
what is the threshold for item difficulty?
0.20-0.80
31
what is the threshold for item reliability?
greater than 0.75
32
what are the two ways of defining a construct in test development?
1. a priori assumption 2. pilot work (qualitative preliminary work)
32
this stage of test development refers to the process of defining the construct to be measured and setting its parameters. This process also includes preliminary decisions about who, what, when, where, and why aspects of the test
test conceptualization
33
this scaling method involves rating an item relative to a benchmark or another item on the scale. Evaluating a patient's anxiety level by comparing it to their typical anxiety level before treatment, using ratings such as "much lower," "slightly lower," "the same," "slightly higher," or "much higher."
comparative scale
33
this scaling method presents two items at a time, asking respondents to choose one based on a specific criterion Example: Assessing preferences for different stress-reduction techniques by presenting pairs such as "meditation vs. exercise" and asking individuals to choose which they find more effective.
paired comparison scale
34
this scaling method uses equal-appearing intervals to measure attitudes. Experts assign values to statements, and respondents select the statements they agree with. The average value of these statements becomes the respondent's score. Example: Measuring attitudes toward psychotherapy where experts rate a set of statements. Respondents then select statements that align with their views, and their attitude score is the average scale value of the statements they endorse.
Thurstone Scale
34
this scaling method measures the extent to which individuals possess a particular attitude or characteristic. The items are arranged in a cumulative order, such that agreeing with a higher-level item implies agreement with all lower-level items. Example: Measuring social tolerance with items ranging from "willing to live in the same country" to "willing to marry" someone from a different racial group.
Guttman Scale
34
what scaling methods are the most flexible?
1. Rating Scale 2. Likert Scale
35
this scaling method uses categories or labels to classify responses. It is nominal, meaning the categories have no inherent order or numerical value.
categorical scale
36
what scaling methods are usually associated with behaviors?
1. Paired Comparison 2. Comparative Scale 3. Categorical
36
what scaling methods are associated with attitudes?
1. Guttman Scale 2. Thurstone Scale
37
what are the designations for the item difficulty index?
% correct 0-20: Very Difficult 21-60: Difficult 61-90: Moderately Difficult 91-100: Easy
38
what are the levels/designations for the item discrimination index?
<0.19 = Poor item, should be eliminated or needed to be revised 0.20-0.29 = Marginal item, needs some revision 0.30-0.39 = Reasonably good item but possibly for improvement >0.40 = very good item >0.50 = ideal
38
what is an unacceptable range for item difficulty index?
<0.20 or >0.80
39
what are the designations for the item reliability and validity index?
Threshold : >0.75 (r) and 0.75 (v) Unacceptable: <0.20 Marginal: 0.21-0.40 Reasonable: 0.41-0.74 Ideal: >0.74
40
this term refers to the revalidation of a test on a sample other than those on whom it was first found to be a predictor [validity shrinkage]
cross-validation
40
this term refers to a test protocol scored by a highly authoritative scorer designed as a model for scoring and resolving discrepancies
anchor protocol
40
this term refers to the validation process conducted on two or more tests using the same sample of test-takers.
co-validation
41
what does it mean when there are negative item value discrimination indices?
the LG is better than the UG in that item
41
this refers to a neurological disorder marked by involuntary episodes of laughing or crying, often without an appropriate trigger.
Pseudobulbar Affect (PBA)