CHAPTER 8: TEST DEVELOPMENT Flashcards
(73 cards)
It is an umbrella term for all that goes into the process of creating a test.
Test Development
It is where the idea is conceived.
Test Conceptualization
A stage in the test development process that entails writing test items (or re-writing or revising existing items), formatting items, setting scoring rules, and otherwise designing and building a test.
Test Construction
It is done after the form of the test that has been developed, and it is administered to a representative sample of test takers under conditions that simulate the conditions that the final version of the test.
Test Tryout
A process where statistical procedures are employed to assist in making judgments about which items are good as they are, which items need to be revised, and which items should be discarded. The analysis of the test’s items may include analyses of item reliability, item validity, and item discrimination. Depending on the type of test, item-difficulty level may be analyzed as well.
Item Analysis
It refers to action taken to modify a test’s content or format for the purpose of improving the test’s effectiveness as a tool of measurement.
Test Revision
The development of test items differs based on whether a test is norm-referenced or criterion-referenced. In norm-referenced tests, good items are those that effectively differentiate between high and low performers, aiming to rank individuals relative to each other. In contrast, criterion-referenced tests are designed to determine whether an individual has mastered specific knowledge or skills, regardless of how others perform. Item development in criterion-referenced tests focuses on clearly measuring mastery of defined criteria, often used in contexts like licensing exams or educational assessments, where competence is the goal, not comparison.
Norm-Referenced vs. Criterion-Referenced Tests: Item Development Issues
It is also known as, pilot study, and pilot research, which refers to the preliminary research surrounding the creation of a prototype of the test.
Pilot Work
It may be defined as the process of setting rules for assigning numbers in measurement. It is the process by which a measuring device is designed and calibrated and by which numbers (or other indices)—scale values—are assigned to different amounts of the trait, attribute, or characteristic being measured.
Scaling
He significantly advanced the field of psychological measurement by introducing methodologically rigorous scaling methods. He was one of the first to adapt psychophysical techniques to assess psychological constructs like attitudes and values.
L. L. Thurstone
It is a procedure for obtaining a measure of item difficulty across samples of test takers who vary in ability.
The notion of absolute scaling
Categorizes data without any order (e.g., gender, diagnosis type).
Nominal scale
Ranks data in order, but intervals between ranks are not equal (e.g., class ranking).
Ordinal scale
Equal intervals between points, but no true zero (e.g., IQ scores).
Interval scale
Equal intervals with an absolute zero point (e.g., reaction time, weight).
Ratio scale
Measures performance in relation to age (e.g., developmental milestones).
Age-based scale
Measures performance in relation to educational grade level (e.g., reading level).
Grade-based scale
Raw scores are converted into a 1–9 scale, with a mean of 5 and a standard deviation of ~2.
Stanine scale
Measures a single trait or construct (e.g., depression).
Unidimensional scale
Measures multiple traits or constructs (e.g., Big Five personality traits).
Multidimensional scale
Requires respondents to compare items or choose between options (e.g., forced-choice format).
Comparative scale
Assigns responses to distinct, labeled categories (e.g., yes/no, agree/disagree).
Categorical scale
It can be defined as a grouping of words, statements, or symbols on which judgments of the strength of a particular trait, attitude, or emotion are indicated by the test taker.
Rating Scale
It was developed to be “a practical means of assessing what people believe, the strength of their convictions, as well as individual differences in moral tolerance”, and it contains 30 items.
Morally Debatable Behaviors Scale–Revised (MDBS-R)