Chapter 6: Test Construction Flashcards
(35 cards)
psychological test
set of items that allows measurement of some attribute
Items
- the various forms that the contents of a psychological test can take
o problems where you must find correct answers (attribute= ability of the person)
o questions about the way the individual typically behaves, feels or thinks, (attribute= personality characteristic).
o Other types of items will be appropriate for other types of attributes;
an expression of a sentiment where an attitude the person holds is the attribute, or a statement of preference where the attribute is an interest
empirical approach
a way of constructing psychological tests that relies on collecting and evaluating data about how each of the items from a pool of items discriminates between groups of respondents who are thought to show or not show the attribute the test is to measure;
rational-empirical approach
- relies on both reasoning from what is known about the psychological construct to be measured in the test
o THEN collecting and evaluating data about how the test and the items that comprise it actually behave when administered to a sample of respondents
the processes of developing a test
1) test conceptualisation
2) test construction
3) test tryout
4) item analysis
5) test revision
test conceptualisation
test specification
a written statement of the attribute or construct that the test constructor is seeking to measure and the conditions under which it will be used
test conceptualisation
literature search
Literature search
- establish whether or not a satisfactory test of the attribute exists.
- A literature search, beginning with the latest Mental Measurements Yearbook, is required to establish what tests of the attribute in question have been published and what their properties are.
Test construction
Choice of a measurement model
- Having decided on the attribute and the theory about it, and having determined that no suitable test is currently available
- type of measurement: the scales of measurement proposed by Stevens; nominal, ordinal, interval and ratio
o the type of measurement required and the model to be used to attain it.
*Measurement: the assignment of numbers to objects according to a set of rules for the purpose of quantifying an attribute
interval scale
a scale that orders objects in such a way that the distances on the scale represent distances between objects
- temperature, height etc.
ratio scale
If a scale has the property of equal intervals but also a true zero—that is, there is a point at which the quantity is said not to exist
- Length and mass as we commonly measure them are ratio scales.
- minutes in a commute, weight or height are all ratios.
model of measurement:
the formal statement of observations of objects mapped to numbers that represent relationships among the objects
- Mathematical models in psychological testing represent the relationship between the attribute to be measured and the response of individuals to the items
- This is done for a single item and can be represented by a trace line or an item characteristic curve.
trace line:
a graph of the probability of the response to an item as a function of the strength of or position on a latent trait
item characteristic curve
trace line in item response theory
classical test theory*
classical test theory
- ‘weak’ true score
- the set of ideas, expressed mathematically and statistically, that grew out of attempts in the first half of the twentieth century to measure psychological variables;
- central idea of a score on a psychological test comprising both true and error score components
Item response theory
- a family of theories that seek to specify the functional relationship between responses to a psychological test item and the strength of the underlying latent trait;
o some attempt to estimate only one parameter of the function; where the ICC is positioned above the horizontal axis. –> referred to as 1PL (one parameter logistic) models
o (2PL) that attempt to estimate both position along the X axis and rate of rise of the function.
oA 3PL model attempts to estimate both position and rate of rise parameters, as well as how far up the vertical axis the ICC begins.
The Rasch Model
an example of a 1PL IRT model
- The Undergraduate Medical and Health Sciences Admission Test (UMAT) used to assist with the selection of students into the medicine, dentistry and health science degree programs at undergraduate level at a number of Australian universities was based on the Rasch model.
Benefits of IRT over CTT
- IRT - a better level of measurement (genuine interval measurement) and determines whether this is achieved or only claimed.
- When test items are known to fit a model, rather than administering the entire test to all testees, a few items can be used to identify the position of a respondent on the underlying trait, and this assessment can then be refined by choosing items appropriate to that trait position.
- RT makes possible a more searching examination of differential validity.
Test construction: item development
plan for item writing
- a plan of the number and type of items that are required for a test, as indicated in the test specification
- A blueprint/plan for item writing stipulating the number of items, the types of items and the areas the items are to be drawn from requiring adequate attribute specification.
ITEM WRITING GUIDELINES
- straight forward language that is appropriate for the reading level of the population
o b/w 5th and 7th Grade reading levels (10 to 13-year-old). - Avoid double barrelled items
o e.g. I support civil rights because discrimination is a crime against God. - Avoid slang and colloquial expressions that may quickly become obsolete
-Consider if using positively and negatively worded items is a good idea
o Positive: guards against response sets
o Negative: reversal may be confusing for test takers
- Write items that majority of respondents can respond to appropriately
o E.g. ‘I get tired after soccer’ vs. ‘I get tired after exercise’ - Ask about sensitive issues using straightforward and nonjudgemental language
- Choose the item response carefully (discussed soon)
- Be sure that the phrasing is consistent with the response options!
Likert Scale
provides the test-taker with 5 or 7 possible responses along a continuum
- Easy to construct and used very extensively in psychology because it yields ordinal-level data, which approximates interval-level data well enough for data analytic purposes
L I K E R T P R O S o Degree of trait can be measured o Lots of information o Easy to use and administer o Works best with strong (but not extreme) statements
L I K E R T C O N S
o Number of response options need to be considered
o Odd vs even number of responses
Binary choice scale
true/false • yes/no
- B I N A R Y P R O S o Easy to construct o Easy to score o Quick to administer o Large number of questions B I N A R Y C O N S - Allows guessing (T/F) - Only suits content where a dichotomous response can be made - Content not as rich
Paired comparions
- Test-taker has to choose one of two options (e.g., a statement, object, picture) on the basis of some rule
- The value (e.g., 1 or 0) of each option in each paired comparison is determined by judges prior to test administration
Comparative scaling
-Sorting or ranking stimuli (e.g., statements, objects, photographs, etc.) according to some rule (e.g., best to worst, most justifiable to least justifiable, etc.)
Editing
- the items chosen are now edited by the test developer against the principles specified and a question order established.
- Items of a particular type or content area are usually kept together in a test, (most difficult items on cognitive tests are included towards the end to maintain motivation early in the test)
- answer key specifies which answer to a question signifies the construct in question
- Comprehensibility of the item pool, and correctness of the answer key –> checked by an expert panel,