Chapter 6: Test Construction Flashcards

Question 1

Q

psychological test

Answer

A

set of items that allows measurement of some attribute

Question 2

Q

Items

Answer

A

the various forms that the contents of a psychological test can take

o problems where you must find correct answers (attribute= ability of the person)

o questions about the way the individual typically behaves, feels or thinks, (attribute= personality characteristic).

o Other types of items will be appropriate for other types of attributes;
 an expression of a sentiment where an attitude the person holds is the attribute, or a statement of preference where the attribute is an interest

Question 3

Q

empirical approach

Answer

A

a way of constructing psychological tests that relies on collecting and evaluating data about how each of the items from a pool of items discriminates between groups of respondents who are thought to show or not show the attribute the test is to measure;

Question 4

Q

rational-empirical approach

Answer

A

relies on both reasoning from what is known about the psychological construct to be measured in the test

o THEN collecting and evaluating data about how the test and the items that comprise it actually behave when administered to a sample of respondents

Question 5

Q

the processes of developing a test

Answer

A

1) test conceptualisation
2) test construction
3) test tryout
4) item analysis
5) test revision

Question 6

Q

test conceptualisation

test specification

Answer

A

a written statement of the attribute or construct that the test constructor is seeking to measure and the conditions under which it will be used

Question 7

Q

test conceptualisation

literature search

Answer

A

Literature search

establish whether or not a satisfactory test of the attribute exists.
A literature search, beginning with the latest Mental Measurements Yearbook, is required to establish what tests of the attribute in question have been published and what their properties are.

Question 8

Q

Test construction

Choice of a measurement model

Answer

A

Having decided on the attribute and the theory about it, and having determined that no suitable test is currently available
type of measurement: the scales of measurement proposed by Stevens; nominal, ordinal, interval and ratio
o the type of measurement required and the model to be used to attain it.

*Measurement: the assignment of numbers to objects according to a set of rules for the purpose of quantifying an attribute

Question 9

Q

interval scale

Answer

A

a scale that orders objects in such a way that the distances on the scale represent distances between objects

temperature, height etc.

Question 10

Q

ratio scale

Answer

A

If a scale has the property of equal intervals but also a true zero—that is, there is a point at which the quantity is said not to exist

Length and mass as we commonly measure them are ratio scales.
minutes in a commute, weight or height are all ratios.

Question 11

Q

model of measurement:

Answer

A

the formal statement of observations of objects mapped to numbers that represent relationships among the objects

Mathematical models in psychological testing represent the relationship between the attribute to be measured and the response of individuals to the items
This is done for a single item and can be represented by a trace line or an item characteristic curve.

Question 12

Q

trace line:

Answer

A

a graph of the probability of the response to an item as a function of the strength of or position on a latent trait

Question 13

Q

item characteristic curve

Answer

A

trace line in item response theory

Question 14

Q

classical test theory*

Answer

A

classical test theory

‘weak’ true score
the set of ideas, expressed mathematically and statistically, that grew out of attempts in the first half of the twentieth century to measure psychological variables;
central idea of a score on a psychological test comprising both true and error score components

Question 15

Q

Item response theory

Answer

A

a family of theories that seek to specify the functional relationship between responses to a psychological test item and the strength of the underlying latent trait;

o some attempt to estimate only one parameter of the function; where the ICC is positioned above the horizontal axis. –> referred to as 1PL (one parameter logistic) models

o (2PL) that attempt to estimate both position along the X axis and rate of rise of the function.

oA 3PL model attempts to estimate both position and rate of rise parameters, as well as how far up the vertical axis the ICC begins.

Question 16

Q

The Rasch Model

Answer

A

an example of a 1PL IRT model
- The Undergraduate Medical and Health Sciences Admission Test (UMAT) used to assist with the selection of students into the medicine, dentistry and health science degree programs at undergraduate level at a number of Australian universities was based on the Rasch model.

Question 17

Q

Benefits of IRT over CTT

Answer

A

IRT - a better level of measurement (genuine interval measurement) and determines whether this is achieved or only claimed.
When test items are known to fit a model, rather than administering the entire test to all testees, a few items can be used to identify the position of a respondent on the underlying trait, and this assessment can then be refined by choosing items appropriate to that trait position.
RT makes possible a more searching examination of differential validity.

Question 18

Q

Test construction: item development

plan for item writing

Answer

A

a plan of the number and type of items that are required for a test, as indicated in the test specification
A blueprint/plan for item writing stipulating the number of items, the types of items and the areas the items are to be drawn from  requiring adequate attribute specification.

Question 19

Q

ITEM WRITING GUIDELINES

Answer

A

straight forward language that is appropriate for the reading level of the population
o b/w 5th and 7th Grade reading levels (10 to 13-year-old).
Avoid double barrelled items
o e.g. I support civil rights because discrimination is a crime against God.
Avoid slang and colloquial expressions that may quickly become obsolete

-Consider if using positively and negatively worded items is a good idea
o Positive: guards against response sets
o Negative: reversal may be confusing for test takers

Write items that majority of respondents can respond to appropriately
o E.g. ‘I get tired after soccer’ vs. ‘I get tired after exercise’
Ask about sensitive issues using straightforward and nonjudgemental language
Choose the item response carefully (discussed soon)
Be sure that the phrasing is consistent with the response options!

Question 20

Q

Likert Scale

Answer

A

provides the test-taker with 5 or 7 possible responses along a continuum

Easy to construct and used very extensively in psychology because it yields ordinal-level data, which approximates interval-level data well enough for data analytic purposes

L I K E R T P R O S 
o	 Degree of trait can be measured 
o	Lots of information 
o	 Easy to use and administer 
o	Works best with strong (but not extreme) statements

L I K E R T C O N S
o Number of response options need to be considered
o Odd vs even number of responses

Question 21

Q

Binary choice scale

Answer

A

true/false • yes/no

-	B I N A R Y P R O S 
o	 Easy to construct 
o	Easy to score 
o	 Quick to administer 
o	 Large number of questions 
B I N A R Y C O N S 
-	 Allows guessing (T/F) 
-	 Only suits content where a dichotomous response can be made 
-	Content not as rich

Question 22

Q

Paired comparions

Answer

A

Test-taker has to choose one of two options (e.g., a statement, object, picture) on the basis of some rule
The value (e.g., 1 or 0) of each option in each paired comparison is determined by judges prior to test administration

Question 23

Q

Comparative scaling

Answer

A

-Sorting or ranking stimuli (e.g., statements, objects, photographs, etc.) according to some rule (e.g., best to worst, most justifiable to least justifiable, etc.)

Question 24

Q

Editing

Answer

A

the items chosen are now edited by the test developer against the principles specified and a question order established.
Items of a particular type or content area are usually kept together in a test, (most difficult items on cognitive tests are included towards the end to maintain motivation early in the test)
answer key specifies which answer to a question signifies the construct in question
Comprehensibility of the item pool, and correctness of the answer key –> checked by an expert panel,

Question 25

Q

Pilot testing

Answer

A

pilot testing with a sample from the population
interest lies in the reactions of the sample to each of the questions, and can be achieved using focus groups, where difficulties with the items—ranging from wording to cultural appropriateness—can be identified.

Question 26

Q

Test development:

Answer

A

Tryout

Item Analysis

Question 27

Q

Test development

TEST TRYOUT

Answer

A

Administer the test on a representative sample
The recommended sample size varies - usually 100+
Use standardised instructions
The test ‘tryout’ data is then used to narrow down the number of items

Question 28

Q

Test Development

Item Analysis

Answer

A

the process of studying the behaviour of items when administered to a group of respondents, usually with a view to the selection of some of the items to form a psychological test
The items will have been reviewed and edited prior to administration using local experts and a pilot (i.e. a small-scale sample) study with members of the intended population.

Qualitative
- larger sample would be asked to comment on the items (readability, comprehensibility, clarity and apparent strangeness.

Quantitative

how the items ‘behave’ when people are asked to complete them.
depends on the measurement model but typically the focus is on item difficulty and item discrimination
reliability and validity
Dimensionality (i.e. factor analysis)  whether all the items within the scale are measuring the same underlying construct or latent variable; or whether there is more than one construct

Question 29

Q

item validity

Answer

A

the correlation between an item and score on an external criterion being used to validate the test.

Question 30

Q

Assessing reliability and validity

Answer

A

Two techniques used for CTT measurement model

o exploratory factor analysis - If only one construct has been targeted in the test then EFA should show one strong factor.
o If more than one construct is being examined in the test, then more than one factor should emerge with notable factor loadings for items as specified in the construct specification and test plan.

oCronbach’s alpha would be calculated and evaluated in the light of guidelines for test use

Using a number of representative samples allows one to check the replicability of the findings and provides increased confidence that the decisions being made about the test are sound.

Question 31

Q

Factor analysis

Answer

A

recommended in all scale development processes (psychological)

New scale development usually starts with exploratory factor analysis (EFA) to identify a manageable number of factors to extract
Confirmatory factor analysis (CFA) used when number of factors is known.

HELPS:

Determine the number of underlying latent variables or constructs
condense information (find out what is actually the underlying factors)
Define the content or meaning of the factors
identify items that are performing better or worse
Items that do not fit into any factor, or those that fit into more than one can be considered for elimination

Number of factors to extract
o Eigenvalues (> 1) ; those factors that are greater than one, you usually consider retaining as a factor
o Scree Plot
Rotation
- Helps interpret the data
- Oblique: assumes factors are correlated
- Orthogonal: assumes factors are uncorrelated

Question 32

Q

reliability of scales

Answer

A

Are the items ‘homogenous?’
 Correlation between the score for the test item and the scale score (item-scale correlations)
 Inter-relatedness of the test items (Cronbach’s alpha)
Can help to identify items to be discarded
 ‘Outlier’ items
 Items that are incongruous with the test

Question 33

Q

TEST REVISION

Answer

A

Test has been: 
-	 Conceptualised 
-	 Constructed 
-	Tried 
-	Analysed 
-	Revision: 
o	A stage in new test development 
o	A stage in modifying an existing test

Those checks

Internal Consistency (IT needs to stay the same across new and old test)
Factor Analysis
Cross-Validation
Collection of additional criterion-related validity data
Does the test predict the criterion in the new sample as well as it did in the old?

Question 34

Q

Cross Validation

Answer

A

Is the test applicable to this population?
Realistic test manual information
Validity Shrinkage:
Often lower validity the second time around
Inevitable
Generally a slight difference
Eliminating chance results
Near enough is good enough!

Question 35

Q

Norming the test

Answer

A

Norming the test in a representative population – General population vs specific population (i.e. cultural group, patients diagnosed with a X disorder) will depend on the use of the test – Often stratified by age and gender

oExplicit  norms are prepared in such a way that these variables are identified in the tables that are prepared.

Creating a test manual/instructions
Publication

Brainscape's Knowledge GenomeTM

Chapter 6: Test Construction Flashcards

Brainscape's Knowledge Genome^TM