Chapter 6: Test Construction Flashcards

1
Q

psychological test

A

set of items that allows measurement of some attribute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Items

A
  • the various forms that the contents of a psychological test can take

o problems where you must find correct answers (attribute= ability of the person)

o questions about the way the individual typically behaves, feels or thinks, (attribute= personality characteristic).

o Other types of items will be appropriate for other types of attributes;
 an expression of a sentiment where an attitude the person holds is the attribute, or a statement of preference where the attribute is an interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

empirical approach

A

a way of constructing psychological tests that relies on collecting and evaluating data about how each of the items from a pool of items discriminates between groups of respondents who are thought to show or not show the attribute the test is to measure;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

rational-empirical approach

A
  • relies on both reasoning from what is known about the psychological construct to be measured in the test

o THEN collecting and evaluating data about how the test and the items that comprise it actually behave when administered to a sample of respondents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

the processes of developing a test

A

1) test conceptualisation
2) test construction
3) test tryout
4) item analysis
5) test revision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

test conceptualisation

test specification

A

a written statement of the attribute or construct that the test constructor is seeking to measure and the conditions under which it will be used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

test conceptualisation

literature search

A

Literature search

  • establish whether or not a satisfactory test of the attribute exists.
  • A literature search, beginning with the latest Mental Measurements Yearbook, is required to establish what tests of the attribute in question have been published and what their properties are.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Test construction

Choice of a measurement model

A
  • Having decided on the attribute and the theory about it, and having determined that no suitable test is currently available
  • type of measurement: the scales of measurement proposed by Stevens; nominal, ordinal, interval and ratio
    o the type of measurement required and the model to be used to attain it.

*Measurement: the assignment of numbers to objects according to a set of rules for the purpose of quantifying an attribute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

interval scale

A

a scale that orders objects in such a way that the distances on the scale represent distances between objects

  • temperature, height etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

ratio scale

A

If a scale has the property of equal intervals but also a true zero—that is, there is a point at which the quantity is said not to exist

  • Length and mass as we commonly measure them are ratio scales.
  • minutes in a commute, weight or height are all ratios.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

model of measurement:

A

the formal statement of observations of objects mapped to numbers that represent relationships among the objects

  • Mathematical models in psychological testing represent the relationship between the attribute to be measured and the response of individuals to the items
  • This is done for a single item and can be represented by a trace line or an item characteristic curve.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

trace line:

A

a graph of the probability of the response to an item as a function of the strength of or position on a latent trait

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

item characteristic curve

A

trace line in item response theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

classical test theory*

A

classical test theory

  • ‘weak’ true score
  • the set of ideas, expressed mathematically and statistically, that grew out of attempts in the first half of the twentieth century to measure psychological variables;
  • central idea of a score on a psychological test comprising both true and error score components
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Item response theory

A
  • a family of theories that seek to specify the functional relationship between responses to a psychological test item and the strength of the underlying latent trait;

o some attempt to estimate only one parameter of the function; where the ICC is positioned above the horizontal axis. –> referred to as 1PL (one parameter logistic) models

o (2PL) that attempt to estimate both position along the X axis and rate of rise of the function.

oA 3PL model attempts to estimate both position and rate of rise parameters, as well as how far up the vertical axis the ICC begins.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The Rasch Model

A

an example of a 1PL IRT model
- The Undergraduate Medical and Health Sciences Admission Test (UMAT) used to assist with the selection of students into the medicine, dentistry and health science degree programs at undergraduate level at a number of Australian universities was based on the Rasch model.

17
Q

Benefits of IRT over CTT

A
  • IRT - a better level of measurement (genuine interval measurement) and determines whether this is achieved or only claimed.
  • When test items are known to fit a model, rather than administering the entire test to all testees, a few items can be used to identify the position of a respondent on the underlying trait, and this assessment can then be refined by choosing items appropriate to that trait position.
  • RT makes possible a more searching examination of differential validity.
18
Q

Test construction: item development

plan for item writing

A
  • a plan of the number and type of items that are required for a test, as indicated in the test specification
  • A blueprint/plan for item writing stipulating the number of items, the types of items and the areas the items are to be drawn from  requiring adequate attribute specification.
19
Q

ITEM WRITING GUIDELINES

A
  • straight forward language that is appropriate for the reading level of the population
    o b/w 5th and 7th Grade reading levels (10 to 13-year-old).
  • Avoid double barrelled items
    o e.g. I support civil rights because discrimination is a crime against God.
  • Avoid slang and colloquial expressions that may quickly become obsolete

-Consider if using positively and negatively worded items is a good idea
o Positive: guards against response sets
o Negative: reversal may be confusing for test takers

  • Write items that majority of respondents can respond to appropriately
    o E.g. ‘I get tired after soccer’ vs. ‘I get tired after exercise’
  • Ask about sensitive issues using straightforward and nonjudgemental language
  • Choose the item response carefully (discussed soon)
  • Be sure that the phrasing is consistent with the response options!
20
Q

Likert Scale

A

provides the test-taker with 5 or 7 possible responses along a continuum

  • Easy to construct and used very extensively in psychology because it yields ordinal-level data, which approximates interval-level data well enough for data analytic purposes
L I K E R T P R O S 
o	 Degree of trait can be measured 
o	Lots of information 
o	 Easy to use and administer 
o	Works best with strong (but not extreme) statements 

L I K E R T C O N S
o Number of response options need to be considered
o Odd vs even number of responses

21
Q

Binary choice scale

A

true/false • yes/no

-	B I N A R Y P R O S 
o	 Easy to construct 
o	Easy to score 
o	 Quick to administer 
o	 Large number of questions 
B I N A R Y C O N S 
-	 Allows guessing (T/F) 
-	 Only suits content where a dichotomous response can be made 
-	Content not as rich
22
Q

Paired comparions

A
  • Test-taker has to choose one of two options (e.g., a statement, object, picture) on the basis of some rule
  • The value (e.g., 1 or 0) of each option in each paired comparison is determined by judges prior to test administration
23
Q

Comparative scaling

A

-Sorting or ranking stimuli (e.g., statements, objects, photographs, etc.) according to some rule (e.g., best to worst, most justifiable to least justifiable, etc.)

24
Q

Editing

A
  • the items chosen are now edited by the test developer against the principles specified and a question order established.
  • Items of a particular type or content area are usually kept together in a test, (most difficult items on cognitive tests are included towards the end to maintain motivation early in the test)
  • answer key specifies which answer to a question signifies the construct in question
  • Comprehensibility of the item pool, and correctness of the answer key –> checked by an expert panel,
25
Q

Pilot testing

A
  • pilot testing with a sample from the population
  • interest lies in the reactions of the sample to each of the questions, and can be achieved using focus groups, where difficulties with the items—ranging from wording to cultural appropriateness—can be identified.
26
Q

Test development:

A

Tryout

Item Analysis

27
Q

Test development

TEST TRYOUT

A
  • Administer the test on a representative sample
  • The recommended sample size varies - usually 100+
  • Use standardised instructions
  • The test ‘tryout’ data is then used to narrow down the number of items
28
Q

Test Development

Item Analysis

A
  • the process of studying the behaviour of items when administered to a group of respondents, usually with a view to the selection of some of the items to form a psychological test
  • The items will have been reviewed and edited prior to administration using local experts and a pilot (i.e. a small-scale sample) study with members of the intended population.

Qualitative
- larger sample would be asked to comment on the items (readability, comprehensibility, clarity and apparent strangeness.

Quantitative

  • how the items ‘behave’ when people are asked to complete them.
  • depends on the measurement model but typically the focus is on item difficulty and item discrimination
  • reliability and validity
  • Dimensionality (i.e. factor analysis)  whether all the items within the scale are measuring the same underlying construct or latent variable; or whether there is more than one construct
29
Q

item validity

A

the correlation between an item and score on an external criterion being used to validate the test.

30
Q

Assessing reliability and validity

A

Two techniques used for CTT measurement model

o exploratory factor analysis - If only one construct has been targeted in the test then EFA should show one strong factor.
o If more than one construct is being examined in the test, then more than one factor should emerge with notable factor loadings for items as specified in the construct specification and test plan.

oCronbach’s alpha would be calculated and evaluated in the light of guidelines for test use

Using a number of representative samples allows one to check the replicability of the findings and provides increased confidence that the decisions being made about the test are sound.

31
Q

Factor analysis

A

recommended in all scale development processes (psychological)

  • New scale development usually starts with exploratory factor analysis (EFA) to identify a manageable number of factors to extract
  • Confirmatory factor analysis (CFA) used when number of factors is known.

HELPS:

  • Determine the number of underlying latent variables or constructs
  • condense information (find out what is actually the underlying factors)
  • Define the content or meaning of the factors
  • identify items that are performing better or worse
  • Items that do not fit into any factor, or those that fit into more than one can be considered for elimination

Number of factors to extract
o Eigenvalues (> 1) ; those factors that are greater than one, you usually consider retaining as a factor
o Scree Plot
Rotation
- Helps interpret the data
- Oblique: assumes factors are correlated
- Orthogonal: assumes factors are uncorrelated

32
Q

reliability of scales

A

Are the items ‘homogenous?’
 Correlation between the score for the test item and the scale score (item-scale correlations)
 Inter-relatedness of the test items (Cronbach’s alpha)
Can help to identify items to be discarded
 ‘Outlier’ items
 Items that are incongruous with the test

33
Q

TEST REVISION

A
Test has been: 
-	 Conceptualised 
-	 Constructed 
-	Tried 
-	Analysed 
-	Revision: 
o	A stage in new test development 
o	A stage in modifying an existing test

Those checks

  • Internal Consistency (IT needs to stay the same across new and old test)
  • Factor Analysis
  • Cross-Validation
  • Collection of additional criterion-related validity data
  • Does the test predict the criterion in the new sample as well as it did in the old?
34
Q

Cross Validation

A
  • Is the test applicable to this population?
  • Realistic test manual information
    Validity Shrinkage:
  • Often lower validity the second time around
  • Inevitable
  • Generally a slight difference
  • Eliminating chance results
  • Near enough is good enough!
35
Q

Norming the test

A
  • Norming the test in a representative population – General population vs specific population (i.e. cultural group, patients diagnosed with a X disorder) will depend on the use of the test – Often stratified by age and gender

oExplicit  norms are prepared in such a way that these variables are identified in the tables that are prepared.

  • Creating a test manual/instructions
  • Publication