WEEK 10 - Testing Flashcards
What are the steps in test questionnaire construction?
- Define the test
- Selecting a scaling method
- Constructing the items
- Testing the items
- Revising the test
- Publishing the test
How is a test defined?
- Test/questionaire
- Item
- Measure
- Already been a test developed?
What is an item in relation to a test?
- Generic word for various forms of content in a psychological test or questionnaire
- Measurement of attribute
- Carefully selected
In defining a test, how do you establish what you are seeking to measure?
- Develop clear idea or specification of the attribute
- Existing theory as a guide
- Write a document containing specifications for the development of items that includes:
- Clear definition of attribute
- outcome of a literature
- If more than one attribute is to be measured, a specific specification is needed for each
In defining a test, it is costly and time consuming to develop a new test/questionnaire. Where would you go to find existing mental tests?
Mental measurement yearbook:
- Commercial product released every 5 years
- Contains info about tests purpose, publisher, pricing, population and scoring
- Includes only commercially available tests and those in English
What is the Kaufman and Kaufman model of the test definition process?
- Measure attribute/construct from a strong theoretical and research basis
- Must have capacity to distinguish between different attributes
- Yield scores that are translatable to an intervention
- Include novel tasks or questions
- Be easy to administer and objective to score
- Be sensitive to the diverse needs of the groups being assessed
What are the types of data?
Categorical
- Gender
- Age band/group
- Political party
Numerical
- Discrete
- no.of children
- Assignment mark
- Coffees in one day
- Continuous
- Weight
- Voltage
- Length
What is nominal measurement?
- A group you put someone in is categorical
- Assign number based on the group the person belongs to but the numerical value is meaningless
What is ordinal measurement?
- Still categorical, but in ranking order
- Tells order but not the distance between each point
What is interval measurement?
- Where continuous data is obtained
- eg. temperature
- Equal distance between points
- no true 0
- People can provide responses according to an ordered response option scale
- Also referred to as a likest-type scale
What is a ratio measurement?
- continuous
- starting point of zero
- difference between points are meaningful
- Ratio scales are rare in psychological measurement
What is included when constructing the items of a test?
- Item format Related to scaling method of choice Dozens of choices available - Types of formats MCQ T/F Force-choice Likert
What are some limitations with MCQs?
- Difficult to construct items
- Provides cues for correct response (does not assess free recall)
What are some limitations of true/false questions?
- Answers may reflect social desirability more than personality traits
- Not much variability
What are the strengths for forced-choice methodology?
- Often used in personality tests
- Overcomes the problems of t/f questions in social desirability
What are the problems with forced-choice methodology?
People don’t always fit in either category
What are the strengths to using likest-type scales?
- One of the most used
- Can better account for individual differences
- Good for assessing attitudes and perceptions
- Reduces desirability bias
What are some problems with the likest-type scales?
- is it consistently measuring the construct in question
- Are all of the item appropriate and contributing to the overall interpretation of the test
- Assumes strength, intensity of an attitude is linear
- People don’t always fit Ito specified option
- Social desirability can still occur
How may you test the items in a questionnaire to make sure they are reliable?
- Conduct a pilot study to ensure the items are clear and easily understood
- Administer questionnaire to large participant sample
- Do some number crunching in special statistical software
- Investigate psychometric properties for individual items
* item characteristics
* Create statistically sound sub scales
* Throw out non-performing items - Determine reliability and validity for the sub scales/overall test
How is a test revised?
- Using the new developed test/questionnaire, collect data in a new sample
- Repeat previous steps
- Make necessary refinements
Cross-validate - does test perform just as well in new sample?
- Obtain feedback from examinees or participants
What is involved with publishing a test?
- Produce testing materials
- Develop a technical and users manual that includes:
Background info
Development history
Administration instructions
Reliability
Validity
Normative info - Publish a scientific paper
What are the 3 main concepts of testing?
Standardisation and norms
Validity
Reliability
What is standardisation and norming?
The process of administering a test to a representative sample for the purpose of establishing norms is referred to as standardising a test.
What are standardisation groups?
- Once we have an individual’s score, we want. to know where that score fits in comparison with the individual’s peers
- Large groups of people are tested and their scores are used to work out test norms
- We can use the mean and SD of this to work out where an individual sits in comparison to others
- Depending on the purpose of the test, the standardisation group might be quite specific or general
- Norms also might change over time (eg. Flynn effect)