Week 3 Reliability and Validity Flashcards
(40 cards)
Define reliability
The degree to which a test tool produces consistent results (when measuring the same thing).
e.g. the scale which measures a consistent weight each time is considered reliable.
Define validity
The extent to which a test measures the construct it is intended to measure. e.g. a scale measures weight nothing else, an IQ test measures intelligence
Why are reliability and validity important?
- diagnosis
- assessment of ability
- treatment decisions and monitoring outcomes
- research
True or false, tests can be reliable without being valid.
TRUE
tests can consistently produce the same results but not accurately measure what you want them to.
True or false, tests can not be valid but still be reliable
FALSE
Tests cannot be valid without being reliable.
Describe classical test theory (Charles Spearman)
States that test scores are the result of:
- factors which contribute to consistency - stable under examination (“True Scores”)
- factors which contribute to inconsistency - characteristics of test taker, or situation that are not related to the characteristic being tested (errors of measurement/ confounders)
What is the formula for test theory?
X = T + e
X= obtained score T= true score e= errors of measurement
e.g. Anxiety score on test =(true) anxiety + error
List the different sources of error
Item selection
Test administration
Test scoring
Systematic measurement error
Describe the following source of error: item selection
sample of items chosen may not be reflective of every individual’s true score
Describe the following source of error: test administration
general environmental conditions e.g. temperature, lighting, noise, states/mood of the test taker
Describe the following source of error: test scoring
Subjectively scored tests e.g. projective tests and essay exams
Describe the following source of error: systematic measurement error
test may consistently tap into something other than the attribute being tests
e.g. test of introversion may actually test aspects of social anxiety without knowing
Explain domain sampling theory
Central concept in classical test theory;
With Domain Sampling, tests are constructed by randomly selecting a specified number of measures from a homogeneous, infinitely large pool.
A sample of items is reliable to the extent that the score it produces correlates highly with these true scores
Are longer tests more reliable?
Technically, yes because according to domain sampling theory, these tests will include more items from the “universe” of possible domains thus testing more aspects of an item.
What are two elements of reliability that are observed/tested?
Stability over time - extent to which test remains stable when it is administered on more than one occasion
Internal consistency - extent to which a psychological test is homogenous or heterogenous
Describe the test-retest (stability) measure of evaluating reliability.
same test administered to same group twice at two different time points.
What are considerations/ limitations for the test-retest measure?
- consider strong correlations between 2 scores
- consider time lapse between test administrations
- practice effects, maturation, treatment effects/ setting all impact scores
Test-retest is an appropriate measure for I___ and E___
It is inappropriate for S___ A__ and W__ of a b_
Intelligence and Extraversion (stable over time)
State anxiety and weight of a baby
Describe the parallel or alternative forms measure of evaluating reliability.
two forms of the same test developed; different items selected according to the same rules. e.g. alternative exam for PSY3041
please select one of the two options:
- same
- different
parallel forms have ___ distribution across scores (means and variance equal)
same
please select one of the two options:
- same
- different
alternate forms have ___ distribution of scores
different
means and variance may not be equal
What are the similarities between parallel and alternate forms of reliability?
- both are matched for content and difficulty
- stable construct required
- two tests administered to the same group (looking for strong correlations between the versions)
- influenced by changes between testing times e.g. fatigue
- additional source of error: item sampling/ slightly diff items
Describe the split half method of evaluating reliability.
test is divided into halves which are compared (randomly split, odd-even system or top vs bottom).
rationale: if scores on 2 half tests from single administration are highly correlated, scores on 2 whole tests from seperate administration should be highly correlated
- estimates of reliability will be smaller because smaller number of items
What is the purpose of the Spearman-Brown formula/ correction?
As the reliability based on the split half is smaller due to a smaller number of items, the Spearman-Brown formula is applied to estimate reliability if each half of the test was the same length as the test.
-internal consistency tested