Week 3 - Reliability and Validity Flashcards
(15 cards)
Person’s true score
IQ test (M = 100, SD = 15)
Test with rxx = 0, raw score = 50
Best estimate = 100 (mean)
Reliability is nothing, so test is meaningless
Person’s true score
IQ test (M = 100, SD = 15)
Test with rxx = 1, raw score = 122
Best estimate = 122 (raw score)
Test is perfect, so score is true
Person’s true score
IQ test (M = 100, SD = 15)
Test with rxx = 0.5, raw score = 80
Best estimate = 90 (midway between mean and raw)
Uses true score estimation formula
Estimated true score
overall mean + reliability(raw score - mean)
Calculating 95% CIs
Estimated true score +- (1.96 x SEm)
SEm is SD of distribution (sample sd x square root(1-rxx))
95% sure that true score is in this range
Standard error of difference
How to calculate whether the difference between two test scores is truly significantly different
squareroot (SEm1squared + SEm2squared)
Types of validity
Face validity - at face value
Content validity - covers a representative sample of domain
Criterion-related validity - scores predict scores on another accepted measure (concurrent or predictive)
Construct validity - test scores reflect individual differences in construct (convergent or divergent)
Criterion-related validity
Compare results to criterion variables (gold standard measurements)
Concurrent - test and criterion in the present
Predictive - criterion collected after test
Validity coefficients generally lower than reliability (look for significance, often 0.2-0.5)
Standard error of estimate
Analogous to SEm
Expected error in prediction of criterion score given test score
SEest = SDy x square root(1 - rxySquared)
If rxy = 1, then SEest = 0
If rxy = 0, then SEest = SD
SEest can set up a CI around the predicted criterion score (very useful for job selection and such)
Validity and decision theory
Even with low rxy, tests will be used if prediction benefit outweighs losses in terms of testing costs (which is often the case)
Factors that influence rxy
Low sample size, restriction of range, non-linear relationship between test and criterion (use non-parametric or transform data), criterion problems (who decides gold standard?, criterion contamination, changing over time)
Restriction of range (selectivity problems)
Restricting range of values reduces r (concern for predictive validity)
E.g. current employees already have particular level, and employing only those above cutoff also restricts range
Solutions - test and employ everyone (not possible), test and employ random sample (unrealistic), statistical correction (best)
Content validity
Systematic examination to determine whether test content covers representative sample of behaviour domain being tested
Not empirically established
Built into test from outset
Construct validity
Degree to which test scores reflect individual differences
Requires gradual accumulation from various sources
Testing - association between test and other constructs/behaviours
Types - convergent (expecting things to correlate), discriminant (should not be correlated)
Relationship between reliability and validity
Reliability (precision), validity (accuracy)
If a test isn’t reliable, it can’t be valid