Reliability & Validity Flashcards
(39 cards)
Reliability
- Are the results consistent?
- Provides an estimate of the proportion of unsystematic error <—need to know the degree of to determine reliability
Validity
- Does it measure what it says it measures?
- Overall eval of evidence and degree of trustworthiness
- Determine if enough support exists to use the test in a certain way
Classical Test Theory
- Observed score = T + E
- T is the true score if the test is completely free from error
- E is the error
Unsystematic Error
- Random errors: mood, health, fatigue
- Administration differences
- Scoring differences
- Random guessing
Systematic Error
Constant errors that occur every time tested, like a typo
Reliability Related to Validity
- High validity can occur if high reliability exists
- High validity cannot occur if low reliability
- High reliability does not suggest high validity
Correlation Related to Reliability
- Correlation: Statistical technique used to examine consistency
- Reliability is often based on consistency between two sets of scores
Positive Correlation
As one increases, so does the other
Negative Correlation
As one increases, the other decreases
Correlation Coefficient (Pearson-Product Moment)
- Correlation coefficient: numerical indicator of the relationship between two sets of data
- PPM correlation coefficient - most common
- -1 to +1: closer to absolute value 1=stronger relationship
Test-Retest
- Give same test twice to same group
- Correlation between first and second administration (2-6 weeks away)
- Possible influences: shorter gap, high correlation, changes in administration, interventions, practice test
- Ex: skills-based test
Alternate Forms
- Very difficult
- Correlation off scores from two equivalent forms of a test
- Measures stability (over time) and equivalence (construct similarity)
- Use sample of different times from same domain
Internal Consistency
- One administration
- One form of instrument
- Divides instrument and correlates the scores from the different portions
Split-Half Reliability
- Given once then split in half to determine reliability
- Need to divide instrument into equivalent halves, like even and odd
- Problem: dividing instrument in half makes number of items smaller —> smaller correlation
Doesn’t work if test increases in difficulty and doesn’t quick fix problem
Kinder-Richardson
- KR-20: heterogeneous items
- KR-21: homogenous items - single construct (cannot be used if items are from the same domain or differ in difficulty)
- Lower reliability coefficient then split-half
- Purpose: Estimate the average of all split-half reliabilities from all ways of splitting the instrument
Pearson-Product Coefficient Alpha
- Used for non-dichotomous scoring
- Ex: Likert scales
- Cronbach’s alpha
- Takes into account variance of each item
- Conservation estimate of reliability
- Most common
Standard Error of Measurement (SEM)
- Provides estimate of range of scores if someone were to take instrument repeatedly
- Based on idea that if someone takes test multiple times, scores would fall into a normal distribution
SEM v. SD
- SD is spread of scores between students
- SEM is spread of scores for one student
- Uses same estimations
Content-Related Validity
- Test items measure the objectives they are supposed to measure
- Focus on how content was determined
- May be based on test creator’s own analysis of topic or expert analysis
- How well do test items reflect the domain of material being tested
Criterion-Related Validity
- Test scores related to specific criterion/variable
- Sources of criterion scores: academic achievement, level of education, performance in specialized training, job performance, psychiatric diagnosis, ratings by supervisors, correlations with previously available tests
Concurrent Validity (Criterion-Related)
- Scores on test and criterion measure are collected at same point
- Ex: achievement, certification
- Scorer typically higher than predictive
- Require reliable and bias-free measures
Predictive Validity (Criterion-Related)
- Test is administered first and scores on criterion measure are collected at a later time
- Ex: SAT, college GPA
- Require reliable and bias-free measures
Construct Validity
- What do scores on this test mean or signify
- Construct: Grouping of variables that make up observed behavior patterns
- Ex: Self-efficacy, personality
- Measured by correlation of 2 scores or factor analysis
- Often seen in psych tests
Convergent v. Discriminant (Construct Validity)
-Covergent: Positive correlation with other tests measuring the same/similar construct