Ch. 6 - Validity Flashcards
(43 cards)
validity
judgment of how well a test measures what it purports to measure in a particular context; judgment based on evidence about the appropriateness of inferences drawn from test scores
inference
logical result or deduction
a valid test has been shown to be valid for
a particular use with a particular population of testtakers at a particular time
no test is ____ valid
universally valid for all times, all uses, and with all populations
a test is valid within ____
“reasonable boundaries” of a contemplated usage
validation
the process of gathering and evaluating evidence about validity
validation studies can be done with ____
a group of testtakers, to provide insights regarding a particular group of testtakers as compared to a norming sample (local validation)
what are the three categories of validity?
content, criterion-related, and construct (umbrella)
content validity
scrutinizing the test’s content
criterion-related validity
relating scores obtained on the test to other test scores or other measures
construct validity
umbrella validity; all others fall under it. a comprehensive analysis.
analysis of how test scores relate to other measures and how scores can be understood within some theoretical framework. (maybe your hypothesis about what’s different about high and low test scorers)
face validity
not one of the three C’s
what a test appears to measure or how relevant the test items look to the testtaker
why does face validity matter?
testtakers may not put forth good effort; parents may complain about their kids taking a non-face-valid test; lawsuits may be filed
content validity
judgment of how adequately a test samples behavior representative of the whole universe of behavior that the test was designed to sample.
e. g. assertiveness test assesses behavior on the job, in social situations, etc
e. g. test samples all chapters
how can we judge content validity?
get a panel of judges or experts - if more than half indicate that an item is essential, that item has some content validity. more people agree, more content validiy
what’s a problem with establishing content validity?
we frequently don’t know all of the items in the theoretical domain of possible items
criterion-related validity
a judgment of how adequately a test score can be used to infer an individual’s standing on a criterion being measured (3 types: concurrent, predictive, incremental)
concurrent validity
a judgment of how adequately a test score can be used to infer an individual’s present standing on a criterion (ex: diagnosing someone from a test when you already know they have the thing - perhaps from a diff validated test. the test might be an easier way to reach the diagnosis)
predictive validity
measures of the relationship between the test scores and a criterion measure obtained at a future time (ex: using GRE scores to predict graduate course passing)
criterion
standard against which a test score is measured; can be almost anything (behavior, diagnosis)
a good criterion is
relevant (pertinent to matter at hand); valid (if X is being used to predict Y, then we need to know X is valid); uncontaminated (not based on a predictor measure (if X is used to predict Y, and Y is in part based on X, then X is contaminiated)
what are three types of criterion-related validity?
concurrent validity
predictive validity
incrimental validity
base rate
extent to which a particular trait, behavior, etc exists in the popluation (proportion)
hit rate
proportion of people that a test accurately identifies as having a specific trait