- Face - Content - Criterion - Construct

- What do experts believe is being measured? This is the least quantitative form. Does the content fit with the construct? - Contains some inter-rater reliability (Kappa) - Important for intelligence/achievement

- does it correlate with other measures given at the same time (aka, predict complementary performance)?

Test Psychometrics Overview Flashcards by Ellen Day

Validity

the extent to which a test measures what it set out to measure

How well did you know this?

Not at all

Perfectly

Types of Validity

Face
Content
Criterion
Construct

How well did you know this?

Not at all

Perfectly

Face validity

does the test appear to be measuring something meaningful?

How well did you know this?

Not at all

Perfectly

Three MAIN types of validity

Content
Criterion
Construct

How well did you know this?

Not at all

Perfectly

Content validity

What do experts believe is being measured? This is the least quantitative form. Does the content fit with the construct?
Contains some inter-rater reliability (Kappa)
Important for intelligence/achievement

How well did you know this?

Not at all

Perfectly

Criterion validity

does the measure appropriately predict aspects that it should? 3 kinds:

(a) concurrent
(b) Predictive 
(c) Known groups

How well did you know this?

Not at all

Perfectly

Types of criterion validity

concurrent
predictive
known groups

How well did you know this?

Not at all

Perfectly

Construct validity

– includes all forms; the extent to which the construct being suggested is actually being measures. 3 types:

(a) Convergent (and divergent)	
(b) Discriminant 
(c) Internal Structure

How well did you know this?

Not at all

Perfectly

Concurrent validity

does it correlate with other measures given at the same time (aka, predict complementary performance)?

How well did you know this?

Not at all

Perfectly

Predictive validity

– does it predict future performance? (e.g., GRE score)

How well did you know this?

Not at all

Perfectly

known groups validity

– using groups with expected, different outcomes (e.g., giving intelligence tests to individuals with MR and giftedness)

How well did you know this?

Not at all

Perfectly

convergent validity

– the target test theoretically related to other tests?
positive correlation

measure of depression should be positively correlated with other depression measures

How well did you know this?

Not at all

Perfectly

divergent validity

Divergent– does the test NEGATIVELY relate to other tests that it SHOULDN’T be related to positively? negative correlation (e.g., happiness measure should be negatively correlated with depression measure)

How well did you know this?

Not at all

Perfectly

discriminant validity

relation to a theoretically unrelated construct. Should be uncorrelated with it.

How well did you know this?

Not at all

Perfectly

internal structure validity

(aka, Factor Validity) – looks at the factors within the construct. Most tests have bad internal structure. Why? Not theory driven! Only three intelligence tests have good internal structure (according to Dr. MacDonald):
 Stanford-Binet (CHC)
 Woodcock-Johnson (CHC)
 KABC-II(5/4 factor of CHC)

How well did you know this?

Not at all

Perfectly

incremental validity

Study These Flashcards

whether the measure will increase the predictive ability of an existing method of assessment. In other words, incremental validity seeks to answer if the new test adds much information that might be obtained with simpler, already existing methods. Example: some have argued that the Rorschach has poor incremental validity since other, more easily administered tests of personality gather the same data, just in a less tedious way.

ecological validity

Study These Flashcards

whether the measure appropriately simulates real-world phenomena. This should not be confused with external validity, which refers to the generalizability of findings to the real world. In other words, an ecologically valid measure should appropriately capture the feel of the corresponding real-world scenario. Example: mock-juries may produce externally valid findings. However, most mock-juries do no include actual court proceedings, instead court transcripts of a trial. Thus, mock-juries could be said to have poor ecological validity.

reliability

Study These Flashcards

the consistency of a measure

components of reliability

Study These Flashcards

CASTI

Cronbach’s alpha (internal consistency measure in statistics)
Alternate forms (e.g. Blue and Green forms of WRAT-5)
Split-half (splitting the test in half and comparing it with itself by correlation
Test-retest (temporal stability of scores)
Inter-rater

standard error of measurement

Study These Flashcards

 We want this to be low so that R is high and we are accurately assessing and thus obtaining the person’s true score, reflective of an accurate assessment of the characteristic(s) in question.

norms

Study These Flashcards

To accurately interpret test data, to ascertain a person’s exact position with reference to a standardized sample, we must have a normative reference group because otherwise a raw score has no meaning. So, we need to see where the person falls in the sample’s relative standing.
Raw score is converted into a derived score (a relative measure), which tells the person relative standing.
There is a need for cultural/ethnic normative groups, which can be accomplished through stratified random sampling.

Objective test validity

Study These Flashcards

have high face validity, intent is easy to discern and hence participants can fake their responses, tests require person to be introspective and accurately answer truthfully, often resulting in false positives, in addition, defensiveness of person may prevent them from accurately responding

projective test validity

Study These Flashcards

are better predictors for long term behavioral patterns, while self report measures work best when both test items and criterion behaviors are assessed at or near the same time and are matched for specificity, the longer the time interval, the less predictive the test will be, objective measurements are best at predicting short-term behavior patterns, best to use a combination of both objective and projective measures

Reliability Rules of thumb (cut offs)

Study These Flashcards

.90 for decision making tasks
.80 or above for clinical and psychoeducational tasks (moderate)
.70-.79 for subtests are relatively reliable
.60-.69 subtests are marginally reliable
less than .60 are unreliable

Validity rules of thumb

.50 - .70 is acceptable criterion

Test Psychometrics Overview Flashcards

(25 cards)