Applied Scale Construction Flashcards
(13 cards)
What is construct validity
an experimental tool should measure what it’s supposed to
Why is construct validity important
it is unscientific to presume validity, must prove case
face validity can deceive
proof is provided through standard rigorous steps
How to establish construct validity
step 1) identify if preconditions for validity are met (there are basic requirements that need to be met before we even consider validity)
discrimination = differences between groups/individuals are clear
reliability =
structure =
step 2) validity is shown by the patterns of links to external constructs
defining construct validity statistically
observed score = what someone scores on a questionnaire
true score = what they would score if questionnaire were completely valid
hypothetical correlation between them is a measure of validity
the extent to which these don’t link perfectly, you have imperfect measurement
perfect link = 1
no link = 0
number between 0 and 1 expresses correlation between observed and true score
two main sources of invalidity
systematic error = a potentially knowable bias that shifts score consistently in one direction - tend to be smaller
random error = random influences that affect scores unpredictably
standard steps to achieve construct validity
1) item design
2) item analysis
3) reliability analysis
4) factor analysis - for structure
5) scale validation - scale is all items together, validation process itself involving convergent and discriminant validation
systematic bias in response to questionnaires
bias towards “acquiescence” (agreeing with items more than disagreeing)
step 1
1) item design - an item is one question in a questionnaire - for qualitative info, it must be quantitively coded later - bad examples from Oxford Capacity, scaling (for close-ended items - even intervals on a scale, between 5 and 7, should “neutral” be included?, equal amounts of forward and reverse-scored items as forward scored means people score higher and reverse scored means people score lower because of acquiescence bias AND as it shows which respondents answer mindlessly
step 2
2) item analysis - discriminate between respondents (no questions everyone answers yes or everyone answers no on) - SD is direct measure of dispersion so higher is better - item scores must clump in bundle - skew is direct measure of asymmetry in score distribution (positively skew = more high scores, negative skew = more low scores), must be a central average (mean in middle)
step 3
reliability analysis
consistency between items (internal consistency), over time (test-retest reliability) and between scorers (inter-rater reliability)
the more scores on items are related to one another, the higher internal consistency
items whose correlation is artificially inflated = bloated specifics
we want items to be a little different
items should be related (related = on the same team, distinct = not doing same thing)
frame in terms of correlations (should intercorrelate well but not perfectly and must maintain content validity ie cover whole aspect)
items shouldn’t duplicate another item’s job (bloated specifics)
internal consistency assessed with alpha (0 to 1)
an items average correlation with other items should be more than 0.20
an items correlation with total scale score should be more than 0.30
scale-level index = cronbach’s alpha (coefficient) is an average of all inter-item correlations - a >.60 is minimal, a>0.80 is okay, and a>.90 is good
mcdonald’s omega is sometimes preferred over cronbach’s alpha
split-half reliability = splits a scale into two halves and compares average scores for each half and compute correlation
longer scales have more internal consistency in terms of alpha
importance of scale reliability
precondition for validity - a scale must be consistent/reliable to be valid BUT reliable scale can also be invalid
step 4
factor analysis
a factor is a single underlying dimension
higher reliability suggests a single factor but doesn’t guarantee it
checks how many dimensions underlie the items
EXPLORATORY FACTOR ANALYSIS = looks at how all items correlate together, then identifies among them clumps of correlations, based on shared variances - it reduces variances across many variables to fewer distinct clusters of shared variance
three stages:
factor extraction - reduces into underlying factors - uses principal axes factoring (PAF) or maximum likelihood (ML) which analyses shared variance and is suited to finding underlying factors - it’s NOT prinicipal components analysis (PCA) which analyses all variance and is suited to collapsing variables
use scree plot gap to infer number of factors, eigenvalue must exceed 1
scree plot gap also looks for a gap between “main” factors and “rubble” (other) factors
factor rotation
factor interpretation
factor rotation - helps make factor solution easier to interpret - use orthogonal rotation (opposite ie Varimax) which assumes factors independent (which is unlikely) and the solution as more interpretable - OR use oblique rotation (similar ie Direct Oblimin) which allows factors to be correlated (which is likely) but the solution is less interpretable
factor interpretation - factor rotation aids interpretability ie moves data towards an arrangement called “simple structure”, but what we want is a scheme for looking at the items and seeing which factors items are linked to and vice versa - line up items and see how each correlate with all factors that have been extracted in analysis (correlations are “loadings”, an item “loads” on a factor - varying from -1 to +1, good loading is more than r=0.35 or less than r=-0.35
simple structure = items associated with factors are LOYAL to only one factor
OTHER THAN EXPLORATORY FA, THERE IS ALSO:
confirmatory factor analysis = creates hypothesis links between variables that check how well it fits the data, tests fit between predicted structure and obtained data
AND item response theory (IRT) = models responding as function of person (trait) and item (many aspects), takes account of differences in item difficulty and relates them with a model to individual differences in ability
step 5
scale validation
associative = scale correlates with what it should (ie convergent validity is where diff measures of same construct converge/intercorrelate)
dissociative = scale doesn’t correlate with what it shouldn’t (ie divergent validity is where measures of same type, but of diff constructs diverge/don’t intercorrelate)
ultimate goal = establish construct validity of scale, if diff methods of measuring construct agree, it’s a strong evidence of validity:
MultiTrait-MultiMethod (MTMM) matrix examines convergent and discriminant/divergent validity
three types of validity, all forms of criterion validity:
concurrent = predicts something at roughly the same time
predictive = predicts something in future ie IQ predicts upcoming exam performance
retrodictive = predicts something in the past ie past exam performance
three more forms of validity (not relevant here):
external = generalisability
internal = IV causes DV
statistical = assumptions and procedures, making the right inferences from data and design
blirtatiousness = tendency to “blurt” stuff out/respond quickly and effusively to others without much thought - amplifies impressions of personality like extraversion with “BLIRT” scale
BLIRT scale:
can be associative (positive/negative)
can be dissociative (null)
blirtatiousness predicts a number of other outcomes (background variables or in the lab variables)