final Flashcards
Psychometrics
How can we measure constructs like depression, anxiety, loneliness, etc
Psychology is the science of thoughts, feelings, and behaviors
Problem– thoughts and feelings are not directly observable
Solution to this is psychometrics
Test serves as a proxy (way to sort of determine) for what we can not see
psychometrics measurement
scoring individuals on characteristics that can’t be easily observes
Measure development– Writing items, scoring procedures
Measure evaluation– Determining whether measure is reliable and valid
Measurement is not a linear process, its is typically ongoing
Items and scales will always be revised over and over again to get the most accurate results possible
psychometrics examples
cognitive ability via tasks/tests requiring cognition
Knowledge on t-ests and ANOVA bias exam 2 scores
Conscientiousness via your answer to 10 questions
Stress via your salivary cortisol levels
Looking a biological measures
Goals of psychometrics–
classify/group people into categories
nominal/ordinal variables
Ex– questions about educational attainment, attachment style
Essentially just grouping people
Goals of psychometrics–
interval/ratio variables
Ex– questions about extent of consciousness, severity of depression symptoms
Measure reliability
consistency/precision of scores across
Time– want to see similarities between time one and time two
Items– responding to items in a consistent manner
Raters– don’t want two raters to observe two things and come to two completely different conclusions
Measure validity
accuracy scores
Are the scores measuring what they are supposed to measure?
“Construct validity”– are you looking at what your are intending to
Reliability versus validity
they do relate but its imperfect
Dots on target examples
Dots close to target– accurate
Dots close together– consistent/precise
Can an unreliable measure be valid– no, has to have consistency for it to be valid
Can an invalid measure be reliable– yes, something can be reliable but consistently wrong
Test-retest reliability
are scores similar when measured at different time points
Official name for asking about time
Always relevant for trait-like constructs
Personality
Intelligence
Test-retest reliability– Less relevant for state-like constructs– things that vary from day to day
Stress
Positive affect (emotion)
Negative affect
test-retest reliability– method
Relate time one scores to time two scores
Want the scores to be highly consistent with one another
Usually use correlation
Looking for an effect size of .70 or higher, generally
retest w/ paired samples t test
a bit controversial
Want to see a non-significant result (no change)
Controversial to look for a null result because runs risk of type one or type two error
Consistency across items– internal consistency reliability
Most psychological scales contain multiple items that, together, create a score
Internal consistency reliability– do items in a scale positively relate to one another
Not that they should be answering exactly the same way, but they should be close and there should be a pattern
Measurement error– differences in responses across the items
Attempt to correct this is aggregating the scores (adding them up) to cancel the small amounts of error
internal consistency reliability– method one (split half the correlation
Split half correlation– 10 diff questions people answer, create two random halves and create a sum score of 5 items in each set
Then look at the relationship between set 1 and set 2 with the expectation they will be related
If you don’t have .7, may have to start over
Problem with splitting up correlation– random half you pick might happen to just be different
internal consistency reliability– method two (Cronbach’s alpha– variance framework)
Dividing covariance (relationships) among all possible pairs of item over total variance across all items
More reliable than split half correlation
Essentially taking the average
Increase covariation across items = higher alpha
Increase item variance = lower alpha
Will always range between 0-1
Want alpha of .7 or higher
Interrator reliability
how consistent are two separate investigators scores for the same group of participants
Two investigators observe the same behavior in the same person at the same time point and score it
Very important– not about two separate experiments
Simple calculation– percentage of agreement between scores
Most relevant for behavioral measures, but sometimes can be relevant for surveys too
Face validity
does this look like its measuring what it is supposed to measure
Arguably weakest type
Ex– strong face validity to measure pandemic worry
Rate agreement from 1-5– “i am worried about the void-19 pandemic”
Use low face validity to measure social desirability
Ex– “i like to gossip at times”
Gives you an idea of how much participant is willing to lie
face validity problems
Completely subjective
Can have good measures, but will still have low face validity
Make decisions based on the questions you ask
Content validity
how does the operational definition match the conceptual definition of the construct
making sure each part of the construct you want to study is measured in some way
Looking at the match between your measure and the actual content itself
Still somewhat subjective
Not a formal test
Sometimes good measure will have low content validity
content validity example
stress
Conceptually, includes both physiological and physiological responses
In theory, good measures should include both
Criterion validity
are scores on the measure related to measures of other constructs (criterial) that they theoretically should be related to
Arguably the strongest
Looking for relationships through hypothesis tests
Concurrent criterion validity– criteria measured at the same time
Predictive criterion validity– criteria measured in the future
Ex– stress scale
Todays negative affect (concurrent)
Tomorrows negative affect (predictive)
Convergent validity
new measure is correlated with other established measures of the same construct
When developing a new stress measure
Have to find its relationship with
The perceived stress scale– an established self-report stress scale
Different from criterion because convergent is looking at measures of the same construct
However, with both you are trying to prove that they relate in some way
Discriminant validity
are scores on the measure not related to the measures of distinct constructs
Sort of controversial form of validity
Hard to find constructs that don’t relate to things like stress/depression, etc
Don’t want to find a relationship
Failing to reject the null
Don’t want a negative relationship either because its still a relationship
discriminant validity example
developing a new measure of stress
Find its relationship with
Social desirability scores
Demographic characteristics