Psychological Measurement Flashcards
(37 cards)
Reliability
Reliability refers to the consistency of a measure. Psychologists consider three types of consistency: over time (test-retest reliability), across items (internal consistency) and across different researchers (inter-rater reliability)
Validity
Validity is the extent to which the scores from a measure represent the variable they are intended to.
What is measurement?
Measurement is the assignment of scores to individuals so that the scores represent some characteristic of the individuals.
Psychological measurement is often referred to as
psychometrics
What are psychological constructs?
We cannot accurately assess people’s level of intelligence by looking at them, and we certainly cannot put their self-esteem on a bathroom scale. These kinds of variables are called constructs (pronounced CON-structs) and include personality traits (e.g., extraversion), emotional states (e.g., fear), attitudes (e.g., toward taxes), and abilities (e.g., athleticism).
The conceptual definition of a psychological construct…
The conceptual definition of a psychological construct describes the behaviors and internal processes that make up that construct, along with how it relates to other variables.
An operational definition…
An operational definition is a definition of a variable in terms of precisely how it is to be measured.
Behavioural measures
Behavioral measures are those in which some other aspect of participants’ behavior is observed and recorded. This is an extremely broad category that includes the observation of people’s behavior both in highly structured laboratory tasks and in more natural settings.
Physiological measures
physiological measures are those that involve recording any of a wide variety of physiological processes, including heart rate and blood pressure, galvanic skin response, hormone levels, and electrical activity and blood flow in the brain.
Converging operations
When psychologists use multiple operational definitions of the same construct—either within a study or across studies—they are using converging operations.
Levels of measurement
levels of measurement (which he called “scales of measurement”) correspond to four types of information that can be communicated by a set of scores, and the statistical procedures that can be used with the information.
Nominal level of measurement
The nominal level of measurement is used for categorical variables and involves assigning scores that are category labels. Category labels communicate whether any two individuals are the same or different in terms of the variable being measured. e.g. marital status
Ordinal level of measurement
The ordinal level of measurement involves assigning scores so that they represent the rank order of the individuals. Ranks communicate not only whether any two individuals are the same or different in terms of the variable being measured but also whether one individual is higher or lower on that variable. e.g. researcher measuring consumers’ satisfaction, requesting participants to rate their feelings as ‘very dissatisfied’ ‘somewhat disatisfied’ ‘satisfied’
The interval level of measurement
The interval level of measurement involves assigning scores using numerical scales in which intervals have the same interpretation throughout. e.g. celsius scale of measurement
The ratio level of measurement
The ratio level of measurement involves assigning scores in such a way that there is a true zero point that represents the complete absence of the quantity. Height measured in meters and weight measured in kilograms are good examples.
Test-retest reliability
Test-retest reliability is the extent to which this is actually the case. For example, intelligence is generally thought to be consistent across time. A person who is highly
Reliability and Validity of Measurement | 95
intelligent today will be highly intelligent next week. This means that any good measure of intelligence should produce roughly the same scores for this individual next week as it does today.
Internal consistency
Another kind of reliability is internal consistency, which is the consistency of people’s responses across the items on a multiple-item measure. In general, all the items on such measures are supposed to reflect
96 | Reliability and Validity of Measurement
the same underlying construct, so people’s scores on those items should be correlated with each other.
Split-half correlation
Like test-retest reliability, internal consistency can only be assessed by collecting and analyzing data. One approach is to look at a split-half correlation. This involves splitting the items into two sets, such as the first and second halves of the items or the even- and odd-numbered items. Then a score is computed for each set of items, and the relationship between the two sets of scores is examined.
Inter-rater reliability
Inter- rater reliability is the extent to which different observers are consistent in their judgments. For example, if you were interested in measuring university students’ social skills, you could make video recordings of them as they interacted with another student whom they are meeting for the first time. Then you could have two or more observers watch the videos and rate each student’s level of social skills.
Face validity
Face validity is the extent to which a measurement method appears “on its face” to measure the construct 98 | Reliability and Validity of Measurement
of interest.
Content validity
Content validity is the extent to which a measure “covers” the construct of interest. For example, if a researcher conceptually defines test anxiety as involving both sympathetic nervous system activation (leading to nervous feelings) and negative thoughts, then his measure of test anxiety should include items about both nervous feelings and negative thoughts.
Criterion validity
Criterion validity is the extent to which people’s scores on a measure are correlated with other variables (known as criteria) that one would expect them to be correlated with. For example, people’s scores on a new measure of test anxiety should be negatively correlated with their performance on an important school exam.
concurrent validity
When the criterion is measured at the same time as the construct, criterion validity is referred to as concurrent validity
predictive validity
when the criterion is measured at some point in the future (after the construct has been measured), it is referred to as predictive validity (because scores on the measure have “predicted” a future outcome).