Week 4 - PSYCHOLOGICAL MEASUREMENT Flashcards
(36 cards)
Describe - The Rosenberg Self-Esteem Scale
(Rosenberg, 1989)[2] is one of the most common measures of self-esteem
Participants respond to each of the 10 items that follow with a rating on a 4-point scale: Strongly Agree, Agree, Disagree, Strongly Disagree
Describe ‘measurement’ in psychometrics and give an example…
so that the scores represent some characteristic of the individuals.
PSYCHOMETRICS EXAMPLE
Imagine — Cognitive psychologist - measure - person’s working memory capacity—their ability to hold in mind and think about several pieces of information all at the same time. (% & @ # ^)
To do this, she might use a backward digit span task, in which she reads a list of two digits to the person and asks them to repeat them in reverse order. She then repeats this several times, increasing the length of the list by one digit each time, until the person makes an error.
The length of the longest list for which the person responds correctly is the score and
REPRESENTS their working memory capacity.
(EXAMPLE: Beck Depression Inventory, which is a 21-item self-report questionnaire in which the person rates the extent to which they have felt sad, lost energy, and experienced other symptoms of depression over the past 2 weeks.
SUM = represents the person’s current level of depression.)
Requires - SYSTEMATIC procedure for assigning scores to individuals or objects…
(1 = # // 2 = * // 3 = $)
SO… those scores represent the = ** characteristic of interest**.
Describe psychological constructs and give examples
Psychological Constructs
Variables that aren’t easy to quantify
These kinds of variables are called constructs (pronounced CON-structs)
EXAMPLE
#Personality traits (e.g., extraversion)
:) emotional states (e.g., fear)
» attitudes (e.g., toward taxes)
(*) abilities (e.g., athleticism).
Describe and give example - Psychological Constructs
Psychological constructs :
- cannot be observed directly
- often represent tendencies to think, feel, or act in certain ways
- often involve internal processes
EXAMPLE
FEAR - activates central and peripheral nervous system structures, AND certain kinds of thoughts, feelings, and behaviors…
NOT OBVIOUS TO AN OBSERVER
IMPORTANT NOTE -
Neither extraversion nor fear “reduces to” any particular thought, feeling, act, or physiological structure or process.
INSTEAD each is a kind of summary of a
COMPLEX SET set of behaviors and internal processes.
Describe and give example - Conceptual definition
The conceptual definition of a psychological construct describes…
the behaviors and internal processes that MAKE UP that construct, along with HOW IT RELATES to other variables.
EXAMPLE:
A conceptual definition of NEUROTICISM (another one of the Big Five) = people’s tendency to experience negative emotions such as anxiety, anger, and sadness across a variety of situations.
This definition might ALSO INCLUDE that it has a strong genetic component, remains fairly stable over time, and is positively correlated with the tendency to experience pain and other physical symptoms.
(EG. The Big Five is a set of five broad dimensions that capture much of the variation in human personality. Each of the Big Five can even be defined in terms of six more specific constructs called “facets” (Costa & McCrae, 1992))
Why use a conceptual definition instead of using the dictionary?
Many scientific constructs do not have counterparts in everyday language (e.g., working memory capacity).
Researchers are in the business of developing definitions that are;
—more detailed and precise
—and that more accurately describe the way the world is—than the informal definitions in the dictionary.
As we will see, they do this by
1. PROPOSING conceptual definitions
2. Testing them empirically
3. Revising them as necessary
Sometimes they throw them out altogether.
This is why the RESEARCH LITERATURE often includes different conceptual definitions of the same construct.
In some cases, an older conceptual definition has been replaced by a newer one that fits and works better.
In others, researchers are still in the process of deciding which of various conceptual definitions is the best.
Describe operational definition (and 3 measure categories)
An operational definition is a
definition of a variable in terms of precisely how it is to be measured.
These measures generally fall into one of three broad categories.
Self-report measures are those in which PARTICIPANTS REPORT on their own thoughts, feelings, and actions, as with the Rosenberg Self-Esteem Scale (Rosenberg, 1965)[2].
Behavioral measures are those in which some OTHER aspect of participants’ behavior is
OBSERVED & RECORDED.
EXAMPLE
Lab - measuring working memory capacity using the backward digit span task.
Natural Setting - Physical aggression from researcher Albert Bandura and his colleagues (Bandura, Ross, & Ross, 1961)[3].
They let each of several children play for 20 minutes in a room that contained a clown-shaped punching bag called a Bobo doll. They filmed each child and counted the number of acts of physical aggression the child committed. These included hitting the doll with a mallet, punching it, and kicking it. Their operational definition, then, was the number of these specifically defined acts that the child committed during the 20-minute period.
Physiological measures are those that involve recording any of a wide variety of physiological processes, EG.
<3 Heart rate
~~~ BLOOD pressure
/// electrical activity ///
For ANY VARIABLE OR CONSTRUCT, there will be multiple operational definitions - Give an example
For ANY VARIABLE OR CONSTRUCT, there will be multiple operational definitions.
EXAMPLE - STRESS
conceptual definition = stress is an adaptive response to a *perceived danger or threat *that involves physiological, cognitive, affective, and behavioral components.
Opreational Definition
**The Social Readjustment Rating Scale **(Holmes & Rahe, 1967)[4] is a self-report questionnaire on which people identify stressful events that they have experienced in the past year and assigns points for each one depending on its severity.
For example, a man who has been divorced (73 points), changed jobs (36 points), and had a change in sleeping habits (16 points) in the past year would have a total score of 125.
The Hassles and Uplifts Scale (Delongis, Coyne, Dakof, Folkman & Lazarus, 1982) [5] is similar but focuses on everyday stressors like misplacing things and being concerned about one’s weight.
The Perceived Stress Scale (Cohen, Kamarck, & Mermelstein, 1983) [6] is another self-report measure that focuses on people’s feelings of stress (e.g., “How often have you felt nervous and stressed?”).
Researchers have also operationally defined stress in terms of several physiological variables including blood pressure and levels of the stress hormone cortisol.
Describe converging operations
When psychologists use multiple operational definitions of the same construct—either within a study or across studies—they are using converging operations.
The idea is that the VARIOUS operational definitions are “converging” or coming together on the same construct.
- When scores based on several different operational definitions are closely related to each other
- and produce similar patterns of results,
- this constitutes good evidence that the construct is being measured effectively
- and that it is useful.
EXAMPLE - various measures of stress, are all correlated with each other and have all been shown to be correlated with other variables such as** immune system functioning** (also measured in a variety of ways) (Segerstrom & Miller, 2004)[7].
Name the four levels of measurement and describe their genesis
Levels of Measurement
The psychologist S. S. Stevens suggested that scores can be assigned to individuals in a way that communicates more or less quantitative information about the variable of interest (Stevens, 1946)[8]. (He proposes four levels)
The nominal level
The ordinal level
The interval level
The ratio level
Describe and give examples - The nominal level
The nominal level of measurement is used for categorical variables and involves
ASSIGNING SCORES = that are **category labels. **
CATEGORY LABELS communicate whether any two individuals are the same or different in terms of the variable being measured.
EXAMPLE
Asking about **marital status or ethnicity **
NO implied order (one is not higher than the other
Responses are merely categorized.
Nominal scales thus embody the LOWEST level of measurement
Describe and give examples - The ordinal level
ORDINAL = ORDER (Rank order)
Ordinal level of measurement involves ASSIGN SCORES so that they represent the rank order of the individuals.
Ranks communicates whether one individual variable is higher or lower on that variable.
EXAMPLE -
APP REVIEW Questions - “very dissatisfied,” “somewhat dissatisfied,” “somewhat satisfied,” or “very satisfied.” The items in this scale are ordered, ranging from least to most satisfied.
Describe ordinal level limitations
EXAMPLE
The DIFFERENCE between two levels of an ordinal scale cannot be assumed to be the same as the difference between two other levels
In our satisfaction scale…
The difference between the responses “very dissatisfied” and “somewhat dissatisfied” is probably not equivalent to the difference between “somewhat dissatisfied” and “somewhat satisfied.”
Nothing in our measurement procedure allows us to determine whether the two differences reflect the same difference in psychological satisfaction.
Statisticians express this point by saying that the differences between adjacent scale values do not necessarily represent equal intervals on the underlying scale giving rise to the measurements.
(In our case, the underlying scale is the true feeling of satisfaction, which we are trying to measure.)
Describe interval level of measurement
The interval level of measurement involves assigning scores using **numerical scales **in which intervals have the same interpretation throughout.
EXAMPLE
Fahrenheit or Celsius temperature scales.
The difference between 30 degrees and 40 degrees represents the same temperature difference as the difference between 80 degrees and 90 degrees.
This is because each 10-degree interval has the same physical meaning (in terms of the kinetic energy of molecules).
Describe the limitations of interval scales
Interval scales are not perfect, however.
In particular, they do not have a true zero point even if one of the scaled values happens to carry the name “zero.”
EXAMPLE
Measuring IQ - Someone may get a ‘0’ score but it doesn’t indicate the complete absense of intellect.
The Fahrenheit scale illustrates the issue. Zero degrees Fahrenheit does not represent the complete absence of temperature (the absence of any molecular kinetic energy).
In reality, the label “zero” is applied to its temperature for quite accidental reasons connected to the history of temperature measurement.
Since an interval scale has no true zero point, it does not make sense to compute ratios of temperatures.
Describe the Ratio Scale
Ratio level of measurement involves assigning scores in such a way that there is a true zero point that represents the COMPLETE ABSENSE of the quantity.
EXAMPLE - Height measured in meters and weight measured in kilograms are good examples.
RATIO SCALE
- provides a name or category for each object
- objects are ordered
- the same difference at two places on the scale has the same meaning.
- the same ratio at two places on the scale also carries the same meaning (see Table 4.1).
EXAMPLE
$$$ Amount of money you have in your pocket right now (25 cents, 50 cents, etc.).
Money is measured on a ratio scale because, in addition to having the properties of an interval scale, it has a true zero point: if you have zero money, this actually implies the absence of money.
Give 2 reasons why Stevens’s levels of measurement are important
Stevens’s levels of measurement are important for at least two reasons.
- they **emphasize the generality of the concept of measurement. **Although people do not normally think of categorizing or ranking individuals as measurement, in fact, they are as long as they are done so that they represent some characteristic of the individuals.
- the levels of measurement can serve as a rough guide to the statistical procedures that can be used with the data and the conclusions that can be drawn from them.
Nominal-level measurement - can use mode.
Ordinal-level measurement - median or mode
Interval and ratio-level measurement are typically considered the most desirable because they permit for any indicators of central tendency to be computed (i.e., mean, median, or mode).
Also, ratio-level measurement is the only level that allows meaningful statements about ratios of scores.
Once again, one cannot say that someone with an IQ of 140 is twice as intelligent as someone with an IQ of 70 because IQ is measured at the interval level, but one can say that someone with six siblings has twice as many as someone with three because number of siblings is measured at the ratio level.
Identify the two dimensions of evaluating measurement method
Psychologists DO NOT simply assume that their measures work.
Instead, they **collect data ** to demonstrate that they work. If their research does not demonstrate that a measure works, they stop using it
Evaluating a measurement method, psychologists consider two general dimensions:
Reliability and Validity
Define reliability
Reliability refers to the consistency of a measure.
Psychologists consider three types of consistency:
- Over time (test-retest reliability),
- Across items (internal consistency)
- Across different researchers (inter-rater reliability).
Describe Test-Retest Reliability
When researchers measure a construct that they assume to be consistent across time, then the scores they obtain should also be consistent across time.
Test-retest reliability is the extent to which this is actually the case.
EXAMPLE
IQ - intelligence is generally thought to be consistent across time. A person who is highly intelligent today will be highly intelligent next week.
Assessing test-retest reliability requires using the measure on a group of people at ONE TIME,
Using it again on the same group of people at aLATER TIME, and then looking at the test-retest correlation between the two sets of scores.
This is typically done by graphing the data in a SCATTERPLOT and computing the** correlation coefficient**. Figure 4.2 shows the correlation between two sets of scores of several university students on the Rosenberg Self-Esteem Scale, administered two times, a week apart.
The correlation coefficient for these data is +.95. In general, a test-retest correlation of +.80 or greater is considered to indicate good reliability.
Again, high test-retest correlations make sense when the construct being measured is assumed to be consistent over time, which is the case for intelligence, self-esteem, and the Big Five personality dimensions.
But other constructs are not assumed to be stable over time. The very nature of mood, for example, is that it changes. So a measure of mood that produced a low test-retest correlation over a period of a month would not be a cause for concern.
DESCRIBE Internal Consistency
Internal Consistency = the consistency of people’s responses across the items on a multiple-item measure
In general, all the items on such measures are supposed to REFLECT the same underlying construct, so people’s scores on those items SHOULD BE CORRELATED with each other.
EXAMPLE
On the Rosenberg Self-Esteem Scale
People who AGREE that they are a person of worth should tend ====== AGREE that they have a number of good qualities.
No correlation = Not the same underlying construct
EXAMPLE - people might make a series of bets in a simulated game of roulette
—– as a measure of their level of RISK SEEKING.
This measure would be internally consistent to the extent that individual participants’ bets were consistently high or low across trials.
DESCRIBE a way to APPROACH Internal Consistency
Internal consistency can only be assessed by collecting and analyzing data.
One approach is to look at a split-half correlation.
This involves splitting the items into TWO sets,
Then a score is computed for each set of items, and === the RELATIONSHIP between the two sets of scores is EXAMINED.
A split-half correlation of +.80 or greater
==== GOOD internal consistency.
Perhaps the most common measure of internal consistency used by researchers in psychology is a statistic called Cronbach’s α (the Greek letter alpha).
Conceptually, α is the mean of all possible split-half correlations for a set of items.
For example, there are 252 ways to split a set of 10 items into two sets of five. Cronbach’s α would be the mean of the 252 split-half correlations. Note that this is not how α is actually computed, but it is a correct way of interpreting the meaning of this statistic. Again, a value of +.80 or greater is generally taken to indicate good internal consistency.
Describe Interrater Reliability
Interrater Reliability
Many behavioral measures involve significant judgment on the part of an OBSERVER OR RATER.
Inter-rater reliability is the extent to which DIFFERENT observers are consistent in their judgments.
EXAMPLE
- Measuring university students’ social skills
- You could make video recordings of them as they interacted with another student whom they are meeting for the first time.
- Then you could have two or more observers watch the videos and rate each student’s level of social skills.
- To the extent that each participant does, in fact, have some level of social skills that can be detected by an attentive observer, different observers’ ratings should be highly correlated with each other.
Inter-rater reliability would also have been measured in Bandura’s Bobo doll study.
In this case, the observers’ ratings of HOW MANY ACTS OF AGGRESSION a particular child committed while playing with the Bobo doll should have been highly positively correlated.
Interrater reliability is often assessed using Cronbach’s α when the judgments are quantitative
OR
an analogous statistic called Cohen’s κ (the Greek letter kappa) when they are categorical.
Describe Validity
Validity is the EXTENT to which the scores from a measure represent the variable they are intended to.
Good VARIABILITY = Good test-retest reliability (Hig positive correlation) and internal consistency (A split-half correlation of +.80 or greater)
HOWEVER a measure can be extremely reliable but have no validity whatsoever.
EXAMPLE
ABSURD: Index finger length REFLECTS Self-esteem
imagine someone who believes that people’s index finger length reflects. STRONG retest reliability BUT NO VALIDITY…
THREE MEASURES OF VALIDITY:
Face validity
Content validity
Criterion validity.