Flashcards in Reliability and Validity Deck (39):
What is reliability
Reliability refers to the consistency/repeatability of results of a measurement. “How” reliable something is relative and depends on the situation.
Types of reliability
• Observers: Inter-Observer reliability
• Observations: Internal (Split-half) reliability
• Occasions: Test-retest reliability
What is inter-observer reliability?
Inter-observer reliability is the degree to which observers agree upon an observation or judgement
• Can be frequency or categorical judgement
How is inter-observer reliability tested?
Measure by looking at shared relationships between observers (i.e. correlation). Cohen’s kappa, Pearson’s correlation coefficient (r), etc. depending on whether continuous or discrete
What is internal reliability?
Internal reliability is the degree to which specific items/observations in a multiple item measure behave the same way. I.e. are they measuring the same thing?
How is internal reliability tested?
Tested by dividing test into 2 halves, then looking at correlation between them. If it has high internal reliability, an individual's performance on the first half should correlated with the second half.
What is test-retest reliability?
Test-retest reliability is the extent to which scores on a test/measure remain stable over time.
What is validity?
Validity refers to how well a measure or an
operationalised variable corresponds to what it is supposed to measure/represent.
Types of validity
• Internal validity
• External validity
• Population validity
• Ecological validity
• Construct validity
• Content validity
• Criterion validity
• How convincing is the evidence for causality in a study/series of experiments?
• i.e. how strong is the inference that the independent
variable and the dependent variable are causally
J.S. Mill: 3 requirements to establish causality
• Temporal Sequence
• Eliminating confounds (rival explanations/hypotheses)
• Third-variable problem
Another threat to internal validity. Most things have multiple causes .
How well does a causal relationship
hold across different people, settings,
treatment variables, measurements
Two types: Ecological and population validity.
• Making cross-cultural inferences from Western, Educated, Industrialized, Rich, Democratic samples?
• Differences in tasks ranging from motivation, reasoning and even visual perception
• Muller-Lyer Illusion: Americans vs. the San people of the Kalahari
How well do results of laboratory experiments generalise to real-life settings?
• E.g. aggression studies in the lab vs. in real life
• Bandura (1961, 1963): Bobo doll experiment
How well do your operationalized variables
(independent and/or dependent) represent the
hypothetical or abstract variables of interest
Degree to which the items or tasks adequately sample the
• i.e. how well does a measure/task represent all the facets of a construct
To what extent can a procedure be used to infer or predict some criterion (outcome)
Two types: concurrent and predictive
A type of criterion validity - to what extent can a procedure be used to infer some criterion
A type of criterion validity - to what extent can a procedure be used to predict some criterion
When do person confounds occur?
Person confounds occur when a variable seems to cause something because people who are high or low on this variable also happen to be high or low on some individual difference variable (e.g. demographics characteristic) that is associated with the outcome variable of interest.
When do operational confounds occur?
Operational confounds occur when a measure designed to assess a specific construct such as depression, memory, or foot size inadvertently measures something else as well.
What is a confound?
A threat to internal validity, that undermines a causal explanation
What is an artifact?
A threat to external validity, a by-product of testing procedure or sample that biases all results.
Unlike confounds, artifacts stay constant and are present in all groups
What are some artifacts?
Hawthorne effect, history effect, and selection bias/non-response bias
What is the Hawthorne effect?
That mere act of measurement changes the nature of something
- a form of participant reaction bias
Advantages to Surveys?
Quick and efficient
Obtain public opinion almost immediately
Easy to use
Limitations of surveys?
Haphazard samples (availability and volunteer bias)
Question wording effects
Advantages of naturalistic observation?
Allows the study of issues not amenable to experimentation
Useful in the initial stages of investigation
Limitations of naturalistic observation?
Low internal validity
Hawthorne effect - threat to external validity
Advantages of participant observation?
It can be used in situations that otherwise might
be closed to scientific investigation
Limitations of participant observation?
The dual role of the researcher maximizes the chances for the observer to lose objectivity and allow personal biases to enter into the description
Advantages of longitudinal studies?
• Genuine changes and stability of some
• Major points of change observed
• Temporal sequence
• Minimise age-cohort effects
Disadvantages of longitudinal studies?
• Time consuming and expensive
• Participant attrition – threat to validity
Advantages of cross-sectional studies?
• Relatively inexpensive and less time consuming • Low attrition rate
Disadvantages of cross-sectional studies?
• Cannot observe changes in individuals
• Insensitive to abrupt changes
• Age-Cohort effects
What are age-cohort effects in cross-sectional studies?
Cohort differences are confounded with age differences.
For example, measuring computer skills at different age points - Different exposure to computers
What is experimental mortality also known as? What's an example of a situation in which it is a problem?
Experimental mortality is also known as heterogenous attrition.
For example, in study on antipsychotics and schizophrenia, experimental group receive antipsychotics which produce side effects and also reduce potentially pleasurable symptoms such as schizophrenia. Hence, more drop out of the experimental group.
This becomes a threat to internal validity.