Reliability, validity and threats to validity Flashcards
(33 cards)
Reliability
A measure of consistency.
How replicable is it?
If a psychological test has a high degree of reliability when it’s replicated it will produce similar results.
Internal reliability
The extent to which something is consistent within itself.
For example:
IQ tests should be measuring the same thing.
External reliability
How much a measure differs from one occasion to another.
For example:
A questionnaire to access the personality of an individual should return similar results over any number of occasions.
Split - half method
This involves splitting the test answers from a participant in half (e.g., compare answers to odd number questions with even number answers) and seeing whether the individual got the same or similar scores on the two halves.
If so, internal reliability is high; if not, it’s low and individual questions would need to be redesigned to ensure all questions were consistently testing the aims of the study.
Compare the two scores using a correlation coefficient.
Test - retest reliability
A method of accessing the reliability of a questionnaire or psychological test by assessing the same person on two separate occasions.
There must be sufficient time between the test and retest to ensure that the recall of responses is not from memory but not long enough for attitude changes to occur.
The two tests will be correlated to make sure they are similar and if correlation is significant and positive then the reliability of the measure is considered to be assumed as good.
Inter - observer / - rater reliability
An issue relevant to observational research.
Observers may have differing perspectives which may mean results are unreliable and show subjectivity bias.
A small scale trial run (pilot study) of the observation to check observers are applying the behavioural categories in the same way.
All watch the same the same event but record the data independently.
(Total number of agreements) / (total number of observation) >+.80.
Improving reliability - questionnaires
Correlation of data must exceed +.80 otherwise some items will be deselected or need to be rewritten.
Complex or ambiguous questions may be misinterpreted so many need to be simplified or changed from open to closed questions.
Improving reliability - interviews
Either to use the same interviewer or provide comprehensive training to ensure similarity and not have ambiguous or leading questions.
May be avoided if the interview is structured.
Improving reliability - lab experiments
Researcher must have strict control over the conditions and the precision of replications of method rather than reliability of findings.
Findings would only be reliable if tested in slightly different conditions each timed they were tested.
Improving reliability - observations
Behavioural categories must be operationalised - measurable and self - evident.
Categories should not overlap and not be ambiguous and all possible behaviours should be covered by the checklist.
Validity
The extent to which the observed effect is genuine.
Does it measure what it sets out to measure and can it be generalised to other settings?
Internal validity
Refers to whether the effects observed are due to the manipulation of the IV and not some other factor.
A major threat to internal validity is demand characteristics.
Validity - example
Some researchers question Milgram’s obedience study for internal validity claiming that the participants were merely playing along and that they knew there was a good probability that the shocks administered were not real so they were merely responding to the demand of the situation which was reinforced by the repetition of phrases by the authority figure to continue.
External validity
Factors outside of the investigation.
External validity - example
Generalising to other situations, populations of people and other eras.
Ecological validity
Generalising findings from one setting to others.
In particular, from a laboratory setting into “everyday life”.
The task that is used to measure the dependent variable in an experiment is not replicable of everyday life.
It lacks or has “low mundane realism” and it is this that lowers ecological validity.
Face validity
A very basic form of validity in which validity is determined by seeing if the scale or measure appears to measure what it’s supposed to measure or by passing it to an expert.
Concurrent validity
The particular score or test are very close to or match those results of another recognised and well - established test.
Concurrent validity - example
A new IQ test scores may be measured against a well- established test to check concurrent validity. Close agreement between the two sets of data would indicate the new test has a high level of validity. Correlation of validity must be +.80 at least or more.
Content validity
Aims to demonstrate that the content of a test represents the area of interest.
Construct validity
The extent to which performance on the test measures an identified underlying construct.
Improving validity - experimental research
Use of a control group.
Standardisation of procedures to minimise impact of participant reactivity and investigator effects.
The use of single-blind (participants not made aware of the aims of the study) and double-blind procedures (a third party conducts the study without knowing the main purpose).
Improving validity - questionnaires
Incorporation of lie scale within the questions to control for effects of social desirability bias.
Validity further enhanced by ensuring all data remains anonymous.
Improving validity - observations
Higher ecological validity when minimal intervention from the observer.
Covert observation likely to have higher validity due to factor of observed behaviour more likely to be natural and authentic.
Behavioural conditions that are too broad, ambiguous or where there is crossover are likely to have a negative impact on the validity of the data collected.