general procedures psychologists use for gathering and interpreting data
Define theory as it relates to research methods.
organized, testable explanation of phenomena
Other researchers must be able to replicate the results of an experiment to validate its conclusions.
What is replication?
obtaining similar results to a previous study using the same methods
What is hindsight bias?
explaining why something happened after it has occured
What is a controlled experiment?
researchers systematically manipulate a variable and observe the response in a laboratory
prediction of how two or more factors are related
How do researchers specifically define what variables mean?
Researchers use operational definitions to precisely describe variables in relation to their study. For example, "effectiveness of studying" can be operationally defined with a test score.
What is the difference between an independent variable and a dependent variable in an experiment?
The factor being manipulated is the independent variable. The factor being measured is the dependent variable.
If we test the hypothesis that students who Brainscape to study, rather than simple flash cards, will learn more (as measured by higher test scores), then what is the independent variable? What is the dependent variable?
independent: method of studying (Brainscape versus regular flashcards)
dependent: amount learned, as measured by their test scores
Define population as it relates to research methods.
all the individuals to which the study applies
Define sample as it relates to research methods.
subgroup of a population that constitutes participants of a study
What type of sample should be used in research?
Larger sample sizes are ideal because they are the most representative of the population.
The amount of difference between the sample and population is called __________.
Define random selection as it relates to research methods.
every individual from a population has an equal chance of being chosen for the sample
Which individuals are in the experimental group?
subjects who receive the treatment or manipulation of the independent variable
Which individuals are in the control group?
subjects who do not receive any treatment or manipulation
Subjects who receive the treatment are part of the __________, while those who do not receive the treatment belong to the __________.
experimental group; control group
What type of experimental design uses experimental and control groups?
A between-subjects design uses an experimental group and a control group to compare the effect of the independent variable.
Which process is used to try to ensure there are no preexisting differences between the control group and the experimental group?
Random assignment is used to assign the sample participants into groups (e.g., experimental drug or placebo).
Random assignment means neither the experimentor nor the participants decide in which group the participants will be, and each participant has an equal chance of being assigned to a given study groups (e.g., treatment vs. placebo).
any difference between the experimental group and the control group, besides the effect of the independent variable
a.k.a. third variable
makes the phenomenon at hand even more difficult to study because of complex interaction effects
List four types of confounding variables.
lack of counterbalancing
Define experimenter bias as it relates to confounding variables.
Experimenter bias occurs when a researcher's expectations or preferences about the results of the study influence the experiment.
Define demand characteristics as they relate to confounding variables.
clues the participants discover about the intention of the study that alter their responses
Define placebo effect as it relates to confounding variables.
responding to an inactive drug with a change in behavior because the subject believes it contains the active ingredient
What is the Hawthorne effect?
individuals who are being experimented on behave differently than in their everyday life
What type of experimental design uses each participant as his/her own control?
A within-subjects design exposes each participant to the treatment and compares their pre-test and post-test results. This design can also compare the results of two different treatments administered.
What is a single-blind procedure?
research design in which the subjects are unaware if they are in the control or experimental group
What is a double-blind procedure?
research design in which neither the experimenter nor the subjects are aware who is in the control or experimental group
Single-blind procedures aim to eliminate the effects of __________, while double-blind procedures use a third party researcher to omit the effects of __________.
demand characteristics; experimenter bias
How are quasi-experiments different from controlled experiments?
Random assignment is not possible in quasi-experiments.
What types of research are considered quasi-experiments?
Differences in behavior between:
males and females
various age groups
students in different classes
establishes a relationship between two variables
does not determine cause and effect
used to make predictions and generate future research
List three methods of data collection
- naturalistic observation
Which two conditions must be met for an experiment to be considered a true experiment?
- the researcher manipulates the independent variable
- all participants are randomly assigned to the experimental and control condition
So, for instance, a study that compares how men versus women do on a given task would not be a true experiment because it is not possible to assign people to group (gender). (This example would be a quasiexperiment.)
Define naturalistic observation as it relates to correlational research.
Naturalistic observation consists of field observation of naturally occuring behavior, such as the way students behave in the classroom. There is no manipulation of variables.
What are surveys and why are they not always accurate?
type of correlational research
questionnaires and interviews given to a large group of people about their thoughts or behavior
individuals aim to be politically correct and socially accepted, leading them to give false answers
Define tests as they relate to correlational research.
research method that measures individual traits at a specific time and place
__________ studies start by looking at an effect and then attempt to determine the cause.
Ex post facto
What is the difference between the reliability and validity of a test?
•Reliable – consistent
When administered properly, does a test give similar results when used on different occasions?
•Valid – useful, meaningful
Does it measure what it claims to measure?
In order to be valid, a measure must be reliable. However, a measure can be reliable without being valid. For instance, imagine a scale that always reads 212 pounds, no matter what the weight is of the person who stands on it. That scale would be a reliable measure, but not a valid measure.
What is a case study?
detailed examination of one person or a small group
beneficial for understanding rare and complex phenomena in clinical research
not always representative of the larger population
- determine cause and effect relationship between variables
- control over confounding variables
- no real-world generalizability
- easy to administer surveys or tests
- minimal time needed
- substantial real-world generalizability
- no control over confounding variables
- skewed or biased results
- establishes a relationship, not causation
analysis of numerical data regarding representative samples
1) __________ data include measurements, such as scores on the Wisconsin Card Sorting Task (behavioral example) or scores on the Magical Ideation Scale (self report example), that can be readily expressed using numbers.
2) __________ data, such as clinical interviews, can be very descriptive and rich, but are challenging and ambiguous to interpret.
What are the four scales of measurement?
Data that are categorical: Numbers have no meaning except for convenience as labels.
Hair Color (possibly coded red = 1; grey = 2; black = 3; brown = 4; blond = 5...)
Political Party (possibly coded Democrat = 1; Republican = 2; Independent/Other =3)
Gender (Male = 1; Female = 2; Prefer not to reply = 3).
numbers are used as ranks
The runner who wins the race is scored as 1, the runner who comes in second is scored as 2, the third is scored as 3, and so on.
numbers that have a meaningful difference between them
Temperature: The difference between 10°F and 20°F is the same as between 30°F and 40°F.
numbers that have a meaningful ratio between them on a scale with a real zero point
Weight and height: If you weight zero pounds, you have no weight. 100 pounds is twice as heavy as 50 pounds.
Would temperature of Celcius and Farenheit be measured on an interval scale or a ratio scale?
If the temperature is 0°F, there is not "no temperature." There is not a meaningful ratio between values. 100°F is not twice as hot as 50°F.
What are descriptive statistics?
numbers that summarize a set of research data from a sample
an orderly arrangement of scores indicating the frequency of each score
What is the difference between a histogram and a frequency polygon?
A histogram is a bar graph and a frequency polygon is a line graph or a bell curve.
Measures of central tendency describe the most typical scores for a set of research data.
most frequently occurring score in the data set
the middle score when the data is ordered by size
arithmetic average of the scores in the data set
If two scores appear most frequently, the distribution is __________, and if there are three or more appearing most frequently, it is __________.
Which measure of central tendency is the most representative? The least representative?
mean is usually most representative, unless there are extreme outliers that pull the mean in a particular direction
median is less sensitive to outliers, but is a weak statistic
mode is the least representative
a bell-shaped, symmetrical curve that represents data about many characteristics, including the distribution of many human characteristics
In a normal distribution, approximately two thirds of the population will be within plus or minus one standard deviation of the norm (mean). Approximately 95% of the population will be within plus or minus two standard deviations of the mean. Over 99% of the population will fall within plus or minus three standard deviations of the mean.
When most of the scores are compacted on one side of the bell curve, the distribution is said to be __________.
Positively skewed distributions include a lot of small values and negatively skewed distributions include a lot of large values.
measures of variablity
Measures of variability describe the dispersion of scores for a set of research data.
- standard deviation
difference between the largest score and the smallest score
What do variance and standard deviation measure?
average difference between each score and the mean of the data set
Taller, narrow curves have less variance than short, wider curves.
What is a z score (a.k.a. standard score)?
allows for comparison between different scales
subtract mean from each score and divide by standard deviation
mean has a z score of zero
percentage of scores at or below a particular score between 1 and 99
If you are in the 70th percentile, 70% of the scores are the same as or below yours.
Pearson correlation coefficient
statistical linear measure of the relationship between two sets of data
varies from -1 to +1
helps to make predictions about variables
perfect positive correlation
perfect negative correlation
r = +1
direct relationship: as one variable increases or decreases, the other does the same
r = 0
r = -1
inverse relationship: as one variable increases or decreases, the other does the opposite
direct relationship: as one variable increases or decreases, the other does the same
inverse relationship: as one variable increases or decreases, the other does the opposite
What type of graph plots single points to show the strength and direction of correlations?
What is the term for the line on a scatterplot that follows the trend of the points?
line of best fit or regression line
What is the difference between a null and an alternative hypothesis?
Null hypotheses state that a treatment had no effect, while alternative hypotheses state the treatment did have an effect in the experiment.
What is the difference between a Type I and Type II error?
Type I errors, or false positives, occur if the researcher rejects a true null hypothesis. Type II errors, or false negatives, occur if the researcher fails to reject a false null hypothesis.
What is a p value?
The p value lets you know if the finding is statistically significant, i.e., the likelihood of the findings being the result of chance. The lower the p score, the less likely it is that the findings are due to chance.
In order for a finding to be considered statistically significant, the p score must be less than or equal to .05; in other words, a %5 or less likelihood that the finding is due to chance.
When is a finding statistically significant?
In psychology, a finding is considered statistically significant if the probability (alpha) that the finding is due to chance is less than 1 in 20 (p is less than or equal to 0.05)
What method statistically combines the results of several research studies to reach a conclusion?
Why did the American Psychological Association (APA) implement ethical guidelines?
Guidelines were set in place in the late 20th century to stress responsibility and morality in research and clinical practice
Dangerous and inhumane experiments such as Harlow's rhesus monkeys, Zimbardo's prison role-playing, and Milgram's shock test led to the implementation of rules
What are the purposes of an Institutional Review Board (IRB)?
- approve research being conducted at their particular institution
- require participants give informed consent after hearing the risks and procedures
- require debriefing of participants afterward with results of research
- require humane and ethical treatment of animal and human subjects
__________ psychology is practical and designed for real world application, while __________ psychology is focused on research of fundamental principles and theories.
Who founded the first psychology research lab?
_______ was one of the first psychologists to demonstrate that one could study psychological processes using experimental psychology.
Describe the work of Oswald Kulpe.
Kulpe was one of the earliest experimental psychologists who performed numerous experiments to prove his "imageless thought" to try and combat Titchener's work and prove that there were some thoughts that did not have images to be analyzed.
Who was the first psychologist to introduce mental testing to the United States?
James McKeen Cattell
Who created the first intelligence test and what was its initial purpose?
The first intelligence test was created by Simon and Binet in 1905 for the purposes of ranking the intelligence of French children to select for mentally retarded children.
______ was a term developed by William Stern, which describes the ratio between someone's chronological and his/her mental age.
Intelligence quotient (IQ)
Who authored the Stanford-Binet Intelligence test?
If I were to test a population of people taking care to sample a proportionate amount to the actual composition of the group, which kind of sampling would I be using?
stratified random sampling
If I know something may be a confounding factor, and I create pairs of participants based on similar levels of this factor to eliminate its effect, this is called_____?
This is an experimental technique in which we make sure both the experimental and control group will experience both levels of the independent variable, just at different times.
Mary designed an experiment in which the groups were not randomly assigned and so the control and experimental groups were not the same, what kind of group design is this?
nonequivalent group design
If the results of my experiment are applicable to the entire population, my experiment is said to have __________ .
If I make inferences from a data set that go beyond the actual data points, this would be _________.
An _______ is an extremely large or extremely small number that affects the measure of central tendency such that it is no longer accurately representative of the sample.
What are the properties of a normal distribution?
A normal distribution is represented by a normal curve. The scores will exist such that 68% of the scores are within 1 standard deviation of the mean and 96% of the scores will fall within 2 standard deviations of the mean.
Similar to a Z-score, a T-score sets up a curve such that the mean is always 50 and each standard deviation is 10. You simply convert each number to the T-score value for easy comparison and analysis.
What is the difference between a positive correlation and a negative correlation?
A positive correlation is one in which if one value increases, the other value will increase. A negative correlation is one in which if one value decreases, the other value increases.
What does a scatterplot look like?
The ________is the line one draws on the scatterplot to best represent the relationship between the two values.
line of best fit
Factor analysis uses multiple sets of correlations to see which variable correlations cluster together to create a factor or group of variables which are presumed to be measuring the same value, based on their high rates of correlation.
Describe the difference between the null hypothesis and the research hypothesis.
The null hypothesis states that there is no relationship between the two values tested. The research hypothesis states that there is a statistically significant relationship between the two values in our experiment.
The _____ is the level of certainty we wish to have that there is an actual relationship between the two values in an experiment.
This is usually set at a 1 in 20 chance or an alpha level of 0.05.
Sandy rejected the null hypothesis and believed there was a relationship between phone numbers and math ability, when in reality, it was proved that there was not a relationship. What kind of statistical error did Sandy commit?
type I error
Bobby decided to accept the null hypothesis and decided there was no relationship between IQ and a healthy diet, even though there statistically was proof that there was a relationship. What kind of error did he commit?
type II error
The probablity of making a type II error is measured by the ________ .
Which statistical test should I use if I am trying to compare three different groups or more?
analysis of variance (ANOVA)
If I only have two groups to compare, which statistical test should I use?
Chi-square tests are used for data that is _______ rather than numerical.
What is the most common way to perform a meta-analysis?
gather as many sources about the topic as possible, examine for multiple themes, publish the results of the meta-analysis for the larger community
A test in which one's score is compared to that of all of the other test-takers, such as "Brian's score is in the 66th percentile."
__________, rather than norm-referenced testing, determines how much information the test-taker knows about a certain subject, such as a history final.
What are three things a test must have to be reliable?
Split-half reliability, alternate-form, and test-retest method are three ways of establishing ________.
a test's reliability
how much a test measures what it claims to measure
What would be the best way to test content validity?
Examining the actual content of the test to make sure that it accurately and completely meets all of the facets of the construct that are being tested.
What does the face validity of the test show?
That the questions on the test will be asking questions that appear to ask questions about the subject of the test; this is the least objective form of validity.
What would be one way to to determine the criterion validity of the SAT?
determine whether high scores on the SAT predict high GPAs in college
how well the test addresses what you were trying to measure
Name two kinds of construct validity.
What is the difference between aptitude and achievement tests?
Someone's score on an aptitude test predicts future ability with training and growth, someone's score on an achievement test shows how much s/he knows right now.
What would a personality inventory be likely to contain?
statements about personality
questions that assess likes and dislikes
The ________ is an intelligence test specially designed for children.
Wechsler Inteligence Scale for Children (WISC)
What are some special features of the Minnesota Multiphasic Personality Inventory?
It has 10 clinical subscale scores, including a score for carelessness, faking, and distorting.
empirical criterion-keying approach
This is a process for creating test questions in which the developers choose from thousands of test questions placed in groups to differentiate between sick and healthy people with a variety of scores.
Which test is the California Personality Inventory the most like and why?
The CPI is most like the MMPI, but is especially intended for test takers ages 13 to young adult.
What is a projective test?
a test with ambiguous stimuli that has a subjective scoring system because there are limitless responses that the patient can give to the presented stimuli.
Projective tests are highly controversial. Critics point out research demonstrating projective tests' lack of reliability and validity. Yet projective tests remain in use in clinical settings and used in legal and clinical decision making.
The Rorschach Ink Blot Test is a widely used projective test. Why is using the Rorshach Ink Blot Test a problematic practice?
Projective tests are highly controversial. Unfortunately, projective tests, such as the Rorschach, have been and continue to be used in making legal determinations, (e.g., custody) despite evidence that such tests lack validity for assessing mental health (e.g., the Rorshach overpathologizes, frequently mistakenly identifying people as having mental illness when they do not.)
For an in-depth discussion of the problems with using the Rorschach Ink Blot Test to assess mental health, please read: http://www.csicop.org/si/show/rorschach_inkblot_test_fortune_tellers_and_cold_reading/
To view the ink blot images, please see: https://en.wikipedia.org/wiki/Rorschach_test
The ________ is a projective test in which the patient is given a series of pictures of scenes involving different people and is instructed to tell a spontaneous story about each scene.
Thematic Apperception Test (TAT)
The TAT was developed at Harvard in the 1930s by Murray and Morgan. Murray and Morgan used ambiguous images selected from magazines. Participants construct stories basd on individually-presented images. The test was dveloped to assess personality.
In addition to personality, the TAT has been (and contiinues to be) used to assess personal growth and mental health. However, the TAT, like other projective tests, lacks both reliability and validity. Including the TAT in a test battery can, in some circumstances, introduce enough error that it reduce the battery's overall reliability and validity.
Which projective test was especially designed for children?
Rotter Incomplete Sentences Blank
forty sentence stems that the test-taker fills out with whatever comes to mind
What are some advantages of using projective tests?
What are some disadvantages of using projective tests?
–Good for breaking the ice
–Some skilled clinicians may be able to use them to get information not captured in other types of tests. (maybe)
–Validity evidence is scarce; psychologists cannot be sure about what responses mean.
–Expensive and time-consuming
–Other less expensive tests work as well or better.
What is the theme of the Strong-Campbell Interest Inventory?
It is a career placement test based around the test-taker's interests.
What were Holland's six types of interests and occupational themes?
What did Arthur Jensen propose?
That racial differences in IQ are genetically related.
Important critique: Jensen did not adequately address other factors, including the lack of culture-fair tests, epigenetic effects, and the impact of socioeconomic status (SES) on educational opportunities and achievement. In addition, critics of Jensen's perspective note that he ignored research that was inconsistent with his hypotheses and Jensen misunderstood the nuances of heritability, resulting in Jensen making deeply flawed conclusions.
What are four factors that can undermine data quality?
•Low precision of measurement
•The state of the participant
•The state of the experimenter
•Variation in the environment
What is an a priori hypothesis?
An a priori hypothesis occurs if one has a predicted hypothesis about a relationship (and the direction of relationship) between variables prior to collecting data.
Findings based on an a priori hypothesis are considered stronger/more persuasive than findings based on a post hoc (after the fact) analysis. This is because a finding based on an a priori hypothesis is less likely to be the result of chance!
What are some strategies to help improve the quality of data you collect?
•Be careful! •Use a standardized procedure or protocol
•Measure something that is important and engages participants
•When using multiple measures, be aware of order effects (Does doing A before asking B influence the answers for B?)
•Note anything unusual about the data collection. For instance, if a fire alarm goes off during data collection,or if the participant reports being in an unusual mood or unwell, make a note of it. Similarly, if you were colecting data on mood states the day after 9/11/2001, your data would likely have been impacted by participants' reactions to current events.
Name three things that can introduce error into our research:
Culture, Biases, and Situation strongly influence our Observations, Responses, and Behaviors!!
Here is a helpful way of thinking about this issue: “…the assumptions you end up making as you try to bridge the imaginative gap are, of course, your own, and the most misleading assumptions are the ones you don't even know you're making.”
Douglas Adams & Mark Carwardine, "On Meeting a Gorilla." from Last Chance to See (writing about when they went to see gorillas in the wild)
Try, in as much as you are able, to be aware of the effects of these on you!
What is the primary aim of statistics?
To rule out randomness or chance as an explanation.
Human brains have evolved to detect patterns. A by-product of being very good at pattern detection is that human beings are prone to sometimes perceive patterns, even when there are no patterns.
What is measurement error?
Measurement error is a threat to research validity; it is the cumulative effect of extraneous variables.
Measurement error often is referred to as noise in the data.
Measurement error also is referred to as error variance.
What are four different types of data frequently used in psychological research?
Self-Report the participants perceptions of himself or herself (e.g., data collcted from surveys or interviews)
Life Outcomes real life verifiable facts (e.g., criminal record/history of incarceration)
Behavioral Observations observing a person's behavior (e.g., how a participant performs on a task, such as a Stroop test or an IQ test)
Informant asking someone who knows the person to share their perceptions (e.g., asking a parent to describe his or her child's strengths and interests)
Shows vs No Shows (and others who refuse to participate)
In voluntary research, typically some potential participants refuse to participate. Other potential participants agree to participate then do not do so (no-shows).
Why is this a problem for voluntary research?
No-shows do not provide data, so they are not represented in the data and subsequent findings.
As a group, non-participaters/no-shows probably meaningfully differ from participants. There may be relevant, important personality or demographic differences between these groups.
Thus, no-shows are a threat to study validity and the generalizability of findings.
(This is not an issue in animal research; lab mice do not have the option of deciding not to participate!)
What are “WEIRD” countries; why is this an issue?
Western, Educated, Industrialized, Rich, and Democratic.
Most psychological research is conducted in WEIRD countries (such as the U.S., Canada, and the U.K.), so findings from such research may or may not generalize to other, non-WEIRD populations.
What is th law of large numbers?
(Unless there is significant sampling error,) the larger the sample size, the more reliable and valid the findings!
What is a Type I error?
What is a Type II error?
Type I error-saying that your results are significant when they are not. ("It works!", when, alas, in actuality the new treatment does not help). False positive
Type II error-saying your results are not significant when they actually are. ("It's worthless!", when, alas, in actuality the new treatment could really help). Falso negative
Psychological research tends to focus on working to avoid making Type I errors, although both are harmful!
What is a response set or response bias?
Why are response sets a problem for researchers?
A response set is the tendency for a participant to have a pattern in how she or he responds to questionnaire items or interview questions, and this pattern or tendency occurs independently of the content of the items. Response sets are a problem because they introduce systematic bias/error into the data set.
What are examples? Some participants tend to say yes to researchers conducting an interview (an acquiescence bias), even when the answer is unknown, ambiguous, or even no. Other participants tend to give extreme answers. In some instances, cultural differences can lead to response sets.
What is an effect size?
An effect size is a measure of strength; it conveys the strength of the relationship/finding.
Effect sizes can be small, moderate, or large.
One commonly used and good measure of effect size is Cohen's d
What does it mean to have multiple outcome measures?
Why is it important, when possible, to design studies so that they have multiple outcome measures?
It means having more than one way to measure a dependant variable!
As long as all of the measures are valid, using multiple measures sunstantially improves your ability to detect effects/differences.
If you want to test an intervention to treat post partum depression, then you could use multiple measures, such as the BDI, a rating from a family member, and a structured clinical interview. If there is any problem collecting or interpreting a measure, having multiple outcome measures reduces the problem's impact. E.g,, what if you used only the rating from family members, and it turned out that not all of the participants have a relative close enough to them to provide a valid rating?
What is a p value?
What is an effect size?
Whereas a p value conveys the likelihood that a finding is chance, (i.e., how likely the finding is real,) an effect size conveys how big or strong that difference between the groups is.
What are some arguments against using deception in psychological experiments?
–Informed consent for deception is not possible.
–When does the deception stop?
–Harms the credibility of psychology
Why use deception in some psychological reseearch?
What safeguards are there for participants?
Sometimes researchers use deception while collecting data. Usually deception is reserved for when being straightforward could meaningfully bias/change the data.
Use of deception must be pre-approved by the IRB. The potential harm must not outway the anticipated benefits, and particpants must be debriefed afterwards.
What is a standard deviation?
The standard deviation is a measure of how closely the data in a sample or population cluster around the mean.
The standard deviation is equal to the square root of the variance.
For a more in-depth explanation of standard deviations, see: