Measurement, Methodology and Other: Measurement and Methodology Flashcards

Question

What is the Hawthorne effect?

Answer 1

Individuals who are being experimented on **behave differently** than in their everyday life.

Answer 2

A **within-subjects design** exposes each participant to the treatment and compares their pre-test and post-test results. This design can also compare the results of two different treatments administered.

Answer 3

Research design in which the subjects are **unaware** if they are in the control or experimental group.

Answer 4

Research design in which **neither the experimenter nor the subjects** are aware who is in the control or experimental group.

Answer 5

demand characteristics

Answer 6

experimenter bias

Answer 7

Random assignment is not possible in quasi-experiments.

Answer 8

Differences in behavior between: * males and females * various age groups * students in different classes

Answer 9

* establishes a relationship between two variables * does not determine cause and effect * used to make predictions and generate future research

Answer 10

1. naturalistic observation 2. surveys 3. tests

Answer 11

1. The researcher manipulates the independent variable. 2. All participants are randomly assigned to the experimental and control condition. ## Footnote So, for instance, a study that compares how men versus women do on a given task would not be a true experiment because it is not possible to assign people to group (gender). (This example would be a quasiexperiment.)

Answer 12

It consists of **field observation** of **naturally occuring behavior**, such as the way students behave in the classroom. There is **no manipulation** of variables.

Answer 13

* It is a type of correlational research. * Questionnaires and interviews given to a large group of people about their thoughts or behavior. * Individuals aim to be politically correct and socially accepted, leading them to give false answers.

Answer 14

Research method that **measures individual traits** at a **specific time and place**.

Answer 15

Ex post facto

Answer 16

* **Reliable – consistent.** When administered properly, does a test give similar results when used on different occasions? * **Valid – useful, meaningful.** Does it measure what it claims to measure? ## Footnote In order to be valid, a measure must be reliable. However, a measure can be reliable without being valid. For instance, imagine a scale that always reads 212 pounds, no matter what the weight is of the person who stands on it. That scale would be a reliable measure, but not a valid measure.

Answer 17

* Detailed examination of one person or a small group. * Beneficial for understanding rare and complex phenomena in clinical research. * Not always representative of the larger population.

Answer 18

Strengths: * determine cause and effect relationship between variables * control over confounding variables Weaknesses: * no real-world generalizability * expensive * time-consuming

Answer 19

Strengths: * easy to administer surveys or tests * inexpensive * minimal time needed * substantial real-world generalizability Weaknesses: * no control over confounding variables * skewed or biased results * establishes a relationship, not causation

Answer 20

**Analysis of numerical data** regarding representative samples.

Answer 21

Quantitative

Answer 22

Qualitative

Answer 23

1. nominal 2. ordinal 3. interval 4. ratio

Answer 24

Data that are categorical: Numbers have **no meaning** except for convenience as labels. ## Footnote Examples: Hair Color (possibly coded red = 1; grey = 2; black = 3; brown = 4; blond = 5...) Political Party (possibly coded Democrat = 1; Republican = 2; Independent/Other =3) Gender (Male = 1; Female = 2; Prefer not to reply = 3).

Answer 25

Numbers are used as ranks. ## Footnote Examples: The runner who wins the race is scored as 1, the runner who comes in second is scored as 2, the third is scored as 3, and so on.

Answer 26

Numbers that have a meaningful difference between them. ## Footnote Example: Temperature: The difference between 10°F and 20°F is the same as between 30°F and 40°F.

Answer 27

Numbers that have a meaningful ratio between them on a scale with a real zero point. ## Footnote Example: Weight and height: If you weight zero pounds, you have no weight. 100 pounds is twice as heavy as 50 pounds.

Answer 28

interval scale ## Footnote If the temperature is 0°F, there is not "no temperature." There is not a meaningful ratio between values. 100°F is not twice as hot as 50°F.

Answer 29

Numbers that summarize a set of research data from a sample.

Answer 30

An orderly arrangement of scores indicating the frequency of each score.

Answer 31

* A histogram is a **bar graph**. * A frequency polygon is a **line graph** or a bell curve.

Answer 32

Measures of central tendency describe the most typical scores for a set of research data. 1. mode 2. median 3. mean

Answer 33

Most frequently occurring score in the data set.

Answer 34

The middle score when the data is ordered by size.

Answer 35

Arithmetic average of the scores in the data set.

Answer 36

multimodal

Answer 37

* **Mean** is usually most representative, unless there are extreme outliers that pull the mean in a particular direction. * **Median** is less sensitive to outliers, but is a weak statistic. * **Mode** is the least representative.

Answer 38

A bell-shaped, symmetrical curve that represents data about many characteristics, including the **distribution of many human characteristics**. ## Footnote In a normal distribution, approximately two thirds of the population will be within plus or minus one standard deviation of the norm (mean). Approximately 95% of the population will be within plus or minus two standard deviations of the mean. Over 99% of the population will fall within plus or minus three standard deviations of the mean.

Answer 39

skewed ## Footnote Positively skewed distributions include a lot of small values and negatively skewed distributions include a lot of large values.

Answer 40

They describe the dispersion of scores for a set of research data. 1. range 2. variance 3. standard deviation

Answer 41

Difference between the largest score and the smallest score.

Answer 42

Average difference between each score and the mean of the data set. ## Footnote Taller, narrow curves have less variance than short, wider curves.

Answer 43

* allows for comparison between different scales * subtract mean from each score and divide by standard deviation * mean has a z score of zero

Answer 44

Percentage of scores at or below a particular score between 1 and 99. ## Footnote Example: If you are in the 70th percentile, 70% of the scores are the same as or below yours.

Answer 45

* statistical linear measure of the relationship between two sets of data * varies from -1 to +1 * helps to make predictions about variables

Answer 46

1. r = +1 **direct relationship:** as one variable increases or decreases, the other does the same 2. r = 0 **no relationship** 3. r = -1 **inverse relationship:** as one variable increases or decreases, the other does the opposite

Answer 47

scatterplot

Answer 48

line of best fit or regression line

Answer 49

* **Null** hypotheses state that a treatment had **no effect**. * **Alternative** hypotheses state the treatment did **have an effect** in the experiment.

Answer 50

* **Type I** errors, or false positives, occur if the researcher **rejects a true null hypothesis**. * **Type II** errors, or false negatives, occur if the researcher **fails to reject a false null hypothesis**.

Answer 51

it lets you know if the finding is statistically significant, i.e., the likelihood of the findings being the result of chance. The lower the *p* score, the less likely it is that the findings are due to chance. ## Footnote In order for a finding to be considered statistically significant, the *p* score must be less than or equal to .05; in other words, a %5 or less likelihood that the finding is due to chance.

Answer 52

In psychology, a finding is considered statistically significant if the probability (alpha) that the finding is due to chance is less than 1 in 20 (p is less than or equal to 0.05).

Answer 53

meta-analysis

Answer 54

* Guidelines were set in place in the late 20th century to stress responsibility and morality in research and clinical practice. * Dangerous and inhumane experiments such as Harlow's rhesus monkeys, Zimbardo's prison role-playing, and Milgram's shock test led to the implementation of rules.

Answer 55

* Approve research being conducted at their particular institution. * Require participants give informed consent after hearing the risks and procedures. * Require debriefing of participants afterward with results of research. * Require humane and ethical treatment of animal and human subjects.

Answer 56

Wilhelm Wundt

Answer 57

Hermann Ebbinghaus

Answer 58

Kulpe was one of the earliest experimental psychologists who performed numerous experiments to prove his "**imageless thought**" to try and combat Titchener's work and prove that there were some thoughts that did not have images to be analyzed.

Answer 59

James McKeen Cattell

Answer 60

Created by Simon and Binet in 1905 for the purposes of ranking the intelligence of French children to select for mentally retarded children.

Answer 61

Intelligence quotient | (IQ)

Answer 62

Lewis Terman

Answer 63

stratified random sampling

Answer 64

matched-subjects design

Answer 65

This is an experimental technique in which we make sure both the **experimental and control group** will experience both levels of the independent variable, just at **different times**.

Answer 66

nonequivalent group design

Answer 67

external validity

Answer 68

inferential statistics

Answer 69

A normal distribution is represented by a **normal curve**. The scores will exist such that 68% of the scores are within 1 standard deviation of the mean and 96% of the scores will fall within 2 standard deviations of the mean.

Answer 70

Similar to a Z-score, a T-score sets up a curve such that the **mean is always 50 and each standard deviation is 10**. You simply convert each number to the T-score value for easy comparison and analysis.

Answer 71

* A **positive** correlation is one in which if **one value increases, the other value will increase**. * A **negative** correlation is one in which if **one value decreases, the other value increases**.

Answer 72

line of best fit

Answer 73

It uses multiple sets of correlations to see which variable correlations cluster together to create a factor or group of variables which are presumed to be measuring the same value, based on their high rates of correlation.

Answer 74

* The **null** hypothesis states that there is **no relationship** between the two values tested. * The **research** hypothesis states that **there is a statistically significant relationship** between the two values in our experiment.

Answer 75

alpha level ## Footnote This is usually set at a 1 in 20 chance or an alpha level of 0.05.

Answer 76

type I error

Answer 77

type II error

Answer 78

analysis of variance | (ANOVA)

Answer 79

categorical

Answer 80

Gather as many sources about the topic as possible, examine for multiple themes, publish the results of the meta-analysis for the larger community.

Answer 81

A test in which one's score is compared to that of all of the other test-takers, such as "Brian's score is in the 66th percentile."

Answer 82

Domain-referenced

Answer 83

1. dependability 2. consistency 3. repeatability

Answer 84

a test's reliability

Answer 85

How much a test measures what it claims to measure.

Answer 86

Examining the actual content of the test to make sure that it accurately and completely meets all of the facets of the construct that are being tested.

Answer 87

That the questions on the test will be asking questions that appear to ask questions about the subject of the test; this is the least objective form of validity.

Answer 88

Determine whether high scores on the SAT predict high GPAs in college.

Answer 89

How well the test addresses what you were trying to measure.

Answer 90

1. convergent validity 2. divergent validity

Answer 91

* Someone's score on an **aptitude** test predicts **future** ability with training and growth. * Someone's score on an **achievement** test shows how much s/he knows **right now**.

Answer 92

* statements about personality * questions that assess likes and dislikes * self-selected ideals

Answer 93

WISC | (Wechsler Intelligence Scale for Children)

Answer 94

It has 10 clinical subscale scores, including a score for carelessness, faking, and distorting.

Answer 95

This is a process for **creating test questions** in which the developers choose from thousands of test questions placed in groups to differentiate between sick and healthy people with a variety of scores.

Answer 96

The CPI is most like the **MMPI**, but is especially intended for test takers ages 13 to young adult.

Answer 97

A test with **ambiguous stimuli** that has a subjective scoring system because there are **limitless responses** that the patient can give to the presented stimuli. ## Footnote Projective tests are highly controversial. Critics point out research demonstrating projective tests' lack of reliability and validity. Yet projective tests remain in use in clinical settings and used in legal and clinical decision making.

Answer 98

Projective tests are highly controversial. Unfortunately, projective tests, such as the Rorschach, have been and continue to be used in making legal determinations, (e.g., custody) despite evidence that such tests lack validity for assessing mental health (e.g., the Rorshach overpathologizes, frequently mistakenly identifying people as having mental illness when they do not.) ## Footnote For an in-depth discussion of the problems with using the Rorschach Ink Blot Test to assess mental health, please read [this resource](https://skepticalinquirer.org/2003/07/the-rorschach-inkblot-test-fortune-tellers-and-cold-reading/) To view the ink blot images, please see [this resource](https://en.wikipedia.org/wiki/Rorschach\_test).

Answer 99

Thematic Apperception Test | (TAT) ## Footnote The TAT was developed at Harvard in the 1930s by Murray and Morgan. Murray and Morgan used ambiguous images selected from magazines. Participants construct stories basd on individually-presented images. The test was dveloped to assess personality. In addition to personality, the TAT has been (and contiinues to be) used to assess personal growth and mental health. However, the TAT, like other projective tests, lacks both reliability and validity. Including the TAT in a test battery can, in some circumstances, introduce enough error that it reduce the battery's overall reliability and validity.

Answer 100

blacky pictures

Answer 101

Forty sentence stems that the test-taker fills out with whatever comes to mind.

Answer 102

* Good for breaking the ice * Some skilled clinicians may be able to use them to get information not captured in other types of tests. (maybe)

Answer 103

* Validity evidence is scarce; psychologists cannot be sure about what responses mean. * Expensive and time-consuming. * Other less expensive tests work as well or better.

Answer 104

It is a **career placement** test based around the test-taker's **interests**.

Answer 105

1. realistic 2. investigative 3. artistic 4. social 5. enterprising 6. conventional

Answer 106

Racial differences in IQ are genetically related. ## Footnote **Important critique:** Jensen did not adequately address other factors, including the lack of culture-fair tests, epigenetic effects, and the impact of socioeconomic status (SES) on educational opportunities and achievement. In addition, critics of Jensen's perspective note that he ignored research that was inconsistent with his hypotheses and Jensen misunderstood the nuances of heritability, resulting in Jensen making deeply flawed conclusions.

Answer 107

* low precision of measurement * the state of the participant * the state of the experimenter * variation in the environment

Answer 108

It occurs if one has a predicted hypothesis about a relationship (and the direction of relationship) between variables prior to collecting data. ## Footnote Findings based on an a priori hypothesis are considered stronger/more persuasive than findings based on a post hoc (after the fact) analysis. This is because a finding based on an a priori hypothesis is less likely to be the result of chance.

Answer 109

* Be careful! * Use a standardized procedure or protocol * Measure something that is important and engages participants * When using multiple measures, be aware of order effects (Does doing A before asking B influence the answers for B?) * Note anything unusual about the data collection. For instance, if a fire alarm goes off during data collection,or if the participant reports being in an unusual mood or unwell, make a note of it. Similarly, if you were colecting data on mood states the day after 9/11/2001, your data would likely have been impacted by participants' reactions to current events.

Answer 110

**Culture**, **Biases**, and **Situation** strongly influence our Observations, Responses, and Behaviors. ## Footnote Here is a helpful way of thinking about this issue: “…the assumptions you end up making as you try to bridge the imaginative gap are, of course, your own, and the most misleading assumptions are the ones you don't even know you're making.” Douglas Adams & Mark Carwardine, "On Meeting a Gorilla." from _Last Chance to See_ (writing about when they went to see gorillas in the wild) Try, in as much as you are able, to be aware of the effects of these on you.

Answer 111

To rule out randomness or chance as an explanation. ## Footnote Human brains have evolved to detect patterns. A by-product of being very good at pattern detection is that human beings are prone to sometimes perceive patterns, even when there are no patterns.

Answer 112

* A threat to research validity; it is the cumulative effect of extraneous variables. * Often referred to as noise in the data and an error variance.

Answer 113

1. Self-Report 2. Life Outcomes 3. Behavioral Observations 4. Informant ## Footnote **Self**-**Report** - the participants perceptions of himself or herself (e.g., data collcted from surveys or interviews). **Life Outcomes** - real life verifiable facts (e.g., criminal record/history of incarceration). **Behavioral Observations** - observing a person's behavior (e.g., how a participant performs on a task, such as a Stroop test or an IQ test). **Informant** - asking someone who knows the person to share their perceptions (e.g., asking a parent to describe his or her child's strengths and interests).

Answer 114

No-shows do not provide data, so they are not represented in the data and subsequent findings. As a group, non-participaters/no-shows probably meaningfully differ from participants. There may be relevant, important personality or demographic differences between these groups. Thus, no-shows are a threat to study validity and the generalizability of findings. ## Footnote (This is not an issue in animal research; lab mice do not have the option of deciding not to participate.)

Answer 115

**W**estern, **E**ducated, **I**ndustrialized, **R**ich, and **D**emocratic. Most psychological research is conducted in WEIRD countries (such as the U.S., Canada, and the U.K.), so findings from such research may or may not generalize to other, non-WEIRD populations.

Answer 116

The **larger** the sample size, the **more reliable and valid** the findings, assuming there is no significant sampling error.

Answer 117

* **Type I** error occurs when a researcher **incorrectly concludes** that a result is significant when it is not (a false positive). * **Type II** error occurs when a researcher **fails to detect a significant result** that actually exists (a false negative). ## Footnote Psychological research tends to focus on working to avoid making Type I errors, although both are harmful.

Answer 118

A response set is the tendency for a participant **to have a pattern** in how she or he **responds to questionnaire items or interview questions**, and this pattern or tendency occurs independently of the content of the items. Response sets are a problem because they introduce **systematic bias/error** into the data set. ## Footnote What are examples? Some participants tend to say yes to researchers conducting an interview (an acquiescence bias), even when the answer is unknown, ambiguous, or even no. Other participants tend to give extreme answers. In some instances, cultural differences can lead to response sets.

Answer 119

It measures the **strength of a relationship or finding**, indicating how significant the observed effect is. It can be categorized as small, moderate, or large, depending on its magnitude. ## Footnote One widely used and effective measure of effect size is **Cohen's d**, which helps quantify the difference between two group means.

Answer 120

It means using more than one method to assess a dependent variable. As long as all the measures are valid, employing multiple measures significantly enhances your ability to detect effects or differences in the study, providing a more robust evaluation of the findings. ## Footnote If you want to test an intervention to treat post partum depression, then you could use multiple measures, such as the BDI, a rating from a family member, and a structured clinical interview. If there is any problem collecting or interpreting a measure, having multiple outcome measures reduces the problem's impact. E.g,, what if you used only the rating from family members, and it turned out that not all of the participants have a relative close enough to them to provide a valid rating?

Answer 121

Whereas a ***p* value** conveys the likelihood that a finding is **chance**, (i.e., how likely the finding is real,) an **effect size** conveys how **big or strong that difference** between the groups is.

Answer 122

* Informed consent for deception is not possible. * When does the deception stop? * Harms the credibility of psychology

Answer 123

Researchers sometimes use deception when collecting data to **prevent participants' awareness from influencing the results**. Deception is typically employed only when being direct could significantly bias the data. Its use must be pre-approved by an Institutional Review Board (IRB), ensuring that the potential harm does not outweigh the anticipated benefits, and participants must be fully debriefed afterward.

Answer 124

A **measure** of how closely the **data in a sample** or **population** cluster around the **mean**. The standard deviation is equal to the square root of the variance. For a more in-depth explanation of standard deviations, see [this resource](https://www.khanacademy.org/math/probability/descriptive-statistics/variance-std-deviation/v/statistics-standard-deviation).

Answer 125

The proportion of test-takers who answer an item correctly. ## Footnote Item difficulty ranges from 0 to 1. A higher value indicates an easier question, as more test-takers answer it correctly.

Answer 126

True ## Footnote The item discrimination index assesses how well an item can differentiate between test-takers who perform well overall and those who do not. Values closer to 1 suggest better discrimination.

Answer 127

internal consistency ## Footnote Cronbach's α assesses the reliability of a test by examining the average correlation among items. Higher values indicate greater internal consistency.

Answer 128

* **KR-20** is **specific to dichotomous** items. * **Cronbach’s α** is used for **continuous or ordinal data**. ## Footnote Both are measures of internal consistency, but KR-20 is used for tests with binary (right/wrong) scoring.

Answer 129

A framework for **understanding test scores** based on the idea that each score is composed of a true score and error. ## Footnote CTT assumes that every observed score is the sum of a true score and random error, emphasizing the importance of reliability and validity.

Answer 130

**Modern test** theory focuses on **item-level data** and models the **probability of a response** given various item and person parameters. ## Footnote Also known as item response theory (IRT), it allows for more precise measurement and analysis across different populations and test forms.

Answer 131

**Standards** derived from a large group used to interpret individual test scores. ## Footnote Norms provide a context for understanding where an individual's score falls relative to a representative sample, aiding in meaningful interpretation.

Answer 132

It ensures that testing conditions are **consistent** and results are **comparable** across different administrations. ## Footnote Standardization reduces variability unrelated to the construct being measured, enhancing the reliability and validity of test results.

Answer 133

* Administration instructions * Scoring procedures * Normative data * Reliability and validity evidence ## Footnote A comprehensive test manual helps ensure standardized administration and accurate interpretation of test results.

Answer 134

**Test bias** occurs when a test systematically **disadvantages** certain groups, whereas **fairness** involves **equitable** treatment and outcomes for all examinees. ## Footnote Bias is a statistical property, while fairness is a broader social concept. A fair test minimizes bias and ensures valid results for all demographic groups.

Answer 135

* **Factorial designs** involve **more than one** independent variable. * **Simple designs** involve only **one independent** variable. ## Footnote Factorial designs allow researchers to investigate the interaction effects between multiple variables, providing a more comprehensive understanding of complex phenomena.

Answer 136

* **Longitudinal** studies track the **same** participants **over time**. * **Cross-sectional** studies analyze data from participants at a **single point in time**. ## Footnote Longitudinal studies are valuable for observing developmental changes and causality, while cross-sectional studies are efficient for examining differences across age groups or demographics.

Answer 137

True ## Footnote Mixed-methods research integrates both qualitative and quantitative data to provide a more complete understanding of research questions, leveraging the strengths of both methodologies.

Answer 138

single subject (or case) ## Footnote Single-case designs are often used in clinical and applied settings to observe the effects of an intervention on an individual, allowing for detailed analysis and customization of treatment.

Answer 139

History ## Footnote History refers to external events that occur during the course of a study that could influence participants' behavior or responses, potentially confounding the results.

Answer 140

* changes within participants over time * Maturation involves natural changes that occur within participants over the course of a study, such as aging or learning, which can affect the outcomes independently of the experimental treatment. ## Footnote Maturation can be controlled by including a control group, which helps differentiate changes due to the experimental manipulation from those occurring naturally.

Answer 141

* Purpose of the research * Procedures involved * Risks and benefits * Confidentiality details * Voluntary participation * Contact information for questions ## Footnote Informed consent is essential to respect participants' autonomy and ensure they understand what participation entails, allowing them to make an informed decision about their involvement.

Answer 142

True ## Footnote Anonymity ensures that participants' identities are not linked to their data, enhancing the privacy and security of sensitive information.

Answer 143

* **Confidentiality** means the researcher **knows** the participants' identities but keeps them **private**. * **Anonymity** means even the researcher **does not know** the participants' identities. ## Footnote Confidentiality requires robust data protection measures to prevent unauthorized access, maintaining trust between researchers and participants.

Answer 144

explanation ## Footnote Debriefing provides participants with comprehensive information about the study, helping to alleviate any potential misconceptions and offering closure regarding their involvement.

Answer 145

* Secure storage of test materials * Controlled access to tests * Regular monitoring of test use * Training for test administrators ## Footnote Test security is crucial to uphold the validity and reliability of assessments, preventing unauthorized access and misuse that could compromise results.

Answer 146

* Inaccurate diagnoses * Unfair treatment decisions ## Footnote Test misuse can lead to harmful outcomes for individuals, including misinformed clinical decisions and biased employment or educational opportunities.

Answer 147

A range of values derived from sample data that is likely to contain the true population parameter. ## Footnote Confidence intervals provide an estimated range of values that is believed to contain the population parameter with a certain level of confidence, usually 95% or 99%.

Answer 148

2; 27 ## Footnote In an ANOVA report, the numbers in parentheses represent the degrees of freedom for the effect (first number) and the degrees of freedom for the error (second number).

Answer 149

True ## Footnote (R²) values range from 0 to 1, where 0 indicates no explanatory power and 1 indicates perfect explanation of the variability of the dependent variable by the independent variables.

Answer 150

* The size of the effect * The difference between two means in terms of standard deviation ## Footnote Cohen's d is a measure of effect size used to indicate the standardized difference between two means. It is important for understanding the practical significance of research findings.

Answer 151

It measures the proportion of total variance that is attributable to an effect. ## Footnote Eta squared is a measure of effect size for ANOVA that indicates the proportion of the total variability in the dependent variable that is associated with the factor under consideration.

Measurement, Methodology and Other: Measurement and Methodology Flashcards

You will be able to interpret core research principles and statistical methods used in psychological science, including psychometric assessment and ethical standards. (182 cards)