220 final exam last chap Flashcards
(36 cards)
What does statistical conclusion validity refer to?
A. Accuracy of measuring instruments
B. Whether we used qualitative data
C. Reasonableness of conclusions about variable relationships
D. Whether our sample matches the population
C. Reasonableness of conclusions about variable relationships
➡️ Is there enough statistical evidence to say, “Yes, these variables are related”?
Statistical conclusion validity is about using proper statistical methods to determine whether your data truly supports a relationship between the IV and DV, or if it’s just a result of random chance.
It’s like ensuring your data doesn’t lead you to the wrong conclusion by applying the right statistical tests, having an appropriate sample size, and avoiding errors like:
Type I errors (false positives)
Type II errors (false negatives)
What are the two possible conclusions in statistical analysis?
A. One variable caused the other OR there’s no effect
B. Correlation is weak OR strong
C. We used descriptive OR inferential statistics
D. There is a relationship OR there isn’t a relationship
D. There is a relationship OR there isn’t a relationship
If you’re studying the relationship between study hours and test scores, you could conclude that there is a relationship (study hours affect test scores) or there isn’t a relationship (study hours do not impact test scores).
Conclusion validity checks if the math (stats) shows a real result or just random chance.
Construct validity checks if you’re measuring what you actually meant to measure.
Internal validity checks if your result was truly caused by your variable, not something else.
External validity checks if your results can apply to other people or situations.
What is a Type I error?
A. Failing to detect a real relationship
B. Concluding a relationship exists when it doesn’t
C. Mixing up descriptive and inferential stats
D. Ignoring statistical power
B. Concluding that a relationship exists when it doesn’t
A study looks at whether gang programs reduce youth crime.
The results show no effect — so they say the program doesn’t work.
But in truth, the program does help, they just didn’t have enough data to detect it.
What is a Type II error?
A. Concluding no relationship exists when there actually is one
B. Assuming causation from correlation
C. Using the wrong variables
D. Using descriptive stats for multivariate data
A. Concluding no relationship exists when there actually is one
A researcher studies whether police patrols reduce car thefts.
They find no significant effect, so they say patrols don’t help.
But in reality, patrols do reduce thefts — the study just didn’t have enough data.
✅ A real relationship exists
❌ The study failed to detect it
type II Threats (The Haystack Problem)
5. In the “needle in a haystack” analogy, what does the needle represent?
A. Sample size
B. External validity
C. The true relationship you’re trying to detect
D. Random assignment
C. The true relationship you’re trying to detect
The needle: the relationship you are trying to see
The haystack: the “noise” that
obscures your vision
Type 1 Threat: Searching for patterns randomly can make you find false relationships.
Example: Finding a fake link between eating chocolate and happiness because you didn’t have a clear plan.
Type 2 Threat: A weak study design can make you miss real relationships.
Example: Missing the effect of a drug because your study had too small of a sample size.
Which of the following is NOT a source of noise in statistical conclusion validity?
A. Random heterogeneity of participants
B. Poor implementation fidelity
C. High statistical power
D. Low reliability of measures
C. High statistical power
the ones that are is in the below
Low reliability of measures
Poor implementation fidelity
Random irrelevancies in the setting
Random heterogeneity of participants
Insufficient statistical
Statistical noise = Random variations in data due to factors that aren’t part of the study, like individual differences, measurement errors, or other variables that affect the outcome.
What does insufficient statistical power affect?
A. The central tendency of the data
B. Ability to detect the needle in the haystack
C. Calculation of percentiles
D. External validity
B. Ability to detect the needle in the haystack
Haystack = noise/sample size/variability
Small needle in a big haystack = hard to find = low statistical power
Big needle = easy to find -> high statistical power
If a result is significant at p < .05, what does this mean?
A. There’s less than a 5% chance the result is due to luck
B. It’s more than 50% true
C. It’s always generalizable
D. The test was one-tailed
A p-value of 0.05 or less indicates a statistically significant result, meaning there’s less than a 5% chance the result is due to random chance, while a p-value greater than 0.05 suggests the result is not significant and may be due to random variation.
If the p-value is less than 0.05 (p < 0.05), it means the result is statistically significant (likely a real effect).
If the p-value is greater than 0.05 (p > 0.05), it means the result is not statistically significant (likely due to chance).
Which of the following are conventional levels of significance? standard cutoff for deciding if a result is meaningful.
A. .75, .25, .10
B. .05, .01, .001
C. .95, .85, .50
D. .03, .07, .15
B. 0.05, .01, .001
A conventional level of significance refers to the commonly accepted thresholds that researchers use to decide if a result is statistically significan
so if p value is at 0.05 or below it is close to the real reslt being statistically siginificant
What is a Type I threat often caused by?
A. Small sample sizes
B. Data fishing (running too many tests)
C. Using qualitative variables
D. High statistical power
B. Data fishing (running too many tests)
you say there is a relationship when really, there isn’t one (A false positive)
Out of many tests, one test might show a result that looks promising typically considered “significant”).
researcher might think this is a real finding because it’s what they were looking for.
it increases the risk of a Type I error.
What is the main purpose of descriptive statistics?
A. Summarize or describe sample data
B. Prove a hypothesis
C. Generalize to populations
D. Predict future events
A. Summarize or describe sample data
What is the purpose of inferential statistics?
A. Clean the data
B. Find frequency distributions
C. Make conclusions about populations from sample data
D. Create visual charts
C. Make conclusions about populations from sample data
use a sample of data to make generalizations or predictions about a larger population. It helps researchers go beyond the data at hand and draw conclusions about a broader group.
study large populations without needing to ask every single person. (sampling)
What does “univariate” analysis focus on?
A. Two variables
B. One variable
C. No variables
D. Three or more variables
B. One variable
focuses on just one variable at a time.
You’re not comparing it to any other variable
You collect data on students’ test scores and look at the average score.
That’s univariate — you’re only analyzing test scores, nothing else.
What are the three main features of univariate analysis?
A. Distribution, central tendency, dispersion
B. Sampling, testing, plotting
C. Normality, reliability, validity
D. Mean, correlation, frequency
A.
Distribution: How the data is spread out.
Central Tendency: The “center” of the data (mean, median, mode).
Dispersion: How spread out the data is (range, variance, standard deviation).
Imagine two classes of students with their test scores:
Class A: 90, 91, 92, 93, 94
Class B: 60, 70, 80, 90, 100
Distribution: Both classes have scores, but Class A’s scores are closely packed around the 90s, while Class B’s scores are more spread out across a larger range.
Dispersion: Class B has higher dispersion because their scores vary more widely (from 60 to 100), while Class A has low dispersion because the scores are tightly grouped together.
What is the mean?
A. Middle number in a list
B. Most frequent value
C. Average of all values
D. Difference between max and min
C. Average of all values
What is the median for this list: 0, 2, 3, 3, 4, 4, 6?
A. 2
B. 3
C .3.5
D. 4
B. 3
What is the mode in this list: 0, 2, 3, 3, 4, 4, 4, 6?
A. 3
B. 2
C. 4
D. 6
C. 4
In a normal distribution, which statements are true?
A. Mean is higher than median
B. Mode is the lowest value
C. Mean = Median = Mode
D. There’s no dispersion
C. Mean = Median = Mode
Imagine you measured the heights of 100 people. If their heights form a normal distribution:
The mean height will be the same as the median height (middle value) and the mode (most common height).
Why might the mean be misleading?
A. It’s hard to calculate
B. It can be affected by outliers
C. It doesn’t apply to normal data
D. It’s always the same as the median
B. It can be affected by outliers (odd one)
you are looking at the number of burglaries in 5 neighborhoods:
Neighborhood 1: 5 burglaries
Neighborhood 2: 7 burglaries
Neighborhood 3: 6 burglaries
Neighborhood 4: 4 burglaries
Neighborhood 5: 50 burglaries (this is the outlier)
Calculation of Mean:
Add up the burglaries: 5 + 7 + 6 + 4 + 50 = 72 burglaries
Divide by the number of neighborhoods (5): 72 ÷ 5 = 14.4 burglaries
The mean number of burglaries is 14.4, but most neighborhoods have much fewer burglaries. Neighborhood 5 has an unusually high number of burglaries, which pulls the mean up, making it seem like burglaries are higher across all neighborhoods than they actually are.
What is the range?
A. The average of all scores
B. The difference between highest and lowest values
C. The middle value
D. The most common value
B. The difference between highest and lowest values
What does a percentile rank show?
A. What percent of scores fall at or below a value
B. How tall the histogram is
C. The SD of scores
D. The average error
A. What percent of scores fall at or below a value
A percentile rank tells you how many people scored lower than or the same as you.
Imagine you are in a criminology class with 100 students, and you get a score of 80 on a test.
If your percentile rank is 90, it means 90% of the students scored lower than or the same as you.
So, you did better than 90 out of 100 students.
Which is NOT a valid quartile range?
A. 0–25%
B. 26–50%
C. 51–75%
✅ D. 60–100%
Quartiles divide data into four equal parts, each representing a range of 25% of the data. These ranges are:
0–25%: The first quartile (Q1), or the lower 25% of the data.
26–50%: The second quartile (Q2), or the middle 25% of the data (this also includes the median).
51–75%: The third quartile (Q3), or the upper 25% of the data.
76–100%: The fourth quartile (Q4), or the top 25% of the data.
so D has too much big jump from the set % scale
Standard Deviation (Steps)
24. What is the first step to compute standard deviation?
A. Find each value’s distance from the mean
B. Calculate the median
C. Multiply the mean by the number of values
D. Divide total by 100
A. Find each value’s distance from the mean
Why do we square the deviations when calculating SD?
A. To prevent negative values from canceling out
B. To make values easier
C. To find the median
D. Because it’s required by SPSS
A. To prevent negative values from canceling out
Deviation=Score−Mean
8−10=-2
(−2)^2 = 4