High School Statistics Flashcards

Learning High School Statistics (149 cards)

1
Q

The conditional probability formula

A

P(A∣B)= P(A∩B)/P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When do we use conditional probability?

A

Conditional probability is used when the occurrence of one event affects the probability of another event. It helps answer questions like:

“What is the probability of 𝐴, if we already know B happened?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When are two events considered dependent?

A

When P(A|B) does not equal P(A)
When the occurrence of one affects the probability of the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you check for dependence?

A

Calculate P(A|B)
Compare it to P(A)
If P(A|B) = P(A), the events are independent.
If P(A|B) does not equal P(A), the events are dependent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is joint probability

A

P(A and B)
It focuses on the overlap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the mean?

A

The mean is the sum of the values divided by the number of values in a data set. It represents the “average” score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the median?

A

The median is the middle point in an ordered data set.
If there is an even number of data points, the median is the average of the middle two points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you determine if the mean or the median is the best measure of center?

A

The “best” measure should be representative of a “typical” score in a data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is IQR?

A

The Interquartile Range (IQR) describes the spread of the middle 50% of a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Visualize how to calculate IQR

A

Consider the dataset: [1,3,5,7,9,11,13]
Step 1: Order the Data:
Data is already in ascending order: [1,3,5,7,9,11,13]
Step 2: Find the Quartiles:
Median (Q2): Middle value = 7.
Lower half: [1,3,5]
Median of lower half (Q1) = 3.
Upper half: [9,11,13]
Median of upper half (Q3) = 11
Step 3: Calculate IQR:
IQR=Q3−Q1=11−3=8
So, the IQR is 8.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Mode

A

The mode is the value that occurs most frequently in the dataset. A dataset can:

Have no mode (if all values occur equally often).
Be unimodal (one mode).
Be bimodal (two modes) or multimodal (more than two modes).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you determine outliers in a dataset?

A

Outliers are data points that fall outside this range:

Lower Bound: Q1−1.5⋅IQR
Upper Bound: Q3+1.5⋅IQR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sample variance

A

a measure of the spread or variability of data in a sample. It tells us how much the data points in a sample differ from the sample mean on average.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Steps to calculating variance

A

[4,8,6,10,12]

Find the Mean:
Calculate Deviations (DP - mean)
Square the Deviations
Sum the Squared Deviations
Divide by n−1 (sample) or n (population)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Interpretation of Sample Variance

A

Small variance: The data points are close to the mean, indicating low variability.
Large variance: The data points are spread out from the mean, indicating high variability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Standard Deviation

A

It takes the square root of the sample variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

When to use and advantages of mean and standard deviation

A

When to Use:
Best for symmetrical distributions without extreme outliers.
Provides a complete summary of the dataset, using all data values.
Advantages:
Captures the overall pattern of the data.
Good for normal (bell-shaped) distributions.
Disadvantages:
Sensitive to outliers: A single extreme value can greatly affect the mean and standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When to use Median and Interquartile Range (IQR)

A

When to Use:
Best for skewed distributions or datasets with outliers.
Resistant to outliers because it focuses on the middle portion of the data.
Disadvantages:
Ignores extreme values and doesn’t use all the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Percentile

A

Tells us what percent of observations are less than or equal to a given value in a distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Visualize how to calculate the 25th percentile for
3,8,7,5,12

A

Order the dataset and calculate the rank :3,5,7,8,12
Calculate Percentile Rank (Position) = (P/100) x (n +1)
P = 25, n = 5 (the number of datapoints)
(25/100) * (5+1) = 1.5
The position 1.5 falls half way between the 1st and 2nd data point
The 1st position DP is 3 and the 2nd position DP is 5, so interpolate
(lower value + (fraction x (difference between values)))
3+.5 * (5-3) = 3 + 1 = 4
The 25th percentile is 4.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Z-score

A

A z-score (or standard score) measures how many standard deviations a data point is from the mean of a dataset.

z = (x - mean)/standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

z-score formula

A

z = (dp - mean)/standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Interpreting z-scores

A

Positive z-score: Above the mean.
Negative z-score: Below the mean.
Outliers often have z-scores greater than
3 or less than −3
A z-score of 0 indicates the value is exactly the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Empirical rule

A

For a bell-shaped curve
68% of the data falls within 1 standard deviation of the mean
95% of the data falls within 2 standard deviation of the mean
99.7% of the data falls within 3 standard deviation of the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Marginal distribution
Marginal distributions show the totals or proportions for one variable, summed over all categories of the other variable(s).
26
Conditional distributions
Focuses on the distribution of one variable given a specific condition on another variable.
27
What is a standard normal table?
A z-table, which provides the cumulative probabilities for the standard normal distribution.
28
What does z-score = 1.28 mean?
Look up z = 1.28 in the z-table The table shows P(Z < 1.28) = 0.8997 Interpretation: About 89.97% of the data lies below z = 1.28
29
What does P (Z > 1.28) mean?
P (Z > 1.28) = 1 - P(Z > 1.28) = 1 - 0.8997 = .1003. Interpretation, about 10.03% lies above z = 1.28.
30
What are thresholds for low percentiles
Usually the cutoff value below which 5% (z= -1.645) or 10% (z = -1.28) of the data lies.
31
What is the slope of a regression line?
y = mx + b represents the rate of change between the dependent variable (𝑦) and the independent variable (x) in a linear relationship.
32
If y=2x+10 represents a company's sales (y) based on advertising (x), what does it mean?
Slope (m = 2). For every $1 increase in advertising spending, sales increase by $2 Y intercept (b = 10), if advertising is $0, sales are expected to be $10.
33
The correlation coefficient (r)
A statistical measure that quantifies the strength and direction of the linear relationship between two variables.
34
Define the range for Correlation Coefficient (r)
r=+1: Perfect positive linear relationship. As one variable increases, the other increases proportionally. r=−1: Perfect negative linear relationship. As one variable increases, the other decreases proportionally. r=0: No linear relationship between the variables.
35
What is the residual
the difference between the observed value (y) and the predicted value (^y ) from a regression line. Residual=y− ^y ​
36
Define y > predicted
The observed value is above the predicted value.
37
Define y < predicted
The observed value is below the predicted value.
38
Explanatory variable
An explanatory variable (also called an independent variable) is the variable in a study or analysis that is used to explain, predict, or influence changes in another variable
39
Response variable
The response variable (dependent) is the "output" or result that depends on the explanatory variable
40
Treatments
Treatments are the specific conditions or procedures that researchers apply to the subjects or units being studied.
41
How do you convert a percent to a decimal?
Divide by 100
42
When you have a total and want to find the part
You multiply. 10 * 25% = 10 * (1/4) = 2.5
43
When you have a part and want to find the total.
You divide 10 / 25% = 10 / (1/4) = 10 * 4/1 = 40
44
Wait times are approximately normal with a mean = 185 sec and a SD = 11. Amelia will only stand for an avg wait time in the bottom 10%. What 's the max avg wait time Amelia will wait?
The bottom 10 percent is a Z-score of -1.29. z-score = (dp - mean) / stand deviation -1.28 = (DP - 185)/11 170.92 = DP
45
Populations vs. samples
Populations is the entire group while the sample is part of the population that we collect data from
46
Observational studies
Collects data to discover relationships without assigning individuals to groups or imposing treatments. We can draw connections based on variables, but cannot make causal relationships because of confounding variables. Can create associations.
47
Experiments
Random sampling into treatment and control groups. Can lead to causality if experiments control for confounding variables.
48
Restrospective studies
Observe data that already exists for a sample of individuals
49
Sample Surveys
Collect data from a sample of individuals to learn about the data at that time
50
Prospective studies
Follow individuals into the future and record data overtime
51
Response bias
When people are systematically dishonest when answering
52
Undercoverage
When the researcher systematically excludes members from the population sample
53
Non response
When people chose from the sample cannot be reached or refuse to participate
54
Voluntary
When people voluntarily choose to participate in a study
55
Biased wording
When survey questions cause people to favor certain responses over others
56
Simple random sample
A sample of individuals from a group are chosen so every group of individuals has an equal chance of being in the sample
57
Stratified random sample
Dividing the population into strata and then sampling individuals from each group. A stratified random sample is helpful when we expect that there may be substantial differences between the groups.
58
Systematic random sample
We put the population into an ordered arrangement and randomly select from 1st K individuals and choose every Kth individual
59
Cluster random sample (sampling method)
We divide the population into groups called clusters and randomly choose some of the groups, including every member from the groups.
60
What is the addition rule of probability: You go to an ice cream shop. The probability of choosing chocolate is 0.3, and vanilla is 0.5. What’s the chance someone picks either chocolate or vanilla?
It can apply to mutually exclusive (can happen at same time) and non-mutually exclusive (can't happen at same time) events. ME P(A or B) = P(A) + P(B) NME P(A or B) = P(A) + P(B) - P(A and B)
61
How do you calculate the probability of dependent variables?
P( A and B) = P(A) * P(B|A) B occurs given that A has occurred
62
How do you calculate the probability of two independent variables?
P(A and B) = P(A) * P(B) B will occur regardless of A, A does not influence B.
63
What's the formula for permutations
(nPr) = n!/(n-r)! Order does not matter in permutations P represents the number of unique arrangements that exists in a set
64
What's the formula for combinations
(nCr) = n!/r!(n-r)! Order matters
65
What is experimental probability
P(event) = # of times event occurs/total # of trials
66
Nia is 1 of 24 students in a class and the teacher is going to select 4 students a s President, VP, Sec, and Treas. What's the probability that Nia is president any given month? No two students can hold two positions at once.
Does order matter? (Yes, so permutations) Determine strategy = possibilities of Nia as Pres / total number of outcomes nPr = 23P3 / 24P4
67
What are order-based outcomes and how do you calculate them?
Another name for permutations. When the sequence of events or items are significant, and each arrangement is unique. P!/(n-r)!
68
Counting totals vs. groupings
If counting totals (like total number of heads or total number of scores), use combinations If counting sequences like basketball scoring or passwords, use order-based methods, which is permutations.
69
Which one is it where groups must be unique and there's no repetition allowed? A. Combinations B. Permutations
A. Combinations
70
Combinations or Permutations: Arranging 3 people in line out of 5 people A. Combinations B. Permutations
B
71
Combinations or Permutations: Selecting 3 people for a team from 5 people A. Combinations B. Permutations
A Because: You're just forming a team, not assigning roles. Order doesn’t matter — picking Alice, Bob, Charlie is the same as Bob, Charlie, Alice. So you're counting groups, not arrangements.
72
Combinations or Permutations: Rolling 2 dice and considering the order of rolls A. Combinations B. Permutations
B
73
Combinations or Permutations: Rolling 2 dice and only caring about the sum A. Combinations B. Permutations
A
74
Combinations or Permutations: Creating a 3-character password from A,B,C A. Combinations B. Permutations
B
75
Combinations or Permutations: Choosing 3 letters from ABC (ignoring order) A. Combinations B. Permutations
A
76
Combinations or Permutations: Choosing 3 toppings: (Cheese,Pepperoni,Olives) is different from (Olives,Pepperoni,Cheese) A. Combinations B. Permutations
B
77
Combinations or Permutations: Choosing 3 toppings: (Cheese,Pepperoni,Olives) if only the types of toppings from matter where Cheese, Pepperoni, and Olives is the same group. A. Combinations B. Permutations
A
78
Mutually Exclusive Events
Events cannot happen at the same time P (A or B) = P(A) + P(B)
79
Not Mutually Exclusive
Events happen at the same time P(A and B) = P(A) + P(B) - P(A n B)
80
What is expected outcome E(X)?
The mean (or expected value, 𝐸(𝑋)) of a discrete random variable is the weighted average of all possible values, where the weights are the probabilities of each value. It represents the long-term average outcome if an experiment is repeated many times.
81
An insurance company uses expected value to calculate premiums. The company insures a car for $20,000. There’s a 0.01 probability the car will be stolen. Calculate the expected outcome E(X)
E(X)=20,000⋅0.01+0⋅0.99=200 The company should charge a premium higher than $200 to make a profit
82
A business must decide whether to invest in a project with uncertain profits: Profit of $50,000 with probability 0.6 Loss of $20,000 with probability 0.4 Calculate the expected outcome E(X)
E(X)=(50,000⋅0.6)+(−20,000⋅0.4)=30,000−8,000=22,000 The project has a positive expected value, so it may be worth pursuing.
83
If Lyssa sells a certain type of light bulb in packages that each contain 24 bulbs, and the back of each package says, "The expected number of broken or defective bulbs per package is .25" Why is this statement incorrect: "If we look at 100 packages, we expect to see a total of about 250 broken or defective bulbs."
Since expected value is a long-term average, we can use it to think about the results of repeated trials. But if we think of the expected value as being an average of 0.25\bad bulbs per package, then over 100 packages, we'd expect 0.25 * 100=25 bad bulbs, not 250.
84
Interpolation
Interpolation is a method used to estimate values that fall between known data points, especially discrete data.
85
Two-way relative frequency tables
Shows us percentages rather than counts. They are good for seeing if there is an association between two variables. We can use row relative frequencies or column relative frequencies, it just depends on the context of the problem.
86
Two-way Frequency tables
Two-way tables organize data based on two categorical variables. *note not relative frequency
87
What is round-off error?
When the percentages in your two-way relative frequency table do not add up to 100 percent even though you rounded properly.
88
What number below is rounded to the hundredth? A. 92.1 B. 50.00 C. 44.553
B
89
What number below is rounded to the tenth percent? A. 93.6 B. 52.44432 C. 34
A
90
A skewed to the left distribution
The mean is to the left of the median and the mode. And there is a tail to the left. The tail to the left means there are a few extremely high values (outliers).
91
A skewed to the right distribution
The mean is to the right of the median and mode. And there is a tail to the right. The tail to the right means there are a few extremely high values (outliers).
92
Clusters
A cluster is a group of data points that are close together in a distribution. It indicates that many values are concentrated in a specific range.
93
Gaps
A gap is a large space between data points where no values exist. Gaps indicate missing values or low frequency in that range. They can suggest a break or divide in the data.
94
Peaks
A peak is a high point in a distribution where many values occur. It represents the mode (most frequent value or range).
95
What do you think about when creating a chart?
DO you want to show mean, median: Box plot Do you want to show counts of objects?: Histogram, Bar Chart Do you want to show distribution or spread? : Box plot, Dot plot Do you want to show mode?: Dot plot
96
What IQR represent, and how is it shown in a boxplot?
IQR is an interval that contains 50 percent of the data. In a box plot, the middle 50 percent is the distance between the left and the right vertical bars.
97
Right Skewed vs. Left Skewed
We say a distribution is skewed if one side of its graph is much longer than the other side. The direction of the skewness (left or right) is toward the longer tail.
98
Which is a true statement? A. When there are outliers, the median is closer to the tail than the mean. B. When there are outliers, the mean is closer to the tail than the media.
B: When a data set is not symmetrical, the mean is usually closer to the outliers or tail than the median is.
99
A cat gave birth to 3 kittens who each had a different weight between 147 and 159 g. Then, the cat gave birth to a 4th kitten that weighed 57 g. How will the birth of the 4th kitten affect the mean and median?
Both the mean and median will decrease, but the mean will decrease more.
100
Lara is a wildlife researcher. They were analyzing the mean and median lengths of 7 fish their team had observed. The fish all had different lengths between 15 cm and 33 cm. Lara found out that they were misreading the longest length. It was actually 88 cm, not 33. How will this length increasing affect the mean and median?
The mean will increase, and the median will stay the same.
101
T or F: The mean always is the balancing point
It is always true that the total distance below the mean is equal to the total distance above the mean. a + b = c + d
102
What do you think deviation means in statistics
In statistics, when discussing measures of spread, deviation is the amount by which a single measurement differs from the mean.
103
10 service worker cleaned parks for the year. The median number of parks cleaned is 4. The range is 7. T or F: The greatest number of parks cleaned can be 12.
False. The minimum number of parks cleaned cannott be greater than 4 because 4 is the medium. MAX - MIN = RANGE. 12 - 7 = 5.
104
What is n-1 and when do you use it?
You use n−1 when you're estimating the population variance from a sample.Using n−1 instead of n corrects this bias, making the estimate unbiased.This is called Bessel’s correction.
105
T or F: You should use n-1 when you have the entire population.
False. Use n. There is no need to estimate anything—you already have all the data.
106
Mean Absolute Deviation
The mean absolute deviation (MAD) is the mean of the distances from the mean. |mean-DP| + |mean-DP|.../2...
107
The following table shows the five-number summary for the number of accounts each sales manager at Force Inc. manages. Min = 30, Q1 = 45, Median = 50, Q3 = 65, Max = 85. About 50% of sales managers at Force Inc. manage fewer than what number of accounts?
Each quartile helps us split the data points into fourths. Minimum is 0 percent, Q1 = 25%, Media = 50% (half), Q3 = 75%, Q4 = 100 % (All)
108
What are systems of equations?
A system of equations is a set of two or more equations that have the same variables. The goal is to find values for the variables that satisfy all the equations at the same time. x+y=10 2x−y=4
109
Density Curve
represents the distribution of a dataset. Instead of showing individual data points (like a histogram), it gives an overall shape of the data. The total area under the curve = 1 (100% of the data).
110
How do you estimate where Data Falls Using Width of Data Points
We estimate the proportion of data in a given range by calculating the area under the curve within that range Find the width of the interval you’re interested in. Find the height of the curve at that interval (if provided). Estimate the proportion of data by using area approximations.
111
What happens to the mean or median when you increase or decrease all of the data?
The mean and median increase or decrease by the same amounts. Changing one or two datapoints will impact the mean, but not the median.
112
What happens if you add or subtract a datapoint to or from the data set? How does that impact the mean or media.
This will impact the mean or median, increasing or decreasing it.
113
What happens when you multiply or divide all data points.
Both the mean and the mean will scale by the same factor when multiplying or dividing all data points.
114
Binomial Distribution
Discrete → Only whole number outcomes (e.g., 0, 1, 2…). Fixed number of trials (n) Two outcomes per trial → success or failure Probability stays the same each time ex: Toss a coin 5 times. What’s the probability of getting exactly 3 heads?→ Binomial! You’re counting how many "successes" in a set number of tries.
115
Normal Distribution
Continuous → Any value within a range (including decimals). Shaped like a bell curve. Mean = center, and the spread is measured by standard deviation. Used when data naturally clusters near the average. 🧠 Example: What’s the probability a person is between 5'6" and 6' tall?→ Normal! Heights are measurements and follow a bell-shaped curve.
116
When does a binomial distibution look like a normal distribution?
When the trials are large and the prob. isn't close to 0 or 1.
117
What visual trick can you use to remember converting decimals?
"Percent to Decimal → move left 2 spots" "Decimal to Percent → move right 2 spots" 👉 Examples: 0.25% → 0.0025 (👈 move decimal 2 left) 0.004 → 0.4% (👉 move decimal 2 right)
118
Convert .25% to a decimal
0.25% = .25/100 = .0025
119
Convert 1.2% to a decimal
1.2% = 1.2/100 = 0.012
120
Convert 0.07% to a decimal
0.07% = 0.07/100 = .0007
121
Convert .0008 to a percent
.0008 = .0008* 100 = 0.08%
122
Convert 0.004 to a percent
0.004 = .004 * 100 = 0.4%
123
Convert 0.025 to a percent
0.025 = 0.025 * 100 = 2.5%
124
Baruti, a ranger in Kruger National Park in South Africa, collected data about the elephant population in the park. She compared the foot lengths of the elephants and their shoulder height (both in centimeters) and created the following scatter plot.Which of the following is the best estimate of the average change in shoulder height associated with a 1 cm increase in foot length?
1. Draw a regressive line to discover points that fall on the line. 2. Discover points such at (10,40) and (20,100) 3. Obtain the slope (y1 -y2)/(x1-x2) 3. Plug the slop and a point on the line and solve for y = (rise/run)x + b
125
The choir director wants to predict ticket revenue (y) based on the number of seats occupied (x). Here are the summary stats: Mean of x = 75.8, Standard deviation of x = 14.8 Mean of y = 696, Standard deviation of y = 177.6 Correlation (r) = 0.81 What is the equation of the least-squares regression line? (Round to the nearest hundredth)
Step 1: Find the slope (b): b = r × (sy / sx) b = 0.81 × (177.6 / 14.8) ≈ 9.72 Step 2: Find the y-intercept (a): a = ȳ − b × x̄ a = 696 − 9.72 × 75.8 ≈ -40.78 ✅ Final equation: ŷ = -40.78 + 9.72x
126
Blanca wants to predict the weight (y) of a wall hanging based on its diagonal length (x). Here are the summary stats: Mean of x = 24.1 Standard deviation of x = 12 Mean of y = 12.9 Standard deviation of y = 16.2 Correlation (r) = 0.9 What is the equation of the least-squares regression line?
Step 1: Find the slope (b): b = r × (sy / sx) b = 0.9 × (16.2 / 12) ≈ 1.22 Step 2: Find the y-intercept (a): a = ȳ − b × x̄ a = 12.9 − 1.22 × 24.1 ≈ -16.38 ✅ Final equation: ŷ = -16.38 + 1.22x
127
r correlation coefficient
Measures the strength and direction of a linear relationship. Ranges from −1 to +1.
128
r-squared (coefficient of determination)
Tells us how much of the variation in y is explained by x. It’s the square of r, so it's always between 0 and 1 (or 0% to 100%). Example: If r = 0.9 → r² = 0.81 That means 81% of the variation in y is explained by x.
129
Imagine you're trying to guess how heavy a backpack is by how many books are inside and you want to use r (correlation coefficient) to determine this. "How good is book-count at guessing backpack weight?"
r = 0.9, so r² = 0.81 (that’s 81%) That means: 👉 81% of the changes in backpack weight can be predicted just by knowing how many books are inside.
130
Let’s say we’re looking at how hours of video games played affects math test scores. What does r-squared =.36 mean?
36% of the reason test scores go up or down can be explained by how many hours of games were played. But...64% must come from other stuff like sleep, studying, or how hard the test was! So: The closer r² is to 1 (or 100%), the better x (games) explains y (test score).
131
What is the Root Mean Square Error (RMSE)?
On average, how far off your predictions are.
132
What does the RMSE mean if you predicted test scores and obtained, making RMSE = 3.1: Real score: 90 → You guessed 88 (off by 2) Real score: 85 → You guessed 83 (off by 2) Real score: 80 → You guessed 75 (off by 5)
You square the errors (so no negatives), average them, then take the square root. “On average, my guesses are about 3.1 points off from the real scores.”
133
Visualize how to calculate RMSE
Find the error for each point (observed - predicted) Square the errors Add all the squared errors Divide by how many points you have (that’s the mean part) Take the square root
134
When reading, Reading a Regression Output (like from a computer or software), what is a Predictor
These are the variables you're using to predict your outcome. The x-variable you're using to predict y
135
When reading, Reading a Regression Output (like from a computer or software), what is a Constant
The Constant row is the intercept (what y would be when x = 0). The y-intercept of the line
136
When reading, Reading a Regression Output (like from a computer or software), what is Coef
These are the numbers used in the regression equation: The slope (how much y changes for each 1 x). y = a + bx. a = Constant x = explanatory variable
137
When reading, Reading a Regression Output (like from a computer or software), what is SE Coef
This shows how much uncertainty is in each coefficient. Smaller = better (more precise). If SE is large, we’re not as sure the true slope or intercept is close to our sample estimate.
138
When reading, Reading a Regression Output (like from a computer or software), what is R squared
🔹 R-Sq (R² or Coefficient of Determination) Tells you how well your model explains the variation in y. R² = 86.7% here → That means 86.7% of the changes in test scores are explained by study hours.
139
When reading, Reading a Regression Output (like from a computer or software), what is R squared adj
🔹 R-Sq(adj) (Adjusted R²) This adjusts R² when you have more than one predictor. Helps avoid overfitting (when your model is too complex). For simple linear regression (1 x-variable), R² and adjusted R² are very close.
140
When reading a Regression Output (like from a computer or software), what is P-Value
P tells you if the predictor is statistically significant. 📘 If P < 0.05, we usually say the predictor matters (it's unlikely the relationship is just random).
141
When reading a Regression Output (like from a computer or software), what is the standard deviation of the residuals or S?
S measures the size of a typical prediction error in the y variable.
142
What happens to r², r, and the standard deviation of residuals (s) if the point (75, 86) — which lies close to the regression line but has an extreme x-value — is removed?
- Effect on r²: The point fits the general pattern (small residual). Removing it weakens the overall relationship. → r² decreases. - Effect on r: The correlation weakens but remains positive. → r moves closer to 0. - Effect on s (standard deviation of residuals): The point had a small residual. Removing it increases the average size of residuals. → s increases.
143
What happens to r², r, and the standard deviation of residuals (s) if a point far from the regression line and far from the mean x-value is removed?
- Effect on r²: The point does NOT fit the general pattern (large residual). Removing it strengthens the relationship. → r² increases. - Effect on r: The point weakens the correlation. Removing it makes the correlation stronger. → r moves closer to 1 (if positive trend) or -1 (if negative trend). - Effect on s (standard deviation of residuals): The point had a large residual. Removing it reduces the typical size of residuals. → s decreases.
144
What experiment design is this an example of? A pharmaceutical company is carrying out a trial for a new flea prevention medication that is designed to work for dogs or cats. They recruit pet owners and ask them what flea prevention they currently use. They randomly assign half of the dogs in the study to the new flea prevention medication, and the other half of the dogs in the study will receive their current prevention. A similar process is carried out with the cats in the study.
In a randomized block design, random assignment is carried out within each block. The pets were first divided by type, and random assignment occurred within each type, so the types are the blocks.
145
Why is this not a matched pair design? A pharmaceutical company is carrying out a trial for a new flea prevention medication that is designed to work for dogs or cats. They recruit pet owners and ask them what flea prevention they currently use. They randomly assign half of the dogs in the study to the new flea prevention medication, and the other half of the dogs in the study will receive their current prevention. A similar process is carried out with the cats in the study.
There are no pairs involved, meaning they weren't given two different flea medications. The flea medications are existing treaments.
146
Emanuel surveyed a random sample of 50 subscribers to Auto Wheel magazine about the number of cars that they own. Of the subscribers surveyed, 15 own fewer than 2 vehicles. There are 340 subscribers to Auto Wheel magazine. Based on the data, what is the most reasonable estimate for the number of Auto Wheel magazine subscribers who own fewer than 2 vehicles?
340 * (15/50) = 102
147
Wholesome Pizza surveyed a random sample of 40 Trinity Food Festival attendees about their favorite type of pizza. Of the attendees surveyed, 21 said that sausage pizza was their favorite type of pizza. There are 1600 Trinity Food Festival attendees. Based on the data, what is the most reasonable estimate for the number of Trinity Food Festival attendees whose favorite type of pizza is sausage pizza?
1600 * (21/40) = 840
148
T or F: You can think of Parameter = Population fact
T. A parameter is a numerical value that describes a characteristic of an entire population. Examples of parameters are mean, standard dev, propotion (p) or N (total number of population
149
T. or F: A sample study compares two parameters of a popluation.
False. Observational and experiments do so while a sample study collects data from a smaller group (a sample) to estimate a parameter about a larger group (the population).