High School Statistics Flashcards
Learning High School Statistics (149 cards)
The conditional probability formula
P(A∣B)= P(A∩B)/P(B)
When do we use conditional probability?
Conditional probability is used when the occurrence of one event affects the probability of another event. It helps answer questions like:
“What is the probability of 𝐴, if we already know B happened?”
When are two events considered dependent?
When P(A|B) does not equal P(A)
When the occurrence of one affects the probability of the other.
How do you check for dependence?
Calculate P(A|B)
Compare it to P(A)
If P(A|B) = P(A), the events are independent.
If P(A|B) does not equal P(A), the events are dependent
What is joint probability
P(A and B)
It focuses on the overlap
What is the mean?
The mean is the sum of the values divided by the number of values in a data set. It represents the “average” score.
What is the median?
The median is the middle point in an ordered data set.
If there is an even number of data points, the median is the average of the middle two points.
How do you determine if the mean or the median is the best measure of center?
The “best” measure should be representative of a “typical” score in a data set.
What is IQR?
The Interquartile Range (IQR) describes the spread of the middle 50% of a dataset.
Visualize how to calculate IQR
Consider the dataset: [1,3,5,7,9,11,13]
Step 1: Order the Data:
Data is already in ascending order: [1,3,5,7,9,11,13]
Step 2: Find the Quartiles:
Median (Q2): Middle value = 7.
Lower half: [1,3,5]
Median of lower half (Q1) = 3.
Upper half: [9,11,13]
Median of upper half (Q3) = 11
Step 3: Calculate IQR:
IQR=Q3−Q1=11−3=8
So, the IQR is 8.
Mode
The mode is the value that occurs most frequently in the dataset. A dataset can:
Have no mode (if all values occur equally often).
Be unimodal (one mode).
Be bimodal (two modes) or multimodal (more than two modes).
How do you determine outliers in a dataset?
Outliers are data points that fall outside this range:
Lower Bound: Q1−1.5⋅IQR
Upper Bound: Q3+1.5⋅IQR
Sample variance
a measure of the spread or variability of data in a sample. It tells us how much the data points in a sample differ from the sample mean on average.
Steps to calculating variance
[4,8,6,10,12]
Find the Mean:
Calculate Deviations (DP - mean)
Square the Deviations
Sum the Squared Deviations
Divide by n−1 (sample) or n (population)
Interpretation of Sample Variance
Small variance: The data points are close to the mean, indicating low variability.
Large variance: The data points are spread out from the mean, indicating high variability.
Standard Deviation
It takes the square root of the sample variance.
When to use and advantages of mean and standard deviation
When to Use:
Best for symmetrical distributions without extreme outliers.
Provides a complete summary of the dataset, using all data values.
Advantages:
Captures the overall pattern of the data.
Good for normal (bell-shaped) distributions.
Disadvantages:
Sensitive to outliers: A single extreme value can greatly affect the mean and standard deviation.
When to use Median and Interquartile Range (IQR)
When to Use:
Best for skewed distributions or datasets with outliers.
Resistant to outliers because it focuses on the middle portion of the data.
Disadvantages:
Ignores extreme values and doesn’t use all the data.
Percentile
Tells us what percent of observations are less than or equal to a given value in a distribution.
Visualize how to calculate the 25th percentile for
3,8,7,5,12
Order the dataset and calculate the rank :3,5,7,8,12
Calculate Percentile Rank (Position) = (P/100) x (n +1)
P = 25, n = 5 (the number of datapoints)
(25/100) * (5+1) = 1.5
The position 1.5 falls half way between the 1st and 2nd data point
The 1st position DP is 3 and the 2nd position DP is 5, so interpolate
(lower value + (fraction x (difference between values)))
3+.5 * (5-3) = 3 + 1 = 4
The 25th percentile is 4.
Z-score
A z-score (or standard score) measures how many standard deviations a data point is from the mean of a dataset.
z = (x - mean)/standard deviation
z-score formula
z = (dp - mean)/standard deviation
Interpreting z-scores
Positive z-score: Above the mean.
Negative z-score: Below the mean.
Outliers often have z-scores greater than
3 or less than −3
A z-score of 0 indicates the value is exactly the mean.
Empirical rule
For a bell-shaped curve
68% of the data falls within 1 standard deviation of the mean
95% of the data falls within 2 standard deviation of the mean
99.7% of the data falls within 3 standard deviation of the mean