1.3 and 1.4 Overview and Descriptive Stats Flashcards
(25 cards)
Smoothed histogram
Density estimate
Sample Statistics
Numbers describing a sample distribution
- measures of CENTER (mean, median)
- measures of spread (standard deviation, range, IQR)
- other: min, max, quartiles
sample mean
x bar
population mean
impossible to find bc you can’t get data from everyone ever
dealt with theoretically with infinite populations
drawback of the Mean
sensitive to outliers
-if the data is not symmetric, the mean isn’t very good at measuring the center
median
~x
middle value
outliers DO NOT change the median = resistant
median when n is odd vs even
odd, after ordering, median = (n+1)/2 th data value
even, after ordering, median = avg of n/2 th and (n/2 + 1)th data values
using population mean and population median
1. symmetric distribution: u = ~u (mean & median are close together) 2. negatively skewed distribution u < ~u 3. positively skewed distribution u > ~u mean gets pulled towards median and vice versa
Quartiles
Q1: median of data values < ~x (25%)
Q3: median of data values > ~x (75%)
range = max - min
Boxplot
visual representation of 5 number summary (max, min, median, Q1, Q3)
Interquartile Range (IQR)
range of middle 50% of data
IQR = Q3 - Q1
How to identify outliers
- find IQR
- multiply * 1.5 (1.5 IQR rule)
- subtract that # from Q1 and then from Q3
- anything above #-Q3 and below #-Q1 = outlier
Deviation from mean
xi - x bar
summation of (xi-x bar) =
0, always
bc…
sample variance s^2
= summation of (xi-x bar)^2
/(n-1)
*if you know the mean and the n-1 values, then you know the last value
n-1
degrees of freedom
since summation of (xi-x bar) = 0, the last deviation can always be calculated if first n-1 is known
Sample Standard Deviation
how spread out the data is
(only really good with symmetric data)
s = sqrt(s^2)
NOT resistant to outliers
When to use standard deviation (s) vs IQR as measure of spread***
- measure of center = mean, use (s)
- measure of center = median, use IQR
Finite population standard deviation
sigma o-bar
sqrt(summantion (xi-x bar)^2 / N)
Letters rule of thumb
roman = sample greek = population (parameters) mew (u) = mean pop x bar = mean sample sigma (o bar) = standard deviation pop s = standard deviation sample
Standard deviation and the mean tell if your data is
centered, symmetric, or dispersed
Bell shaped data =
normally distributed
Empirical Rule
68-95-99.7% rule
1 standard deviation = 68% of data
2 standard deviations = 95% data
3 standard deviations = 99.7% data
Z-score
- standard deviation = ruler on normal distribution
- measurements = z-scores
- = how many standard deviations above or below a data point is
- z = (x - mean) / standard deviation
- no units, mean of all z-scores = 0, standard deviation of all = 1