Midterm - Important Flashcards
(22 cards)
Mode
Value that occurs the most, if there are repeats.
Standard Deviation
How a group of numbers are spread out from the mean. Square root of variance.
sqrt of ((sum of (x - mean)^2)/N)
Variance
The measure of how far each data point is placed from the mean
(sum of (x - mean)^2)/N
Standardized Score/z-score
How many standard deviations a value lies from the mean.
(x - mean)/(std dev)
|z| > 1.96 fall outside ~95% of the data
|z| > 2.58 fall outside ~99% of the data
|z| > 3.0 are “definite outliers”
Percentile
Divide the data into 100 equal parts. The nth percentile is the value below which n% of observations fall.
Quantiles
General term for dividing data into equal-sized groups.
4 parts: quartiles
5 parts: quintiles
10 parts: deciles
100 parts: percentiles
Percentile Calculation - Greater Than
To find value greater than p% of the values:
1. Multiply 0.p * n
2. Round up
3. Add 1 and use that value.
Percentile Calculation - Greater Than or Equal To
To find the value greater than or equal to p% of the values:
1. Multiply 0.p * n
2. Round up and use that value
Percentile Calculation - Interpolation
To find the pth percentile:
1. rank = p*(n + 1)
2. If rank is an integer, use that value
3. Else use the values from the rank above and below.
4. Take the difference between the values and multiply it by the fraction.
5. Add the lower-rank value, or subtract from the higher-rank.
Symmetric
The left side and the right side are roughly mirrored
mean = median
skewness = 0
Skewed Left
The left side has a long tail, while the right side has a cluster of values
mean < median
skewness < 0
Skewed Right
The left side has a cluster of values, while the right side
has a long tail mean > median
skewness > 0
Skewness
A measure of the amount and direction of skew, or departure from symmetry.
< 0.5 is symmetric
between 0.5 and 1 is slight skewness
> 1 is substantial skewness
Kurtosis
A measure of tail heaviness. Larger values of kurtosis indicate a greater presence of extreme values in the distribution.
Mesokurtic
Kurtosis is roughly = 3. Matches a normal distribution.
Excess kurtosis
Kurtosis minus 3. Mesokurtic is approximately zero.
Leptokurtic
Kurtosis > 3. Heavy (flat) tails and a peaked center. More outliers.
Platykurtic
Kurtosis < 3. Lighter (taller) tails and a flatter peak. Fewer outliers.
Correlation
A statistical measure that expresses the extent to which two variables are linearly related.
Positive - change in same direction
Neutral - no relationship
Negative - change in opposite directions
Causation
One event is the result of the occurrence of the other event. Cause and effect.
Simpson Paradox
Groups of data show one trend which is reversed when the groups are combined.