Ch.4 Numerical Descriptive Techniques Flashcards
(20 cards)
Measures of Central Tendency
summarize large sets of data with just one number, like an average. This could be the mean (the average you’re most familiar with), the median (the middle number), or the mode (the most frequently occurring number). These measures help us understand and communicate large sets of data more easily.
The Mean
often referred to as the average, is a single score that provides a typical value for all of the scores in a data set. It is calculated by summing all the values in the data set and then dividing by the number of values.
Mean Population Calculation
µ = ∑x / N
Mean Sample Calculation
x̄ or M = ∑x / n
The Median
is the measure of central tendency that represents the midpoint of a distribution. It is particularly useful in skewed data sets or when there are a lot of outliers.
To Find the Median,
- Put the data in order
- Find the middle score.
- If the data set has an even number of scores, the median is the average of the two middle scores.
The Mode
is the most common value in a data set and is easily identifiable as the highest frequency in a frequency distribution table or a histogram. Unlike the mean and the median, the mode must be a score in your data set.
The median is the preferred measure of central tendency in two main scenarios:
- Data containing extreme outliers or heavily skewed data
With extreme outliers or heavily skewed data, the median provides a more accurate representation of the data set than the mean. - Ordinal data, meaning the data is not properly numerically measured but has an order.
With ordinal data, the median allows for a measure of central tendency when a numerical sum (and therefore the mean) cannot be calculated.
The Geometric Mean
is a type of average that is calculated by multiplying all the numbers in a set together, then taking the nth root, where ‘n’ is the total number of values. It’s especially useful in situations involving proportional growth or rates of return, but can only be used with positive numbers.
Geometric Mean Formula
μ geometric=[(1+R1)(1+R2)(1+Rn)]^1/n −1
where: ∙R1…Rn are the returns of an asset (or other
observations for averaging).
The Geometric Mean may be more accurate than the arithmetic mean for calculating average rates of return, growth rates, or compounding interest rates. However,
for estimating future rates of return, the arithmetic mean is more appropriate.
Variability measures
such as range, variance, and standard deviation provide additional insights into how spread out data is around the mean.
The Range
It’s simply the difference between the maximum and minimum values in the data. However, this also highlights the limitation of the range, as it only considers the extreme values and ignores the rest of the data.
Range formula
range = largest observation - smallest observation
- does not consider all data in its calculation
- very sensitive to extreme values in data
The Empirical Rule
also known as the 68-95-99.7 rule, states that for a normal distribution, nearly all data will fall within three standard deviations of the mean. Specifically, 68% of data falls within one standard deviation of the mean, 95% within two standard deviations, and 99.7% within three standard deviations.
Deviations
∑(x - µ ) = 0 (MUST ALWAYS = 0)
Variance (avg squared deviation)
σ2 = (x - µ )^2 then -> σ2 = ∑(x - µ )^2 / N (POPULATION) or σ2 = ∑(x - x̄ )^2 / n - 1 (SAMPLE)
The Empirical Rule, also known as the 68-95-99.7 rule, states that
for a normal distribution, nearly all data will fall within three standard deviations of the mean. Specifically, 68% of data falls within one standard deviation of the mean, 95% within two standard deviations, and 99.7% within three standard deviations.
Chebysheff’s Theorem:
The proportion of observations in any sample or population that lie within k standard deviations of the mean is at least
1 - 1/ k^2
k = # of standard deviations
Note - This applies to any shape of distribution not just a bell
K must be > 1
Coefficient-of-Variation
CV = σ /