Topic 1 Descriptive Statistics Flashcards
Data are time-consuming, ____, and of varying quality.
commercially sensitive
A data set of size __ is denoted as: {xi}ᵢ₌₁,…,ₙ
n
What’s the difference between a dot plot and a histogram?
Dot plots show individual data points (multi-frequency data set) while histograms group data into bins and show frequencies.
What affects the appearance of a histogram the most?
A. Sample size
B. Axis label
C. Bin size
D. Title
C. Bin size
Relative frequency = frequency / ____
total number of data points (n)
What is plotted on the vertical axis of a histogram?
The absolute (or relative) frequency is plotted on the
vertical axis
What does F(x) in a CDF plot represent?
The relative frequency of data ≤ x
What is the downside of using histograms?
Downside is that histogram differs depending on bin size
F(x) =
- 0 if x < x₍₁₎
- j/n if xⱼ ≤ x < xⱼ₊₁
- 1 if x ≥ x₍ₙ₎
Formula for Arithmetic Mean
x̄ = (1/n) ∑xi
Which measure is affected most by outliers?
Mean
Which is not a measure of central tendency?
A. Median
B. Mode
C. Range
D. Mean
C. Range
The geometric mean is only used for ____ data.
non-zero
What does interquartile range (IQR) measure?
The spread between Q3 and Q1 (middle 50% of data)
Formula for Sample variance (unbiased)
s² = (1/(n−1)) ∑(xi − x̄)²
Formula for Sample Standard Deviation (unbiased)
s = √(s²) = √[(1/(n−1)) ∑(xi − x̄)²]
What measures asymmetry?
Skewness
The coefficient of variation is given by: vx = ____ / x̄
standard deviation (sx)
Why is the mean absolute deviation more robust than standard deviation?
It is less influenced by outliers
What does a positive skew indicate about the data?
Mean > Median > Mode
Formula
Biased skewness:
g₁(x) = (1/n) ∑(xi − x̄)³ / ωₓ³
Skewness is a ____ quantity (unit-less).
non-dimensional
What is the median if n is odd?
median(x) = x₍ₙ₊½₎
What is the mode?
The most common value