Classification of Data Flashcards
(21 cards)
What are the four levels of measurement?
Nominal, Ordinal, Interval, Ratio.
What characterizes nominal data?
Taxonomic classification (e.g., soil types, climate zones).
How does ordinal data differ?
Allows ordering (e.g., transportation routes, political boundaries).
Key difference between interval and ratio data?
Interval has arbitrary zero (temperature), ratio has true zero (distance).
What statistical methods apply to interval/ratio data?
Inferential statistics (correlation, regression).
What is the equal intervals method?
Each class has equal numerical range (range/class count).
Best use case for equal intervals?
Data with familiar ranges (e.g., temperature bands).
Main drawback of equal intervals?
May create empty classes or group dissimilar clusters.
How does quantile classification work?
Equal number of observations per class (total obs/class count).
Advantage of quantile method?
Prevents empty classes; good for ordinal data/comparisons.
What does mean-standard deviation require?
Normally distributed data (classes based on ±SD from mean).
Problem with mean-SD for raw data?
Fails if data isn’t normally distributed.
How do maximum breaks work?
Class breaks at largest gaps between ordered values.
Limitation of maximum breaks?
May miss natural data clusters.
What defines natural breaks?
Minimizes within-class variance, maximizes between-class variance.
Default classification method?
Natural breaks (visually subjective).
When to use head/tail breaks?
Heavy-tailed distributions (recursive mean splits).
First step in choosing classification?
Check data distribution via histogram.
Best method for normal distribution?
Mean-standard deviation.
Best for uniform data without outliers?
Equal intervals.
Recommended for irregular distributions?
Quantile or natural breaks.