Lecture 17 ARM Flashcards

Visualising Anthropological Data - 4/7 (22 cards)

1
Q

Inter-Quartile Range (IQR)

A

Definition: The range of the middle half of all the observations
Less influence of outliers - more robust
More information about the shape

Calculation: QR - Q1(75th percentile - 25th percentile)

Gets rid of the outer values

Eg important when studying inequality

Eg exam question: What is interquartile range used for?
Answer: To remove things like outliers when studying a community.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Variance 1(yap from kevin)

A

To understand how varied the data is - only used for interval/ratio - useful for normally distributed data.

Gives an indication of how far, on average, the values are from the central tendency (mean, usually).

Why do we need to know variance?
Provides a measure of variability, that takes into account the distance of each value from the mean. By squaring the deviations, variance gives more weight to larger differences - making it sensitive to outliers.

Often used in intermediate steps to get standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Standard deviation

A

The square root of the variance.
Measures the average distance between each data point from the mean - but expressed in the same units as the original data. This makes it more interpretable /measurable for variability. ITPROVIDESACLEARINDICATIONOFHOWMUCHTHEVALUESDEVIATEFROMTHEMEAN.

Differs from IQR-Standard deviation ISinfluenced by outliers. Weakness or strength depending on if the outliers are measurement errors or meaningful data.

Large SD: data is widely scattered
Small SD:data clustered tightly around the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Variance and Standard deviation for anthropology

A

Used in various subfields
Eg linguistics anthro - measure variability of lingustic structures
- quantify and compare variability in data - give more patterns, nuances etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Variance 2 (slide)

A

Defintion: Refers to the spread of dispersion of data values in a set - how much they differ from each other

Calculation: Average of squared differences from the mean

If all the data points are almost the same - variation is low. If they vary widely, variation is high (doooiiininngng)

In everyday terms: The degree of difference among observations (also called VARIABILITYor SPREAD)

Anthro - finding differences and spread, it provides more context that average / centrality measures may overlook or hide.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why variation matters in anthropology

A
  1. Reveals complexity - shows diversity within groups, showing that populations are not homogenous
  2. Cultural and individual differences - anthros study cultural complexity, eg variances in practices, beliefs
  3. Avoids misleading averages - focusing only on averages can hide important differences - eg inequalities or subgroup paterns - hence it provides context
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Range

A

Definition: The simplest measure of variation - difference between the highest and lowest values in a dataset: Calculation of max value - min value
Sensitive to outliers!
Insight: Gives a quick sense of span, but does not tell us about distribution in between - can be affected by extreme values (outliers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Variance and sd in the sample versus in the population

A

s = sample standard deviation
sˆ2 = sample variance
o = population sd
oˆ2 = population variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Understanding Outliers

A

Definition: An unusual data point that lies far outside the range of the majority of the data - much higher or lower than the rest of the observation
Impact: can skew averages and distort analysis. - but also be important signals - maybe an error or a meaningful exception

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why visualize data?

A
  1. Reveal patterns: Display distribtution of data quickly (skewed, symmetric), peaks - (common values, eg unimodal, bimodal, multimodal), clusters and outliers
  2. Complement numbers: Averages and SD give summary - charts show detail - eg bimodality or skews
  3. Accessibility: For many people at a glance
  4. Anthropological insight: Helps spot cultural or biological patters (diversity, anomalies) that merit further inquiry (interdisciplinary audiences)
  5. Keep it clear: simple, direct, effective
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Histogram

A

Definition: A histogram plots a numeric variable’s distribution as a series of bars (no middle space between the bars

x-axis: data value ranges (bins , eg 0- … any number)
y-axis: frequency (count of observations in each bar/bin

Purpose: reveal SHAPE of data / distribution
- where is the data concentrated
- spread; are valyes tightly clustered or widely dispersed
- skewness: longer tail on left or right? multiple peaks? outliers?

Anthropological uses
- Eg distribution of household sizes in villages…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Central tendency and variation in histogram

A

Central values show up as the tallest bar region (mode) - variability indicated by the width or spread of (range of bins with counts).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Examples of histograms

A
  1. Symmetric / unimodal
  2. Skew left
  3. Skew right
  4. Uniform
  5. Bimodal
  6. Multimodal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Common pitfall histograms

A
  • Misreading bin ranges - each bar covers ARANGE - not a single value! (that would be a bar chart)
  • Comparing histograms with different bin widths without caution (can distort perception of shape)
  • Confusing histogram (numeric bins) with bar chart (categorical) - histograms are for continuous data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Boxplots

A

Also knows as the whisker-plot - shows median, quartiles and outliers

Box: Spans over the IQRaka Q3-Q1,middle 50% of the data
Median: Line inside the box (Q2-50th percentile)
Whiskers: Extend the data from 1.5 x IQRfrom the box (not beyond 2.7 SD)
Outliers: points beyond the box plot
See illustration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Purpose boxplot

A

Summarise distribution SHAPE and VARIABILITY in a compact form - great for comparing multiple groups side by side
- see median differences at a glance
- see IQR (box height)
- check symmetry vs skew (is median centered in box - are whisker lengths equal?)
- identify outliers - plotted as dots

15
Q

Boxplots in anthropology

A

Compare distribution across categories
- Eg nutritional status by region, income by ethnic group etc
- conveys which groups tends higher or more variable

16
Q

Common pitfalls boxplots

A

1) Confusing median with mean (mean is NOT shown)
2) Assuming whiskers = min and max values - the whiskers are only 1.5 x IQR - outliers are separate
3) Ignoring outlier points - they can important
4) Comparing boxplot heights without considering sample size - also small n can lead to misleading boxplots.

17
Q

Scatterplot

A

Definition: Plot of individual data points on two axes (x vs y) - each point is one observation with a value for variable x and variable y (eg country and income)

Purpose: Reveal ASSOCIATION/CORRELATION between two quantitative variables
- Direction: positive correlation (upward trend), negative correlation (downward trend), or no correlation (no clear trend)
- Form: Linear, curved, clustered, or outliers influencing patterns
- Strength: How tightly do points cluster around a line or trend - tight = strong, scattered = weak

18
Q

Anthropological use of scatterplots

A

examine a potential relationship - see if higher X tends to go with higher or lower Y
-Identify subgroups or anomalies

19
Q

Central tendency and variation in scatterplots

A
  • No single center shown - focus on co-variation instead of a variable’s mean
  • Variation: the spread of points around any trend indicates how consistent or strongly correlated the relationship is. wide scatter = high variability and thus weak correlation
20
Q

Common pitfalls scatterplots

A
  • Correlation vs causation trap
  • Overplotting in large datasets - too many can obscure patterns
  • Ignoring non-linear patters
  • Letting outliers distort perception