Lecture 16 ARM Flashcards

Question

Simple random sampling

Answer 1

Select population - make a sampling frame (list of people who can be selected) - determine sample size - select randomly from pool Equal chance of being selected Depend on the study Like a lottery!

Answer 2

Every x-th person in a sample frame is selected

Answer 3

Divide a population into subgroups, or strata, based on characteristics and randomly choosing from each strata

Answer 4

Used when the population is large and geographically dispersed (many strata) Eg when population is distributed in provinces - Random selection of provinces - From provinces random sampling villages - From villages random sampling households Eg netherlands

Answer 5

Required sample size must be given in absolute numbers, not a percentage Heuristic: from a sample size higher than 29, aka n=30, it is possible to generalise across a group using standard normal distribution (below 30: use a t-distribution You can never generalise beyond your sampling frame Sampling errors

Answer 6

From a sample size higher than 29, aka n=30, it is possible to generalize across a group using standard normal distribution (below 30: use a t-distribution

Answer 7

Summarise data to identfy patterns and typical values

Answer 8

Definition: A statistics that identifies the center of a distribution of data Purpose: Finds a single value that best represents all observations in the sample Three types; Mean, median, mode - which are all different notions of "average" .. Use in research: In papers to summarise data, often in tables of sample characteristics - give a sense of typical values

Answer 9

Measures of spread (range, IQR, variance, standard deviation)

Answer 10

Mean: Sum of all values divided by the number of values - Uses ALL data points in the calculation Interpretation: The balance point of the data - very common Type of data: Requires interval/ratio (only makes sense for quantitative data) Challenge: Sensitive to outliers - one very high or low value can skew the mean significantly

Answer 11

A unnaturally high or low value of the data set which can skew the data (or mean) Researchers should comment on the outliers - they can be valuable data as well

Answer 12

Median: The value that falls in the middle of the SORTED dataset - from smallest to largest - or the 50th percentile. Requires data that can be ordered or rank, thus - ordinal or interval / ratio Benefit: Resistant to outliers - not affected by extremely high or low values - Eg adding a millionaire to the income sample of a household would barely changes the median, but would raise mean a lot

Answer 13

Mode: The most frequently occurring value in the dataset (typetall) Applicable to ANY level of measurement - the only measure that works for nominal data Usage: Highlights the most common category or value, useful for categorical data or to report a typical category

Answer 14

Match to data level: - Nominal --> Mode ONLY - Ordinal --> Median (mode works) - Interval/Ratio --> Mean usually, but median if distribution is SKEWED (and mode also works) Distribution shape - Symmetric (normal-ish) --> mean = median = mode (all similar - Skewed (long tail - bell curve moves in one direction or the other) --> median is a better representation than the mean (mean gets pulled by tail) Outliers - If outliers are present, median is best - If outliers are not present, mean is more powerful In practice - both are important to report

Answer 15

Analysing a single variable at a time

Answer 16

Check table

Answer 17

Dataset with only one single peak or mode

Answer 18

- Symmetrical distribution - Mode=median=mean - located at the peak - Normal distribution

Answer 19

- Not symmetrical - Mode, median and mean occur at distinct points - Skewed data

Answer 20

Stems from the mean being larger or smaller than the other values - maybe from outliers

Answer 21

- When the TAIL is to the right, and the bump is to the left - POSITIVE SKEW (think right = positive!) - Mean is GREATER THAN the median - Mean AND median are GREATER THAN the mode (negative/left side of the x-axis)

Answer 22

- Tail is going to the left, bump to the right - NEGATIVELY SKEWED - Mean is LESS than median (think left = less) - Mean AND median are LESS THAN mode (positive side or right side of x-axis

Answer 23

Do not take into account outliers THEREFORE we look at dispersion! (Frequencies, range, IQR, Variance, Standard deviation)

Answer 24

Type of dispersion measure Definition: Number of occurrences of each value in the data set Finding most common and least common categories in a data set

Answer 25

Measure of dispersion Difference between the maximum and minimum value of a data set Maximum value - minimum value = range Finding the total spread of the data High range - large variability Small range - small variability

Lecture 16 ARM Flashcards

(49 cards)