2 - Summarising Data Flashcards
What does each row & each column represent?
Each row = an OBSERVATION (or record) & represents 1 person
Each column = a VARIABLE (e.g race, gender, DOB)
What are the 4 types of variables ?
- A nominal-scale variable
- Values are categories w/out numerical ranking e.g country of residence
- Nominal variables w/ only 2 categories are v common: alive/dead, ill/well, vax/unvax, smoked/didn’t smoke
- A nominal variable w/ 2 mutually exclusive categories = DICHOTOMOUS VARIABLE - An ordinal-scale variable
- Has values that can be ranked but aren’t necessarily evenly spaced e.g stage of cancer - An interval-scale variable
- Measured on a scale of equally spaced units, but w/out a true 0 point e.g DOB - A ratio-scale variable
- Interval variable w/ a true 0 pt e.g height in cm, systolic bp in mmHg, duration of illness in days
What are the 4 types of variables ?
- A nominal-scale variable
- An ordinal-scale variable
- An interval-scale variable
- A ratio-scale variable
A nominal-scale variable
- Values are categories w/out numerical ranking e.g country of residence
- Nominal variables w/ only 2 categories are v common: alive/dead, ill/well, vax/unvax, smoked/didn’t smoke
- A nominal variable w/ 2 mutually exclusive categories = DICHOTOMOUS VARIABLE
An ordinal-scale variable
- Has values that can be ranked but aren’t necessarily evenly spaced e.g stage of cancer
An interval-scale variable
- Measured on a scale of equally spaced units, but w/out a true 0 point e.g DOB
A ratio-scale variable
- Interval variable w/ a true 0 pt e.g height in cm, systolic bp in mmHg, duration of illness in days
What kind of variables are nominal- & ordinal-scale variables ?
QUALITATIVE or CATEGORICAL
What kind of variables are interval- & ratio-scale variables?
QUANTITATIVE or CONTINUOUS
Frequency distributions are represented in a histogram, with 3 main features. What are they?
- Central location (peak of distribution)
- Spread (how widely dispersed it is on both sides of peak)
- Shape (where it is approx symmetrical)
What are the 3 measures of central location?
- Mean
- Median
- Mode
What is spread & what are the 2 measures?
Aka variation or dispersion
- Range
- Standard deviation
What are the 2 possible shapes of a frequency distribution?
skewed vs symmetrical
What does skewness refer to? What does +vely or -vely skewed mean?
skewness refers to the TAIL, not the hump → so a distribution skewed to L has a long L tail
If skewed to R → +vely skewed
If skewed to L → -vely “
What is the normal of Gaussian distribution?
Classic bell-shaped curve
What is the median?
Middle value of a set of data thats been put into rank order, value that divides the data into 2 halves
50th percentile (of the distribution)
What is the mean?
Aka average
Best descriptive measure for data that are normally distributed
What is used instead of MEAN for data values which are skewed or have outliers?
MEDIAN
How does one select to use mean, median or mode?
- Characteristics of data – eg normally distributed or skewed & with/without outliers
- Reason for calculating the measure – eg descriptive or analytical purposes
Mean = measure of choice when data are normally distributed Median = measure for data not normally “
When data is not normally distributed, median is not preferred. True or false?
True
Mean uses all the data & is sensitive to outliers
Mode & median → unaffected by outliers
What are the 3 measures of spread?
- Range
- IQR
- SD
What are percentiles?
Divide data into distribution of 100 equal parts
Pth percentile (P goes from 0 to 100) = value that has P % of values falling at or below it → 90th percentile has 90% of values “ “
What are quartiles?
= grouping data into 4 equal parts/quartile
Each quartile = 25% of the data
Cut-off for the 1st quartile is the 25th percentile
Cut-off “ “ 2nd “ = 50th “
(etc etc)
What is the IQR (interquartile range)?
Measure of spread used most commonly w/ the median
Represents the central portion of the distribution, from the 25th percentile to 75th percentile