2 - Summarising Data Flashcards

1
Q

What does each row & each column represent?

A

Each row = an OBSERVATION (or record) & represents 1 person

Each column = a VARIABLE (e.g race, gender, DOB)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 4 types of variables ?

A
  1. A nominal-scale variable
    - Values are categories w/out numerical ranking e.g country of residence
    - Nominal variables w/ only 2 categories are v common: alive/dead, ill/well, vax/unvax, smoked/didn’t smoke
    - A nominal variable w/ 2 mutually exclusive categories = DICHOTOMOUS VARIABLE
  2. An ordinal-scale variable
    - Has values that can be ranked but aren’t necessarily evenly spaced e.g stage of cancer
  3. An interval-scale variable
    - Measured on a scale of equally spaced units, but w/out a true 0 point e.g DOB
  4. A ratio-scale variable
    - Interval variable w/ a true 0 pt e.g height in cm, systolic bp in mmHg, duration of illness in days
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 4 types of variables ?

A
  1. A nominal-scale variable
  2. An ordinal-scale variable
  3. An interval-scale variable
  4. A ratio-scale variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A nominal-scale variable

A
  • Values are categories w/out numerical ranking e.g country of residence
  • Nominal variables w/ only 2 categories are v common: alive/dead, ill/well, vax/unvax, smoked/didn’t smoke
  • A nominal variable w/ 2 mutually exclusive categories = DICHOTOMOUS VARIABLE
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

An ordinal-scale variable

A
  • Has values that can be ranked but aren’t necessarily evenly spaced e.g stage of cancer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

An interval-scale variable

A
  • Measured on a scale of equally spaced units, but w/out a true 0 point e.g DOB
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A ratio-scale variable

A
  • Interval variable w/ a true 0 pt e.g height in cm, systolic bp in mmHg, duration of illness in days
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What kind of variables are nominal- & ordinal-scale variables ?

A

QUALITATIVE or CATEGORICAL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What kind of variables are interval- & ratio-scale variables?

A

QUANTITATIVE or CONTINUOUS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Frequency distributions are represented in a histogram, with 3 main features. What are they?

A
  1. Central location (peak of distribution)
  2. Spread (how widely dispersed it is on both sides of peak)
  3. Shape (where it is approx symmetrical)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the 3 measures of central location?

A
  1. Mean
  2. Median
  3. Mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is spread & what are the 2 measures?

A

Aka variation or dispersion

  1. Range
  2. Standard deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the 2 possible shapes of a frequency distribution?

A

skewed vs symmetrical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does skewness refer to? What does +vely or -vely skewed mean?

A

skewness refers to the TAIL, not the hump → so a distribution skewed to L has a long L tail

If skewed to R → +vely skewed
If skewed to L → -vely “

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the normal of Gaussian distribution?

A

Classic bell-shaped curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the median?

A

Middle value of a set of data thats been put into rank order, value that divides the data into 2 halves

50th percentile (of the distribution)

16
Q

What is the mean?

A

Aka average

Best descriptive measure for data that are normally distributed

17
Q

What is used instead of MEAN for data values which are skewed or have outliers?

18
Q

How does one select to use mean, median or mode?

A
  1. Characteristics of data – eg normally distributed or skewed & with/without outliers
  2. Reason for calculating the measure – eg descriptive or analytical purposes
Mean = measure of choice when data are normally distributed 
Median = measure for data not normally “
19
Q

When data is not normally distributed, median is not preferred. True or false?

A

True
Mean uses all the data & is sensitive to outliers
Mode & median → unaffected by outliers

20
Q

What are the 3 measures of spread?

A
  1. Range
  2. IQR
  3. SD
21
Q

What are percentiles?

A

Divide data into distribution of 100 equal parts

Pth percentile (P goes from 0 to 100) = value that has P % of values falling at or below it → 90th percentile has 90% of values “ “

22
Q

What are quartiles?

A

= grouping data into 4 equal parts/quartile
Each quartile = 25% of the data
Cut-off for the 1st quartile is the 25th percentile
Cut-off “ “ 2nd “ = 50th “
(etc etc)

23
Q

What is the IQR (interquartile range)?

A

Measure of spread used most commonly w/ the median

Represents the central portion of the distribution, from the 25th percentile to 75th percentile

24
The IQR is generally used in conjunction with what?
median → together, useful to characterize central location & spread of any freq distributions → but esp skewed (asym) ones
25
What is a box plot?
graphical representation of locality, spread & skewness groups of numerical data thru their quartiles
26
Uses of IQR
If distrib is non-symmetric – use range & IQR (so median goes together w/ range & IQR)
27
What is standard deviation (SD)?
Variability in a set of data | Commonly used w/ mean
28
When is SD used?
Only when data is normally distributed (i.e data falls into bell-shaped curve) For normally distributed data: - Mean = recommended measure of central location - SD = “ “ of spread
29
What is the standard error (se) of the mean?
Variability we may expect in means of repeated samples taken from the same population Divide SD by square root of n
30
How is se calculated?
Divide SD by square root of n
31
What is standard error/se of mean used for?
Calculation of confidence intervals (confidence limits) around the mean
32
What is "inference"?
Epidemiologists conducting studies to make generalizations about the larger population
33
What does a narrow vs wide confidence interval (CI) mean?
Narrow CI → high precision Wide CI → low precision Narrower the interval, the more precise the estimate Big studies → WIDE confidence intervals (more confident ab data obtained) Small studies → NARROW “ “
34
What are confidence intervals (CIs) used for?
calculated for means but ALSO for: - proportions, rates, risk ratios, odds ratios (& other measures where purpose = draw inferences from a study to the population)