Week 9 Kuracloud: Measuring and Summarising Data Flashcards

1
Q

Statistics

=

(Kirkwood & Sterne. Essential Medical Statistics, 2nd ed., 2010)

A

= “the science of collecting, summarising, presenting and interpreting data, and of using them to estimate the magnitude of associations and test hypotheses”

(Kirkwood & Sterne. Essential Medical Statistics, 2nd ed., 2010)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Descriptive Statistics

A

= describes features of data sample
“summarising, presenting and interpreting data”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Inferential Statistics

A

= infer findings of sample to target population
“estimate the magnitude of associations and test hypotheses”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data

=

A

= “a set of values of subjects with respect to qualitative or quantitative variables”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Raw Data

=

A

= observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data set

=

A

= collection of information regarding a group of people or other items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Variables

=, 2

A

= characteristics that you can measure or observe and may take any one of a specified set of values
- Numerical (quantitative) (or interval/ratio data)
- Categorical (qualitative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Categorical Variables

2,1

A
  • ordered/ordinal = rank in categories in an order
  • unordered/nominal = place observations in named, unordered groups
    • dichotomous/binary
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Numerical Variables

2

A
  • continuous = on a continuos scale, can take any value in range
  • discrete = finite options, usually countable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Derived variable

=,

A

= new variable created from existing variable
variable measured as numerical –> categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Spreadsheets of datasets

3

A
  • Columns: each represents 1 variable (first usually identifier)
  • Rows: each represents data for 1 person (record)
  • Cells: value of 1 variable for 1 person = observation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Outcome variable

=, (3)

A

= focus of attention, we try to explain its variation
(dependant variable/response variable/y-variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Exposure Variable

=, (3)

A

= influences variation of outcome variable
(independant variable/predictor variable/x-variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Operationalising Variables

=,

A

= deciding which category designates individual as having an outcome/exposed
dictates interpretation of results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Nominal (unordered categorical) variable measurement

2

A
  • frequencies (no. observations in each category)
  • proportions (relative frequencies)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Ordinal (ordered categorical) measurement

2

A
  • frequencies
  • proportions
  • sometimes means and medians
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Numerical (interval/ratio) measurement

3

A
  • mean
  • median
  • standard deviation
18
Q

Nominal (unordered categorical) graphical representation

3

A
  • pie chart
  • column/bar graph
  • stacked column/bar graph
19
Q

Ordinal (ordered categorical) graphical representation

1

A
  • column/bar graph
20
Q

Numerical (interval/ratio) graphical reprentation

4

A
  • bar graph (data grouped)
  • histogram (data grouped)
  • box and whisker plot (summary statistics)
  • line graph (over time)
21
Q

Relative frequencies

=, 3

A

= proportion/percentage of total number
presented in:
- table
- bar graph
- pie chart

22
Q

Epidemiological prevalence or cumulative incidence

2

A

Presentation: proportion/percentage
Type: dichotomous categorical variables

23
Q

Frequency distribution

=, 2, 2

A

= distribution of values of a numerical variable
- first step in analysing numerical data
- displayed in a histogram
- for discrete: individual frequencies displayed
- for continuous: frequencies of formed groups/ranges

24
Q

Histogram vs Bar graph

A

histogram has no gaps between bars because continous data

25
Histograms show us: | 5
- spread - skew - mode - gaps - unusual values
26
Histogram Shapes
- positively skewed - symmetrical - negatively skewed
27
Positively Skewed | =,
= asymmetrical distribution in which "upper tail is longer than lower tail" (higher frequency at left/lower values) ^\__ mean > median
28
Symmetrical | =,
= symmetrical distribution around centre, bell curve, normal distribution, Gaussian distribution _/^\_ mean, median, mode almost equal
29
Negatively Skewed | =,
= asymmetrical distribution in which "lower tail is longer than upper tail" (higher frequency at higher/right values) /^ mean < median
30
Measures of Central Tendency | 3
- mean - median - mode
31
Measures of Variability | 3
- range - interquartile range/IQR (difference between 1st and 3rd quartiles) - standard deviation
32
Standard deviation (SD)
= measure of spread about mean calculation: 1. differences of each observation from mean taken (deviations) 2. Deviations are squared 3. Add deviations together 4. divide by no. observations - 1 (= variance = SD squared) 5. Square root
33
Theoretical Frequency Distribution/Standard Normal Distribution properties (or PDF = probability density function) | 8
- symmetrical about mean (bell curve) - mean = 0, SD = 1 - tall and narrow for small SD, short and wide for large SD - 68% lie within 1 SD of mean - 95% lie within 2 (actually 1.95) SDs of mean - 99% lie within 3 SDs of mean - use mean and SD to find proportion lying between any two values - probability of any specific value is 0
34
95% reference range/central reference range | =
= range of expected normal values in a population, values that enclose 95% population (1.95 or 2 SD either side of mean)
35
Assumption of Normality | =, 2
= assuming values of a continuous variable are normally distributed before calculations Distribution may be skewed if: 1. Mean and median are very different 2. Very large SD, 95% reference range falls outside of possible values or is negative
36
Aggregated Data | =
= units of observation are combined not individual level
37
Univariate analysis | =
= describes single variable
38
Bivariate analysis | =,
= relationship between 2 variables - exposure --> outcome, test hypothesis
39
When both variables categorical: | 4
display relationship by cross-tabulating in a contingency table - rows: exposure - columns: outcomes (no outcome column eliminated if percentages) used to calculate odds rations
40
Categorical Measures of association | 3
- odds ratio = strength of association between variables (yes/no --> odds for variable 1/odds for variable 2) - risk ratio (only in longitudinal) - prevalence ratio (good for cross-sectional)
41
When both variables numerical
Scatterplot - x-axis: exposure - y-axis: outcome
42
Numerical Measures of Association | ===,4
r = correlation coefficient = strength of linear association between two continuous variables = number of SD that outcome changes for 1 SD when exposed - always between -1 and 1 - r < 0: inverse correlation - r = 0: no association - r > 0: correlation - r = 1: perfect correlation, straight line