4. Data, Distribution, Populations, and Samples Flashcards
What is a categorical Variable?
Individuals classified into one
of several categories
What are different types of categorical variables?
Binary variables: Only two categories e.g. female/not female,
yes/no
Ordinal variables: >2 categories and they are ordered
e.g. pain (non/mild/moderate/severe)
Nominal variables: Neither binary nor ordinal
e.g. ethnicity (Caucasian, Asian, Black)
Whats a binary variable?
Only two categories e.g. female/not female,
yes/no
What is an ordinal variable?
: >2 categories and they are ordered
e.g. pain (non/mild/moderate/severe)
What is a nominal variable?
Neither binary nor ordinal
e.g. ethnicity (Caucasian, Asian, Black)
What is a numerical variable?
Measured on a number scale
What are different types of discrete variables?
Discrete variables: Distinct number of values
e.g. age in years, number of transfusions, parity
Continuous variables: Any value within a certain range
e.g. blood pressure, head circumference
What is a discrete variable?
Distinct number of values
e.g. age in years, number of transfusions, parity
What is a continous variable?
Any value within a certain range
e.g. blood pressure, head circumference
What is important about collecting data?
That you collect it in its most informative measure as you will never be able to return back to the point of collection. Data variables can be transformed into different types later on.
What are different ways of summarising categoric data?
Proportion = Number in the category divided by the total
Percentage = proportion x 100
Rate = the number with the event per / (people or time)
Odds =Number in the category / number not in that category
What is the Mean?
Most widely known measure of centre
= sum of all measurements/total number of measurements
Whats the median?
value that falls halfway along the frequency distribution (50th Centile)
How do you summarise the median in odd and even numbers?
Odd - exactly in the middle
Even - the average of the middle two values
When data is symmetric how will this effect the mean and median?
They will be exactly the same.
What is bad about the mean as a comparison?
The mean is highly influenced by a single extreme value. This skewed distribution can unfairly affect the mean, which will then not accurately represent the data set.
What is the aim when summarising numeric measurements?
Aim: to best summarise the data
Median = always representative of the centre of the data
Whereas, the mean is only representative if the distribution of the
data is symmetric
Mean = each measure is directly involved in its calculation very
sensitive to changes in the data and heavily influenced by outlying
measurements
Describe the Range and its positives and negatives.
Range: difference between the largest and the
smallest values of the distribution
It ignores the bulk of the data
By definition, it depends on the two most
extreme (and hence possibly ‘odd’) values
Describe the IQR and its + and -.
Inter-quartile range: the range within which 50% of the
sample values lie
More representative of the majority of the data
Does not depend on the oddest or extreme values like
the range does
More stable summary measure
What is variance?
A measure of variation. It tells us the deviations observed from the mean.
How is variance (S2) calculated?
Sum of deviations/ (n-1)
What is the standard deviation?
The average deviation from the mean.
How can the SD be calculated?
Square root of variance.
Describe the SD and its + and -.
Standard deviation is more sensitive to changes in the
data than the range and inter-quartile range
Standard deviation = more powerful summary measure
of the spread of the data as it makes more
comprehensive use of the entire dataset
If the mean is NOT a meaningful summary of the centre
of the data, then the same follows for the standard
deviation as this is based on distances from the mean