Flashcards in Stats Deck (53)
A non-numerical variable, like hair colour or eye colour
A numerical variable, like length, time
Can take any value within a given range, e.g. height, time, age,
Can only take certain values, e.g. shoe size, cost in £ and p, number of coins.
What type of data do histograms deal with?
Continuous data --> no spaces between the bars
Define the mode
The value which occurs most often.
Define the mean average
Where you add up all the numbers and then divide by the number of numbers
Define the median
The median is the middle number in an ordered list.
When is the mode used?
You should use the mode if the data is qualitative (colour etc.) or if quantitative (numbers) with a clearly defined mode (or bi-modal). It is not much use if the distribution is fairly even.
When is the mean used?
This is for quantitative data (numbers), and uses all pieces of data. It gives a true measure, and should only be used if the data is fairly symmetrical (not skewed).
When is the median used?
You should use this for quantitative data (numbers), when the data is skewed, i.e. when the median, mean and mode are probably not equal, and when there might be extreme values (outliers).
What is the range?
The range is the largest number minus the smallest (including outliers).
What's an outlier?
An extreme value
An ordinal variable is a categorical variable for which the possible values are ordered.
Nominal variables have two or more categories without having any kind of natural order. They are variables with no numeric value, such as occupation or political party affiliation.
Binary variables are variables which only take two values.
What are the various sources of data?
Routinely collected data:
- Mortality and census data
- Hospital activity data
- Primary care data
- Infectious disease notifications
- Regular national surveys (e.g. Health Survey for England)
Research study data
Have numerical values
Measurements are on a continuous scale i.e. can take an infinite number of distinct values
Also known as count variables
Have numerical values, but they must be integers e.g. Number of fillings
What are the appropriate graphical presentations to use for 1 categorical variable?
What are the appropriate graphical presentations to use for 1 continuous variable?
What is the appropriate graphical presentation to use for a categorical outcome and categorical exposure?
What is the appropriate graphical presentation to use for a numerical outcome and categorical exposure?
Box and whisker plot
What is the appropriate graphical presentation to use for a numerical outcome and numerical exposure?
What are other terms for 'exposure'?
What are other terms for 'outcome'?
heights of the bars are proportional to the frequencies
useful for comparing the frequencies in each category relative to the others
areas of the sectors are proportional to the frequencies
useful for comparing the frequencies in each category with the whole group
A distribution where the mean, median, and mode are roughly similar
A probability distribution that describes data that is symmetric around a mean
Describe positive skewness
Mode < median < mean
Also known as a right-skewed distribution because it has a long right tail.
Describe negative skewness
Mode > median > mean
Also known as a left-skewed distribution because it has a long left tail
Define the term 'standard deviation'
Measure of spread of observations around the mean
Define the term 'interquartile range'
Range from first (25%) to third (75%) quartiles of a distribution
What is a distribution?
• describes the frequency (or probability) of occurrence for a given value
• describes the shape of the data
What can we do with a distribution?
- make inferences about a wider population
- generate confidence intervals (assessing variability of estimates)
- test hypotheses
- calculate sample size
A measure of the asymmetry of the distribution
What is a null hypothesis?
A hypothesis saying that the outcome is not associated with the exposure
What is an alternative hypothesis?
A hypothesis saying that the outcome is associated with the exposure
Why use statistical tests?
We use statistical tests to help us judge if our observed effect size is due to chance or if it is real.
What is the significance level?
•The probability that you will find an effect that does NOT actually exist
•Strength of evidence needed to reject NULL hypothesis
•Normally set to 5%
Define the term 'inferential statistics'
Inferential statistics allows you to make predictions (“inferences”) from that data. With inferential statistics, you take data from samples and make generalisations about a population.
What is meant by standard error?
•Standard Error is an inferential statistic.
•It is an estimate of how variable a statistic would be if we repeated our study numerous times.
What are p-values?
P-values give the probability that we observed an effect size as large as we did if the null hypothesis is true i.e. effect size is zero
What do p-values tell you?
P-values tell us the strength of the evidence against the null hypothesis that there is no association.
As the p-value decreases the evidence against the null hypothesis increases.
What do confidence intervals tell you?
The confidence interval shows the range of values in which the true effect size is likely to lie.
A 95% confidence interval tells us that in 95% of replicate experiments, the true value will lie in the interval.
How can the concept of disease be influenced?
• Evidence of symptoms
• Technological and medical development
• Sociocultural environment
Define the term 'abnormality'
Different from what is usual or average, especially in a way that is bad
List the three types of abnormality
• Abnormal if unusual
• Abnormal if associated with clinical abnormality
• Abnormal if increased risk of future disease
Abnormal if unusual
Common in laboratory testing to define normal as the range which includes 95% of values found in healthy subjects. This means abnormal is the top and bottom 2.5% of the population
What's wrong with defining abnormal as unusual?
By definition 5% of healthy people will have “abnormal” i.e. “unusual” values
Abnormal if associated with clinical abnormality
More logical to label values of a test as abnormal if these values are clearly associated with the presence of a disease state.
What's wrong with defining abnormal as being associated with clinical abnormality?
There's almost always overlap between values in diseased subjects and those in healthy subjects