Critical numbers - statistics Flashcards
What is the target population and sample population?
We can’t collect info from everyone so we take a sub set from the whole population this is known as the sample population.
What is sampling bias?
What is recall bias?
Social desirability bias?
Information bias?
Sampling bias = individuals in the study are more/less likely to be included than others
Recall bias = individual can not remember specifics of a question
Social-desirability bias = individuals tell us incorrect information because they feel a societal pressure
Information bias = measurement bias
What is a background/confounding factor?
Something that is responsible for the outcome and related to the exposure.
Screen use and poor vision…. Cofounder = lack of natural light.
Types of study design:
Experimental vs observational
Retrospective vs prospective
Individual vs population
Experimental = researcher changes something/ has intervened
Observational = researcher just collected data
Retrospective = look back to see if exposure caused outcome
Prospective = collect information to see if current exposure leads to outcome
Individual = info collected on an individual - usual study design
Population/ecological = whole populations looked at
Types of study:
Case control
Look at individuals with outcome and matched individuals without and look to see who had exposure and the outcome.
Good for investigating rare disease
Cross sectional study:
Look at what is happening now (snapshot of time)
Who currently has exposure and the outcome
Difficult to establish order of events
Cohort study:
Collect information on a sample, some have exposure some do not, no one has outcome yet. Then follow up and see if those with exposure leads to more outcomes.
Time consuming, expensive
Randomised control trial:
Have multiple groups(also known as arms)
Give a different exposure to each group
Compare the outcomes between the groups
Steps to avoid bias: Blinding - single and double Randomisation - flipping a coin Placebos Matching - identical with only difference is the exposure
Gold standard, but expensive and not always suitable exposures
Crossover trial:
Extension of a RCT where everyone in the study has all different exposures. Therefore you can compare their effects to themselves.
Randomised which treatment/exposure they receive first
Not always suitable as may be carry over effects
What is a variable?
A quantitive measure of something that varies.
What is a categoric variable and what are the subtypes?
Categoric variables fit into a particular category.
Binary = 2 categories - yes or no
Ordinal > 2 with a natural ordering e.g. low medium and high
Nominal > 2 with no ordering e.g hair colour, ethnicity
What is a numeric variable and what are the subtypes?
A variable that is a measured on a scale.
Can be discrete = where this a distinct number of values e.g age in years
Continuous = can take nay value within its limits e.g. weight
What is descriptive statistics?
Collection of statistical measures used to describe the data sample we have.
Definitions of:
Proportion
Probability
Odds
Rate
Portion = total number with outcome/total number
Probability = proportion x 100
Odds = number with outcome/number without
Rate = number of times something happens per a quantified e.g x per 100 people
What is the risk difference?
Risk ratio?
Odds ratio?
Risk difference = subtraction of one proportion from the other
“Risk with X …% higher than with Y”
Risk ratio = Group A/Group B proportion or percentage
The focus on the top compared to on bottom
If greater than 1 risk in group A larger than B if 1 then its the same and if less than 1 its smaller.
1.85 shows a increase risk of 85%
0.80 shows a decreased risk of 20%
Odds ratio = Group A odds/ Group B odds
Odds increased or decreased by X
Remember a score of 2 is only 100% increase in odds
Odds ratio and risk ratio can cause what?
Can cause unnecessary panic, 200% increase may sound larger but actual risk could still be very very small.
What does standard deviation show?
Shows the spread of dat about the mean
Sigma is the symbol for
Standard deviation
Variance =
SD squared
Mean =
Sum of numbers/total number
Median =
middle number of data set
If invested 2 numbers take the average
In a perfect symmetric distribution the mean and median are…
Equal
When the distribution is not symmetric you are said to have a…
Skewed distribution
It can be right or left skewed depending on the position of the outlier.
The outlier will skew it in that direction
What are the three measures of spread we have learnt about?
Range
SD
Interquartile range
How to work out range?
Is it useful?
Largest minus smallest value
Not good for data with outlier
IQR
What is it and is it good?
Represents the middle 50% of the data.
To calculate, order clues, then calculate the 25th percentile and 75th percentile and leave like that and or minus the 75th from the 25th.
Associated with the median and is better for data with outliers .
Standard deviation
What is it and is it good?
Spread of data about the mean
Again it can be skewed by outliers as it takes into account the mean
However it is more powerful as it uses all the data. Therefore it should be used in statistics unless the data is skewed. If the data is skewed then IQR should be used.
In a symmetric distribution what should be used to summarise data?
In a non symmetric distribution what should be used to summarise the data?
Symmetric = use mean and SD
Non symmetric use IQR and median as they are not affected by outliers.
What is a normal distribution?
Certain numerical values e.g weight wen plotted will follow a normal distribution.
This is because most people have values that are in the middle around the mean with only a few extremes either side.
The shape of the curve is bell showed and hence it is sometimes referred to as a bell curve.
The mean is in the middle and the larger the SD the flatter and wider the curve will be.
Using a normal distribution mean and SD what can we work out?
We know that 1 SD either side of the mean = 68% of data
1.96= 96% of values
3 = potential outlier if further than this point
To work out the reference range what do we do and what does it show?
The reference range is 95% of the population.
We do mean - 1.96 x SD
And mean + 1.96 x SD
This shows in our sample that 95% of observed values fall between … and …
When not to use the reference range?
If the graph is not normally distributed and there are outliers present, you may get a range that is not possible or is factually incorrect.
There will not be 2.5% of values on either side.
To quantify the difference of numerical data you can?
Look at the differences in means.
If not possible you can use the difference in medians.
What is Pearsons correlation coefficient?
What does 1, 0 and -1 show?
A statistic that measures linear correlation between two variables X and Y. It has a value between +1 and −1.
1 shows a perfect positive linear association
0 shows no correlation
- 1 shows a perfect negative linear association
1 = as x increases y increases
-1 as x increases y decreases
Closer to 1 or -1 will show a stronger correlation