Critical numbers - statistics Flashcards

1
Q

What is the target population and sample population?

A

We can’t collect info from everyone so we take a sub set from the whole population this is known as the sample population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is sampling bias?

What is recall bias?

Social desirability bias?

Information bias?

A

Sampling bias = individuals in the study are more/less likely to be included than others

Recall bias = individual can not remember specifics of a question

Social-desirability bias = individuals tell us incorrect information because they feel a societal pressure

Information bias = measurement bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a background/confounding factor?

A

Something that is responsible for the outcome and related to the exposure.

Screen use and poor vision…. Cofounder = lack of natural light.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Types of study design:

Experimental vs observational

Retrospective vs prospective

Individual vs population

A

Experimental = researcher changes something/ has intervened

Observational = researcher just collected data

Retrospective = look back to see if exposure caused outcome

Prospective = collect information to see if current exposure leads to outcome

Individual = info collected on an individual - usual study design

Population/ecological = whole populations looked at

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Types of study:

Case control

A

Look at individuals with outcome and matched individuals without and look to see who had exposure and the outcome.

Good for investigating rare disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Cross sectional study:

A

Look at what is happening now (snapshot of time)
Who currently has exposure and the outcome

Difficult to establish order of events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Cohort study:

A

Collect information on a sample, some have exposure some do not, no one has outcome yet. Then follow up and see if those with exposure leads to more outcomes.

Time consuming, expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Randomised control trial:

A

Have multiple groups(also known as arms)
Give a different exposure to each group
Compare the outcomes between the groups

Steps to avoid bias: 
Blinding - single and double 
Randomisation - flipping a coin 
Placebos 
Matching - identical with only difference is the exposure 

Gold standard, but expensive and not always suitable exposures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Crossover trial:

A

Extension of a RCT where everyone in the study has all different exposures. Therefore you can compare their effects to themselves.
Randomised which treatment/exposure they receive first

Not always suitable as may be carry over effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a variable?

A

A quantitive measure of something that varies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a categoric variable and what are the subtypes?

A

Categoric variables fit into a particular category.

Binary = 2 categories - yes or no

Ordinal > 2 with a natural ordering e.g. low medium and high

Nominal > 2 with no ordering e.g hair colour, ethnicity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a numeric variable and what are the subtypes?

A

A variable that is a measured on a scale.

Can be discrete = where this a distinct number of values e.g age in years

Continuous = can take nay value within its limits e.g. weight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is descriptive statistics?

A

Collection of statistical measures used to describe the data sample we have.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Definitions of:

Proportion
Probability
Odds
Rate

A

Portion = total number with outcome/total number

Probability = proportion x 100

Odds = number with outcome/number without

Rate = number of times something happens per a quantified e.g x per 100 people

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the risk difference?

Risk ratio?

Odds ratio?

A

Risk difference = subtraction of one proportion from the other
“Risk with X …% higher than with Y”

Risk ratio = Group A/Group B proportion or percentage
The focus on the top compared to on bottom
If greater than 1 risk in group A larger than B if 1 then its the same and if less than 1 its smaller.
1.85 shows a increase risk of 85%
0.80 shows a decreased risk of 20%

Odds ratio = Group A odds/ Group B odds
Odds increased or decreased by X
Remember a score of 2 is only 100% increase in odds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Odds ratio and risk ratio can cause what?

A

Can cause unnecessary panic, 200% increase may sound larger but actual risk could still be very very small.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does standard deviation show?

A

Shows the spread of dat about the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Sigma is the symbol for

A

Standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Variance =

A

SD squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Mean =

A

Sum of numbers/total number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Median =

A

middle number of data set

If invested 2 numbers take the average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

In a perfect symmetric distribution the mean and median are…

A

Equal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

When the distribution is not symmetric you are said to have a…

A

Skewed distribution
It can be right or left skewed depending on the position of the outlier.

The outlier will skew it in that direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the three measures of spread we have learnt about?

A

Range
SD
Interquartile range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How to work out range?

Is it useful?

A

Largest minus smallest value

Not good for data with outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

IQR

What is it and is it good?

A

Represents the middle 50% of the data.

To calculate, order clues, then calculate the 25th percentile and 75th percentile and leave like that and or minus the 75th from the 25th.

Associated with the median and is better for data with outliers .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Standard deviation

What is it and is it good?

A

Spread of data about the mean

Again it can be skewed by outliers as it takes into account the mean

However it is more powerful as it uses all the data. Therefore it should be used in statistics unless the data is skewed. If the data is skewed then IQR should be used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

In a symmetric distribution what should be used to summarise data?

In a non symmetric distribution what should be used to summarise the data?

A

Symmetric = use mean and SD

Non symmetric use IQR and median as they are not affected by outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is a normal distribution?

A

Certain numerical values e.g weight wen plotted will follow a normal distribution.
This is because most people have values that are in the middle around the mean with only a few extremes either side.

The shape of the curve is bell showed and hence it is sometimes referred to as a bell curve.

The mean is in the middle and the larger the SD the flatter and wider the curve will be.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Using a normal distribution mean and SD what can we work out?

A

We know that 1 SD either side of the mean = 68% of data

1.96= 96% of values

3 = potential outlier if further than this point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

To work out the reference range what do we do and what does it show?

A

The reference range is 95% of the population.

We do mean - 1.96 x SD
And mean + 1.96 x SD

This shows in our sample that 95% of observed values fall between … and …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

When not to use the reference range?

A

If the graph is not normally distributed and there are outliers present, you may get a range that is not possible or is factually incorrect.

There will not be 2.5% of values on either side.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

To quantify the difference of numerical data you can?

A

Look at the differences in means.

If not possible you can use the difference in medians.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is Pearsons correlation coefficient?

What does 1, 0 and -1 show?

A

A statistic that measures linear correlation between two variables X and Y. It has a value between +1 and −1.

1 shows a perfect positive linear association
0 shows no correlation
- 1 shows a perfect negative linear association

1 = as x increases y increases
-1 as x increases y decreases

Closer to 1 or -1 will show a stronger correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What should a graph have?

X6 points for a good graph

A
  1. The amount of information should be maximised for the minimum amount of ink
  2. Figures should have a title explaining what is being displayed
  3. Axes should be clearly labelled
  4. Gridlines should be kept to a minimum
  5. Avoid 3-D charts as these can be difficult to read
  6. The number of observations should be included
36
Q

What types of graph should categorical data be displayed as?

A

Bar chart - frequency vs categories - better

Pie chart - always 2d

37
Q

What types of graph should numerical data be displayed as?

And give an overview of each type…

A

Dot plots - dots on a continuous scale

Histograms - frequency density vs continuous non overlapping categories
Can see distribution from this graph.

Box plots - min, LQ, median, UQ, Max also any outliers 1.5 box length away

Scatter plot - used to see association between 2 variables - correlation
Dependent on y independent on x

38
Q

What is the symbol for the sample mean and what is the symbol for the population mean?

A

39
Q

Is the sample mean the best guess of our target mean?

A

Yes whenever we do a study our sample mean is our best guess of our target population.

40
Q

What is standard error=

A

An estimate of precision of our mean (in this case)

= SD/square root of n for mean

Makes sense as the smaller our SD the more similar the sample population is and the large the sample size the better the representation hence both lead to a smaller SE

41
Q

What is the best way to reduce SE?

A

Increase n

42
Q

How do you use SE?

2 X WAYS?

A

Can compare SE of two groups the smaller the SE the more precise the mean is of that group.

Also used in confidence intervals!

43
Q

What is a confidence interval?

How is it calculated?

A

A 95% confidence interval is the range of values we are 95% confident our mean lies between.

It is calculated by mean + or - 1.96*SE

Sum it up by saying my estimate is my best guess of the mean and i am 95% confident that the true mean is between the two limits.

44
Q

Would a 99% confidence interval have a narrower or wider CI?

A

Wider as more confident it contains the true mean

45
Q

When can CI not be used?

A

If the data is not normally disturbed we cant use SE and therefore we can use CI

The sample size also has to be greater than 20

46
Q

What can CI be used for?

A

Means

Difference between 2 means (Ho = no difference)

Relative risk

Etc

If 1 doesn’t show statistically significant results but another does then the difference may still be statistically significant

47
Q

What does a reference range show?

A

It shows where 95% of the data lies between and is calculated by mean + or - 1.96*SD

48
Q

What is the point of a significance test?

A

To see if any observed difference is important/ significant.

49
Q

What is the null hypothesis?

A

There is no difference between x and y…

E.g there is no difference in IQ between students at UoS or SHU

Start of by believing this hypothesis.

50
Q

What can probability range between?

What is the percentage probability of 0.01?

A

Can range between 0-1

= 1% probability of occurring

51
Q

On a normal distribution the further away from the mean the smaller the area under the curve and smaller the p value…

A

52
Q

A value observed ear the null hypothesis will have a p value of…

Far away from the null hypothesis will have a p value of…

A

Near 1 (the null hypothesis appears to be true - evidence to accept, any deviation due to chance)

Close to 0 (there is evidence to reject the null hypothesis - statistically significant difference)

53
Q

The lower the P value the result are more…

A

Statistically significant

54
Q

Is SE related to P

A

Yes the smaller the SE, the smaller the P

55
Q

What does a p value of 0.001 mean?

A

We would have seen this much of a difference by chance 1/1000 times if the null was true

56
Q

If the null hypothesis value (no difference value) is not in the 95% CI then…

If the null value is in the 95% CI then your p value will most likely be

A

Then your p value will be less than 0.05

Greater than 0.05

57
Q

A mean against a hypothesised value - this test i called?

A

One -sample t test

58
Q

Are all statistical significant findings clinically relevant?

A

No -

59
Q

What is a p value?

A

The probability of seeing the difference you have if the null hypothesis was true

60
Q

What does a correlation of 1 show?

A

It shows a perfect linear association between two continuous variables (1 increases so does the other)
It is a measure of strength

However it does not take into account gradient of the liner.

A flat line could have the same r number as steep line.

61
Q

What is regression?

What is the meaning of the line?

A

It is an advanced correlation which can be used to make future predictions

It take the general formula y = mx + c

Y = outcome - dependent variable 
X = predictor - independent variable 

C or a = y intercept, when x = 0
M or b = gradient/ coefficient

62
Q

How s a regression line calculated?

A

Use our software to fit a regression line to the data.

63
Q

What are the most interesting value of the regression line?

What can you do with it?

A

M also known as b (the gradient)

Every 1 you go across you go up or down by m

You can do inferential statistics
SE, P value, confidence intervals
And hence see if this value is significant and hence if the relationship between the 2 variables is significant.

Ho = would be 0

64
Q

What is linear regression?

A

A regression model where the outcome is a single continuous variable.

65
Q

What is a multiple regression?

Why it it beneficial?

A

It is a regression line but with lots of continuous variables.

One Y, but lots of Xs (predictors)

As you are using lots of independent variables, it accounts for confounding factors!!! - see clearly the relationship between the main x and y.

The p value and CI will be adjusted for the new x values if they have a large affect the sign the significance of your p value will drop (randomise noise).

66
Q

An author may say… when using. Regression model with multiple predictors?

A

After accounting for other variables in the model…

67
Q

What are categoric variables and are they used in regression?

A

They are categoric data and can be used in regression models. The important thing to remoter is that they always have a reference range.

Each coefficient is in comparison to the reference value an increase will mean a positive association and a negative a negative association. Again they can have p values.

68
Q

As we include more factors that have a relationship with y what happens to our x p value?

A

It will decrease.

69
Q

What does prevalence mean?

A

Proportion of a population with a disease at a point in time

= number of cases at a point of time/ total population

70
Q

What does incidence mean?

And equation?

A

Rate at which new cases occur in a population in a certain time period.

= number of new cases/ population at risk

71
Q

What is an ecological study?

A

An ecological study is an observational study defined by the level at which data are analysed, namely at the population or group level, rather than individual level.

Looking at rates of smoking in a country and then rates of lung cancer

72
Q

What are the advantages of an ecological study?

A

Uses routinely collected data - Quick, cheap
• Units of analysis are populations - groups of people
• Can examine patterns of ill-health by age, sex, ethnicity, country
and/or by time
• Few ethical issues
• Useful for generating hypotheses

73
Q

What are the disadvantages of using an ecological study?

A

No link between individual exposure and effect • Bias - variation in diagnostic criteria
• Absence of records of individual attributes
• Unsuitable format of records
• Inconsistency in data presentation

74
Q

Advantages of using a cross-sectional study?

A

Results used to generate hypotheses
• Rapid feedback of current events in the community • Quick and cheap
• Few ethical problems

75
Q

Disadvantages of using a cross-sectional study?

A

Could just be reporting a medical oddity

• Prone to bias, e.g. sampling, subject and observer variation • No time reference

76
Q

Advantages of a case control study?

A
  • By concentrating effort on the identification of affected individuals and recruiting controls from the unaffected population, the number of subjects required to obtain significant results is kept to a minimum (so good for rare diseases)
  • Results can be obtained relatively quickly because the investigation does not have to wait for the disease to develop (compare this with Cohort studies – see later) and can look for multiple causes
  • It is a relatively inexpensive type of study
77
Q

Disadvantages of a case control study?

A

Generally rely on retrospective data, which has its own dangers. The ability of individuals to recall past events tends to be unreliable due to a tendency for memory to be selective. Records of past events may be incomplete.
• Because data are collected retrospectively, it is difficult to say if an association is causal or not. This is less of a problem when the exposure is highly specific or where the time between exposure and disease is short
• Prone to selection and information biases
• There can be difficulties choosing controls
• The incidence of disease within a population cannot be calculated from this
type of study

78
Q

Advantages of a cohort study?

A
  • The main advantage is that it is possible to distinguish antecedent causes from concurrent associated factors (cause comes before effect)
  • Since incidence can be determined for both exposed and non- exposed groups, we can determine absolute, relative and attributable risks
  • We can study more than one outcome to the same exposure
  • There is less chance of bias since exposure is measured before development of disease
79
Q

What are the disadvantages of a cohort study?

A

• Cannot be certain that exposures are causal- this requires controlled studies
• Long periods of study, and large populations mean that cohort studies are
expensive
• Follow-up can be a problem- especially if the period of study is long- this
needs to be considered in the design of the study
• Diagnosis of cases may change over the years as medical science becomes
more advanced- better at detecting the disease or with different criteria for a diagnosis

80
Q

What are the advantages of a RCT?

A

Randomization should mean that confounding factors (age, sex etc.) are equally distributed. This helps to concentrate the study on the effect of the intervention
• By randomly allocating patients to interventions, it is likely that staff and patients will not break the blinding
• Statistical tests for significance are easier to interpret when the study design removes confounders
• Confounders and many biases minimised

81
Q

What are the disadvantages of a RCT?

A
  • To allow sufficient numbers to balance confounders these tend to be large and expensive trials. They are often multicentre and may even be multinational
  • There is always a chance that volunteer bias will be a problem: what about people that refuse to be included in the trial or those that are never asked.
  • There may be ethical difficulties in withholding treatment from the control group or offering what is believed to be an inferior treatment to one group
  • May lose statistical power if poor compliance
82
Q

There are specific questions you can ask to help critically appraise an article - see lecture for more details.

Also Axis and CASP - will be focussed upon

A

..

83
Q

What is AXIS?

A

20 questions that assesses difference key aspects of an article.

84
Q

How should you display the data collection methods?

A

Via a flow diagram:

Number approached
Who left and why
Who was analysed

85
Q

What is a parametric test what is a non-paremtric test and when are they used?

A

Parametric test = test that follow particular assumptions and if these are not met then a non parametric alternative should be used.

However, parametric tests use all the data, non parametric tests only use the ranks and are therefore less powerful.

86
Q

Why is it important to have critical appraisal skills as a doctor?

A

Patients may read an article - Worry and ask questions

You may need to read the article appraise it and see if it is relevant/ of their concern