Critical numbers - statistics Flashcards

Question

How to work out range? | Is it useful?

Answer 1

Largest minus smallest value Not good for data with outlier

Answer 2

Represents the middle 50% of the data. To calculate, order clues, then calculate the 25th percentile and 75th percentile and leave like that and or minus the 75th from the 25th. Associated with the median and is better for data with outliers .

Answer 3

Spread of data about the mean Again it can be skewed by outliers as it takes into account the mean However it is more powerful as it uses all the data. Therefore it should be used in statistics unless the data is skewed. If the data is skewed then IQR should be used.

Answer 4

Symmetric = use mean and SD Non symmetric use IQR and median as they are not affected by outliers.

Answer 5

Certain numerical values e.g weight wen plotted will follow a normal distribution. This is because most people have values that are in the middle around the mean with only a few extremes either side. The shape of the curve is bell showed and hence it is sometimes referred to as a bell curve. The mean is in the middle and the larger the SD the flatter and wider the curve will be.

Answer 6

We know that 1 SD either side of the mean = 68% of data 1.96= 96% of values 3 = potential outlier if further than this point

Answer 7

The reference range is 95% of the population. We do mean - 1.96 x SD And mean + 1.96 x SD This shows in our sample that 95% of observed values fall between ... and ...

Answer 8

If the graph is not normally distributed and there are outliers present, you may get a range that is not possible or is factually incorrect. There will not be 2.5% of values on either side.

Answer 9

Look at the differences in means. If not possible you can use the difference in medians.

Answer 10

A statistic that measures linear correlation between two variables X and Y. It has a value between +1 and −1. 1 shows a perfect positive linear association 0 shows no correlation - 1 shows a perfect negative linear association 1 = as x increases y increases -1 as x increases y decreases Closer to 1 or -1 will show a stronger correlation

Answer 11

1. The amount of information should be maximised for the minimum amount of ink 2. Figures should have a title explaining what is being displayed 3. Axes should be clearly labelled 4. Gridlines should be kept to a minimum 5. Avoid 3-D charts as these can be difficult to read 6. The number of observations should be included

Answer 12

Bar chart - frequency vs categories - better | Pie chart - always 2d

Answer 13

Dot plots - dots on a continuous scale Histograms - frequency density vs continuous non overlapping categories Can see distribution from this graph. Box plots - min, LQ, median, UQ, Max also any outliers 1.5 box length away Scatter plot - used to see association between 2 variables - correlation Dependent on y independent on x

Answer 14

Yes whenever we do a study our sample mean is our best guess of our target population.

Answer 15

An estimate of precision of our mean (in this case) = SD/square root of n for mean Makes sense as the smaller our SD the more similar the sample population is and the large the sample size the better the representation hence both lead to a smaller SE

Answer 16

Increase n

Answer 17

Can compare SE of two groups the smaller the SE the more precise the mean is of that group. Also used in confidence intervals!

Answer 18

A 95% confidence interval is the range of values we are 95% confident our mean lies between. It is calculated by mean + or - 1.96*SE Sum it up by saying my estimate is my best guess of the mean and i am 95% confident that the true mean is between the two limits.

Answer 19

Wider as more confident it contains the true mean

Answer 20

If the data is not normally disturbed we cant use SE and therefore we can use CI The sample size also has to be greater than 20

Answer 21

Means Difference between 2 means (Ho = no difference) Relative risk Etc If 1 doesn’t show statistically significant results but another does then the difference may still be statistically significant

Answer 22

It shows where 95% of the data lies between and is calculated by mean + or - 1.96*SD

Answer 23

To see if any observed difference is important/ significant.

Answer 24

There is no difference between x and y... E.g there is no difference in IQ between students at UoS or SHU Start of by believing this hypothesis.

Answer 25

Can range between 0-1 = 1% probability of occurring

Answer 26

Near 1 (the null hypothesis appears to be true - evidence to accept, any deviation due to chance) Close to 0 (there is evidence to reject the null hypothesis - statistically significant difference)

Answer 27

Statistically significant

Answer 28

Yes the smaller the SE, the smaller the P

Answer 29

We would have seen this much of a difference by chance 1/1000 times if the null was true

Answer 30

Then your p value will be less than 0.05 Greater than 0.05

Answer 31

One -sample t test

Answer 32

The probability of seeing the difference you have if the null hypothesis was true

Answer 33

It shows a perfect linear association between two continuous variables (1 increases so does the other) It is a measure of strength However it does not take into account gradient of the liner. A flat line could have the same r number as steep line.

Answer 34

It is an advanced correlation which can be used to make future predictions It take the general formula y = mx + c ``` Y = outcome - dependent variable X = predictor - independent variable ``` C or a = y intercept, when x = 0 M or b = gradient/ coefficient

Answer 35

Use our software to fit a regression line to the data.

Answer 36

M also known as b (the gradient) Every 1 you go across you go up or down by m You can do inferential statistics SE, P value, confidence intervals And hence see if this value is significant and hence if the relationship between the 2 variables is significant. Ho = would be 0

Answer 37

A regression model where the outcome is a single continuous variable.

Answer 38

It is a regression line but with lots of continuous variables. One Y, but lots of Xs (predictors) As you are using lots of independent variables, it accounts for confounding factors!!! - see clearly the relationship between the main x and y. The p value and CI will be adjusted for the new x values if they have a large affect the sign the significance of your p value will drop (randomise noise).

Answer 39

After accounting for other variables in the model...

Answer 40

They are categoric data and can be used in regression models. The important thing to remoter is that they always have a reference range. Each coefficient is in comparison to the reference value an increase will mean a positive association and a negative a negative association. Again they can have p values.

Answer 41

It will decrease.

Answer 42

Proportion of a population with a disease at a point in time = number of cases at a point of time/ total population

Answer 43

Rate at which new cases occur in a population in a certain time period. = number of new cases/ population at risk

Answer 44

An ecological study is an observational study defined by the level at which data are analysed, namely at the population or group level, rather than individual level. Looking at rates of smoking in a country and then rates of lung cancer

Answer 45

Uses routinely collected data - Quick, cheap • Units of analysis are populations - groups of people • Can examine patterns of ill-health by age, sex, ethnicity, country and/or by time • Few ethical issues • Useful for generating hypotheses

Answer 46

No link between individual exposure and effect • Bias - variation in diagnostic criteria • Absence of records of individual attributes • Unsuitable format of records • Inconsistency in data presentation

Answer 47

Results used to generate hypotheses • Rapid feedback of current events in the community • Quick and cheap • Few ethical problems

Answer 48

Could just be reporting a medical oddity | • Prone to bias, e.g. sampling, subject and observer variation • No time reference

Answer 49

* By concentrating effort on the identification of affected individuals and recruiting controls from the unaffected population, the number of subjects required to obtain significant results is kept to a minimum (so good for rare diseases) * Results can be obtained relatively quickly because the investigation does not have to wait for the disease to develop (compare this with Cohort studies – see later) and can look for multiple causes * It is a relatively inexpensive type of study

Answer 50

Generally rely on retrospective data, which has its own dangers. The ability of individuals to recall past events tends to be unreliable due to a tendency for memory to be selective. Records of past events may be incomplete. • Because data are collected retrospectively, it is difficult to say if an association is causal or not. This is less of a problem when the exposure is highly specific or where the time between exposure and disease is short • Prone to selection and information biases • There can be difficulties choosing controls • The incidence of disease within a population cannot be calculated from this type of study

Answer 51

* The main advantage is that it is possible to distinguish antecedent causes from concurrent associated factors (cause comes before effect) * Since incidence can be determined for both exposed and non- exposed groups, we can determine absolute, relative and attributable risks * We can study more than one outcome to the same exposure * There is less chance of bias since exposure is measured before development of disease

Answer 52

• Cannot be certain that exposures are causal- this requires controlled studies • Long periods of study, and large populations mean that cohort studies are expensive • Follow-up can be a problem- especially if the period of study is long- this needs to be considered in the design of the study • Diagnosis of cases may change over the years as medical science becomes more advanced- better at detecting the disease or with different criteria for a diagnosis

Answer 53

Randomization should mean that confounding factors (age, sex etc.) are equally distributed. This helps to concentrate the study on the effect of the intervention • By randomly allocating patients to interventions, it is likely that staff and patients will not break the blinding • Statistical tests for significance are easier to interpret when the study design removes confounders • Confounders and many biases minimised

Answer 54

* To allow sufficient numbers to balance confounders these tend to be large and expensive trials. They are often multicentre and may even be multinational * There is always a chance that volunteer bias will be a problem: what about people that refuse to be included in the trial or those that are never asked. * There may be ethical difficulties in withholding treatment from the control group or offering what is believed to be an inferior treatment to one group * May lose statistical power if poor compliance

Answer 55

20 questions that assesses difference key aspects of an article.

Answer 56

Via a flow diagram: Number approached Who left and why Who was analysed

Answer 57

Parametric test = test that follow particular assumptions and if these are not met then a non parametric alternative should be used. However, parametric tests use all the data, non parametric tests only use the ranks and are therefore less powerful.

Answer 58

Patients may read an article - Worry and ask questions You may need to read the article appraise it and see if it is relevant/ of their concern

Critical numbers - statistics Flashcards

(86 cards)