Flashcards in stats Deck (91):

1

## median

###
50th percentile

quantifies average

50% of data above median, 50% beloe

2

## when is data symmetrial with resepect to median

### when median is equidistant from upper and lower quartile boundaries

3

## when is negative skew seen wiith respect to median

### when median is closer to upper quartile

4

## how do you check symmetry of variables

###
box and whisper

histogram

5

## difference of 99% CI compared to 95% CI

### 99% CI would be a wider range than 95% CI and extend it at both extremes

6

## if p>0.05

###
no evidence

there may truly be no difference in the mean of the variables

the sample may be too small to detect a difference

7

## smaller standard error means

### the estimate of the mean is more precise

8

## 2-tailed test

### difference in sample means in either direction provides evidence against null hypothesis

9

## when is mann whitney test used

###
if variables are discrete/categorical/ordinal

if data is non-parametric

10

## a parametric test makes strong assumptions on..

### distribution of data

11

## what does wilcoxon signed-rank test compare

###
distribution between first and second measurement

assesses whether population mean ranks differ

12

## when is wilcoxon signed-rank test used

###
matched/paired data

when assumptions of paired t-test do not fit

13

## what does standard error indicate

### indicates how far the study estimate would be from the true value in the population if you were to repeat the study multiple times with different samples

14

## p-value if CI excludes the null hypothesis value

###
p<0.05

there is some evidence

15

## define odds

### how common a binary characteristic is to occur for a single group

16

## odds ratio

###
measure of association between exposure and outcome

odds of one group compared to another

17

## reference category

###
odds of ref category = 1

used to compared odds

18

## pearsons correlation coefficient

###
r

quantifies the strength of linear association between two variables

19

## assumptions for pearsons correlation

### linear relationship between variables

20

## what does r squared (pearsons) refer to

### the proportion of variation in one variable explained by the other variable

21

## what does linear regression desribe

###
the relationship between two quantitative variables

one variable is independant and affects the other dependant variable

22

## equation for linear regression

### outcome = a + b(predictor)

23

## how do you calculate diagnostic accuracy

###
PPV

NPV

24

## how do you calculate sensitivity

### no. who correctly tested +ve for the disease / total no. who have the disease

25

## how do you calculate specificity

### no. of people correctly test -ve / total no. of healthy people

26

## how do you calculate PPV

### no. of people who correctly test +ve / total no. of people who test +ve

27

## use of normal distribution

### determines choice of statistical methods

28

## mean and sd define

### normal distribution

29

## define population

###
full set of units (people) to which the study results will be generalised

usually infinite in size

30

## why might there be uncertainty in the answer provided by the sample data

###
variability between people

sample is only a subset of the population - not fully representative

31

## what are statistics for

###
summarising sample data

quantifying uncertainty in results

32

## 2 types of statistics

###
inferential

descriptive

33

## descriptive statistics

### describe basic features/characteristics in the sample

34

## inferential statistics

###
make inferences about relationships in the population using the sample

however can never be 100% certain

e.g. standard error, CI, p-values

35

## sampling distribution

### all the different estimates from different samples and their frequencies

36

## effect of sample size on CI

### the larger the sample size the narrower the CI

37

## effect of CI on certainty/uncertainty

### the wider the CI, the greater the uncertainty

38

## what do p-values quantify

### the extent to which the sample estimate contradicts the null hypothesis

39

## what does PICO stand for

###
population/patient

intervention

comparison

outcome

40

## what does the t in PICO(T) stand for

### type of study design that would work best

41

## why is PICO used

### to frame or answer a health related question

42

## when is data paired

###
if data are matched on criteria e.g. age/gender before comparing on either trial arm

if measurements are taken before and after an interventoin

43

## what does paired data analyse

### within-pair differences

44

## parametric methods

###
e.g. t-test, analysis of variance (ANOVA)

make distribtuional assumptions eg. Normal

summarise data using means and sd

45

## parametric method for 3 or more independent groups

### ANOVA

46

## what does ANOVA stand for

### analysis of variance

47

## parametric methods for 3 or more dependant groups

###
paired test

repeat measures of ANOVA

48

## when do you use a non-parametric test

###
if variables are skewed

small sample size

if sd is different across groups

if the variables are more ordinal than quantitative

49

## when using non-parametric tests you should...

###
analyse the rank ordering in the data (not actual scores)

only provide p-values (not CIs)

compare entire distribution rather than just means

50

## how do you summarise non-parametric data

###
IQR

median

51

## non-parametric test for 2 independent groups

### Mann Whitney

52

## non-parametric test for 2 paired groups

### Wilcoxon signed-rank

53

## non-parametric for 3 or more independent groups

### Kruskal Wallis

54

## non-parametric for 3 or more paired groups

### Friedman

55

## advantages of non-parametric tests

###
they are always valid for quantitative data

(parametric only valid if assumptions are satisfied)

56

## disadvantages of non-parametric tests

###
no CIs

based only on analysis of ranks

no direct inferences about a parameter

57

## what defines a large sample sizw

### sample greater than 50

58

## how do you calculate variance

### SD squared

59

## how do you calculate whether the variances are 'equal'

### variance in one group should be no more than 4x the variance of the other group

60

## how can you compare CIs between groups

### calculate a single CI for the difference between groups

61

## effect of proportion on odds

### the higher the proportion the higher the odds

62

## how do you calculate proportion

### no. of participants in category of interest / total no. of participants

63

## relationship between exposure variable and outcome variable

### the exposure variable is the potential cause of the outcome variable

64

## tests for binary hypothesis testing

###
chi-squared (large samples)

fisher's exact (small samples)

65

## risk difference of 0

###
no risk difference

groups equally likely to have the disease

66

## how do you calculate risk difference

### proportion in group A - proportion in group B

67

## how do you calculate risk ratio

### proportion in group A / proportion in group B

68

## what do risk ratio and odds ratio quantify

### the strength of association between the intervention and binary variable

69

## risk ratio = 1

### no difference in risk between two groups

70

## NNT stands for

### number needed to treat

71

## how do you calculate NNT

### 1 / risk difference

72

## what is NNT

### the number of people that need to receive intervention before 1 person benefits from it

73

## what is NNT better for

### quantifying the impact of an intervention in a given population

74

## what does NNT do

###
measures the effectiveness of an intervention

(based on risk difference)

75

## what is correlation

### the association between two variables

76

## graphical description of correlation

###
scatter plot

outcome = y-axis

predictor = x-axis

77

## numerical description of correlation

###
correlation coefficient

pearson's = linear

spearmans = non-linear

78

## assumptions for spearmans correlation coefficient

###
non-linear correlation

e.g. curved line

must be 'monotonic' - either never -ve or never +ve

e.g. graph cannot be U-shaped

79

## if r squared = 1

### then all the variation in one variable is explained by the other variable

80

## what is the predictor

###
the independent variable

the explanatory variable - potential cause of the outcome variable

81

## what is the least squares regression line

### line that makes the vertical distance from the data points to the regression line as small as possible

82

## what is a residual (e)

### the vertical distance between the observed data point and the regression line (predicted value)

83

## equation for calculating erros in prediction

### outcome = a + b(predictor) + e

84

## are most biological variables are continuous?

###
yes

e.g. blood pressure

85

## why is it impossible to choose a cut-off line to correctly classify all subjects to a disease status

### most distributions of diagnostic test scores will overlap

86

## what are the probability-based estimates of accuracy

###
specificity

sensitivity

PPV

NPV

87

## what factors affect sensitivity of a test

### the severity of the disease

88

## assumption for sensitivty test

### population shave similar disease severity

89

## what factors affects specificity of tests

### if symptoms show on non-disease patients specificity is reduced

90

## what does PPV quantify

### the likelihood that somebody has the disease based on the test result

91