stats Flashcards Preview

PMR > stats > Flashcards

Flashcards in stats Deck (91):
1

median

50th percentile
quantifies average
50% of data above median, 50% beloe

2

when is data symmetrial with resepect to median

when median is equidistant from upper and lower quartile boundaries

3

when is negative skew seen wiith respect to median

when median is closer to upper quartile

4

how do you check symmetry of variables

box and whisper
histogram

5

difference of 99% CI compared to 95% CI

99% CI would be a wider range than 95% CI and extend it at both extremes

6

if p>0.05

no evidence

there may truly be no difference in the mean of the variables
the sample may be too small to detect a difference

7

smaller standard error means

the estimate of the mean is more precise

8

2-tailed test

difference in sample means in either direction provides evidence against null hypothesis

9

when is mann whitney test used

if variables are discrete/categorical/ordinal
if data is non-parametric

10

a parametric test makes strong assumptions on..

distribution of data

11

what does wilcoxon signed-rank test compare

distribution between first and second measurement

assesses whether population mean ranks differ

12

when is wilcoxon signed-rank test used

matched/paired data
when assumptions of paired t-test do not fit

13

what does standard error indicate

indicates how far the study estimate would be from the true value in the population if you were to repeat the study multiple times with different samples

14

p-value if CI excludes the null hypothesis value

p<0.05
there is some evidence

15

define odds

how common a binary characteristic is to occur for a single group

16

odds ratio

measure of association between exposure and outcome
odds of one group compared to another

17

reference category

odds of ref category = 1
used to compared odds

18

pearsons correlation coefficient

r
quantifies the strength of linear association between two variables

19

assumptions for pearsons correlation

linear relationship between variables

20

what does r squared (pearsons) refer to

the proportion of variation in one variable explained by the other variable

21

what does linear regression desribe

the relationship between two quantitative variables
one variable is independant and affects the other dependant variable

22

equation for linear regression

outcome = a + b(predictor)

23

how do you calculate diagnostic accuracy

PPV
NPV

24

how do you calculate sensitivity

no. who correctly tested +ve for the disease / total no. who have the disease

25

how do you calculate specificity

no. of people correctly test -ve / total no. of healthy people

26

how do you calculate PPV

no. of people who correctly test +ve / total no. of people who test +ve

27

use of normal distribution

determines choice of statistical methods

28

mean and sd define

normal distribution

29

define population

full set of units (people) to which the study results will be generalised
usually infinite in size

30

why might there be uncertainty in the answer provided by the sample data

variability between people
sample is only a subset of the population - not fully representative

31

what are statistics for

summarising sample data
quantifying uncertainty in results

32

2 types of statistics

inferential
descriptive

33

descriptive statistics

describe basic features/characteristics in the sample

34

inferential statistics

make inferences about relationships in the population using the sample
however can never be 100% certain

e.g. standard error, CI, p-values

35

sampling distribution

all the different estimates from different samples and their frequencies

36

effect of sample size on CI

the larger the sample size the narrower the CI

37

effect of CI on certainty/uncertainty

the wider the CI, the greater the uncertainty

38

what do p-values quantify

the extent to which the sample estimate contradicts the null hypothesis

39

what does PICO stand for

population/patient
intervention
comparison
outcome

40

what does the t in PICO(T) stand for

type of study design that would work best

41

why is PICO used

to frame or answer a health related question

42

when is data paired

if data are matched on criteria e.g. age/gender before comparing on either trial arm
if measurements are taken before and after an interventoin

43

what does paired data analyse

within-pair differences

44

parametric methods

e.g. t-test, analysis of variance (ANOVA)

make distribtuional assumptions eg. Normal
summarise data using means and sd

45

parametric method for 3 or more independent groups

ANOVA

46

what does ANOVA stand for

analysis of variance

47

parametric methods for 3 or more dependant groups

paired test
repeat measures of ANOVA

48

when do you use a non-parametric test

if variables are skewed
small sample size
if sd is different across groups
if the variables are more ordinal than quantitative

49

when using non-parametric tests you should...

analyse the rank ordering in the data (not actual scores)
only provide p-values (not CIs)
compare entire distribution rather than just means

50

how do you summarise non-parametric data

IQR
median

51

non-parametric test for 2 independent groups

Mann Whitney

52

non-parametric test for 2 paired groups

Wilcoxon signed-rank

53

non-parametric for 3 or more independent groups

Kruskal Wallis

54

non-parametric for 3 or more paired groups

Friedman

55

advantages of non-parametric tests

they are always valid for quantitative data
(parametric only valid if assumptions are satisfied)

56

disadvantages of non-parametric tests

no CIs
based only on analysis of ranks
no direct inferences about a parameter

57

what defines a large sample sizw

sample greater than 50

58

how do you calculate variance

SD squared

59

how do you calculate whether the variances are 'equal'

variance in one group should be no more than 4x the variance of the other group

60

how can you compare CIs between groups

calculate a single CI for the difference between groups

61

effect of proportion on odds

the higher the proportion the higher the odds

62

how do you calculate proportion

no. of participants in category of interest / total no. of participants

63

relationship between exposure variable and outcome variable

the exposure variable is the potential cause of the outcome variable

64

tests for binary hypothesis testing

chi-squared (large samples)
fisher's exact (small samples)

65

risk difference of 0

no risk difference
groups equally likely to have the disease

66

how do you calculate risk difference

proportion in group A - proportion in group B

67

how do you calculate risk ratio

proportion in group A / proportion in group B

68

what do risk ratio and odds ratio quantify

the strength of association between the intervention and binary variable

69

risk ratio = 1

no difference in risk between two groups

70

NNT stands for

number needed to treat

71

how do you calculate NNT

1 / risk difference

72

what is NNT

the number of people that need to receive intervention before 1 person benefits from it

73

what is NNT better for

quantifying the impact of an intervention in a given population

74

what does NNT do

measures the effectiveness of an intervention
(based on risk difference)

75

what is correlation

the association between two variables

76

graphical description of correlation

scatter plot
outcome = y-axis
predictor = x-axis

77

numerical description of correlation

correlation coefficient
pearson's = linear
spearmans = non-linear

78

assumptions for spearmans correlation coefficient

non-linear correlation
e.g. curved line
must be 'monotonic' - either never -ve or never +ve
e.g. graph cannot be U-shaped

79

if r squared = 1

then all the variation in one variable is explained by the other variable

80

what is the predictor

the independent variable
the explanatory variable - potential cause of the outcome variable

81

what is the least squares regression line

line that makes the vertical distance from the data points to the regression line as small as possible

82

what is a residual (e)

the vertical distance between the observed data point and the regression line (predicted value)

83

equation for calculating erros in prediction

outcome = a + b(predictor) + e

84

are most biological variables are continuous?

yes
e.g. blood pressure

85

why is it impossible to choose a cut-off line to correctly classify all subjects to a disease status

most distributions of diagnostic test scores will overlap

86

what are the probability-based estimates of accuracy

specificity
sensitivity
PPV
NPV

87

what factors affect sensitivity of a test

the severity of the disease

88

assumption for sensitivty test

population shave similar disease severity

89

what factors affects specificity of tests

if symptoms show on non-disease patients specificity is reduced

90

what does PPV quantify

the likelihood that somebody has the disease based on the test result

91

how does the prevalence of a disease affect the PPV

if a disease has a greater prevalence (is more common) then the PPV will increase