Stats Flashcards

1
Q

Standard deviation

A

measure of variability in data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Parameter

A

numerical characteristic which is descriptive of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Statistic

A

numerical characteristic of your sample, used to estimate unknown population parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Standard error

A

measure of accuracy of a statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Simpsons paradox

A

a phenomenon in probability and statistics, in which a trend appears in different groups of data but disappears or reverses when these groups are combined.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Continuous variable

A

continuous numeric measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Discrete variable

A

numeric measurement from counting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Nominal variable

A

categorical, no natural order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Ordinal variable

A

categorical, natural order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

RCT

A

comparative experiment to eliminate the placebo effect, randomization eliminates bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Power analysis

A

used to determine necessary sample size based on the size of the effect, the residual variability and the design of the experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Random sampling

A

computer generated, each member of the population has an equal chance of being chosen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Systemic sampling

A

every nth person from sample list chosen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Stratified random sampling

A

population split into strata and chosen to best represent the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Disproportionate sampling

A

if strata in a population of substantially equal sizes, but you may want to select an equal number from each strata

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Cluster sampling

A

random sampling of a series of units in a population, convenient but may have sampling bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Non-probablitity sampling

A

convenience, quota, purposive and snowball sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Convenience sampling

A

non-probability, chosen based on availability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Quota sampling

A

non-probability, researcher guides sampling process until quotas are met

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Purposive sampling

A

non-probability, hand-picked based on certain criteria

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Snowball sampling

A

non-probability, relies on original participants referring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

p value

A

if the null hypothesis is true, the probability of getting your data by chance. 0-0.1 = strong, 0.01-0.05 = moderate, 0.05-0.1 = weak, >0.1 = inconclusive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Parallel RCT design

A

two groups, two different treatments (typically treatment and control)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Crossover RCT design

A

two groups, both treatments in a different order, “wash-out” period

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Factorial RCT design

A

several factors compared at the same time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Quasi-experimental studies

A
  • one group post-test
  • one group pre-test/post-test
  • non-equivalent control group
  • non-equivalent control group pre-test/post-test
  • single subject (structured experiment, not case study, poor external validity)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Longitudinal studies

A

follow across time and look for outcomes, two subgroups, follow until event occurs and then compare characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Attrition

A

introduction of bias due to loss of subjects to a study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Case control study

A

retrospective, similar individuals with one key interest matched and histories analysed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Cross-sectional study

A

snapshot of a population at a given time eg. census

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Case report

A

clinical history of a single patient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Surveys

A
  • open/closed ended questions
  • dichotomous response
  • likert scale
  • visual analogue scale
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Numeric values

A

location, spread, shape, deviations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Histograms

A

good for visualizing large numbers, difficult to compare >2 groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Density plots

A

alternative to histograms, slow a line for estimating density at each value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Box plots

A

summarise location (median), spread (wide or narrow) and shape (symmetrical or skewed), good for comparing multiple distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Bar charts

A

used for categorical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Spine plots

A

categorical data, proportional to group size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Scatter plots

A

direction (pos/neg), linearity (does it curve), strength (how closely it follows pattern

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Sample variance

A

determining variability of data

ANOVA –> F statistic –> p value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Confidence interval

A

used for noting margin of error (95% confident population mean = 2SD in normal distribution)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Pearson’s correlation use

A

showing strength of correlation in normal, linear relationships. always create scatterplot to confirm 0 indicates no correlation. -1 indicates a perfect negative correlation. 1 indicates a perfect positive correlation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Spearman’s rank correlation coefficient

A

used if a correlation is not normal or non-linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Standard error use

A

measuring the accuracy of a statistic –> estimates the standard deviation of the sample mean based on the population mean. The greater the sample, the less the standard error. Determines t-statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Sample variance calculation

A

(a-)2+(b-)2…+(n-)2/(n-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Standard error calculation

A

SD/√n

47
Q

T statistic

A

number of standard errors the estimate is from the hypothesised sample

48
Q

One Sample T Test calculation

A

mean/SEM

49
Q

Independent Sample T-test calculation

A

(diff b/w mean of 2 groups)/(SEM of difference)

50
Q

T-test assumptions

A
  • the 2 groups are independent
  • the populations have normal variability
  • the variances are equal
  • -> if equal use a pooled T-test
  • -> if not equal use Welch’s T-test
51
Q

Analysis of Variance use

A

comparing means in the context of the variability. determines F statistic and therefore p value

52
Q

ANOVA calculation

A

measuring variability (sum of squares and mean square (ie. sample variance)), then comparing variability within and between each group. Significant if ‘within group variability’ is much smaller than ‘total variability’ ie. large ‘between group variability’. Ie. knowing a person’s group gives us information about them

53
Q

ANOVA assumptions

A
  • independent groups
  • normal variability (unless large sample size)
  • equal variances otherwise use a transformation or Welch’s ANOVA test
54
Q

ANOVA null hypothesis

A

the mean is the same for all groups

55
Q

F statistic use

A

determining signal-to-noise ratio, the larger the f stat, the more different the groups are

56
Q

F statistic calculation

A

ratio of (between group variability):(within group variability) –> mean square values

57
Q

Rˆ2 use

A

statistical measure of how well the regression line approximates the real data points (measure of usefulness of data). if R2=0.1, being part of the group explains 10% of the variability

58
Q

Rˆ2 calculation

A

variability explained by model/total variability

59
Q

Residuals use

A

checking assumptions (“prediction error”), estimation of within-group variability

60
Q

Residuals calculation

A

observed response - mean (“predicted value”)

61
Q

Checking normal variability

A

Normal variability is the assumption that residuals have a normal distribution. Check with normal Q-Q plot. If not normal, linear model predictions will be undermined. Consider transforming data. Less effect if large sample size.

62
Q

Post-hoc test use

A

determining specific difference between groups once a difference has been found using ANOVA

63
Q

Tukey’s HSD null hypothesis

A

the mean is the same for both groups

64
Q

Linear regression use

A

determining difference from linear association (as opposed to mean)

65
Q

Linear regression calculation

A

line is such that it minimises the sum of the squared residuals

66
Q

Linear regression assumptions

A
  • independent observation
  • linear association
  • normal variability
  • equal variances
67
Q

Power use

A

determining the probability of detecting an effect when there is indeed an effect

68
Q

Power calculation

A

1 - (probability of making a type I/II error)

69
Q

Type I error

A

reject the null hypothesis when it is true

70
Q

Type II error

A

don’t reject the null hypothesis when it is false

71
Q

How to improve power

A
  • increased effect size
  • decreased variability
  • increased sample size
  • increased significance threshold
72
Q

Power analysis

A

used to determine necessary sample size based on the estimated size of the effect, the residual variability and the design of the experiment. ideally ≥80%

73
Q

Levene’s test for equal variances null hypothesis

A

the population variances are equal (always use a boxplot to check)

74
Q

Non-parametric test use

A

comparing ranks of values (does not require normal distribution, can be more robust, used for ordinal, can be used for numeric if non-normal)

75
Q

Mann-Whitney U test assumptions

A

both distributions are identical in shape and scale

76
Q

Mann-Whitney U test null hypothesis

A

it is equally likely that a randomly selected value from one sample will be less than or greater than a randomly selected value from a second sample

77
Q

Regression use

A

A technique for determining the statistical relationship between two or more variables where a change in a dependent variable is associated with, and depends on, a change in one or more independent variables.

78
Q

Regression B =

A

significance of variable after other variable has been taken into account (also gradient)

79
Q

Regression df

A

no. of variables

80
Q

Residual df

A

n-2

81
Q

Regression null hypothesis

A

the slope if equal to 0

82
Q

Normal distribution

A

A normal quantile-quantile plot compares quantiles from our data with the quantiles of the normal distribution.
1SD = 68%, 2SD = 95%, 3SD = 99.7%

83
Q

Central limit theorem

A

If the population has a normal distribution, the sample mean is normal for any sample size.
If the population is not normal, the sample mean is approx normal, and becomes more normal as the sample size increases.

84
Q

Cronbach’s alpha

A

Used for measuring internal consistency based on correlation b/w items on a scale (ie. how closely related a set of items are as a group).
0 = completely independent
good internal consistency ≥0.8

85
Q

Cohen’s kappa (inter-rater) use

A

measuring strength of agreement, by proportion of agreement not by chance.
1 = perfect agreement
0 = no agreement
0.4 = acceptable

86
Q

Cohen’s kappa calculation

A

raw agreement% - chance agreement%/(1 - chance agreement%)

87
Q

Weighted kappa

A

used for measuring seriousness of disagreement (eg. scaled option vs yes/no option)

88
Q

Chi-squared test use

A

investigating whether distributions of categorical variables differ from one another (how far observed values are from expected values)

89
Q

Chi-squared null hypothesis

A

there is no difference between groups

90
Q

Observational study

A

no control over independent variable

91
Q

Experimental study

A

involves a treatment or invervention, aims to provide evidence of effect of independent variable on dependent variable

92
Q

Inclusion and exclusion criteria purpose

A

clear definition of target population in order to achieve a representative sample

93
Q

Parametric tests

A

more powerful than non-parametric if assumptions are satisfied, provide direct estimates for effects, including confidence intervals

94
Q

Pearson’s correlation strength

A
  1. 7-1.0 = strong
  2. 3-0.69 = moderate
  3. 0-0.29 = none to weak
95
Q

Sampling variability

A

the process whereby statistics, such as the sample mean, would give different results if the random sampling process was repeated

96
Q

Double-blind

A

the participants and experimenter do not know which treatment they are receiving/giving

97
Q

Independent variable

A

the variable that is changed

98
Q

Dependent variable

A

the outcome of the change of independent variable

99
Q

Comparative experiments

A

eliminate the placebo

100
Q

Randomisation

A

helps remove bias in a comparative experiment

101
Q

Advantages of sampling

A
  • more economical
  • time efficient
  • can be more accurate due to greater control over measurements
102
Q

Sampling bias

A

when members of a sample over/under represent attributes of a population

103
Q

Null hypothesis definition

A

no correlation between the independent and dependent values

104
Q

Degrees of freedom

A

number of independent pieces of information to estimate a parameter

105
Q

Sample mean

A

on average gives the population mean, variability in this estimate gets smaller as the sample size increases, spread gets smaller as n increases

106
Q

Student’s T Distribution

A

distribution of the t statistic, used to estimate population parameters when sample size is small and/or when population variance is unknown

107
Q

One-way ANOVA

A

one independent variable

108
Q

Two-way ANOVA

A

studies interaction between factors influencing a variable

two independent variables

109
Q

Welch’s ANOVA

A

used when populations do not have equal variances

null hypothesis that means are equal

110
Q

Prediction error

A

= residuals

111
Q

Spread of residuals

A

if normal variability, should show constant variability and no obvious pattern

112
Q

Chi squared calculation

A

square differences and divide by expected value

113
Q

Inter-rater reliability

A

the degree to which ratings given by different observers agree

114
Q

Intra-rater reliability

A

the degree to which ratings given by the same observe on different occasion agree