Stats Flashcards
Standard deviation
measure of variability in data
Parameter
numerical characteristic which is descriptive of the population
Statistic
numerical characteristic of your sample, used to estimate unknown population parameters
Standard error
measure of accuracy of a statistic
Simpsons paradox
a phenomenon in probability and statistics, in which a trend appears in different groups of data but disappears or reverses when these groups are combined.
Continuous variable
continuous numeric measurement
Discrete variable
numeric measurement from counting
Nominal variable
categorical, no natural order
Ordinal variable
categorical, natural order
RCT
comparative experiment to eliminate the placebo effect, randomization eliminates bias
Power analysis
used to determine necessary sample size based on the size of the effect, the residual variability and the design of the experiment
Random sampling
computer generated, each member of the population has an equal chance of being chosen
Systemic sampling
every nth person from sample list chosen
Stratified random sampling
population split into strata and chosen to best represent the population
Disproportionate sampling
if strata in a population of substantially equal sizes, but you may want to select an equal number from each strata
Cluster sampling
random sampling of a series of units in a population, convenient but may have sampling bias
Non-probablitity sampling
convenience, quota, purposive and snowball sampling
Convenience sampling
non-probability, chosen based on availability
Quota sampling
non-probability, researcher guides sampling process until quotas are met
Purposive sampling
non-probability, hand-picked based on certain criteria
Snowball sampling
non-probability, relies on original participants referring
p value
if the null hypothesis is true, the probability of getting your data by chance. 0-0.1 = strong, 0.01-0.05 = moderate, 0.05-0.1 = weak, >0.1 = inconclusive
Parallel RCT design
two groups, two different treatments (typically treatment and control)
Crossover RCT design
two groups, both treatments in a different order, “wash-out” period
Factorial RCT design
several factors compared at the same time
Quasi-experimental studies
- one group post-test
- one group pre-test/post-test
- non-equivalent control group
- non-equivalent control group pre-test/post-test
- single subject (structured experiment, not case study, poor external validity)
Longitudinal studies
follow across time and look for outcomes, two subgroups, follow until event occurs and then compare characteristics
Attrition
introduction of bias due to loss of subjects to a study
Case control study
retrospective, similar individuals with one key interest matched and histories analysed
Cross-sectional study
snapshot of a population at a given time eg. census
Case report
clinical history of a single patient
Surveys
- open/closed ended questions
- dichotomous response
- likert scale
- visual analogue scale
Numeric values
location, spread, shape, deviations
Histograms
good for visualizing large numbers, difficult to compare >2 groups
Density plots
alternative to histograms, slow a line for estimating density at each value
Box plots
summarise location (median), spread (wide or narrow) and shape (symmetrical or skewed), good for comparing multiple distributions
Bar charts
used for categorical data
Spine plots
categorical data, proportional to group size
Scatter plots
direction (pos/neg), linearity (does it curve), strength (how closely it follows pattern
Sample variance
determining variability of data
ANOVA –> F statistic –> p value
Confidence interval
used for noting margin of error (95% confident population mean = 2SD in normal distribution)
Pearson’s correlation use
showing strength of correlation in normal, linear relationships. always create scatterplot to confirm 0 indicates no correlation. -1 indicates a perfect negative correlation. 1 indicates a perfect positive correlation.
Spearman’s rank correlation coefficient
used if a correlation is not normal or non-linear
Standard error use
measuring the accuracy of a statistic –> estimates the standard deviation of the sample mean based on the population mean. The greater the sample, the less the standard error. Determines t-statistic
Sample variance calculation
(a-)2+(b-)2…+(n-)2/(n-1)