Stats Flashcards

Question 1

Q

Standard deviation

Answer

A

measure of variability in data

Question 2

Q

Parameter

Answer

A

numerical characteristic which is descriptive of the population

Question 3

Q

Statistic

Answer

A

numerical characteristic of your sample, used to estimate unknown population parameters

Question 4

Q

Standard error

Answer

A

measure of accuracy of a statistic

Question 5

Q

Simpsons paradox

Answer

A

a phenomenon in probability and statistics, in which a trend appears in different groups of data but disappears or reverses when these groups are combined.

Question 6

Q

Continuous variable

Answer

A

continuous numeric measurement

Question 7

Q

Discrete variable

Answer

A

numeric measurement from counting

Question 8

Q

Nominal variable

Answer

A

categorical, no natural order

Question 9

Q

Ordinal variable

Answer

A

categorical, natural order

Question 10

Q

RCT

Answer

A

comparative experiment to eliminate the placebo effect, randomization eliminates bias

Question 11

Q

Power analysis

Answer

A

used to determine necessary sample size based on the size of the effect, the residual variability and the design of the experiment

Question 12

Q

Random sampling

Answer

A

computer generated, each member of the population has an equal chance of being chosen

Question 13

Q

Systemic sampling

Answer

A

every nth person from sample list chosen

Question 14

Q

Stratified random sampling

Answer

A

population split into strata and chosen to best represent the population

Question 15

Q

Disproportionate sampling

Answer

A

if strata in a population of substantially equal sizes, but you may want to select an equal number from each strata

Question 16

Q

Cluster sampling

Answer

A

random sampling of a series of units in a population, convenient but may have sampling bias

Question 17

Q

Non-probablitity sampling

Answer

A

convenience, quota, purposive and snowball sampling

Question 18

Q

Convenience sampling

Answer

A

non-probability, chosen based on availability

Question 19

Q

Quota sampling

Answer

A

non-probability, researcher guides sampling process until quotas are met

Question 20

Q

Purposive sampling

Answer

A

non-probability, hand-picked based on certain criteria

Question 21

Q

Snowball sampling

Answer

A

non-probability, relies on original participants referring

Question 22

Q

p value

Answer

A

if the null hypothesis is true, the probability of getting your data by chance. 0-0.1 = strong, 0.01-0.05 = moderate, 0.05-0.1 = weak, >0.1 = inconclusive

Question 23

Q

Parallel RCT design

Answer

A

two groups, two different treatments (typically treatment and control)

Question 24

Q

Crossover RCT design

Answer

A

two groups, both treatments in a different order, “wash-out” period

Question 25

Q

Factorial RCT design

Answer

A

several factors compared at the same time

Question 26

Q

Quasi-experimental studies

Answer

A

one group post-test
one group pre-test/post-test
non-equivalent control group
non-equivalent control group pre-test/post-test
single subject (structured experiment, not case study, poor external validity)

Question 27

Q

Longitudinal studies

Answer

A

follow across time and look for outcomes, two subgroups, follow until event occurs and then compare characteristics

Question 28

Q

Attrition

Answer

A

introduction of bias due to loss of subjects to a study

Question 29

Q

Case control study

Answer

A

retrospective, similar individuals with one key interest matched and histories analysed

Question 30

Q

Cross-sectional study

Answer

A

snapshot of a population at a given time eg. census

Question 31

Q

Case report

Answer

A

clinical history of a single patient

Question 32

Q

Surveys

Answer

A

open/closed ended questions
dichotomous response
likert scale
visual analogue scale

Question 33

Q

Numeric values

Answer

A

location, spread, shape, deviations

Question 34

Q

Histograms

Answer

A

good for visualizing large numbers, difficult to compare >2 groups

Question 35

Q

Density plots

Answer

A

alternative to histograms, slow a line for estimating density at each value

Question 36

Q

Box plots

Answer

A

summarise location (median), spread (wide or narrow) and shape (symmetrical or skewed), good for comparing multiple distributions

Question 37

Q

Bar charts

Answer

A

used for categorical data

Question 38

Q

Spine plots

Answer

A

categorical data, proportional to group size

Question 39

Q

Scatter plots

Answer

A

direction (pos/neg), linearity (does it curve), strength (how closely it follows pattern

Question 40

Q

Sample variance

Answer

A

determining variability of data

ANOVA –> F statistic –> p value

Question 41

Q

Confidence interval

Answer

A

used for noting margin of error (95% confident population mean = 2SD in normal distribution)

Question 42

Q

Pearson’s correlation use

Answer

A

showing strength of correlation in normal, linear relationships. always create scatterplot to confirm 0 indicates no correlation. -1 indicates a perfect negative correlation. 1 indicates a perfect positive correlation.

Question 43

Q

Spearman’s rank correlation coefficient

Answer

A

used if a correlation is not normal or non-linear

Question 44

Q

Standard error use

Answer

A

measuring the accuracy of a statistic –> estimates the standard deviation of the sample mean based on the population mean. The greater the sample, the less the standard error. Determines t-statistic

Question 45

Q

Sample variance calculation

Answer

A

(a-)2+(b-)2…+(n-)2/(n-1)

Question 46

Q

Standard error calculation

Question 47

Q

T statistic

Answer

A

number of standard errors the estimate is from the hypothesised sample

Question 48

Q

One Sample T Test calculation

Question 49

Q

Independent Sample T-test calculation

Answer

A

(diff b/w mean of 2 groups)/(SEM of difference)

Question 50

Q

T-test assumptions

Answer

A

the 2 groups are independent
the populations have normal variability
the variances are equal
-> if equal use a pooled T-test
-> if not equal use Welch’s T-test

Question 51

Q

Analysis of Variance use

Answer

A

comparing means in the context of the variability. determines F statistic and therefore p value

Question 52

Q

ANOVA calculation

Answer

A

measuring variability (sum of squares and mean square (ie. sample variance)), then comparing variability within and between each group. Significant if ‘within group variability’ is much smaller than ‘total variability’ ie. large ‘between group variability’. Ie. knowing a person’s group gives us information about them

Question 53

Q

ANOVA assumptions

Answer

A

independent groups
normal variability (unless large sample size)
equal variances otherwise use a transformation or Welch’s ANOVA test

Question 54

Q

ANOVA null hypothesis

Answer

A

the mean is the same for all groups

Question 55

Q

F statistic use

Answer

A

determining signal-to-noise ratio, the larger the f stat, the more different the groups are

Question 56

Q

F statistic calculation

Answer

A

ratio of (between group variability):(within group variability) –> mean square values

Question 57

Q

Rˆ2 use

Answer

A

statistical measure of how well the regression line approximates the real data points (measure of usefulness of data). if R2=0.1, being part of the group explains 10% of the variability

Question 58

Q

Rˆ2 calculation

Answer

A

variability explained by model/total variability

Question 59

Q

Residuals use

Answer

A

checking assumptions (“prediction error”), estimation of within-group variability

Question 60

Q

Residuals calculation

Answer

A

observed response - mean (“predicted value”)

Question 61

Q

Checking normal variability

Answer

A

Normal variability is the assumption that residuals have a normal distribution. Check with normal Q-Q plot. If not normal, linear model predictions will be undermined. Consider transforming data. Less effect if large sample size.

Question 62

Q

Post-hoc test use

Answer

A

determining specific difference between groups once a difference has been found using ANOVA

Question 63

Q

Tukey’s HSD null hypothesis

Answer

A

the mean is the same for both groups

Question 64

Q

Linear regression use

Answer

A

determining difference from linear association (as opposed to mean)

Answer 63

A

line is such that it minimises the sum of the squared residuals

Answer 64

A

independent observation
linear association
normal variability
equal variances

Answer 65

A

determining the probability of detecting an effect when there is indeed an effect

Answer 66

A

1 - (probability of making a type I/II error)

Answer 67

A

reject the null hypothesis when it is true

Answer 68

A

don’t reject the null hypothesis when it is false

Answer 69

A

increased effect size
decreased variability
increased sample size
increased significance threshold

Answer 70

A

used to determine necessary sample size based on the estimated size of the effect, the residual variability and the design of the experiment. ideally ≥80%

Answer 71

A

the population variances are equal (always use a boxplot to check)

Answer 72

A

comparing ranks of values (does not require normal distribution, can be more robust, used for ordinal, can be used for numeric if non-normal)

Answer 73

A

both distributions are identical in shape and scale

Answer 74

A

it is equally likely that a randomly selected value from one sample will be less than or greater than a randomly selected value from a second sample

Answer 75

A

A technique for determining the statistical relationship between two or more variables where a change in a dependent variable is associated with, and depends on, a change in one or more independent variables.

Answer 76

A

significance of variable after other variable has been taken into account (also gradient)

Answer 77

A

no. of variables

Answer 78

A

the slope if equal to 0

Answer 79

A

A normal quantile-quantile plot compares quantiles from our data with the quantiles of the normal distribution.
1SD = 68%, 2SD = 95%, 3SD = 99.7%

Answer 80

A

If the population has a normal distribution, the sample mean is normal for any sample size.
If the population is not normal, the sample mean is approx normal, and becomes more normal as the sample size increases.

Answer 81

A

Used for measuring internal consistency based on correlation b/w items on a scale (ie. how closely related a set of items are as a group).
0 = completely independent
good internal consistency ≥0.8

Answer 82

A

measuring strength of agreement, by proportion of agreement not by chance.
1 = perfect agreement
0 = no agreement
0.4 = acceptable

Answer 83

A

raw agreement% - chance agreement%/(1 - chance agreement%)

Answer 84

A

used for measuring seriousness of disagreement (eg. scaled option vs yes/no option)

Answer 85

A

investigating whether distributions of categorical variables differ from one another (how far observed values are from expected values)

Answer 86

A

there is no difference between groups

Answer 87

A

no control over independent variable

Answer 88

A

involves a treatment or invervention, aims to provide evidence of effect of independent variable on dependent variable

Answer 89

A

clear definition of target population in order to achieve a representative sample

Answer 90

A

more powerful than non-parametric if assumptions are satisfied, provide direct estimates for effects, including confidence intervals

Answer 91

A

7-1.0 = strong
3-0.69 = moderate
0-0.29 = none to weak

Answer 92

A

the process whereby statistics, such as the sample mean, would give different results if the random sampling process was repeated

Answer 93

A

the participants and experimenter do not know which treatment they are receiving/giving

Answer 94

A

the variable that is changed

Answer 95

A

the outcome of the change of independent variable

Answer 96

A

eliminate the placebo

Answer 97

A

helps remove bias in a comparative experiment

Answer 98

A

more economical
time efficient
can be more accurate due to greater control over measurements

Answer 99

A

when members of a sample over/under represent attributes of a population

Answer 100

A

no correlation between the independent and dependent values

Answer 101

A

number of independent pieces of information to estimate a parameter

Answer 102

A

on average gives the population mean, variability in this estimate gets smaller as the sample size increases, spread gets smaller as n increases

Answer 103

A

distribution of the t statistic, used to estimate population parameters when sample size is small and/or when population variance is unknown

Answer 104

A

one independent variable

Answer 105

A

studies interaction between factors influencing a variable

two independent variables

Answer 106

A

used when populations do not have equal variances

null hypothesis that means are equal

Answer 107

A

= residuals

Answer 108

A

if normal variability, should show constant variability and no obvious pattern

Answer 109

A

square differences and divide by expected value

Answer 110

A

the degree to which ratings given by different observers agree

Answer 111

A

the degree to which ratings given by the same observe on different occasion agree

Brainscape's Knowledge GenomeTM

Stats Flashcards

Brainscape's Knowledge Genome^TM