Statistics Flashcards

(62 cards)

1
Q

Inferential statistics

A

Allows for generalisations to be made about a population from a sample representative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are common problems in biological data

A
  • small sample size
  • unequal sample size
  • correlation within data (measurements from subject over time, or measurements from brain regions at the same time will always be correlated)
  • unequal variance (heterogeneity)
  • non-normal (skewed) distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Discrete variables

A

variables who values are finite, or countably infinite values within a range
- eg: ‘pain relief’ vs ‘no pain relief’ or subjective rating scales

these numbers do not have the same academic integrity as continuous variables (thus means have ‘less’ meaning)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

continuous variables

A

variables whose values exist on an infinite continuum/are uncountable
- e.g. frequency, temperature, amplitude, enzyme concentration, receptor density

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Binary variables

A

Yes or No outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Nominal variables

A

Represents groups with no ‘rank’ or ‘order’ within them
- eg: species, colour, brands

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Ordinal variables

A

Groups that are ranked in a specific order
- eg: likert scales,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Parametric statistics

A

that the data follows a normal distribution, and that there is equal variance within each group (homogeneity of variance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Nonparametric statistics

A

used when the data does not follow a normal/known distribution
- tend to be less statistically powerful

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Parametric test used for 2 unpaired groups

A

Unpaired t-test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Non-parametric test used for 2 unpaired groups

A

Man-Whitney U test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Parametric test used for 2 paired groups

A

Paired t-test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Non-parametric test used for 2 paired groups

A

Wilcoxon test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Parametric test used for ≥3 unmatched groups

A

1 way ANOVA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Non-parametric test used for ≥3 unmatched groups

A

Kruskal-wallis test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Parametric test used for ≥3 matched groups

A

Repeated measures ANOVA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Non-parametric test used for ≥3 matched groups

A

Friedman test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Parametric test used to determine association between two variables

A

Pearson correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Non-parametric test used to determine association between two variables

A

Spearman correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Parametric test used to predict a value for one variable from other(s)

A

Simple linear/non-linear regression
Multiple linear/non-linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Parametric test used to predict a value for one variable from other(s)

A

Non-parametric regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Central limit theorem

A

As the sample size increases, the probability that the data will be normally distributed increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

The null hypothesis

A

Assumes that there is no difference between groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Power

A

(1-b)
the probability of rejecting the null hypothesis
- increasing sample size results in decreased variability, and thus greater power

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
a
Type 1 error rate; rate of incorrect rejection of null hypothesis **this is equal to the significance level (which is typically 0.05, meaning 5% probability of falsely rejecting the null hypothesis)
26
b
type II error rate; rate of incorrect acceptance of null hypothesis
27
What metrics cannot be altered to increase power?
- variability, as it is fixed depending on type of data - type I error
28
T-test
Ratio of the difference between two groups in relation to a measure of variability (standard error)
29
Examples of the two types of t-test
- non-paired: comparing cannabis treatment to placebo treatment in different groups - paired: comparing cannabis treatment to saline treatment in the same group
30
ANOVA
Analysis of variance Used to determine whether ≥3 means are significantly different Takes into account variance both between (treatment variance) and within (error variance) groups
31
1 factor ANOVA
Used when examining different treatments on different groups
32
Repeated measures ANOVA
Used when investigating different treatments on the same group
33
Multilevel ANOVA
Used when investigating ≥2 independent variables and the interactions between then - provides a separate F value for each independent variable and interaction
34
Multiple comparisons
Using multiple t-tests is advised against, as it increases the type I error - Bonferroni corrections counteracts this Hence why ANOVAs are preferred for ≥3 groups
35
Post-hoc tests
aka multiple comparisons tests Used after completing an ANOVA to determine which groups are significantly different - Dunnett test - Tuker-kramer test
36
Pseudo-replication
occurs when the number of measured values or data points exceeds the number of genuine replicates - eg: confusing # slices with # animals Leads to an inflation of sample size, thus artificial inflation of power
37
Linear mixed model analysis (when used + assumptions)
A statistical method used when data is not independent, and errors are correlated Assumptions: - does not assume independence of data - does not assume balanced design - does not assume homogenous variance - assumes random sampling - covariance structure must be specified
38
Covariance
Provides the relationship between two variables from slope gradients and sample size - relationship defined by slope valence - does not inform gradient or derivation
39
Correlation (R)
Measures the degree of association between two variables - not sensitive to scale - quantifies strength of correlation Defined as a number, r where -1
40
R^2
The coefficient of determination: - A metric of correlation that allows comparison of two correlations = variance around mean - variance around line / variance around mean = 1 - RSS/TSS R^2 should be ≥ 0.80 eg: R^2 = 0.80 = the relationship between two variables accounts for 80% of the variation
41
Regression analyses
Statistical method that allows examination of the relationship between 2+ variables of interest through the generation of a line of best fit - linear or non-linear - t-tests/ANOVA can be used to determine significance of regression
42
Sum of squares
Total sum of squares (TSS) = variation of data about the mean Residual sum of squares (RSS) = variation not explained by the regression line sum of squared regression (SSR) = variance explained by regression
43
Simple regression
A statistical method that allows examination the relationship between two variables of interest - Calculate residual sum of squares - Smaller RSS indicates a better fit - used for all standard curves
44
T-test for regression
The regression co-efficient (slope) / standard error of slope co-efficient = b/SE(b) - can also be expressed as a confidence interval = b ± taSE(b) - typically set at 95%, indicating that 95% of the data will fall between a set of values
45
ANOVA for regression
Determines whether the amount of variation accounted for by the regression line (SSR/SSE) is greater than variation NOT explained (RSS) - signal > noise
46
Assumptions for t-test/ANOVA for regression
- residuals are normally distributed - constant variance (SD) of residuals - independent samples If these are not fulfilled type I error increases
47
Non-linear regression
A statistical test that used calculus and matrix algebra to determine the line of best fit for a non-linear relationship - requires initial estimated parameters (mean, SD) - can be used to interpolate values - useful for obtaining Bmax, Ka, EC50 etc...
48
Linearising transform
Data can be transformed so that it fits the assumptions for linear regression eg: - scatchard plots for binding data - lineweaver-burke plots for enzyme kinetics - logarithmic plots for kinetic data TRANSFORM DISTORTS THE ERROR - violates assumptions of regression of normal distribution of error and ~equal SE for each x value
49
Issues with Scatchard plots
X (bound drug) is often used to calculate Y (bound/free) i.e. the independent variable is part of the dependent) - results in inaccurate Y values - violates assumptions of linear regression (normal distribution and homoscedasticity; equal variance of errors)
50
Multivariate statistics
Statistical analysis that are used when there are multiple dependent and/or independent variables - used commonly in clinical neuropharmacology - becoming more common in genomics and proteomics
51
Multiple linear regression
An equation composed of multiple regression coefficients for different independent variables (x1,x2) but with a single dependent variable (y) y = b1x1 + b2x2 +... + c requires adjusted R^2 to take into account multiple variables as a function of sample size
52
Multi-collinearity
Occurs when regression variables are highly correlated, resulting in an inflation estimate of variance through sum of squares - inaccurate coefficients - can lead to a significant F value but no significant differences between any specific groups The highly correlated variable should be removed as they are REDUNDANT
53
Principal component analysis
- identifies the most important features (principal components) that contribute to variation - Plots these variables in order of importance according to 'eigenvalue' - the second PC is always perpendicular to the first
54
Discriminant analysis
A statistical method that helps you to identify the most important variables that distinguish the different groups in data - Principal component analysis - Factor analysis
55
Factor analysis
Used to simplify complex data by identifying common factors that explain the relationships between dependent variables
56
Random forest classification
A machine learning method that utilises multiple 'decision trees' and finds the average to give a final result - can be used to determine how good an independent variable is at predicting dependent - error plateaus after ~100 trees
57
Eigenvalues
‘components’ or ‘factors’ (mathematically known as ‘roots’) that explain most of the variation in the data - In an analogous way to ANOVA, these eigenvalues represent the major sources of variation in the covariance matrix
58
Cluster analysis
An exploratory technique often used on very large data sets to show variables that typically vary together, i.e. have a relationship - Results often shown using a ‘dendrogram’ - often requires a 'z' transform - Different algorithms can be used to determine clusteres
59
Canonical correlation analysis
used to identify and measure the associations among two sets of variables. Canonical correlation is appropriate in the same situations where multiple regression would be, but where are there are multiple intercorrelated outcome variables.
60
Non-parametric multivariate analysis
- few assumptions about data eg: random forest classification/regression, PCA, and cluster analysis
61
Scree plot
Way of interpreting data from PCA - plots each principal component in order based on amount of variation that it explains (eigenvalue)
62