Statistics Flashcards

(36 cards)

1
Q

What is a p-value?

A

The probability of observing data as extreme or more extreme under the null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does a 95% confidence interval mean?

A

We are 95% confident that the interval contains the true population parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When should you use a t-test?

A

When comparing the means of two groups with normally distributed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is ANOVA and when is it used?

A

Analysis of variance; used to compare means across ≥3 groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is multicollinearity?

A

When two or more predictors in a regression model are highly correlated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define statistical power.

A

The probability of correctly rejecting the null hypothesis when it is false.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Bonferroni correction?

A

Adjusting the p-value threshold by dividing it by the number of comparisons to control false positives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is FDR?

A

False Discovery Rate: the expected proportion of false positives among the declared significant results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is overfitting?

A

When a model learns noise rather than the true pattern, leading to poor generalization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is cross-validation?

A

A method to assess model generalizability by splitting data into training and testing subsets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is scaling important in metabolomics?

A

Metabolite concentrations vary widely; scaling ensures each variable contributes equally to the analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Difference between autoscaling and pareto scaling?

A

Autoscaling: mean-centering + divide by SD. Pareto: divide by √SD. Pareto retains more biological variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is normalization in metabolomics?

A

Adjusting data to correct for sample size, instrument drift, or batch effects (e.g., total area, internal standards).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is VIP score in PLS-DA?

A

Variable Importance in Projection; indicates the contribution of each variable to the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How is model validity assessed in PLS-DA?

A

Using cross-validation, permutation tests, and metrics like R², Q².

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is R² vs Q²?

A

R²: explained variance. Q²: predictive ability (cross-validated). High Q² indicates a robust model.

17
Q

Why are univariate and multivariate analyses both used?

A

Univariate pinpoints individual features; multivariate captures patterns and correlations among metabolites.

18
Q

What is a volcano plot?

A

A scatterplot of –log10(p-value) vs log2(fold change) to visualize significant metabolite differences.

19
Q

What is hierarchical clustering?

A

Unsupervised method grouping samples or variables based on similarity (distance metrics).

20
Q

What is k-means clustering?
.

A

Partitions data into k clusters based on minimizing within-cluster variance

21
Q

What is a heatmap used for in metabolomics?

A

To visualize patterns across samples and features; often clustered by similarity.

22
Q

What is metabolite identification confidence level?

A

Level 1 (confirmed with standards) to Level 4 (unknown), as per MSI (Metabolomics Standards Initiative).

23
Q

What is ROC analysis used for?

A

Evaluating diagnostic accuracy (AUC, sensitivity, specificity).

24
Q

What is bootstrapping?

A

A resampling method to estimate the variability of a statistic.

25
What is permutation testing?
Shuffling labels to test the significance of a supervised model.
26
What is Mahalanobis distance?
A multivariate distance metric accounting for correlations between variables.
27
What is the curse of dimensionality?
As dimensionality increases, the data becomes sparse, reducing model performance and interpretability.
28
What is LASSO regression?
Linear regression with L1 penalty; performs variable selection and regularization.
29
What is multiple testing correction?
Adjusting for the number of hypotheses tested to reduce false positives (e.g., FDR, Bonferroni).
30
What is metabolite imputation?
Filling in missing values using strategies like k-NN, minimum value, or model-based methods.
31
What are QC samples used for?
Monitoring analytical performance and enabling correction of signal drift.
32
What is LOESS normalization?
Locally weighted regression to correct batch or signal drift trends.
33
What is sPLS-DA vs PLS-DA?
sPLS-DA includes sparsity (variable selection); PLS-DA uses all variables.
34
What is an eigenvalue in PCA?
It represents the variance explained by each principal component.
35
How do you validate pathway analysis results?
Cross-check with literature, compare across datasets, or validate using biological experiments.
36