Chapter 11 + 12 Flashcards
Corresponding to the HC's of week 1 of the course's second part
Similarities between data analysis pipelines between different omics approaches
-All technologies yield many measurements for each sample
-Same way of handling dimensionality
-yields hundreds or thousands of variables per sample like different genes, proteins or metabolites
Samples organised in matrix
Rows: the samples
Columns: the variables (like genes)
Four components of the generalized data analysis pipeline
- Experimental design and data collection
- Data preprocessing and quality control
- Data analysis
- Biological interpretation
The experimental design can have large impact on the statistical power and therefore the …
Conclusions that are reached
First step in experimental design
Frame a biological question
What is the aim of the biological question?
Determine the hypothesis that will be tested and the statistical test that will be executed.
What does the biological question determine which is needed for an interpretable and successful outcome?
The experimental preconditions
Three types of main objectives which require a different type of experimental design
- Detection of responsive features under controlled experimental conditions (perturbation study)
- detection of biomarkers
- identification of regulatory or mechanistic relationships between variables
Experimental designing after biological question
Identify noise factors and design the experiment
Noise factors
Factors that can disturb a proper measurement (from the biological experiment up to and including the measurement)
Noise factors can lead to …
bias
Three basic principles to deal with noise factors.
- Replication
- Randomization
- Blocking
What is the aim of the experimental design?
Ensure reliable measurements free from bias
Replication
Duplicate, repeat or perform the same measurement more than once
> obtain an estimate of the experimental error
On what factor is the type of error which is estimated with replication dependent?
On how the replication is done
> For estimating and controlling biological variability: different organisms or batches of cells samples should be processed in the same manner.
Types of replication errors
-Repeatability: error based on repeats of sample measurement (same sample)
-Reproducibility: error based on sample workup or sampling/the whole experiment (larger errors)
Types of replicates
-Biological replicates: error based on the whole experiment (also the organisms) > not interested in 1 individual
-Technical replicates: to gain statistical power
Randomization
Requiring the experimenter to use random choices for every factor that is not of interest but might influence the outcome of the experiment
> random selection of individuals for groups
> hybridization of mRNA samples from treatment and control group: sensitive for external factors: important to not measure all controls first and then all treated: impossible to distinguish between time effect (not interesting) and treatment effect (interesting)
Confounder
A Confounder is a variable whose presence affects the variables being studied so that the results do not reflect the actual relationship (e.g. time: randomize over time to eliminate the bias)
Blocking
Arranging experimental samples in groups (blocks) that are similar to one another
(e.g. gender, or different columns)
> but within the groups the variation of treated/control needs to be similar
> or: blocks because not all measurements can be done on one day
> eliminating confounding effect of gender or LC column
General rule for blocking
Block what you can, randomize what you cannot (treated/control is not blockable)
Which instruments show drift in time?
GCMS and LCMS (for metabolomics and proteomics)
Where is the order in which the samples defined and why is it of importance?
In the measurement design: important because in LCMS or GCMS when the number of samples is large and several batches are needed, instrumental drift causes samples to be measured in the beginning to be slightly different than when measured at the end of the series.
Why is randomization crucial in different batches in the LCMS or GCMS
Because of instrumental drift, when no randomization is performed, the observed difference could be only due instrumental drift and there is bias. the actual results are not destinguishable from the bias.