HC 11 - Metabolomics: Preprocessing of Data Flashcards

Hoorcollege 11

1
Q

LC-MS defintie

A

Liquid chromatography - Mass Spectrometry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do you want to get from the Raw LC-MS data?

A

-Metabolite ID
-Metabolite
-Amount per sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Metabolisme

A

Alle biochemische processen die in de cel plaatsvinden

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Metaboliet

A

Alle organische moleculen in een organisme met molaire massa <1500 Da

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Glucose is 180 Da. Hoeveel amu is dat?

A

180 amu

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Hoe wordt de molaire massa van een atoom berekent?

A

Door isotypes en existentie in natuur ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Range massa metabolieten

A

100-1500 Da

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Massa humaan genoom

A

22 000 miljard Da

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Metabolomics stappenplan

A

-Sampling & sample preparation: protocol > samples
-Data aquisition: samples > raw data
-Data Pre-processing: raw data > list of peaks/metabolites
-Data analysis: list of peaks > relevant metabolites and connectivities
-Biological interpretation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Challenges with metabolomics

A

-Sample complexity
>body fluids/tissues
>hundreds or thousands metabolites per sample
-Chemical properties
>polarity
>size and mass
-concentration range is very large

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

LC-MS: define the steps of LC and MS

A

LC: seperation into analytical fractions based on polarity
MS: seperation and characterization based on mass

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Formula SNR (Signal to Noise Ratio)

A

S/N = GEMs/SDn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Mass spec: measurements, main steps

A

-Measuring m/z
-Steps
>Conversion molecules to gas-phase ions into vacuum (ESI under high pressure)
>Separation according to m/z
>Detection of ions and signal processing into mass spectrum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does ionization take place (protonation)?

A

High voltage power supply connected to the ESI needle
> Ionization into liquid
> Electrons are removed
> Protonation of the molecule [M+H^+]
> droplet formation
> solvent evaporation
> coulomb fission
> continued fission and evaporation
> dissolved ions in gas-phase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which modes does ESI have?

A

Positive ion mode [M+H] and negative ion mode [M-H]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

negative ion mode ESI

A

Addition of electrons instead of removal > [M-H] > removal of H+

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Protonation of bases and deprotonation

A

Creation of [M+H]+
R-NH2 + H3O+ <-> R-NH3+ + H2O
Creation of [M-H]-
R-COOH + OH- <-> R-COO- + H2O

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the base peak in the ESI MS spectrum?

A

The most intense peak in the mass spectrum.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Differences in axes in the chromatogram and mass spectrum

A

LC > intensity (y) over retention time (x)
MS > intensity (y) over m/z (x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

LC-MS has three dimensions. Which?

A

Intensity (y) over retention time (x) and m/z (z)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is an EIC?

A

An Extracted Ion Chromatogram
> a chromatogram of a specific range or value of m/z (mixture) summed up (summed up the intensities per time point within the m/z range).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a TIC?

A

A Total Ion Chromatogram
> a chromatogram (I over RT) for all m/z values: mixture chromatogram at each time point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Mixture mass spectrum

A

Mass spectrum (I over m/z) for all time points summed up per m/z

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Why can’t a singel analysis of LC-MS provide full coverage?

A

Because LC-MS measures hundreds of metabolites across 4 orders of magnitude in concentration in a single run (linear)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Within metabolomics: different kinds of sample preparation, chromatography and MS ion modes

A

1 sample
>
Sample Preparation
-Lipidomics
-Polar Metabolomics
>
Chrom for lipidomics
-Normal phase LC: amphipathic
-Reverse phase LC: apolar
Chrom for polar metabolomics
-cHILIC
>
MS ion modes for all
-positive or negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How can coverage be expanded?

A

-Changing the measurement conditions (sample preparation, chormatographic phase, polarity)
-Separate complex mixtures in LC-MS by making use of chemical properties over a single run

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Drawbacks of LC-MS

A

-Identification is challenging
-Absolute quantificantion only possible with good analytical standard materials (isotope labeled)
-Sensitivity is different for each metabolite
-Destructive >once a biological sample is measured, it cannot be measured again.
-Ion suppression
-Column degradation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

If the m/z is 600 in negative ion mode. What is the neutral molecule mass? (z=-1)

A

600 * 1 + 1 (deprotonation) > 601 Da

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Applications TIC

A

Sum of the intensities at each time point
-Understanding overall chromatography and hunting for new peaks
-Useless for complex samples > so many peaks elute the run that peaks are unresolved

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Base peak chromatogram

A

Reconstructed ion chromatogram: maximum intesity across all m/z values per time point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Reconstructed ion chromatogram is a synonym for:

A

The Extracted Ion Chromatogram (EIC, XIC)

32
Q

EIC

A
  • Total intensity within a mass tolerance window around a particular analyte’s mass-to-charge ratio is plotted at every time point in analysis
33
Q

Applications EIC

A

-Detect previously unsuspected analytes
-highlight potential isomers
-resolve suspected co-eluting substances
-provide clean chromatograms of the compounds of interest

34
Q

Raw data in LC-MS (lists for multiple monsters) must be processed to:

A

Peak table

35
Q

XCMS workflow approach pre-processing

A
  1. Import LC-MS data (EIC)
  2. Find peaks
    >only in time dimension > filter data > identify peaks > integrate
  3. Correspondence
    > match peaks across samples > retention time correction > multiple times, different orders
  4. Fill in missing peak data
36
Q

How can peaks be detected by XCMS?

A

Slicing the data to EICs (EIC slices) with each one covering a narrow m/z range, therefore avoiding problem of searching for peaks in m/z direction.

37
Q

EIC slice creation

A

-Take the maximum at each time point
-Take peaks on the slice borders into account
-XCMS uses raw data to obtain correct m/z values for the detected peaks
-EIC in slice calculation

38
Q

In the EIC slices graph, where are the slices from which EICs are calculated and from which scan is the spectrum measured?

A

m/z on y and RT on x, dots are intensities.
>vertical line across all m/z for a time point: the scan (spectrum)
>horizontal line across all time points: the slice (EIC calculated)

39
Q

What information does XCMS need to detect peaks in EICs?

A

-Retention time
-m/z value
-Intensity
-Peak area
> for all the peaks

40
Q

Which peaks should not be detected (not true ions)?

A

False positives

41
Q

Why is identifying all signals caused by true ions difficult?

A

-Peak shapes
-Overlapping peaks
-Artifacts
>White noise (background)
>Drift (flicker)
>Interference

42
Q

What is drift (flicker)? and interference?

A

Drift: changes in response with changing operations or conditions
Interference: noise spikes of random occurence and intensity

43
Q

Which functions are very sensitive to noise?

A

The derivates of the function.

44
Q

What does the first derivative correct?

A

the linear baseline (background)

45
Q

Steps in the peak detection

A

-Raw data
-Smoothed data (moving average)
-Baseline corrected and smoothed data
-Detected peaks

46
Q

Which derivative is used by XCMS for peak detection, and how?

A

The negative second derivative
> maximum in original corresponds to a minimum in the second derivative
> maximum in original corresponds to maximum in negative second derivative
XCMS
> removes noise and calculates 2nd derivative in one step

47
Q

Result of peak detection?

A

Peak table for each individual sample

48
Q

Which data can be found in a peak table?

A

Left > right
-The peak number
-m/z and min/max
-RT and min/max
-peak intensities integrated/baseline corrected/maximum
-Index (sample no)

49
Q

Which data is found in raw data tab;e

A

L>R
-peak no
-m/z
-RT
-peak intensities raw/filtered data
-peak area raw/filtered data

50
Q

Which corrections is performed during correspondence?

A

RT correction > match peaks

51
Q

Peak matching

A

Deciding when two peaks are the same metabolite
> the m/z and RT may vary across the samples

52
Q

Which variation in m/z and RT is usually found across samples?

A

Small in m/z
Large in RT

53
Q

Binning

A

Grouping RT - m/z pairs across the sample into a ‘bin’.
> within the bin:
-small variation in m/z
-RT is variable

54
Q

How does a binning graph look like?

A

Dots in clusters with different colors corresponding to different samples and time (x) and m/z (y). Different bins of horizontal lines are made.

55
Q

What is the Gaussian Kernel Density Estimation?

A

An estimation to group together peaks from a cluster from one metabolite
> peak density is shown as a fluid black line on the y axis with time
> relative intensity is shown as red peaks into the graph.
> boundaries are placed next to clusters of peaks where a peak of density is located

56
Q

What needs to be set for Gaussian Kernel Density Estimation?

A

Where the gates are placed
For example: a Gaussian kernel with SD= 30 s means that gates are placed around a cluster with SD= 30 s. Only then, it is a cluster.

57
Q

What is a meta-peak in the Gaussian Kernel graph?

A

Smoothed peak density profile shown as a rectangle over the red intensity peaks which is darker at the denser parts

58
Q

What is a well behaved peak group and what is not?

A

A well behaved peak group is a peak group set within a bin which contains almost all different samples. If a peak group contains only one sample, it has to be removed.

59
Q

After binning and Gaussian Kernel Density Estimation, what is needed?

A

A retention time correction, since each cluster has a gate around it which is too large.> needs to be more precize, is accurate now

60
Q

What does RT correction mean?

A

Overlaying the RT for the different samples.
> only for the well behaved peak groups: to do this, calculate the median of the Retention Time.

61
Q

For a peak to be well behaved, what is needed?

A

That it is present in all samples or nearly all sample (missing in one or two samples, or extra peaks in a group is allowed)

62
Q

Steps of RT correction

A

-Calculate the median RT of all well behaved groups
-If a sample elutes before the median: negative deviation (logic, sd)
-Place this sample on a graph in which the y-axis plots the deviation from the median against time
-do this for all gates for the same color sample (eg red) (negative and positive deviations within each gate)
-Local regression (loess)
-Adjust retention times according to deviations fit (so that the line is stable around zero deviation)
-Nonlinear aligment: different corrections at different time points
-repeat for all samples
-use the correction factors to correct RT

63
Q

What is done with the correction factors from RT correction?

A

They are applied to well behaved AND not well behaved groups (all)

64
Q

Filling in the missing peak data

A

Determine which samples are missing for each peak group (no peak detected) and use information from the peak detection about where peaks begin and end and aligned RT for each sample.
> integrate the raw LC-MS data to fill in intensity values for missing data points

65
Q

Why is the filling of missing peak data needed?

A

Because a significant number of potential peaks can be missed in the detection and they are needed for robust statistical analysis.

66
Q

Filter and identify peaks in 3 steps

A
  1. EICs
  2. apply model peak matched filter
  3. identify and integrate peak data
67
Q

Match peaks across samples in 3 steps

A
  1. Segment global peak list by mass
  2. identify times w/ high peak density
  3. create groups in dense regions
68
Q

Retention time correction in 3 steps

A
  1. Identify groups for use as standards
  2. calculatie retention time deviations
  3. align peaks using non-linear warping
69
Q

Fill in missing peak data in 3 steps

A
  1. Identify groups missing samples
  2. calculate mass/time peak regions
  3. integrate intensity from raw data
70
Q

What can be done with the final peak table?

A

Statistical analysis > visualization of the important peaks

71
Q

Applications final peak table with compounds and relative concentrations

A

-Statistical analysis
-Multivariate statistical analysis
-Pathway analysis
-Systems biology

72
Q

Metabolite identification settings

A

m/z, ion mode (pos or neg)
> assume [M+H] or [M-H]

73
Q

What is systems biology?

A

a biology-based inter-disciplinary field of study that focuses on complex non-linear interactions within biological systems, using a more holistic perspective to biological and biomedical research instead of the traditional reductionistic approach. Combination of wet-lab experiments and computational approaches.

74
Q

what is systems medicine?

A

the scientific discipline that aims at systems level understanding of for example biological networks or cells or organs or organisms. It extends systems biology by focussing on the application of system based approaches to clinically relevant applications in order to improve patient health.

75
Q

What is needed for systems medicine?

A

Integration of different types of data
phenotype/clinical, molecular, information about cells, organs, social networks/epidemiology

76
Q

Contributions of systems medicine

A

o Improved molecular disease signatures
o Improved stratification (detection of sub-groups) of disease subtypes
o Improved personalized treatment
o Improved treatments through detections of novel genes which are mutant in diseases.