Metagenomics Flashcards
(57 cards)
What environmental processes are microbes responsible for?
o Most of the biogeochemical cycles on earth [Cycling of substances through which a substance moves through the biotic and abiotic]
o Waste processing
o Growth & reproduction of plants & animals
o Production of antibiotics, food fermentation & maintain human health.
What is Metagenomics?
- The study of genetic material recovered directly from environmental samples
- It involves pooling and studying the genomes of all the organisms in a community -> all the functions encoded in the community’s DNA (metagenome) can be studied
What does Metagenomics let us find?
o Genetic info on potentially novel biocatalysts / enzymes
o Genomic linkages between function & phylogeny for uncultured organisms
o Evolutionary profiles of community function & structure
What are the steps in a Typical Sequence-Based Metagenome Project?
- Experimental Design
- Sampling
- Sample fractionation
- DNA extraction
- DNA sequencing
- Assembly
- Annotation
- Statistical analysis
- Data storage
- Data sharing
What is the foundation of a good Metagenomics study?
Experimental Design
What criteria should extracted DNA satisfy?
o High quality
o Representative of all cells present in sample
o In sufficient amounts for library production & sequencing
What are 4 processing methods used in Metagenomics studies?
- Physical fractionation:
o Applicable only when certain parts of community are the target of analysis (like viruses in seawater) - Physical separation & isolation of cells from samples:
o Might be necessary to maximise DNA yield or avoid co-extraction of enzymatic inhibitors (Like humid acid in soil - stick to exposed DNA) - Lysis of cells:
o Direct lysis in soil has quantifiable bias vs indirect lysis in terms of: Microbial diversity, DNA yield, Resulting sequence fragment length - Multiple Displacement Amplification (MDA):
What is the process of Multiple Displacement Amplification ?
- Non-PCR based DNA amp technique.
- Anneals random hexamer primers to template
o No denaturation required, increase in [Hexamers] is sufficient to allow slow initial priming step - Once reaction starts strand-displacing mechanism of MDA releases ssTemplate for ongoing priming & amp
- phi29 polymerase extends primers till they reach the next primer (start of a dsDNA section)
- ph29 displaces the dsDNA strand it just hit and continues polymerization (‘under’ the displaced strand)
- New primers bind to displaced strand -> polymerization again -> hyperbranched structure
- MDA generates larger sized product with lower error frequency than conventional PCR amplification
What are 3 sequencing methods used in Metagenomics?
- Classical Sanger Sequencing
- 454/Roche System / Pyrosequencing
- Illumina
Why is Sanger still considered the gold standard sequencing technology?
o Low error rate
o Large insert sizes
o Long read length (>700bp)
When is Sanger sequencing applicable?
- Applicable if objective is generating close-to-complete genomes in low-diversity environs
What are the disadvantages of Sanger sequencing?
o Labor-intensive
o Bias against genes toxic to host
- [because of large insert size, full length genes could be included which would be expressed and kill host]
o Overall cost per Gb (±400 000 USD)
Roche system summary info
- Based ‘sequencing by synthesis’ principle
- Relies on detection of pyrophosphate release on nucleotide incorporation
o Sanger relies on chain termination with diDN - Uses emulsion polymerase chain reaction (ePCR) to clonally amplify random DNA fragments attached to microscopic beads
- Much cheaper than Sanger (± 20 000 USD per Gbp)
- Avg read length = 600-800bp
- Offers multiplexing (up to 12 samples of ±500 Mbp in a single run
What are the steps of Pyrosequencing via the 454/Roche System?
- DNA Library constructed -> DNA Fragments ligated with adaptors
- Strand amplification by ePCR on surfaces of 100 000’s of agarose beads
- Surfaces of beads have mills of oligomers -> each is complimentary to adaptors on fragments
- ePCR uses vigorously mixed oil & aqueous mixture -> isolate individual agarose beads (each bead with individual unique DNA fragment hybridized to its surface
a. Isolated in aqueous micelles that also contain the PCR reactants - Micelles pipetted into wells of microtiter plate -> temp cycling produces > 1mil sequence-ready beads
- Each bead has up to 1mil copies of original annealed fragment
- Beads added to surface of 454 pico titer plate (PTP)
a. PTP: Single wells in tips of fused fiber optic strands (1 bead in each well) - Smaller magnetic & latex beads (attached to active enzymes needed for pyrosequencing) added to surround DNA-containing agarose beads in PTP
- PTP placed in sequencer, nucleotide & reagent solutions delivered into it in sequential fashion
- Binding of nucleotide releases APS -> ATP sulfurylase + APS converts PPi to ATP ->ATP + luciferase -> oxidation of luciferin -> light
What is the Illumina sequencing average read length?
±150bp
What is the cost of illumina?
±50 USD per Gbp
What are the drawbacks of Illumina?
- Limited read length-> increased proportion of assembled reads which may be too short for functional annotation
- Limited systematic errors - But some datasets have high error rates at tail ends of reads
o Can clip reads to eliminate the ‘bad’ datasets
Why is Assembly necessary?
- Assembly of short read fragments is necessary to obtain longer genomic contigs to:
o Determine genome sequence of uncultured organisms
o Obtain full-length CDS (coding DNA sequence) for subsequent characterization
What is a Pangenome?
o Entire gene set of all strains of a species. Includes:
o Core genome (genes present in all strains)
o Variable genome (genes present in only some strains)
Why are assembly algorithms that assume clonal genomes less suitable for Metagenomics?
- Microbe comms have significant variation at strain & species level
o Because the ‘clonal’ assumptions built into many assemblers might lead to suppression of contig formation for some heterogenous taxa at specific parameter settings
o De Bruijn-type assemblers deal explicitly with non-clonality of natural populations
What are the 2 Assembly strategies for Metagenomics samples?
o Reference-based assembly (co-assembly):
- Works well if closely related reference genomes are available - BUT: differences between sample genome & reference (large insertion, deletion etc.) can -> fragmented assembly or in divergent regions not being covered.
o De novo assembly:
- Typically requires larger computation resources
What is Binning?
The process of sorting DNA sequences into groups that might represent an individual genome or genomes from closely related organisms
What are the 2 types of info within a DNA sequence that binning algorithms use?
- Genomes have a conserved nucleotide comp which will also be reflected in genomic DNA fragments
- An unknown DNA fragment might encode for a gene which is similar to known genes in a reference database
When using any binning algorithm what important considerations should be thought about?
o The type of input data available
o The existence of suitable training datasets or reference genomes