5 Genomics Applications : Metagrnomics And Amplicon Seqeuncing Flashcards
(28 cards)
What two methods are used for analyzing micro biomes
Amplicon seqeuncing
Metagenomics
What is metagenomics and amplicon seqeuncing used for
Used to understand microbial ecology and find:
Who is there: what in the micro biome
- the species richness (the number of different taxa present in the microbiome)
- the species evenness (the relative abundance of each taxa in the microbiome
What are they doing:
- metagenomics can tell us about the functional chemistry of the organisms (like their metabolic functions, food sources, etc)
- also tells us what environmental products they consume and excrete
How are they doing it:
- can predict the type of enzymes and pathways in the organisms that are carrying out specific functions
- can tell us division of labour: if all enzymes or pathways are present in one organism or if they are stratified between multiple members of the community (this is 3D genomic technology)
Does amplicon sequencing equal metagenomics
No, the purpose of doing them and analysis of the data is different
Amplicon seqeuncing shows diversity at very specific loci
Metagenomics shows function
What is 16s rRNA important for
It’s highly conserved type of rRNA in bacteria
used in Amplicons seqeuncing to identify and classify the bacteria into different groups and give an idea of population diversity of large environments of bacteria
Can be used to build phylogenetic trees of bacterial species
Characteristics of the 16s rRNA
How many BP
It’s the SSU (small subunit) rRNA and 1500 bp
Has conserved and variable regions.
The variable regions are species specific (so only present in the specific species which allows for classification)
What is the benefit and downside of short readd (illumina) and long readd (Pacbio)
Short read:
- give good depth since doing them multiple times but shorter Amplicon
- don’t get as good of a taxonomic classification with short read as you do with long read
Long readd have lower depth because the output from those sequences are smaller (longer readd but less of the readd)
- can do good taxonomic classifications because gives full length of rRNA gene
What are OTU
What is the rDNA identity threshold to be classified as in the same genus or species
Operational taxonomic unit
A species distinction in microbiology. It uses SSU (small subunit) rRNA and a percent similarity threshold to classify microbes to be within the same or different OTU’s
Genus: they’re in a genus if 16s rDNA identity is > 95 %
Genus: they’re in a species if 16s rDNA identity is > 98.7 %
but doesn’t work for all situations
How can databases be used to tell the phylogenies of the organism
There are Databases of genes that have a variety of seqeunces
Comparison to these genes can help you determine the phylogenies of the organism based on different loci in their own genes that are conserved
Explain how amplicon sequencing works
Uses two step PCR
- Overhang adapter:
- have a primer that attaches to the gene you want to amplify which is the conserved region of the 16s rDNA
- these primers also have adapters of synthetic DNA attached to them which compliment another primer later
- use PCR to amplify the conserved rDNA region , producting now has the overhang adapter sequence and the primer sequence on either end
- Index (genetic barcode):
- use another primer that has a a sequencing adaptor (to bind the flow cell) and a genetic barcode (to identify one amplicon sample from the next and multiplex) on it to bind the overhang adapter
- PCR
-amplicon PCR product now has the sequencing adaptor and barcode
What are curated databases
Give example of one
There been an extra quality control/human intervention step to give that data from the databases higher reliability and fidelity
This makes the curated database which you can use to build phylogenies. Ex greengenes
If you use a uncurated database to do a taxonomic/phylogeny assignment based on 16s rRNA then there a chance data is misinterpreted
What is the data analysis for amplicon sequencing
Hardest step
You do statistical and functional analysis like:
Using Ecological theory:
- Shannon diversity index
- alpha beta gamma diversity estimates
- procrustes analysis
Making community functional inferences:
- use a bioinformatics algorithms (like PICRUSt) to predict the function of the species in the environment your looking at
What are the downside to using PICRUSt to analyze the function of the species on the environment your looking at to say they’re the same species
How do you fix this
It’ll assume similar function based on genetic similarity :
- Since just using 16s rDNA sequences , the species could have a lot of similarity in dna (because in same environment) even though they’re diff species
Also:
- There is diversity in bacteria out there and you can’t assume that any two strains of bacteria are gonna have the same function (so although same species they could have diff functions)
To lab test to test your predictions
What is the output of amplicon sequencing
A chart that shows the relative abundance of bacterial taxa
Ex. If have cystic fibrosis patients you can do amplicon sequencing on their lung extract to see the diversity of bacteria:
- the abundance of diff bacterial taxa would change if you have the disease or not. Can use amplicon sequencing to show which taxa is most abundant
What are the problems with sequencing based 16s workflows (and basically all amplicon sequencing)
Contamination:
- since amplified your sample using PCR, if contamination (like contaminations microbes) show up their seqeunces are also amplified then the results are messed up
Extraction bias:
- based on how you extract your sample you could lose certain genes/organisms present in the sample
- this can skew the abundance of what in the sample and then the analysis
PCR jackpot:
- Can get random polymorphisms in the genes
- Then the error gets amplified in the later steps of PCR
- Then it gets fixed into the population of amplified genes
- This can mess up the estimate of OTU’s which messes up your classifications
Also chimeras
How can using Illumina sequencing for amplicon sequencing be odd
For amplicon sequencing you want the longest read possible
Since illumina makes short reads, it won’t sequencing not long enough to sequence the entire 16s rDNA that you got from PCR
So the seqeunces does short securing from on end of the amplicon and another from the other
Then computer does stitching where it takes the two reads of the one amplicon and stitches them together by removing the lowest confidence ends
Then get full seqeunced amplicon
But won’t work well if reads are too far apart
Explain the problem with chimera 16s rDNA sequences in amplicon sequencing
During PCR, the pol could fall off and extension could be aborted and the rest of the amplified product isn’t made
Then the partly amplified DNA acts as a primer (still on the original seqeunce) and amplifies around the conserved regions (making the wrong seqeunce on the top strand)
Then you get a Chimera hybrid that looks like a completely new seqeunce that never existed
What is benchmarking
Depending on the workflow of the amplicon sequences and algorithm used to analyze them, you can get diff outcomes so the organisms can be classified in diff ways
So need to be carful of how you interpret and angle the data
What is a meta genome and genome
Genome: entirety of an organisms
hereditary info
Meta genome: all the hereditary information in an environmental sample, consisting of the genomes of many individual organisms
What is the workflow of metagenomics
making a DNA library of all the organisms present in the environmental sample you collected (so not just dna of one organism and not just 16s rDNA)
Once you get the dna it Same as all other sequencing workflow
What are the uses of metagenomics
Exploration of what was in the sample and based on the dna
Comparisons between environments at different days
Identify specific functions of the organisms based on the DNA
Extract whole genomes of organisms from the metagneomic data