MPS Technology Part 2 Flashcards
(22 cards)
What are the common features of massively parallel sequencing technologies
Even though they have similarities are all of them diff
Random fragmentation of DNA
Ligation with custom adaptors to make a library
Paired-end read capability (read the seqeunce from either end)
“Digital” reads that enable quantitative comparison between one sample and the next
Yes they are all very different these are just similarities
What is an illumina sequencer
How does it work
Second (next) generation sequencing: amplification and cluster based sequencing
Don’t use it anymore
In vivo:
1. DNA fragmentation of the thing you want to seqeunce
- Cloning and amplification (in vivo) or adaptor ligation (in vitro), then denature the two strands by changing PH
- The adaptors on the seqeunce bind to oligonucleotide probes fused to the glass in the flow cell
- Each nucleotides added has a diff colour fluorescent probe
- As nucleotide get added to the adapter on the chip a single fluorescent dot of a single colour is on the chip.
- Via PCR that one dot gets amplified to made a bundle of them in one place. This makes a polony (PCR colony)
What does third generation sequencing mainly focus on
Individual/ single molecule seqeuncing
Explain in depth the sequencing by synthesis in illumina sequencing
So you have the ss fragmented DNA, it binds to the surface of the flow cell
Bridge amplification: one adapter on the ssDNA on the chip flips and compliments a oligonucleotide adaptor coupled to the chip
Then labelled nucleotides added to adaptor and PCR type reaction happens where nucleotides make bridge bundles
The the double strand bundle dna is denatured for sequencing reaction to happen
Flush on the probes nucleotide that only one nucleide added at a time , next nucleotide in the strand is incorporated
Capture Fluor image using microscopy, since on nucleotide added to the single bundle you can see the instense Fluor
Unblock the nucleotide (so another can get added), flush in new nucleotides then image
What does sequencing by synthesis mean
Using enzymatic (dna pol) synthesis Kd DNA to carry the reaction forward
Challenges with illumina sequencing data
Lots of depth but cant easily assemble reads
Its short read sequecning (getting a bunch of little mini reads) so you get lots of contigs
However, the genome has a lot of repeats, so if you have a repeat of 500bp but the machine can only give 200bp, you can assemble the rest of the repeat
If try to map the contigs over a reference genome (finished genome) it’s very hard because so many diff small peice to put together
What is SMART: Pacbio
Single molecule real time sequencing
Massively parallel since diff templates in each well
Produced very long readd which allow you to computationally assemble the genome easier than illumina
Still big machine not that accessible
Sequencing by synthesis at the level of single molecules
How is the sample prepared for pacbio SMART
Do shotgun sequencing or PCR amplification
Then ligate adapters to the repaired ends of the fragmented DNA
Then you get adapters at the ends of the DNA that can be denatured and circularized at a certain temperature
Explain in depth how SMART sequencing works
Nanofludic chambers in each well, at the bottom of each well is an immobilized DNA pol
The fragmented dna with the circular adaptors is threaded through the pol
Since circular, it can keep seqeuncing and repeating to get better consensus
The wells diameter is smaller than the wavelength of light, so since it’s so small only space for one nucleotide to be detected
There are labelled nucleotides in the well, when a nucleotide gets added there is a single pulse for a quick second of fluoresces at the bottom of the well (because nucleotide is fluorescent) then can identify what was added
Then the fluorophore detaches and pulse is gone, next nucleotide comes in
In each well there is a diff fragmented dna strand so you get long read of each fragment
What would be a source of error for SMART seqeluencing PACBIO
Irregular fluroecenrce rate, some nucleotides get added slowly and some really fast, Irregular rate causes problems in how the fluor signal is detected
Also if you have homopolymeric tracks of nucleotides, the same Fluor fluorophore gets added mutiple times in a row: if this happens really quickly the computer can’t tell how many of those nucleotide repeat there were
This leads to misjudging the number of bases/length of the homopolymeric tract
How do you fix the homopolymeric tract error in SMART PACBIO
use statitsitc, if 20% of readd are wrong then keep repeating the sequencing of the one molecule
Then make a consensus read to have a really high average
What is the length order of readd in Pacbio, Sanger, illumina
Illumina shortest
Sanger (600-1000)
Pacbio longest (more than 10,000bp)
How do you filter out the data from PACBIO SMART
What does coverage mean
Throw out the lowest 75% of readd and keep the best 25% of reads
Coverage means that nucleotide has been read many times so it’s accurate
Explain the HGAP
Hierarchical genome assembly process
Take the longest reads, assemble them as the longest seed reads, then assemble those seed reads into a finished genome using a reference
Does this in a heirachical way
Short read and long reads
Illumina short
Pacbio nanopore long
How does nanopore sequencing work
A nanometer sized pore inserted into a membrane
Current devices use E.coli CsgG porin as a nanopore
Single DNA strand goes into the pore, ions flow with it
The ions cause a change in conductance of the membrane that corresponds to the base going through the pore
1D reading (only one strand of DNA read)
What are the characteristics of nanopore
What is the read size in nanopore limited by
Lowest accuracy , but can make up for this but repeating the experiment many times, also use hybrid assemblies to get better quality genomes
Read size is limited only by the shearing of DNA during the extraction
Gives accessibility: low cost , $1000 for starter kit
Bioinformatics solutions for it are more improved (used to be more limited)
Readd >100,000 bp
Flognle and nanopore designed for what
What to consider for illumina and Pacbio
To be smaller scale and operated in a single lab
But don’t have the biggest output (might need more data than they actually give)
Ex. Want to finish a bacterial genomes, Pacbio is good for that but so much through put that you wouldn’t be able to seqeunce just a single bacterial genomes, you can seqeunce if many time over
But for a human genome Pacbio may not be sufficient because now too high BP
If want many reads and doing a counting of individual read to see what molecules are there, illumina is good
What technology to use for a particular work flow
Okay