Sequencing and Bioinformatics Flashcards
dNTP vs ddNTP?
What do these abbreviations stand for?
Similarities and Differences?
Nickname for ddNTP?
Both can incorporate into a newly synthesised DNA strand
Deoxy Nucleotide (dNTP) - Oxygen present on 3’ C allows for chain extension
Dideoxy Nucleotide (ddNTP) - 3’ oxygen removed preventing chain extension; Terminator nucleotide
Explain Sanger Dideoxy sequencing
It is an example of ‘sequencing by _______’?
Sequencing by Synthesis
DNA is denatured, and new strand is synthesised with a mixture of dNTPs and ddNTPs
New DNA strand extends a known primer
As each nucleotide is added to the chain, there’s a chance that a terminator nucleotide will be added
If this occurs then no more bases can be added
The products are then run on a gel to figure out the sequence
Name of sequence once a ddNTP is added
Truncated sequence
Alternate ways of carrying out Sanger sequencing?
Running 4 reactions, each with a different ddNTP i.e. ddATP, ddGTP etc.
Explain Dye-Terminator Sequencing
What are the benefits?
Using fluorescently labelled ddNTPs
- This allows all the products to be run in the same lane in capillary gel electrophoresis
Each base is identified depending on the colour/wavelength of the fluorescent tag
This process can be automated and scaled up to industrialise the process of sequencing to sequence large mounts of DNA
How many bp can the ABI 370 sequencer read at a time?
800bp
Problems with sequencing human genome?
3 billion base pairs; Takes a long time if reading 800bp at a time
Piecing together the genome is also a challenge
Process the Human Genome Project Strategy (IHGSC) used? (hint: BACs)
- Extract human genomic DNA from multiple people
- Fragment DNA so they are small enough for the sequencers to read
- Size selection of 100-200kb fragments
- Clone fragments into Bacterial Artificial Chromosomes (BACs)
What are BACs and what do they do?
They are plasmids which contain sequence elements which trick E. coli into replacing/copying them during the cell cycle (as if they were a native plasmid)
Explain how BAC cloning is performed
BAC cloning amplifies DNA
BACs are transformed into E. coli cells and these grow colonies
These colonies grown contain millions of copies of the same fragment
What are genetic and physical maps used for in the sequencing of the genome?
Established a dense set of genetic markers across the human genome
What do genetic maps rely on?
What is the relationship between distance between markers and recombination frequency?
They rely on recombination frequency between markers
The further apart markers are from one another, the higher the recombination frequency
How do they link BAC clones to the genetic and physical maps?
Clones are tested for PCR markers that have known locations on genetic and physical maps
How are overlapping clones identified?
What is the likely origin for such clones?
The end of the insert, in the BAC containing the marker, is sequenced
Using the sequence used to design PCR primers, BAC clones containing that end sequence are looked for
Such BAC clones are likely to come from a neighbouring genetic locus
How are BAC clones organised?
Which types of BAC clones are identified?
They are ordered relative to the genetic map, with PCR primers
- These PCR primers are complementary to the sequence of a particular marker, allowing us to test and identify BAC clones that overlap the genetic marker
- Once we know that a BAC clone overlaps a marker, we know roughly where to place it on the genetic map
How is sequence data from the end of the insert obtained?
What is done with this data?
Sequence from the vector backbone into the BAC insert
This sequence data is used to design a new set of primers
How are these primers used with the BAC library?
Where are the clones identified likely to be?
They are ran with the BAC library to identify clones which contain the end sequence but not the original genetic marker
These are likely to be derived from a genomic region adjacent to the source of the original BAC clone
How is this BAC insert sequencing process repeated?
What can be done in the other direction?
The end of the second BAC clone insert is sequenced, allowing for another set of primers to be designed to identify a third BAC clone with an insert that overlaps the second
The same thing can be done at the other end of the original clone which overlaps the marker, to obtain overlapping BAC clones in the other direction
What is this whole approach of identifying and ordering overlapping BAC inserts called?
Chromosome walking
Allows a library of clones to be built up in the correct order relative to the genetic map
How does shotgun sequencing work?
Many fragments are sequenced at random and then assembled
Size of BAC inserts used in shotgun sequencing and how are the BAC clones generated?
BAC clone DNA is fragmented into smaller 5-10kb fragments and cloned into plasmid vectors
How are primers used in shotgun sequencing to derive plasmid insert sequence?
The sequence of the plasmid that the insert is in is known, so primers could be designed to sequence the insert
Done many times to derive a consensus of the insert sequence
What did Celera want to do with the human genome data?
Patent and commercialise it
What was the sequencing approach Celera used?
How is this different to IHGSC?
They used a shotgun Whole Genome Sequencing (WGS) approach
Instead of creating BAC clones, Celera fragmented into much smaller fragments of 2-50Kbp
What did Celera demonstrate with shotgun sequencing?
That shotgun sequencing was feasible for even large and repetitive genomes
Problems with Sanger sequencing on an industrial scale?
How did IHGSC get around this?
Doesn’t scale well as when reading, each capillary can only produce 1 sequence at a time
The IHGSC had ‘factories’ with hundreds of sequencers
What happened to the cost of sequencing in 2007 and why?
Price to sequence human genome dropped drastically from 2007 onwards due to Next-Generation Sequencing
What is massively parallel sequencing and how does it improve on Sanger sequencing?
What became the dominant platform for technologies like this?
Lots of molecules are sequenced at the same time as opposed to one; Reduced costs
Illumina is the main platform for this technology
How many sequence reads does 1 HiSeq run produce?
Length of sequencing reads?
≈8 billion sequence reads
≈100bp sequence reads
Illumina uses sequencing by _______?
What is used to detect each base?
Sequencing by synthesis
Fluorescent bases are used to detect each base
Why and how are the molecules amplified in Illumina sequencing?
Optical sensors are not sensitive enough to detect the signal from a single template molecule
PCR amplification of template molecules is done via a process called bridge amplification
First step of Illumina sequencing?
Fragment DNA and size select fragments of ≈500bp
Refer to cluster generation process images and explain the process
Outcomes of cluster generation?
Overlapping?
Generate millions of separate clusters, each with sequence data from a different region of the genome
Clusters are large enough to be detected when they fluoresce
Some clusters overlap, potentially resulting in the loss of a few reads; Doesn’t matter as there are so many clusters
What is used in Illumina’s cluster sequencing by synthesis process? (type of term. nucleotide)
Reversible terminator nucleotides; Block chain extension, but the block and dye can be removed
Once the block is removed it acts like a dNTP
Refer to cluster sequencing process and explain the process
Also explain how it is repeated on different clusters to generate the second read?
Size of fragments selected in Illumina library preparation?
500bp
How far apart are read pairs when doing 2 rounds of sequencing clusters?
500bp
Advantages of 3rd Gen sequencing over Illumina? (4 advantages)
Single molecule sequencing – No amplification required
Real time sequencing – Data is generated during the run
Ultra-long read lengths - Up to 50kb (PacBio) or >2Mb (nanopore)
Can directly identify base modifications such as methylation
Disadvantages of 3rd Gen sequencing over Illumina?
Fewer reads per run than Illumina
More expensive per base
Individual reads have a high error rate (although consensus accuracy is good) (high error rate is inevitable as you are sequencing a single molecule at a time)
What are ZMWs in PacBio SMRT cells?
How many?
Zero mode waveguides are wells that cover the aluminium surface
150,000 wells
What is the ds-template DNA bound to and where?
DNA is bound to a DNA polymerase and a sequencing primer
Polymerase is immobilised at the bottom of a well; 1 polymerase per well
What does ZMW do to light used to excite the fluorescently labelled nucleotides?
What does this allow for?
Only allows the light to penetrate a small distance into the well, exciting a very small volume
This allows the signal from a single fluorescently labelled nucleotide to be detected