Lecture 07: DNA sequencing Flashcards

Question 1

Q

Human genome facts

Answer

A

Human genome length (nucleotides)?
- 3.6 Gb

Human genome length (metres)? ~2 m

Human genome mass?
- 0.000000000003 g

Question 2

Q

What can be sequenced?

Answer

A

Whole genome (de novo) sequencing (Re-sequencing)
Targeted (SNP, RAD, exome)
Single individual vs.
Multiple individuals (poolseq)
Multiple taxa (metabarcoding, metagenomics)
RNA → DNA (transcriptome)

Exome: composed of all the exons that remain after splicing -> all sections that potentially code for proteins

Question 3

Q

De-novo sequencing - vocab

Answer

A

Reads = original sequences, pc compares them and then organises them after similarities -> the longer the better for analysis
Contigs are just multiple reads strung together
Scaffold: how far the distances of the sequences are from each other

Question 4

Q

Notable acheivments

Answer

A

2008 first human genome sequencing through parallel DNA sequencing
2010, 185 low coverage human genomes, 697 exomes
2010 first Pleistocene human genome
2014, 48 bird genome assemblies

-> Conclusion: learn bioinformatics & programming because the data sets get bigger and bigger

Question 5

Q

Technology timeline

Answer

A

1975: Sanger sequencing
Automated Sanger sequencing
2005: “Next generation” Sequencing (NGS)

Question 6

Q

Sanger Sequencing

Answer

A

first DNA sequencing method and used for 30 years
Like most sequencing methods, it is template based
start with a single strand of DNA, which is produced by using a single primer
You set up four “sequencing reactions” wich contains:
DNA template
Primers
Nucleotides
A small proportion of one 32P-labelled dideoxy nucleotide (A,T,G or C)
The di-deoxy nucleotides stop extension of the DNA chain
different chains will be different lengths -> they can be separated by gel electrophoresis
Separate your four reactions on four lanes of a large gel
(Later 4 different fluorophores were used meaning you could run 1 sample per lane)

Question 7

Q

Pre- versus post-NGS

Answer

A

Sanger sequencing:
384 reads up to ~300,000 bp
Roche 454 sequencing (2005) :
300,000 reads up to 20,000,000 bp

Question 8

Q

Current sequencing technologies: Illumina

Answer

A

Sample prep
Bind DNA to flowcell, generate clusters
Sequencing by synthesis
Data analysis

Question 9

Q

Illumina sequencing more detailed

Answer

A

Ligation of adapters to each end of the DNA molecule
Single strands are coupled to glass slides, via adaptors
bridge amplification, “PCR colonies” or “polonies”/cluster
For subsequent sequencing, nucleotides are blocked, so no more than one can be incorporated per cycle
Four fluorescent dyes for each base allow detection via pictures

Question 10

Q

Illumina- Considerations

Answer

A

cluster density: Under-Clustered, Optimal clustered, Over- clustered
read lenght and qualtity: * High throughput
* High sequencing quality
* Limited read length (to some extent – up to 2 x 300 bp now possible)
Assembly is a problem

-> lowest cost per base, but full run cost $10.000

Advantages: high throughput and high sequencing quality, relatively cheap

Disadvantages: limited read length, quality declines with higher read lengths

Question 11

Q

Hi-C sequencing

Answer

A

Based on Illumina sequencing
Uses chromatin conformation information
Allows better scaffolding
Example: Chinese mitten crab

Question 12

Q

Newer technologies

Answer

A

Pac Bio
Oxford Nanopore
(Bionano)

Primary focus: increase read length →Improved genome assemblies

Question 13

Q

Pacific Biosystems (PacBio)

Answer

A

designing a library: circular template by ligating adapters on dsDNA
add primer and polymerase to the sample
SMRT- Cell with Zero Mode- Waveguides
each sample in one ZMW
with every labeld nucteotide incorporated by Pol. -> light is emitted

Real time sequencing

Question 14

Q

Pacific Bioscience

Answer

A

Strictly, single molecule reaction monitoring
No washing: cheap on reagents
No stop-and-go synthesis as in other systems
Recent upgrade to 8x more reads per run
Read length is up to ~40,000 bases
Initially high error rate
Highly competitive for **long reads

Question 15

Q

Pacific Bioscience HiFi

Answer

A

Strictly, single molecule reaction monitoring
No washing: cheap on reagents
No stop-and-go synthesis as in other systems
Repeated sequencing of circularized molecule
Read length is “only” ~15,000 bases on average
Very high accuracy
Excellent for de-novo assembly

Question 16

Q

Nanopore technology

Answer

A

Proposed and started in the early 1990’s in Santa Cruz and Harvard!
* Based on threading a single strand of DNA through a microscopic hole in a membrane
* Creating an electric field across the membrane causes the DNA to pass through
* Measuring the electrical properties of the hole (capacitance), should tell you which base is passing through it
* Resolution has proved to be a bit of a problem…but it is now also excellent – up to 99.9%

Question 17

Q

Oxford Nanopore principle

Answer

A

Array of microscaffolds
Each microscaffold supports a membrane and embedded nanopore.

Sensor chip
Each microscaffold corresponds to its own electrode that is connected to a channel in the sensor array chip.

ASIC
Each nanopore channel is controlled and measured individually by the bespoke ASIC. This allows for multiple nanopore experiments to be performed in parallel.

Question 18

Q

Oxford Nanopore in detail (Picture)

Question 19

Q

Some applications for Nanopore sequencing

Answer

A

Genome assembly
Detection of structural variants, e.g. long (> 1kb) tandem repeats
RNA seq analysis of splice isoforms
Real-time detecting of pathogens (e.g. Ebola in recent epidemic)

Question 20

Q

Oxford Nanopore sequencing summary

Answer

A

Extremely long reads (N50 of 50 kb, even > 4 Mb have been reported)
Directly accessing original DNA molecules
Capable of distinguishing modified bases directly
Comparatively cheap and fast, especially sample preparation
Small versions portable
Originally low accuracy, now also > 99%

Question 21

Q

Conclusion

Answer

A

DNA sequencing throughput has increased massively and continues doing so
NGS has transformed numerous fields of molecular biology
Data analysis and storage is the current bottleneck
New technologies focus on read length
Long read sequencing for de-novo assemblies, Illumina for re-sequencing

Brainscape's Knowledge GenomeTM

Lecture 07: DNA sequencing Flashcards

Brainscape's Knowledge Genome^TM