Lecture 07: DNA sequencing Flashcards

1
Q

Human genome facts

A

Human genome length (nucleotides)?
- 3.6 Gb

Human genome length (metres)? ~2 m

Human genome mass?
- 0.000000000003 g

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What can be sequenced?

A
  • Whole genome (de novo) sequencing (Re-sequencing)
  • Targeted (SNP, RAD, exome)
  • Single individual vs.
  • Multiple individuals (poolseq)
  • Multiple taxa (metabarcoding, metagenomics)
  • RNA → DNA (transcriptome)

Exome: composed of all the exons that remain after splicing -> all sections that potentially code for proteins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

De-novo sequencing - vocab

A
  • Reads = original sequences, pc compares them and then organises them after similarities -> the longer the better for analysis
  • Contigs are just multiple reads strung together
  • Scaffold: how far the distances of the sequences are from each other
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Notable acheivments

A
  • 2008 first human genome sequencing through parallel DNA sequencing
  • 2010, 185 low coverage human genomes, 697 exomes
  • 2010 first Pleistocene human genome
  • 2014, 48 bird genome assemblies

-> Conclusion: learn bioinformatics & programming because the data sets get bigger and bigger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Technology timeline

A
  1. 1975: Sanger sequencing
  2. Automated Sanger sequencing
  3. 2005: “Next generation” Sequencing (NGS)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sanger Sequencing

A
  • first DNA sequencing method and used for 30 years
  • Like most sequencing methods, it is template based
  • start with a single strand of DNA, which is produced by using a single primer
  • You set up four “sequencing reactions” wich contains:
  • DNA template
  • Primers
  • Nucleotides
  • A small proportion of one 32P-labelled dideoxy nucleotide (A,T,G or C)
  • The di-deoxy nucleotides stop extension of the DNA chain
  • different chains will be different lengths -> they can be separated by gel electrophoresis
  • Separate your four reactions on four lanes of a large gel
    (Later 4 different fluorophores were used meaning you could run 1 sample per lane)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Pre- versus post-NGS

A
  • Sanger sequencing:
    384 reads up to ~300,000 bp
  • Roche 454 sequencing (2005) :
    300,000 reads up to 20,000,000 bp
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Current sequencing technologies: Illumina

A
  1. Sample prep
  2. Bind DNA to flowcell, generate clusters
  3. Sequencing by synthesis
  4. Data analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Illumina sequencing more detailed

A
  • Ligation of adapters to each end of the DNA molecule
  • Single strands are coupled to glass slides, via adaptors
  • bridge amplification, “PCR colonies” or “polonies”/cluster
  • For subsequent sequencing, nucleotides are blocked, so no more than one can be incorporated per cycle
  • Four fluorescent dyes for each base allow detection via pictures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Illumina- Considerations

A
  1. cluster density: Under-Clustered, Optimal clustered, Over- clustered
  2. read lenght and qualtity: * High throughput
    * High sequencing quality
    * Limited read length (to some extent – up to 2 x 300 bp now possible)
  3. Assembly is a problem

-> lowest cost per base, but full run cost $10.000

Advantages: high throughput and high sequencing quality, relatively cheap

Disadvantages: limited read length, quality declines with higher read lengths

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Hi-C sequencing

A
  • Based on Illumina sequencing
  • Uses chromatin conformation information
  • Allows better scaffolding
    Example: Chinese mitten crab
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Newer technologies

A
  • Pac Bio
  • Oxford Nanopore
  • (Bionano)

Primary focus: increase read length →Improved genome assemblies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Pacific Biosystems (PacBio)

A
  1. designing a library: circular template by ligating adapters on dsDNA
  2. add primer and polymerase to the sample
  3. SMRT- Cell with Zero Mode- Waveguides
  4. each sample in one ZMW
  5. with every labeld nucteotide incorporated by Pol. -> light is emitted

Real time sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Pacific Bioscience

A
  • Strictly, single molecule reaction monitoring
  • No washing: cheap on reagents
  • No stop-and-go synthesis as in other systems
  • Recent upgrade to 8x more reads per run
  • Read length is up to ~40,000 bases
  • Initially high error rate
  • Highly competitive for **long reads
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Pacific Bioscience HiFi

A
  • Strictly, single molecule reaction monitoring
  • No washing: cheap on reagents
  • No stop-and-go synthesis as in other systems
  • Repeated sequencing of circularized molecule
  • Read length is “only” ~15,000 bases on average
  • Very high accuracy
  • Excellent for de-novo assembly
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Nanopore technology

A

Proposed and started in the early 1990’s in Santa Cruz and Harvard!
* Based on threading a single strand of DNA through a microscopic hole in a membrane
* Creating an electric field across the membrane causes the DNA to pass through
* Measuring the electrical properties of the hole (capacitance), should tell you which base is passing through it
* Resolution has proved to be a bit of a problem…but it is now also excellent – up to 99.9%

17
Q

Oxford Nanopore principle

A

Array of microscaffolds
Each microscaffold supports a membrane and embedded nanopore.

Sensor chip
Each microscaffold corresponds to its own electrode that is connected to a channel in the sensor array chip.

ASIC
Each nanopore channel is controlled and measured individually by the bespoke ASIC. This allows for multiple nanopore experiments to be performed in parallel.

18
Q

Oxford Nanopore in detail (Picture)

A
19
Q

Some applications for Nanopore sequencing

A
  • Genome assembly
  • Detection of structural variants, e.g. long (> 1kb) tandem repeats
  • RNA seq analysis of splice isoforms
  • Real-time detecting of pathogens (e.g. Ebola in recent epidemic)
20
Q

Oxford Nanopore sequencing summary

A
  • Extremely long reads (N50 of 50 kb, even > 4 Mb have been reported)
  • Directly accessing original DNA molecules
  • Capable of distinguishing modified bases directly
  • Comparatively cheap and fast, especially sample preparation
  • Small versions portable
  • Originally low accuracy, now also > 99%
21
Q

Conclusion

A
  • DNA sequencing throughput has increased massively and continues doing so
  • NGS has transformed numerous fields of molecular biology
  • Data analysis and storage is the current bottleneck
  • New technologies focus on read length
  • Long read sequencing for de-novo assemblies, Illumina for re-sequencing