Term 2 Lecture 10: DNA sequencing and introduction to genomics Flashcards

1
Q

Why is sequencing a genome useful?

A

-confirms that you have the correct clone
- confirms structure e.g. open reading frame (ORF) is important for fusion proteins
- identifies mutations
- allows annotations, comparisons (alignment to wildtype reference) and prediction of function to be determined
- species identification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Historical perspective

A

1960s-70s DNA sequencing methods first developed - the Sanger dideoxy method was the most successful. This method is still used today although technology has changed a lot.

1980s-2000s Shot gun cloning strategies developed, based on Sanger dideoxy method and the first genomes were sequenced. A draft of the human genome was sequenced in 2000

2005-present Next Generation Sequencing (NGS) methods/ technologies developed, currently dominated by Illumina company

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sanger sequencing

A

Works by controlling termination of replication
-This is done by using dideoxyribonuclease triphosphates (ddNTPs) that are dNTPs without the hydroxide (OH) group which causes them to act as terminators
- initially it was all done by hand on the bench and involved pouring, loading and running 75cm long gels then reading them by eye
- these gels were made of polyacrylamide which has fine pores that allow you to separate out DNA fragments that vary by just one base

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sanger sequencing process

A

1) the piece of DNA that you want to sequence is separated to give ssDNA and a primer is used to initiate DNA synthesis - in the old fashioned method the primer was labelled with a radioactive label e.g. P32

2) 4 reactions are then set up in individual epindorf tubes, each reaction containing all the normal dNTP bases and each with one of the 4 ddNTPS ( ddATP,ddTTP,ddCTP,ddGTP)

3) DNA polymerase added, whenever a ddNTP is added instead of a dNTP the DNA molecule being coded can no longer extend giving a short DNA product.

4) this results in an array of different sized products terminated wherever a ddNTP has been incorporated instead of a dNTP

5) the contents of the tubes are then loaded onto lanes on the gel ( A, T, C and G lanes) smaller fragments travel further. The distance between each band is one bp (possible as polyacrylamide gel has small pores)

6) DNA sequence is read directly from the bottom of the gel upwards (shortest to longest so 5’ to 3’ direction.) The gel itself is radioactive so is exposed to x-ray film to give a readable autoradiogram

Key points:
-all normal dNTPs are present in all 4 tubes, you must work out the concentration of ddNTPs carefully so that dNTPs are in excess
- terminated fragments are expressed by size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sanger sequencing with fluorescent markers

A

Sanger sequencing is a very laborious process - using the original bench method sequencing a gene could take an entire PHD to complete.
Now you can sequence a gene in a 5 week 3rd year undergraduate project.
This is possible by replacing gel electrophoresis with capillary electrophoresis. Now everything is carried out in micro-titres (microplates) and liquid handling is carried out by robots in 48 well plates.
Fluorescent selection of fragment length is possible using fluorescently labelled ddNTPs.
A laser is shone through fragments and the detector creates a reading from the chromatogram.
Using this method 1000bp can be sequenced per day.

You can sequence from both ends and cover a whole gene in one run, it is rapid and costs only £3-4 (relatively cheap)

Primer binding sites either side of MCS can be used for sequencing too, sequencing a recombinant plasmid confirms that the correct plasmid has been cloned and there are no errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Shotgun sequencing

A

The human genome was sequenced using shotgun sequencing it took 13 years to complete the draft by 2000.
All the human DNA had to be fragmented, cloned into vectors, introduced to bacteria, plasmids isolated and then the inserts sequenced.
Resulting in millions of fragments that had to be aligned - as this was the first time the genome had been sequenced there was nothing to align it to and this is why it took so long to sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Next Generation Sequencing (NGS)

A

Genome sequencing is getting cheaper and faster - now you can sequence a genome and have it fully annotated in a matter of weeks and soon it will be possible in days and costs may drop to just $100 to sequence a genome.

Using the reference sequence created in 2000 shorter sequences of the genome in smaller fragments can be sequenced much more rapidly.
They can be aligned computationally very quickly using the reference genome.
So it can be aligned to the reference and checked to identify any differences.

What are the new technologies?

  • microfluids - using microscopic plates
  • high resolution optics
  • higher computing power
  • sequencing by DNA synthesis rather than by termination - following the synthesis reaction as its happening using nucleotides that give off a light signal each time they are incorporated. This signal can be identified according to which nucleotide has been added.

They are more complex processes but the basic rules of DNA synthesis still apply
- complementary DNA sequences base pair
- primers bind to template strands to initiate DNA synthesis
- DNA is synthesised 5’ to 3’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Illumina process

A

Developed by Illumina company - it is the most common method as every time a new technology comes out Illumina buys it and shuts it down.
Illumina works by using flow cells, tiny, thick glass slides with ridges that create channels/lanes for microfluidics.
The channels allow all the components for DNA synthesis to be washed through them. Each lane is randomly coated with a lawn of DNA oligo follicles that are complimentary to the end of the DNA fragments being sequenced.
The DNA fragments base pair with the oligos., there are two types of oligo the kind that DNA fragments bind to initially and the complimentary kind that they bind to secondarily in step 6.

Start by prepping a sample library:

1) genomic DNA is isolated and sheared into small pieces (approx. 100bp) by sonication
2) adaptors are added by blunt end ligation providing:
a) a region the sequence primer can bind to
b) a linker region that will attach/ hybridise the DNA fragments to the flow cell
3) DNA fragments are hybridised to the primer lawn
4) the flow cell oligo acts as a primer and is extended by DNA Pol 5’ to 3’
5) the original template is washed away
6) ss stranded molecule flops over and forms a bridge by hybridising to the adjacent complimentary primer
7) this 2nd primer is extended by DNA Pol synthesising a dsDNA fragment again
8) ds bridge is formed between the oligo pair
9) the ds bridge is denatured resulting in 2 copies of covalently bound ss DNA
10) bridge amplification is repeated until multiple bridges are formed
11) ds DNA bridges are all denatured
12) reverse strands are cleaved and washed away leaving a cluster with forward strands only - 10000 repeats of each fragment
13) sequencing primer is hybridised to the adaptor sequence present on all forward strands
14) add fluorescent
15) every time a fluorescent nucleotide is added you get a flash of light of the colour specific to that nucleotide

This process results in millions of ‘reads’ - short sequences of around 100bp.
These can be aligned rapidly to the reference genome using high power computing.
This is known as ‘massive parallel sequencing’ because the whole genome is being sequenced at the same time - all reads are sequenced on one flow cell.
As flow cells have multiple lanes upto 6 genomes can be sequenced at the same time by the Illumina Hi Seq machine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Recent application of human genome sequencing - cancer genomics (Feb 2020)

A

Signs of cancer can appear long before diagnosis. Research into genetic mutations suggests the possibility of tests that would detect cancer earlier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Third Generation Sequencing

A

The ideal DNA sequencing tool would have high throughput and be able to sequence long stretches of DNA
PacBio have developed Single Molecule Real-Time DNA sequencing (SMRT)
SMRT eavesdrops on an individual DNA Pol molecule in real time as it synthesises DNA from a template strand.
Theoretically the sequence length is limited only by the size of the DNA template (see Pierce 19.5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

RNA sequencing (RNA Seq) is more commonly used for everyday sequencing

A

-RNA Seq or mRNA transcript sequencing is now a very common way to characterise gene expression patterns
- it is replacing microarray analysis (as it gets cheaper)
- it is quantitative like qPCR but more powerful
- it is revealing the extent of post transcriptional regulation not previously observed e.g. the amount of alternate splicing that occurs is far more than we previously thought

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

RNA Seq works by:

A
  • isolating RNA from a sample
  • isolating mRNA from that
  • reverse transcribing it to produce cDNA
  • cDNA is then fragmented
    -adapters added to either end of cDNA
  • same method as for genome DNA used from this stage on e.g. if using Illumina: using flow cell oligos to create clusters and observing fluorescent markers of nucleotides as they are added

So every RNA transcript present in the cell is replicated and the array of short sequences is aligned to a reference.

However unlike genomic DNA, cDNA is aligned in a QUANTITATIVE WAY - the number of cDNA copies aligned to each gene show to what level each gene is expressed in the genome
^ this is known as transcriptomics
RNA Seq gives global quantitative data, from this a gene of interest can be identified and qPCR used to validate the results for individual transcripts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Alternate splicing can be detected by RNA Seq

A

On an RNA Seq readout the reference genome is located at the bottom of the page showing predicted open reading frames expected from a particular gene.
The height of the readings represents the number of transcripts produced.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Electronic Fluorescent Pictographs (eFP)

A

can be used to show differential gene expression in different tissues in different stages of development (see Toronto Uni database)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

DNA Metabarcoding research at DU

A

Adriano Welch at DU is researching predators that could help farmers to protect their cocoa plantations from insect pests using DNA metabarcoding.

Diet metabarcoding:
droppings collected from predators
Using a mitochondrial gene present in all organisms, amplified by PCR primers and NGF sequencing fragments
^ from this gene the ID of species consumed can be identified

By identifying which pests each species eat farmers can be informed as to what types of habitat they should protect/ create - it was found that a bat and bird specie consumed many pests and therefore recommended that trees should be planted for roosting spots.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly