Sequencing genomes, NGS and bioinformatics Flashcards

1
Q

Describe chromosome maps

A
  • different types of map have different resolutions
  • ## lower resolution maps can generate higher resolution maps
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

cM

A

proportional to percentage recombination in a single generation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe a karyotypic map

A

microscopic observation of chromosomal spreads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe linkage maps

A
  • genetic maps derived from monitoring recombination between markers
  • cM units
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe physical maps

A
  • measured in bp
  • tiling path of overlapping BAC clones
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe a sequence map

A

sequence of bases along the chromosome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe hierarchical genome sequencing

A
  • target genome cloned into highly redundant BAC vector library
  • creates contigs
  • indentify minimal set of overlapping clones by restriction mapping and hybridisation
  • shotgun
  • sequencing and assembly
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

contig

A
  • tiling path of BACs
  • approximately 100kb each
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

plasmid subclones

A

approximately 2kb each

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe shotgun in hierarchical genome sequencing

A

fragment BAC and subclone pieces into plasmid vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe sequencing and assembly in hierarchical genome sequencing

A
  • compile sequences of individual overlapping plasmid subclones to produce sequence of entire BAC
  • compile sequences of individual overlapping BAC clones to produce sequence of entire chromosome / genome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe shotgun genome sequencing

A
  • fragment entire genome and clone pieces
    directly into plasmid vector
  • forms plasmid cones
  • sequencing of plasmid cones at random
  • computational assembly
  • individual reads & sequence contigs not anchored
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe NGS

A

enable “massively parallel sequencing”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

NGS

A

next generation sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is massively parallel sequencing

A

analysis of millions of fragments from a single sample in parallel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe ‘454’ pyrosequencing

A
  • pyrophosphate (PPi) released upon nucleotide incorporation by DNA polymerase
  • PPi used to fuel a downstream set of reactions that ultimately produces light
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How does pyrosequencing produce light

A

action of luciferase on luciferin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Describe library preparation for ‘454’ pyrosequencing

A
  • shear genomic DNA to 300-800bp fragments
  • ligate oligonucleotide adapters
  • amplify fragments by PCR
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Describe ‘454’ pyrosequencing emulsion PCR

A
  • anneal DNA fragments to an excess of agarose beads that have oligonucleotides complementary to the A/B adaptors attached to them
  • 1 fragment per bead
  • disperse beads and PCR reagents in oil to form an emulsion
  • each water droplet carries a single bead
  • PCR amplifies the unique sequence on the surface of each bead
  • release beads
  • add beads to a sequencing plate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Describe the functioning of water droplets in emulsion PCRs

A

each droplet functions as a discrete microreactor, eliminating cross-talk during PCR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does emulsion PCR produce?

A

millions of copies of an identical sequence on each of hundreds of thousands of beads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Describe the sequencing plate of emulsion PCR

A

1.6 million wells

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Describe the pyrosequencing element of ‘454’ pyrosequencing

A
  • smaller enzyme beads added to each well to surround the DNA-carrying beads
  • sequencing primer, DNA polymerase, APS and luciferin added
  • different dNTPs added sequentially to the wells in repeated cycles
  • nucleotide incorporation results in light emission
  • light intensity recorded
  • CCD camera identifies which wells have incorporated a new nucleotide, producing a signal image
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

APS

A

adenosine 5’ phosphosulphate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Describe the error of the pyrosequencing method

A
  • 2 or more consecutive bases of the same generate proportionally greater intensity that is difficult to measure
  • homopolymers errors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Describe the pyrophosphate reaction

A
  • APS releases PPi by sulfurylase to produce ATP
  • ATP reacts with luciferin, catalysed by luciferase, to produce light + oxy luciferin
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Describe Illumina sequencing

A
  • reversible terminator sequencing
  • bridge amplification
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Compare Illumina and Sanger sequencing

A
  • similar in principle
  • Illumina can remove terminator
29
Q

Describe the library preparation for Illumina sequencing

A
  • fragment DNA
  • repair ends
  • add A overhang
  • ligate adaptors
  • select ligated DNA
30
Q

Compare Illumina and 454 sequencing

A

bridge amplification functionally analogous to emulsion PCR

31
Q

Describe bridge amplification in Illumina sequencing

A
  • flow cell surface covered with a lawn of attached oligo that are complementary to adaptors
  • library fragments bound to the flow cell by hybridisation to the oligo
  • PCR amplification uses only the bound oligos as primers
  • denature and wash original strand away
  • denature clusters and cleave to wash away reverse strands
  • ready for sequence
32
Q

What is the effect of using only the bound oligos as primers

A

constrains the distribution of the products, producing clusters (‘colonies’) of numerous, single-stranded identical template fragments that form bridges

33
Q

polony

A

polymerase-generated colony

34
Q

Describe the sequencing element of Illumina sequencing

A
  • clusters supplied with polymerase and all 4 nucleotides, each tagged with a different fluor
  • because the nucleotides have their 3’OH chemically blocked, only one is incorporated per cycle
  • i.e. first base is extended
  • after each incorporation cycle, cell is imaged to identify the new nucleotide incorporated at each cluster
  • chemical step removes the fluorescent tag and the 3’ block
  • generates base calls
35
Q

base calls

A

image clusters after each cycle

36
Q

Sanger sequencing aka

A

chain termination

37
Q

Summarise Sanger sequencing

A
  • 400-900bp read length
  • 99.999% accurate
  • 96 reads per run
  • takes 20mins-3hrs
  • $5000 per million bases
38
Q

Summarises 454 pyrosequencing

A
  • 700bp read length
  • > 99% accuracy
  • 1 million reads per run
  • takes 24hrs
  • $10 per million bases
39
Q

Summarise Illumina sequencing

A
  • 150-300bp read length
  • > 99% accuracy
  • billions of reads per run
  • 1 to 10 days per run, depending on the sequencer
  • much less than $0.5 per million bases
40
Q

Illumina sequencing aka

A

sequencing by synthesis

41
Q

Advantages of Sanger sequencing

A
  • long, accurate individual reads
  • cost effective for very small projects
42
Q

Disadvantages for Sanger sequencing

A
  • low throughput
  • expensive and impractical for large projects
43
Q

Advantages for 454 pyrosequencing

A
  • long read size
  • fast
44
Q

Disadvantages for 454 pyrosequencing

A
  • expensive
  • homopolymers errors
45
Q

Advantages for Illumina sequencing

A

extremely high sequence yield

46
Q

Disadvantages for Illumina sequencing

A
  • short reads difficult to assemble
  • equipment very expensive
47
Q

What are the second generation technologies

A
  • 454 pyrosequencing
  • Illumina
48
Q

Describe second generation technologies

A

rely on the parallel, phased sequencing of huge numbers of identical fragments to generate detectable signal

49
Q

Describe third generation technologies

A
  • can sequence single molecules
  • dephasing not an issue; much longer reads can be achieved
50
Q

List some third generation technologies

A
  • Pacific Biosciences: SMRT sequencing
  • Oxford Nanopore Technology: MinION, PreomethION
51
Q

SMRT sequencing

A

single molecule real-time

52
Q

Describe the characteristics of third generation sequencing

A
  • up to 100kb read length
  • high error rate
53
Q

Describe RNA-Seq

A

provides information on the transcriptome; which genes are expressed, relative transcript levels

54
Q

Summarise shotgun

A
  • sequence genome
  • computational assembly
55
Q

Summarise hierarchical shotgun

A
  • create BAC clone map
  • sequence BACs
  • computational assembly
56
Q

Summarise cDNA sequencing

A
  • extract mRNA
  • generate and sequence cDNAs
  • computational assembly
57
Q

Summarise resequencing

A
  • sequence genome
  • genome alignment
58
Q

Describe genome alignment

A
  • align to previously derived sequence
  • aids assembly
59
Q

What are the advantages of resequencing?

A

detect polymorphisms and mutations in different individuals, strains and mutants

60
Q

Describe sequence annotation

A
  • origin
  • background information
  • important regions of the sequence
  • links to protein sequence, and other information
  • can contain erros (dependent on researchers for entry)
61
Q

What is important origin data

A

species, variety/strain, tissue, cell line, clone, etc.

62
Q

What is important background information for annotation

A

literature, researcher, etc.

63
Q

What is important background information for annotation

A

literature, researcher, etc.

64
Q

What are important regions of the sequence to annotate?

A

promoter, introns, coding sequence, motifs, etc.

65
Q

How to annotate

A
  • identify ORFs
  • search databases using ORFs as queries
66
Q

What will databases supply on searching for ORFs as queries

A
  • related genes (potentially of known function)
  • conserved functional domains or motifs
  • protein targeting sequences, TMDs, etc.
67
Q

Finding related genes

A
  • BLAST
68
Q

BLAST

A

Basic Local Alignment Search Tool

69
Q

BLASTN

A

nucleotide query sequence searched against a nucleotide database