Sorefan Flashcards

(54 cards)

1
Q

What is whole genome sequencing?

A
  • complete genome seq of organism at single time

- inc seq of chromosomal DNA and mito/chloro etc DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the challenges for genome sequencing?

A
  • NA extraction from cells –> needs high quality and conc
  • fragmentation
  • sub-fractionation size selection –> to isolate fragments of correct size
  • separating indiv molecules
  • amplification of signal
  • reading signal
  • data analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What were the 3 phases of human genome project?

A
  • genetic and physical maps of human and mouse, seq yeast and worm
  • -> technology dev
  • draft seq –> inc many gaps and errors
  • finished seq –> fill in gaps and correcting errors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How are genetic maps made?

A
  • analyse genetic distance between genes by measuring recombination freq
  • markers rely on variation of seq between parents and individuals
  • distance measured in centimorgans
  • mostly PCR based, eg. polymorphisms in genes and DNA markers
  • linkage map by looking at relative distances of 2 or more polymorphic genes and measuring RFs
  • DNA markers superseded phenotypic markers
  • DNA based mol markers could be RFLPs
  • -> methods to analyse are slow so moved onto using SSLPs as easy to analyse w/ PCR
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are SSLPs?

A
  • simple seq length polymophisms
  • repeat regions in genome that vary in length between pops
  • usually mini and microsatellite seqs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are minisatellites?

A
  • repeat units up to 25bp
  • not spread evenly around genome, mostly at telomeric regions
  • several kb long
  • difficult to PCR
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are microsatellites?

A
  • usually di or trinucleotide repeats
  • few 100 bases long
  • easy to PCR
  • 650,000 in genome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why are genetic maps in humans limited?

A
  • large pops of siblings don’t exist, so limited no. recombination events to study
  • recombination events not at random genome positions –. recombination hotspots
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are physical maps created?

A
  • restriction mapping locates relative positions on DNA molecule of recognition seqs for for REs
  • FISH = map marker locations by hybridising probe containing marker to intact chromosomes
  • STS = map positions of short seqs by PCR
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the advantages of creating BAC libraries from indiv chromosomes?

A
  • BAC clone library can be used to seq genome

- BACs w/ inserts from each chromosome could be shared across consortium

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is genome sequencing carried out clone by clone?

A
  • extract DNA
  • fragment DNA
  • -> ideally completely random so no parts missed out
  • -> by physical methods = sonication, hydrodynamic shearing, restriction enzymes and transposase
  • -> by chemical methods (mostly used to fragment RNA) = heat and divalent cation (Zn and Mg)
  • size selection –> gel electrophoesis
  • clone 100-200kbp fragments into BAC plasmids to create library
  • transformation of bacteria for BACs
  • pick indiv colonies and extract vector (each tube has many copies of indiv DNA insert)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How are clones positioned on genetic and physical maps?

A
  • test clones for PCR markers w/ known locations
  • BAC end sequencing using Sanger
  • -> known seq so can design primer
  • -> denature vector and Sanger seq
  • -> design primer to reverse strand to seq other direction
  • -> end seqs from same insert, so are paired end read
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why are paired end read useful?

A
  • can physically link 1 end of seq w/ another, so can be used to resolve seq gaps
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How is it decided which BAC has insert next to insert of interest?

A
  • gen contiguous set of clones
  • if any of BACs inc end seq, then insert they contain must be next to it
  • test BAC library for end seq from desired vector by PCR
  • repeated over and over again until all BACs placed in order on each chromosome
  • created contig
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why was shotgun seq of BAC clones needed?

A
  • as BAC end seq leaves most of middle of genome insert to seq
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How was shotgun seq BAC clones carried out?

A
  • each BAC clone broken up into 5-10kb fragments
  • cloned into diff vector that accepts smaller inserts
  • if seq lots of paired end seqs can assemble large fragment (=consensus seq)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How did Celera seq human genome?

A
  • fragmented genome into 2-50kbp fragments
  • cloned 2, 10 and 50kbp fragments into plasmids to create library
  • assemble reads to create consensus seq and seq contigs
  • draft genome had 98% bases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Why did the IHGP use clone by clone instead of whole genome shotgun seq?

A
  • to prove feasible for complex repeat rich genome
  • assembly easier and could be performed confidently
  • could target gaps for finishing
  • better suited to diverse international consortium
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What needed to be done to finish the human genome?

A
  • fill in sequencing gaps and physical gaps
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Why were gaps present in human genome, and how could these problems be solved?

A
  • cloning bias
  • no restriction sites –> use diff RE, use physical or chem fragmentation method
  • insert unstable –> use diff vector
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How were seq gaps closed?

A
  • paired end seqs align to either side of gap
  • if gap < 1kbp = PCR across gap
  • if gap > 1kbp = sequential seq along insert
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How were physical gaps closed if know order of scaffolds?

A
  • if gap region absent from all gene libraries
  • PCR used to amplify genomic DNA spanning gaps and amplified DNA seq directly w/ or w/o cloning into vector
  • PCR products over 3kbp hard to amplify
23
Q

How were physical gaps closed if don’t know order of scaffolds?

A

How do we know which pairs of primers are adj and will give product?

  • try every poss and look for PCR reaction products using gDNA as template
  • in singleplex PCR reaction each combo of primers tested w/ genomic DNA as template
  • process sped up w/ multiplex PCR, as multiple pairs of primers tested in single PCR tube, so fewer reactions need to be performed
  • use algorithm to decide min no. primer combos
24
Q

Where are repetitive seqs found in genome?

A
  • approx 45% of genome
  • mini and microsatellites
  • centromeres
  • telomeres
  • transposons
  • duplicated genes
25
What are the problems w/ repetitive seqs?
- hard to assemble and usually resolved last - many poor quality never get these regions resolved - better if had tech to seq v long fragments that span repetitive seq
26
How can repetitive seqs cause rearrangements?
- if break up seq and reassemble can get shorter as don't know original order
27
How can repetitive seqs confuse computer, and how can this be solved?
- tandem repeats can cause truncations --> computer can't be sure where to map reads to - worse if repetitive seqs on diff chromosomes - seq across whole repetitive region to identify flanking seqs that are unique (IGHL couldn't do this as not poss at time) - anchor flanking seq regions to known positions of genome --> need genetic and physical maps
28
What factors influence the species chosen for genome projects?
- genetic model - commercial/medical relevance - genome size --> small easy, so bacteria most common
29
Why was NGS needed?
- Sanger and Celera slow, expensive, cloning bias, low coverage and 1 seq at time
30
How was NGS initially dev?
- 1st was 454 sequencing dev by Roche - Applied biosystems dev competing tech called SOLiD - both replaced by Solexa
31
How does NGS compare to Sanger seq?
- NGS involves fragmenting genome and seq all fragments or in parallel - no plasmid cloning req and cheap - NGS still has similar challenges --> 1 key challenge is how to separate indiv seqs as amplify insert so machine can measure signal
32
How is Illumina library prep carried out?
- need to create library of gDNA inserts flanked by adaptors of known seq - fragment gDNA using physical/enzymic/chemical methods - size selected on gel - end repair DNA so has blunt ends - -> DNA pol I fills in 5' overhang - -> exonuclease removes 3' overhang - add A tail so DNA ends not compatible and DNA cant concatermerise - adaptors ligated to ends of fragments using DNA ligase (adaptors are ds oligonucleotides w/ T overhang) - adaptor seqs elongated by PCR using extended oligonucleotide primers that inc unique 6 base index seq - allows amplification and quantification of library
33
What are the functions of adaptor seqs?
- provide priming sites for PCR amplification - allow index seqs to be added - priming sites for bridge amplification - priming sites for seq
34
What are indexes in Illumina indexed libraries, and why are these included?
- unique 6 base codes that identify each sample | - can distinguish diff samples, so multiple samples can run at same time
35
Why is Illumina bridge amplification necessary?
- Illumina sequencer not sensitive enough to measure signal from 1 DNA molecule - so bridge amplification used to prod localised clusters of ≈1000 identical molecules on glass dide
36
How is Illumina bridge amplification carried out?
- after library prep, library diluted and denatured, ss library washed onto flow cell w/ lawn of oligonucleotides complementary to adaptor seqs - indiv molecules hybridise to oligos on flow cell - fragment strand then bends over, hybridising to complementary oligo on flow cell forming bridge - pol used to create complementary copy of fragment strand - original strand then washed away - repeated to create cluster of identical molecules at discrete location - next stage = seq fragments using fluorescently labelled nucleotides
37
Why is there an optimum length of fragments for bridge amplification?
- want 2 initial molecules to be far apart so don't coalesce, so can be spread effectively - done by washing over at low conc - long arms mean could reach quicker and coalesce
38
How is Illumina sequencing by synthesis carried out?
- sequencing primer hybridised - pol and 4 nts added - fluorophores at each cluster read by lasers - cleave fluorophore and unblock nt - wash - repeat (until achieve desired read length) - index seq primer hybridised
39
Why are Illumina sequencers so expensive?
- optics, lasers and cameras req
40
What are the adv and disadv of Illumina seq by synthesis?
- enormous output - accurate but not as accurate as Sanger - all nts can be added simultaneously - relatively slow - short read lengths - sample needs to be amplified --> PCR bias as doesn't amplify GC rich seq efficiently - ligation of adaptors biased --> ligases prefer certain seqs
41
What would the features of a better sequencing machine be?
- single molecule seq w/o amplification - continuous reads (no stop starting) - v long reads - solid state e-s - cheap - small
42
What are some examples of 3rd gen seq?
- Helicos (now obsolete) - Pacific Biosciences (PacBio) - Oxford Nanopore - NABsys
43
What are the advantages of PacBio?
- v long reads - 1 mil reads - v high accuracy - shortest run time - least GC bias (can easily seq through v high/low GC content) - no amplification bias - discover broad spectrum of DNA base mods
44
How is PacBio library prep carried out?
- prod genomic library of fragments - fragment DNA - repair DNA damage and ends A tailed - ligate SMRTBell adaptors - anneal seq primer to SMRTBell templates - -> complementary, so adaptor partially ds and partially ss - -> t overhand, complementary to A tail - streptavidin tagged pol and primer bound to SMRTBell adaptor cloned insert - sequence
45
Is PacBio library prep diff to other library preps?
- similar | - but genomic fragments ≈10kb and SMRTBell adaptors used
46
How is SMRT (single molecule real time) sequencing carried out?
- add diff fluorescent label to each type of nucleobase but attach it to terminal phosphate released during polymerisation - measure fluorescence each time new base added, decays away when fluorescent tag released - use zero mode waveguide chambers to improve detection from tiny signals - read lengths prod of ≈500-3200 bases
47
How is Nanopore library gen?
- rapid - DNA isolated using beads that bind DNA - transposase complex used to simultaneously cut and ligate 1st adaptor - then 2nd 1D adaptors and motor proteins added - amplification nt req as device can read single molecules - 2 diff types adaptor can be added - 1D adaptors allow 1 strand to be seq - 2D adaptors allow both strands to be seq
48
What does nanopore tech involve?
- protein nanopore - -> heptameric protein α-hemolysin - -> separated from bacteria allowing low cost and robust nanopores - -> pore embedded into synthetic membrane w/ high electrical resistance - -> 512 pores per minion cell - synthetic polymer membrane
49
What occurs during 1D sequencing?
- DNA attached to pore and motor protein controls DNA translocation speed through pore - 1 strand seq and other strand discarded
50
What occurs during 2D sequencing w/ hairpin adaptor?
- hairpin adaptor allows seq of both strands - 1st strand seq then adaptor unwound and seq - opp strand seq - opp strand seq complementary to 1st strand, used to correct errors in seq to create '2 direction read' - not same as paired end seq
51
What are the adv of Nanopore seq?
- no amplification --> decreases artifacts from PCR - rapid - long reads --> simplifies assembly - solid state electronics --> cheaper and more reliable machines - portable - versatile --> can be changed to measure RNA, proteins or other compounds
52
What are the applications of NGS?
Research tools: - de novo genome seq - re seq genome and comparing to reference genome - seq transcript - methylation of DNA - seq small RNAs - protein binding sites Clinical apps: - diagnosis - biomarkers - prenatal testing
53
How can NGS be used as a research tool?
- seq genome of species - cataloguing variation between individuals in species - characterising differences between cells w/in individuals - describing underlying cellular mechanisms
54
What is NGS now used in clinic for?
- personalised medicine