Human genome sequencing. Flashcards

(186 cards)

1
Q

What is the definition of whole genome sequencing?

A

The complete genome sequence at the same time, including nuclear, mitochondrial and chloroplast DNA (where applicable.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Whole genome sequencing is different from DNA profiling, what is DNA profiling?

A

DNA profiling determines the likelihood that genetic information comes from an individual or a group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How long did it take to sequence the first genome?

A

13 years.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How much did it cost to sequence the first genome?

A

$ 3.8 billion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The first genome took 13 years to sequence, it is now possible to sequence hundreds in a matter of weeks. How much would this now cost?

A

$300,000.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What would the perfect sequencer result in?

A

Instantaneous and unprocessed samples bing produced.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are 6 challenges that genome sequencing still faces (both NGS and first generation) ?

A
  1. Nucleic acid extraction.
  2. Sub fractional size selection.
  3. Separation of molecules into individual positions.
  4. Amplification of the signal.
  5. Reading the signal.
  6. Data analysis.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Genome sequencing still faces multiple problems, what one of these problems has almost been resolved?

A

Extraction of the nucleic acid.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What was the first method used to sequence the genome/

A

Sanger sequencing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sanger sequencing used to be able to sequence 300bp at a time, how many can it sequence now?

A

1000bp. This however still isn’t a whole chromosome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Most aspects of genome sequencing are considerably cheaper now, what aspect is still expensive?

A

Sanger sequencing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In what year was the human genome project started?

A

1990.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What did the first phase of the human genome project involve?

A

The creation of genetic and physical maps of human and mice.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What two organisms were sequenced at the start of the human genome project and what sizes were their genomes?

A

The worm (100mb) and yeast (12mb).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When was the first draft human genome sequence created?

A

1997-2000.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The first draft of the human genome was mostly correct, however what did it contain?

A

Many gaps and errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

When a genome has a certain amount of errors it is classed as complete. True or false?

A

False, there is no distinct limit. Most genomes are not as complete as the human genome, however there are some which are more complete.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How many countries and US labs were involved in the human genome project?

A

18 countries and US 200 labs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Where was most of the human genome sequenced?

A

In the welcome trust in the UK.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a genetic map?

A

The order of genetic mapping markers and the genetic difference between them based on recombination frequency. Distance measured in centimorgans.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What do genetic maps rely on?

A

Sequence variation between parents and individuals of 300bp.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are genetic maps mostly based on?

A

PCR to determine restriction fragment length polymorphisms, mini satellites and micro satellites.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a physical map?

A

The actual location of DNA sequences in a genome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Is a physical or genetic map more useful?

A

Physical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What are the four steps of genome sequencing through the clone-by-clone method?
1. Extract DNA. 2. Fragmentation of DNA. 3. Size selection of DNA. 4. Cloning of 100-200 Kbp fragments into BACS, YACS or PACs to create a genome library.
26
What are the three main methods of fragmenting DNA?
Physical, enzymatic and chemical.
27
What are two examples of physical methods to fragment DNA?
Sanitation and nebulisation (hydrodynamic sheering).
28
What do enzymatic methods of DNA fragmentation involve?
Restriction enzymes and transposases.
29
What are two examples of chemical methods used for DNA fragmentation?
Heat and divalent cations such as Zn2+ and Mg2+.
30
What methods of fragmentation are most often used with mRNA?
Chemical.
31
What do YACS include?
Yeast centromere, telomere and linear insert.
32
What was the first method used for the creation of a gene library in human genome sequencing and what was the disadvantage of this method?
YACS which allowed for recombination with other parts of the yeast genome.
33
What method of genome fragmentation create preferred random fragments?
Physical.
34
What does fragmentation with restriction enzymes result in?
An incomplete digest with unevenly distributed fragments.
35
Although restriction enzymes result in unevenly distributed fragments what beneficial feature do they also result in?
Fragments with undamaged sticky ends.
36
Inserts can be up to 2000bp. How many base pairs can Sanger sequencing accurately sequence?
500-1000bp.
37
What does a clone contain stand for?
A continuous set of clones.
38
What can a super contig also be called?
A scaffold.
39
Once you have seen if your fragments contain the markers what can you do?
Design primers to sequence the DNA.
40
If A BAC contains the end sequence of another BAC what can you assume?
That they are found next to each other.
41
Can super contains include gaps?
Yes.
42
What are the main steps of shotgun sequencing BAC clones?
1. Retrive 40-1000 kb of DNA from BAC clone. 2. Break up DNA into 5-10kb fragment. 3. Use universal primers to sequence insert.
43
Why did the human genome project not 'walk along' the DNA with primers when shotgun sequencing the bas clones?
As it is a very expensive and time consuming method. Can only do 1000bp a day.
44
What do you need to sequence to assemble large fragments in shotgun sequencing of BACS?
Lots of paired sequences. You can not sequence the middle section however.
45
When the human genome project was originally completed what coverage of the human genome was originally desired?
5 times coverage.
46
When the human genome was first sequenced 5 times coverage was aimed for. What coverage is now aimed for?
30 times coverage.
47
What does BAC need sequencing by sanger sequencing produce?
Super contigs and scaffolds.
48
What order is correct? BACS, sanger, contigs, shotgun. BACS, shotgun, contigs, sanger.
BACS, sanger, contigs, shotgun.
49
What profitable organisation wanted to patent the human genome?
Celera genomics.
50
Who was responsible for wanting to patent the human genome?
Craig Venter.
51
What percentage of the human genome was completed before Celera genomics decided to also try and complete the human genome sequence?
5%-10%.
52
What was Celera Genomics approach to completing the human genome?
Size select the DNA and then clone 10, 20 and 50 Kbp fragments to create a library. These were assembled into a consensus sequence and into contigs which could be sequences automatically by AB13700.
53
What fold coverage did Celera Genomics aim for?
5 Fold.
54
Why were gas found in the human genome when it was originally sequenced?
Cloning bias from restriction sites not being evenly distributed. Fragments were also not evenly inserted.
55
What techniques were used to minimise gaps in the human genome?
The use of different restriction enzymes and different physical and chemical fragmentation methods.
56
Why in some rare occasions were DNA inserts unstable in E.coli?
The insert could contain a lethal gene to E.coli.
57
For an unknown reason two types of vectors worked better in sequencing the human genome. What were these two?
BACS and PACS. YACS did not work as well.
58
What were the two types of gaps that were present in the draft human genome sequence copy and what were the difference between these?
Sequencing gaps have the sequence present. Physical gaps the sequence is absent from the gene libraries.
59
How would you close a sequencing gap?
Design a new sequencing primer based on end sequences and Sanger sequence from both ends. Fragment can be larger than 1kb but the whole process is very slow.
60
What do you know regarding physical gaps?
The order of the scaffolds but not the sequence in-between.
61
How would you close a physical gap?
1. Amplify the DNA between the gap by PCR. 2. This DNA can be further amplified though clones as PCR can only amplify 3kb. 3. Sequence the PCR product or clone into a plasmid vector and end sequence these. Will be able to determine what fragments are by the end sequences.
62
If you do not know the order of the scaffolds how can you determine what primers to use to fill any gaps?
Try EVERY possibility of primers and look for a PCR reaction product using genomic DNA as the template. An algorithm can then be used to determine the minimum combination in which primers are adjacent.
63
How much of the human genome is made of repetitive DNA sequences?
45%-50% of the human genome.
64
What are 6 examples of repetitive DNA sequences found in the human genome?
1. Minisatellites. 2. Microsatellites. 3. Centromeres. 4. Telomeres. 5. Transposons. 6. Duplicated genes.
65
What type of relative DNA sequence can be a problem in genome sequencing?
Duplicated genes.
66
What part of the genome is hard to assemble meaning it is the last part to be resolved?
The repetitive parts.
67
What do many poor quality genomes never get resolved and why?
Repetitive parts as they are expensive to resolve and rarely contain genes.
68
What two things can repetitive sequences cause?
Truncations and rearrangements.
69
What types of repeats often cause truncations?
Tandem repeats.
70
What free factors influenced the species chosen to have their genomes sequenced?
1. If they were genetic models. 2. Their commercial and medical relevance. 3. Their genome density and genome size.
71
What is the scientific name for a mouse?
M. musculus.
72
What four organisms were sequenced due to their commercial and medical relevance?
1. H. sapiens. 2. Oryza Sativa (rice). 3. P. falciparum. 4. Haemophilus influenza (representative bacterial disease.)
73
Arabidopsis thaliana has the smallest plant genome at 100mb. What is it?
Cress.
74
What two organisms have not been sequenced despite their massive commercial use due to the fact that their genomes are just too big?
Maize ( 4.8 Gb) and Wheat (17 Gb).
75
Wheat has not has its genome sequenced because it is far too big. What other reason has prevented its genome being sequenced?
It contains multiple repeats.
76
The human genome project used the clone by clone approach to sequence the human genome. What method would have been a better choice?
Whole genome shotgun sequencing.
77
What were some of the reasons that the human genome project chose to use the clone by clone method to sequence the genome rather than whole genome shotgun sequencing like Celera did?
1. Easier to prove it was feasible to sceptics. 2. Less risk adverse, which was important as it was government funded. 3. Assembly easier. 4. Could target gaps allowing it to be finished. 5. Better suited for diverse nation consortium.
78
What department donated some of their labs to allow the human genome project to be completed?
The department of energy.
79
In what year was the first 24bp sequence published?
1973.
80
When was the first Sanger sequence method published?
1977.
81
When was Genbank started?
1982.
82
When was PCR developed?
1983.
83
When was C elegans genome sequenced?
1998.
84
What was the problem with the 454 life NGS G520?
Could not scale.
85
Why did the human genome project need to develop next generation sequencing ( 5 reasons) ?
Sanger sequencing was not an ideal method, for the following reasons: 1. Too expensive. 2. Too slow. 3. Cloning bias. 4. Low coverage. 5. Only one sequence at a time.
86
What are the two alternative names for Next Generation Sequencing?
2nd generation sequencing, massively parallel sequencing.
87
What are thee main advantages of NGS?
1. Rapid. 2. Cheap. 3. No plasmid needed.
88
First generation sequencing needed to be replaced as the data was not accurate enough to sequence a whole genome. True or false?
False, first generation sequencing was very accurate. It was however too slow as only one lane could be processed at a time and it required expensive chemicals.
89
What is the main benefit of Illumina sequencing?
More than one sample/ genome can be run at a time.
90
What are indexs?
Unique 6 base codes that allow the identification of each sample.
91
What NGS technique allows two time coverage of the genome?
Miseq.
92
What are the steps for Illumina library preparation?
1. Fragmentation. 2. Size select (200-500bp). 3. End repair with DNAP and exonucleases. 4. A tailing. 5. Ligation of adaptor. 6. PCR amplification.
93
What is A tailing?
A processes used in Illumina sequencing carried out by tac polymerase. Ensures that the strands are complementary to each other stoping ligation.
94
What do adaptors have?
A T' overhang.
95
What happens during the PCR amplification step in Illumina library preparation?
Low sequences are picked up. Adaptors are extended and index sequences are added to allow samples to be differentiated between.
96
What can be used to amplify the signal produced by Illumina Library preparation other than PCR?
Fluronucleotides.
97
How many copies are present per cluster in Illumina bridge amplification?
1000.
98
How many clusters can be present per cm2 in Illumina bridge amplification?
10 million.
99
Clustering in Illumina sequencing is a process in which each fragment is ________ . Flow cell is made of a glass slide coated with __________. One _____ is complementary to the ______ region. Polymerase makes this strand double stranded and then ___________. Adaptor now complementary to the other oligo, binds and _______ to create two single stranded DNA molecules. ____ strand is then cleaved and washed off.
``` Isothermically Lawn of different oligios Oligo Adaptor Original template removed Bridge amplification occurs Reverse. ```
100
Why are the 3' ends blocked in Illumina sequencing?
Prevents unwanted binding.
101
What type of sequencing is used with Illumina sequencing?
Sequencing by synthesis.
102
What determines the length of the read in sequencing by synthesis?
The number of cycles.
103
What determines the base incorporated in sequencing by synthesis?
Emission wavelength and signal intensity.
104
What is the role of the index 1 primer in sequencing by synthesis?
Allows the index read product to be obtained. This read is completed once the adapter is reached.
105
What has to happen before Index 2 can be sequenced in sequencing by synthesis?
The 3' end has to be unprotected.
106
What happens once the index 2 read has been washed away in sequencing by synthesis?
DNAP makes double stranded molecule, which is then linearised with the 3' ends being blocked. Originally forward strand cleaved of and washed away. Read two then occurs.
107
What are the 6 main steps of Illumina sequencing by synthesis?
1. Sequencing primer hybridised. 2. Polymerase and nucleotides added. 3. Flurophores an each cluster read by lasers. 4. Cleave flurophores and unblock nucleotides. 5. Wash. 6. Repeat. 7. Index sequencing- primer hybridised.
108
How much output is given from Illumina sequencing?
An enormous output.
109
Is sanger or Illumia sequencing more accurate?
Sanger.
110
Why is Illumina sequencing relatively slow?
As it has a stop start nature.
111
Does the sample have to be amplified with Illumini?
Yes.
112
How long is the read length for Illumini sequencing?
Short.
113
Why would solid state electronics be better than Illumina sequencing by synthesis?
Expensive optics and chemicals.
114
What does the solid state chip measure in Ion torrent?
pH changes.
115
What are the 8 steps with Ion torrent?
1. DNA fragmentation. 2. Size selection. 3. End repair. 4. Adaptor ligation. 5. PCR amplification. 6. Emulsion PCR. 7. Ion torrent. 8. Sequencing.
116
What causes a change in pH for ion torrent to work?
Once a nucleotide is added a H+ ion is released.
117
What does the micro reactor contain in emulsion PCR?
Ideally one DNA stand, DNA, primers, one bead and PCR mix.
118
What are the stages of emulsion PCR (7 steps) ?
1. Denaturation of the library fragment. 2. Annealing of one reverse fragment to the adaptorsite on the beads. 3. Polymerase amplifies the forward strand starting from the beads towards the primersite. 4. Denaturation of original reverse strand from the bead. 5. Annealing of the reverse strand to the adaptor site of the bead. Primer anneals to the forward strand. 6. Polymerase amplifies the forward strand starting at the bead and heading towards the primersite. Reverse strand opposite. 7. Repeat for up to 48 cycles.
119
How does the forward DNA strand connect to the bead in emulsion PCR?
Sugar phosphate backbone of the DNA.
120
What is the main problem with homopolymers and ion torrent?
Hard to tell the difference between 10, 11 etc of the same base in a row.
121
Does Illumina or Ion torrent need expensive optics?
Illumina.
122
What does the ion sensitive layer below the well do in Ion torrent?
Detect changes in pH and convert these changes to voltage, indicating incorporation of that specific nucleotide.
123
How often are different nucleotides washed over the chips in Ion Torrent?
Every 15 seconds.
124
How much has the output of Ion torrent increased in the last 2 years?
1000X.
125
What 6 things would a better sequencing machine do/ include?
1. Allow for single molecule incorporation of a sequence without amplification. 2. Continuous reads. 3. Long reads. 4. Solid state electronics. 5. Cheap. 6. Small fragments.
126
What are three examples of first generation sequencing?
1. Fragmented ladders. 2. Sanger sequencing. 3. Maxam Gilbert
127
What first generation sequencing method is no longer used?
Madam Gilbert.
128
What sequencing method had 74% of the market in 2014?
Illumina.
129
What generation of sequencing is amplified DNA libraries, clonal arrays and cycling enzymatic reactions an example of?
2nd Generation sequencing.
130
SOLid is a method of second generation sequencing that has become obsolete. What is an example of a second generation sequencing method that is almost obsolete?
454.
131
What third generation sequencing method has become obsolete?
Helicos.
132
What generation of sequencing are PacBio and Oxford Nanopore?
Third.
133
Name an example of fourth generation sequencing?
Experimental, cellular resolution and positional sequencing.
134
Why must second generation sequencing be amplified on beads or on a plate?
Ensures that the signal level is above the background noise.
135
Does second generation sequencing create large libraries?
Yes.
136
Why is the reaction paused after the addition of a base in second generation sequencing?
Allows the signal to be read.
137
What are four disadvantages of second generation sequencing?
1. Expensive library preparation. 2. Slow library preparation. 3. Relatively short read length at 100-200bp. 4. Bias introduced by PCR.
138
Why can PCR introduce bias?
GC rich sequences are not amplified as efficiently and ligases prefer certain sequences for ligation.
139
What sequencing techniques produce very long reads with half the data being over 14,000 bp and the longest reads being 40,000 bp long?
PacBio (Pacific biosciences RS11).
140
What is the accuracy of PacBio?
99.999%
141
What is the shortest run time of PacBio?
10 bases per second.
142
Does GC biases affect low or high GC regions?
Both.
143
What sequencing method has least GC bias?
PacBio.
144
Why is there no amplification bias with PacBio?
As the sequences do not need to be amplified.
145
What can PacBio do which other sequencing methods can not?
Discover a broad spectrum of DNA base modifications.
146
What adaptors are used with PacBio library preparations?
SMRTBell adaptors.
147
What is the purpose of SMRTBell adaptors in PacBio?
Ligate blunt hairpins and repair fragment ends.
148
What are the 6 main steps of the PacBio method?
1. Fragment DNA. 2. Repair DNA and damaged ends. 3. Ligate adaptors. 4. Anneal sequencing primer to SMRTBell templates.
149
What is the 5' SMRTBell adaptor sequence?
TCTCTCTC.
150
What is the 3' SMRTBell adaptor sequence?
GAGAGAGAT.
151
What does SMRT sequencing stand for?
Single Molecule Real Time sequencing.
152
Normally when nucleotides are made fluorescent the base is modified. What is modified to make the nucleotides fluoresce in SMRT sequencing?
The terminal phosphate.
153
What is measured in SMRT sequencing?
The fluorescence emitted when each base is added.
154
'Zero mode wavelength chambers' are used in SMRT sequencing to improve detection as the signals produced are tiny. What are these chambers coated in?
Aluminium and silicon dioxide.
155
How wide are zero mode wavelength chambers (ZMV)?
70nm.
156
What are the read lengths produced in SMRT sequencing?
500-3200 bases.
157
How does SMRT sequencing work?
Nucleotides diffuse in and out of the ZMV chambers every microsecond, when one is incorporated by DNAP it takes several milliseconds. This means the fluorescence label has time to be excited, emitting light that can be detected.
158
What method in addition to PacBio allows the incorporation of 10bp per second?
SMRT sequencing.
159
What method of sequencing is cheap and quick enough to potentially improve health care?
SMRT sequencing.
160
What does a CCS read, produced by SMRT sequencing stand for?
Circular consensus sequence.
161
What release step is the Oxford Nanopore MinION GridION on?
Beta.
162
What was sequenced in 2014 by Oxford Nanopore sequencing?
E.coli and Scardovia.
163
What is the average read length of Oxford Nanopore?
5.4Kb.
164
Are some reads with Oxford Nanopore bigger than 5kb, 10kb, 15kb or 20kb?
10Kb.
165
What is the current error rate of Oxford Nanopore?
30-40%.
166
What are the steps involved in Nanopore library generation?
1. Fragment by nebulisation. 2. End repair. 3. A tailing. 4. Ligation of a 1D or a 2D adaptor. 5. Conditioning attaches a motor protein.
167
Is the 1D or the 2D adaptor used in Oxford Nanopore a hairpin shape?
2D.
168
What protein is used as the pore in nanopore technology?
Heptameric protein a-hemolysin.
169
Why is a-hemolysin an ideal protein to use in nano pore technology?
It is secreted from bacteria meaning it is low cost and robust.
170
What does nano pore technology involve?
A synthetic polymer membrane. Current is measured across the pore.
171
What happens in 1D Oxford nano pore technology?
One stand is sequenced and the other is discarded.
172
What happens in 2D oxford nano pore technology?
The first strand is sequenced and then the hairpin adaptor is unwound and sequenced. The opposite strand (complementary to the first) is then sequenced and is used to correct any errors allowing the correction of a two direction read.
173
What is the role of the motor protein in nanopore technology?
Ensures that only one base enters the pore at a time.
174
How does nano pore technology allow for the identification of molecules?
The membrane has a very high electron resistance and each molecule causes a distinctive disruption in the current allowing for its identification.
175
What is the role of the Gridiron node in nano pore technology?
Allows for data collection.
176
Is there a deterioration of accuracy with nano pore?
No.
177
How does nano pore technology need to be improved?
Read length.
178
What are 6 advantages of nanpore technology?
1. No amplification. 2. Rapid. 3. Long reads. 4. Electronic data in real time. 5. Solid state electronics. 6. Portable. 7. Versatile.
179
What is an advantage of having electronic data collected in real time in nano pore sequencing?
It means thats sequencing can be stopped once the required sample is obtained.
180
Nano pore technology is currently the fastest sequencing method available. True or false?
False, it sequences 10bp a minute like Illumina and Ion torrent.
181
Solid state electronics make sequencing easier as they do not require expensive optics. What else does it do?
Imparts more reliability.
182
Why are nano pores versatile?
They can potentially be changed to measure RNA, proteins and other compounds.
183
What are 6 research related applications that NGS can be used for?
1. Denovo genome sequencing. 2. Resequencing the genome and comparing it to the reference genome. 3. Sequencing transcripts (RNAseq). 4. Studying methylation of DNA. 5. Sequencing small RNAS (sRNAseq). 6. Studying protein binding sites (CHIPseq).
184
What are three clinical applications of NGS?
1. Diagnostics. 2. Biomarkers. 3. Prenatal testing.
185
When did Illuminas MiSeqDX get FDA approval for diagnostics, assays and biomarkers?
19/11/2013.
186
What gene has 'Molecular Health' been testing for with NGS?
Her2 gene.