Genomics Flashcards

1
Q

Dye determination sequencing

A
  • each ddNTP is labelled with a different fluorophore

- one reaction with all four ddNTPS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Single sequence to genome

A
  • genome too long to sequence in one go
  • can fragment the genome into large pieces (50-200kpb) which can be amplified in bacterial culture as bacterial artificial chromosomes (BACs).
  • similar strategies in yeast and cosmids/fosmids which use other replication origins in bacteria
  • fragments of DNA are cloned into the vector of choice and then tens of thousands of individual colonies are picked to create a library
  • each clone contains a (hopefully) unique fragment of the genome sequence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

BAC library

A
  • rare cutting enzyme digest and clone (every 50-200kb)
  • clone into a BAC vector to give bacterial artificial chromosomes
  • pick each clone into a separate well - each well now contains a different genome fragment
  • each fragment can be purified and analysed
  • align by digest ‘fingerprint’
  • shotgun sequence each BAC individually
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Individual BAC clones

A
  • have a restriction digest pattern matching that of the original genome
  • by digesting with many enzymes the digestion patterns can be determined and matched to ‘tile’ BAC clones to give a physical map
  • may also contain known DNA sequences or markers which can then be used to improve the physical map and link it to the genetic map
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

‘Shotgun’ sequencing with Sanger

A
  • analyses all the bases at once for a single sequence
  • breaks down the genome into manageable chunks at random then sequence
  • fragment genomic DNA -> clone into sequencing vector -> pick colonies and sequence
  • large libraries with tens of thousands of these clones can be constructed and mapped by restriction mapping
  • requires the source DNA to be broken into approx. 1000bp chunks
  • these are incorporated into a sequencing vector and sequenced using standard primers from both sides
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Overlap Layout Consensus Method

A
  • DNA is sequenced to produce a set of partial sequences (reads)
  • a computer is used to assemble the sequence reads into a series of overlapping fragments
  • the overlaps are removed by the computer to produce a single assembled sequence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

‘Next generation’ sequencing

A

Sanger

  • shotgun cloning is slow and expensive
  • sequences one molecule at a time
  • accurate

Illumina (Solexa)

  • sequences all molecules at the same time
  • quite accurate
  • other competing technologies as well
  • relatively short reads
  • expensive for single sequences, cheap for many
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

High-throughput sequencing

A

Involves:

  • the chemical amplification of DNA fragments
  • the synthesis of complementary strands using fluorescently labelled nucleotides
  • now outdated and rarely used
  1. single DNA molecule are attached to a solid surface
  2. each molecule is amplified in place by PCR (each spot is a PCR colony or ‘polony’)
  3. the four nucleotides (as nucleotide triphosphates), each labelled with a different fluorescent dye, are added, along with DNA polymerase and a universal primer
  4. only one nucleotide is attached to the primer by DNA polymerase. Unicorporated nucleotides are removed
  5. the newly added nucleotide is detected by a camera
  6. the cycle is repeated about 100 times
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

High-throughput sequencing

A
  1. The sequence of interest is first fragmented and fragments of a specific size isolated.
  2. Specific PCR primers are ligated onto the ends.
  3. These fragments are then hybridised to oligos on a flow cell (very dilute)
  4. The oligos attached to the flow cell act as primers to amplify the fragment attached to the slide. This forms a PCR colony of identical fragments
  5. then it is on to the sequencing process.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The HTS cycle

A
  1. add to growing chain 5’-3’
  2. detect label with camera
  3. chemically cleave the label revealing the 3’OH
  • limit = about 130-150 cycles
  • further extension blocked by the dye label
  • immobilised template is hybridised with a 3’-labelled dNTP and the sequence extended by one base
  • as 3’ is blocked (the fluorescent label acts as a protecting group), the chain cannot be extended
  • excess reagents are removed and the presence of the fluorescent label is detected
  • the 3’ position is deprotected, ready for the next cycle
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Illumina Hiseq

A
  • a small portion of the sequencing slide can read >150 million sequences at the same time for each sample
  • scaling this up, an illumina hiseq has 8 lanes
  • each lane can be used for a different sample
  • each lane can give 20-40 million sequences up to 150 bases
  • 1 run takes 3-6 days (~1 hour per base)
  • 3010^68 lanes * 150 bases = 36 Gbo (approx. ten human genomes)
  • now up to 150Gb per run (50x coverage)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Chromosomes

A

have a single DNA molecule with specialised DNA sequences for the initiation of DNA replication, for spindle interactions in mitosis (centromeres), and for maintaining the integrity of the ends (telomeres)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Protein gene expression

A

occurs at open reading frames, from which RNA polymerase transcribes mRNAs that are translated to form polypeptides, which become functioning proteins. Genes contain DNA sequences for control of their expression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Protein coding genes

A

generally not repetitive but there are some exceptions, e.g. gillagrin and high copy number genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Repetitive regions

A

microsatellites, telomeres, intron sequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

tRNA

A

very similar sequences (but very short)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

rRNA

A

many copies of some ribosomal genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Transposons

A

mobile genetic elements - sequence of a few kb that can move about the genome. Thousands of copies in eukaryotes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Size matters

A
  • the longest repeats in microbial genomes are about 7kb
  • with the latest technologies we can read right through them
  • without extra long reads we need to improvise with paired-end reads
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Contig

A

a contiguous (continuous) consensus sequence from an assembly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Scaffold

A

a series of contigs where we have additional information to place them together in the right order and orientation but the sequence between the contigs is not complete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Assembly

A

the set of scaffolds for one genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

N50

A

the size of the largest contig/scaffold of which is 50% of the assembled data is in a contig/scaffold of that size or larger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Read length

A
  • A single read cannot span a repetitive region that is longer than the read length.
  • This prevents long contigs from forming.
  • The longer the read length the larger the repeat region that can be assembled.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Read depth/coverage

A
  • The average number of times each base appears in the final assembly.
  • A coverage of 10X means that each base is on average found in 10 reads.
  • The deeper the coverage, the more clearly any sequence or structure changes can be discerned from sequence error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Ploidy

A
  • The number of copies of the genome in the organism.
  • Bacteria =1; Human=2; Potato=4; Strawberry=8
  • The higher the ploidy, the harder it is to accurately assemble.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Genomic resequencing

A
  • to look for a variant
  • identify differences between strains/organisms/individuals
  • assembly against a reference is much easier than de-novo sequecing
  • may impact how you are treated medically in the future/potential of personalised medicine
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Resequencing steps

A
  • different to reference sequence
  • gap compared to reference sequence
  • duplicated gene or region?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Challenges of short read re-sequencing

A
  • deleting a whole genome, hard to look for something that’s not there
  • duplication is same kind of problem
  • inversion, if sequence is short its hard to tell
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Single molecule real time sequencing (PacBio)

A
  • long read (10kb+)
  • high error rate (14%)
  • cyclising the template means it can be read many times and an accurate consensus obtained
  • iontorrent works in a similar way but detects the pH change on nucleotide addition
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Nanopore sequencing

A
  • as DNA is passed through the nanopore by the molecular motor under the influence of a potential
  • the current changes in a detectable way depending on the bases occluding the pore
  • the current can be interpreted to read the DNA sequence
  • to improve accuracy, a hairpin adapter is ligated to the end of the DNA fragment
  • this causes both strands of DNA to be read sequentially
32
Q

Ultra long read issues

A

Accuracy
- at present around 95-98%
Throughput
- much slower than Illumina (5Gb/48hr vs 150Gb/96hr)
Toolset
- these are new technologies and the analysis tools are still being developed

33
Q

Sequencing summary

A
  • most sequencing is by synthesis
  • current sequencing technologies can produce terabases per day
  • assembly is a challenge, especially for large genomes
  • repetitive regions are challenging
  • small changes compared to a reference are challenging
  • new technologies are helping to solve the challenge
  • careful experimental design can help solve the challenge
  • long reads - lower throughput but better for genome structure
  • short reads - higher throughput but better for sequence accuracy
34
Q

How do we differ from one another?

A

we differ from each other in small polymorphisms and structural variation

35
Q

What is a reference genome?

A

A standard sequence against which we can compare other sequences

36
Q

What are the problems with the reference?

A
  • The reference is from a very small subset of donors and is a mosaic
  • People vary, in some regions far more than others. (GRCh38 has 261 alternate scaffolds)
  • The reference is incomplete (603 gaps)
37
Q

dbSNP

A

a collection of genomic variation for human and other species

38
Q

Single nucleotide variants/polymorphisms

A
  • substitution
  • deletion
  • insertion
39
Q

Structural variants

A

changes in the overall structure of the genome

  • duplication
  • loss
  • translocation
  • inversion
  • repeat
  • deletion
40
Q

Haemophilus Influenza

A
  • a human pathogen that lives in the upper intestinal tract
  • can cause conjunctivitis and meningitis
  • sequenced in 1995
  • first whole genome sequences using a shotgun method
  • since then 28 different strains have been sequences and the pathogens have been identified
41
Q

What does the presence of genes responsible for different functions determine?

A

the virulence of the organism

42
Q

Virulence

A

the degree ofpathogenicitywithin a group or species ofparasitesas indicated by case fatality rates and/or the ability of the organism to invade the tissues of thehost

43
Q

Virulence Factors

A

genes which produce products essential for virulence

44
Q

Where do gene differences between strains tend to occur?

A
  • in clustered regions on the chromosome

- provided support for the concept of lateral gene transfer

45
Q

Lateral gene transfer

A

In some cases the genes coding for a specific cluster of genes can arise from a different source. Instead of progressive stepwise evolution, a cluster of genes from ‘foreign’ DNA is incorporated as a plasmid or integrated into the genome

46
Q

Mycoplasma genitalium

A
  • smallest known free living organism- commensal in the genitourinary tract
47
Q

Chlamydia trachomatis

A
  • human pathogen
  • causes trachoma (blindness), pharyngitis, bronchitis
  • obligate intracellular parasite transmitted by sexual contact
48
Q

How does chlamydia trachomatis effect HeLa cells by infection?

A
  • prevents fusion of phagosome and lysosome

- takes in ATP from host cells as it cannot produce it itself

49
Q

Metabolism of chlamydia trachomatis

A
  • metabolic pathways are patchy
  • part of TCA missing
  • cannot synthesise ATP
  • doesn’t appear to synthesise amino acids
  • contains a well defined recombinase pathway
  • reported to recombine and reshuffle the genome quite readily
  • contains many fatty acid and phospholipid synthesis
50
Q

How do parasitic or commensal bacteria benefit?

A
  • resources of the host
  • taking things is more efficient than making them
  • their genomes have adapted and lost key metabolic processes
51
Q

DNA replication is expensive

A

there is no competitive advantage for a bacterium to keep DNA that is of no use or redundant

52
Q

What is the minimal genome?

A
  • take a small genome and make it smaller by knocking out genes
53
Q

Transposon

A
  • ‘mobile’ DNA element that codes for enzymes that allow it to relocate in the genome
  • can be many 10’s of kb long and include many genes
54
Q

Method applied to mycoplasma genitalium

A
  • reduced essential gene count from 482 to 389
  • synthesised the entire genome and transplanted into an empty cell
  • the new synthetic organism grew
55
Q

Means of horizontal (lateral) gene transfer

A
  • virus
  • ingest DNA. foreign DNA can be taken up by a variety of mechanisms Phage infection. direct introduction of DNA into cells (competent cells).
  • ingest organism. by ingesting an organism then using its DNA
  • conjugation (mating). Exchanging DNA with a related organism. Inside the cell the DNA could stay as a plasmid or integrate into the host genome via viral integrases
  • OR transposon jumps from ingested DNA to genome (mobile DNA element)
56
Q

Identifying horizontal gene transfer

A
  • using phylogeny (horizontally acquired gene cluster)
  • using sequence properties e.g. GC content
  • genes incorporated from different sources may have different baseline GC content, or different kmer usage
57
Q

Horizontal gene transfer - organismal tree

A

genes are transferred laterally between species e.g. up and down between C and D

58
Q

Horizontal gene transfer - Gene x tree

A
  • apparent close relationship of lineages inferred from sequences of x reflects the lateral transfer of this gene rather than the phylogeny of the organisms
59
Q

Horizontal gene transfer - consensus tree

A

Based on multiple genes more accurately reflects the organismal phylogeny

60
Q

Lateral gene transfer complicates phylogenetic relationships

A
  • the phylogeny of four hypothetical prokaryote species, two of which have been involved in a lateral transfer of gene x
  • a tree based only on gene x shows the phylogeny of the laterally transferred gene, rather than the organismal phylogeny
  • a consensus tree based on multiple genes is more likely to reflect the true organismal phylogeny, especially if those genes come from a stable core of genes involved in fundamental processes
61
Q

Factors that help an organism invade the host

A
Cell attachment
- adhesins, fimbrae etc.
Capsules
- prevent attack by macrophages and digestion
Degrading enzymes
- hyaluronidase, proteases, lipases
62
Q

Factors that help an organism evade the hosts defences

A

Toxins
- endotoxins and exotoxins
Immunosuppressants
- e.g. anti-immunoglobulin proteases

63
Q

Endotoxins

A

part of the bacterial structure e.g. lipopolysaccharide

64
Q

Exotoxins

A

secreted by bacteria e.g. shiga toxin, pertussis toxin, cholera, botox

65
Q

Virulence factors

A

toxins etc and where they are coded

66
Q

B.cereus strains

A
  • they have incorporated the virulence factors into their genome
  • do not have a plasmid
67
Q

B.anthracis

A
  • two plasmids
  • pXO1 contains the toxins
  • pXO2 produces the capsule, preventing phagocytosis and is used for immunization of domesticated animals worldwide
68
Q

Sterne strain (34F2)

A

has lost the pXO2 plasmid

69
Q

Anthrax in chimpanzees

A

In 2010 a group of researchers identified a bacterium responsible for a fatal anthrax-like disease in chimpanzees which had closer sequence similarity to B.cereus and B.turingiensis than to B.anthracis but contained the pXO1 and pXO2 plasmids (and a third plasmid.)

70
Q

RNAseq

A

sequencing all the RNA molecules in a cell

71
Q

Metagenomics

A

Sequence every organism in the environment

72
Q

Microbial genomes are minimal

A

if a gene isn’t required then it tends to be lost

73
Q

Microbial genomes reflect the biology

A

the genes tell us about the life of the organism

74
Q

Microbial genomes are plastic

A

they are reshaped with additional plasmids, transposons etc. to add new functions

75
Q

Genome sequencing is opening up…

A

… new areas for study

76
Q

Large scale sequencing can…

A

identify disease loci through genome wide association studies