Microbial Genomics Flashcards
(87 cards)
How does Sanger sequencing work? (hint - What does it exploit?)
What is this method an example of?
Exploited dideoxy-nucleotides, structural analogues of deoxy-nucleotides which are the standard building blocks of DNA, but which lack an OH group on the sugar
This means they can incorporate into a DNA strand during DNA synthesis, but do not permit further extension of the chain (no OH to stem off of)
Sequencing by synthesis
What was the process of the original sanger sequencing?
What was the original drawback and how was this improved?
As each nucleotide is added to the chain, there is a chance that a terminator nucleotide will be added instead
If that happens, no more bases can be added to that copy, and we end up with a truncated sequence
In Sanger’s original method, 4 reactions are carried out using the 4 dd/dNTPs separately
- Products are run side-by-side on a polyacrylamide gel to separate them according to size, and the sequence can be read off the gel from the bottom upwards
Use of fluorescently-labelled dideoxy bases meant that sequencing could be performed in a single capillary tube, with the bases distinguished by the colour of the fluorescent label
What was the first microbial genome sequenced?
What were 2 key features?
Bacteriophage ΦX174 ; Phage for E. coli
Genome is extremely compact
Overlapping open reading frames; Same section of DNA encodes two different proteins
What is a commonly used model organism?
How was it sequenced?
E.coli K-12
Clone-by-clone sequencing
- 250kb sections of the E.coli genome were cloned into bacteriophage λ, and ordered based on information from a genetic map
What was the first species identified via shotgun sequencing?
Haemophilus influenzae Rd in 1995
Explain shotgun sequencing
Why might this be preferred over other sequencing methods?
Fragmentation of random sections (e.g. restriction enzymes) of the genome which are then sorted by size and cloned into a vector
Vector is inserted into E. coli to clone fragment
Followed by computational assembly of the complete chromosome
Small genomes
G+C base composition similar to that of humans (38%)
No physical map available for ordering of genetic information
Give more detail about the assembly step of shotgun sequencing (hint - paired and long)
Drawback that prevents complete chromosome sequencing? (hint - not cloned)
How is this addressed?
Reads are obtained from either end of each DNA fragment
Reads can be computationally assembled to produce long sequences
Likely to be gaps due to regions of the genome which could not be cloned, or due to repetitive sequences which could not be resolved during the assembly process
A “finishing” step is needed to manually close gaps in the alignment through the use of molecular biology methods such as PCR, Southern blots and sequencing
How is the chance of errors reduced in shotgun sequencing?
Enough sequence data will be obtained to cover the genome several times over with each base pair in the genome being sequenced multiple times
- This helps to reduce errors in the assembly, as any errors in one read will not be present in the other reads covering the same region
How did we discover an essential minimal gene set for life? (hint -H. influenzae and M. genitalium)
Comparison of H. influenzae and M. genitalium showed 240 genes conserved
Further study showed it was closer to 256 genes
What is non-orthologous replacement?
When intermediate steps in essential pathways are performed by non-homologous proteins between 2 highly conserved species
What are FUN genes and how did E. coli sequencing play a role in their discovery?
Function UnknowN (FUN) genes
WGSs of E. coli K-12 identified a large number of ORFs (~20% of genome) which had no known function and no similarity to previously characterised sequences
When has sequencing help identify how a pathogen evades immune interactions? (hint - Campylobacter jejuni)
With Campylobacter jejuni
Sequencing revealed a large number of hypervariable sequences
Regions can vary in length during DNA replication due to strand slippage, resulting in changes in the sequence or expression of some genes, often ones associated with the synthesis of surface structures
What did sequencing of M. tuberculosis provide insights into? (hint - antigenic)
What has availability of the reference genome sequence enabled? (hint - diverisity)
What drug have we characterised and how has this sequencing information helped? (hint - vaccine)
Genes potentially associated with antigenic variation and pathogenicity, and the large portion of the genome devoted to lipid metabolism
Enabled characterisation of the global genetic diversity of M. tuberculosis strains and provided insights into the development of antimicrobial resistance
It has also allowed characterisation of the BCG vaccine, an attenuated derivative of a closely related organism which has a genome 99.95% identical to M. tuberculosis, but with a reduced genome size due to a series of deletions
What was discovered to be the cause of some gaps when shotgun sequencing E. coli? (hint - toxic)
What has this helped develop? (hint - novel)
Genes/fragments could not be cloned because they encoded gene products which are toxic to E. coli
Such toxic genes are of considerable potential in the development of biotechnological applications and novel antimicrobial therapies
What is a likely source for 20% of genes in E. coli? (hint - not randomly distributed)
Horizontal gene transfer
How do we identify likely regions of horizontal gene transfer?
Believed that bacterial genomes evolve towards a particular GC content
Any genes acquired through phage or recombination may not have GC content that is typical of genome as a whole
- Atypical GC content is a marker of horizontal gene transfer
What is the strain E. coli O157:H7 of ten associated with?
What type of E. coli is this and what does it produce? (hint - EHEC)
Pathogenic E. coli outbreaks
Enterohaemorrhagic E. coli (EHEC),
- Produces Shiga toxin and is associated with haemorrhagic colitis and haemolytic uraemic syndrome (HUS), which can lead to kidney failure and is sometimes fatal
What is often a driving factor leading to the sequencing of certain microbes?
Importance as a pathogen e.g. E. coli strain O157:H7
When sequencing O157:H7 EDL933 and K-12 we discovered regions called O-islands and K-islands. What are these?
Why was there discovery a surprise? (hint - expected similarity)
O-islands - Clustered regions of extra DNA in the O157:H7 strain
K-islands - Regions unique to E.coli K-12
It was expected that the O157 genome would be similar to K-12 plus genes associated with pathogenicity
Several of the O- and K-islands were located at the same position in the genome; Why?
Occurs as there are recombination hotpots in the genome – Easy to acquire horizontally transferred DNA
What type of E. coli is strain CFT073? (hint - uropath)
They are not harmful in intestines, but where do they become pathogenic and cause infections?
What does ExPEC mean?
Uropathogenic E.coli (UPEC)
Associated with urinary tract infections (UTIs)
Extraintestinal E.coli (ExPEC)
Explain the genome ‘patchwork’ structure
Shared co-linear backbone interrupted by strain-specific islands
WHat did the first large-scale comparison of bacterial genomes uncover (8 strains of Streptococcus agalactiae)? (hint - 3 types of genome)
Core genome – Total set of genes conserved across all strains of a species
Dispensable (accessory) genome – Non-core genes present in the genome of each individual strain; Not conserved in at least one other member of species
Pan-genome – Total non-redundant set of genes associated with any strain of a species
How do we estimate size of core genome?
Randomising the sequencing order of the genomes, and looking at how the size of the core genome reduces as additional genomes are added to the analysis
Done lots of times, and the median size of the core genome is calculated as each additional genome is added
By adding a trendline we can estimate when the size of the core genome would plateau (i.e. the point at which the trend line would be horizontal)