Chaudhuri Flashcards

Question

What are the clusters in illumina flow cells and how are they distrib?

Answer 1

- each cluster derived from single initial mol and corresponds to separate read - clusters distrib randomly on flow cell surface

Answer 2

- density of clusters determines total yield or reads, but if adj clusters too close seq cannot be resolved (ie. want them as close as poss whilst still being able to resolve)

Answer 3

- software improvements have allowed increased cluster density - also technical improvements --> eg. higher res cameras in machines, thus means can have small and closer clusters

Answer 4

- instead of flat surface, flow cell covered w/ tiny nanowells - primers for branch amp only present w/in each well, so get single cluster gen in each well from a single starting mol - amp rapid, so well fills up, preventing other mols from entering - know exact position of each well, so cluster can be identified unambiguously - cluster cannot spread outside well, so no overlapping clusters, means clusters can packed tightly

Answer 5

- Illumina X ten released in 2015 - new system targeted exclusively at seq human genomes - machines cost $1 mil and have to buy 10

Answer 6

- long standing goal of genomics, as it is the point it becomes feasible to offer genome sequencing as a routine service in healthcare) - Illumina X ten can, inc consumables, labour and depreciation

Answer 7

- HiSeq X Five and HiSeq400

Answer 8

- reads are limited in length as the quality of base cells reduces later in read, resulting in more errors - due to problems of phasing

Answer 9

- by chance a random base will not incorporate into 1 of the reads, then this read will be lagging behind by 1 base, so start to get a mixed signal - this can happen repeatedly and as this develops become less confident in the colour of the cluster, as more mixed

Answer 10

- early incorp of bases | - essentially the opp problem to phasing

Answer 11

- often reads will be trimmed to remove low quality seq prior to analysis (gets v low quality after around 100bp) - recent software improvements on Illumina MiSeq allowed dynamic correction of phasing problems, increasing read length to 300bp

Answer 12

- correction computationally intensive, so can only used on a small scale

Answer 13

- 4 bases sequenced using only 2 colours, rather than 1 for each base - allows simpler optics in machine, therefore lower costs - T = green, C = red, A = green and red, G = no colour - used in NextSeq500 and Miniseq

Answer 14

- single molecule real time sequencing (SMRT) - can gen v long reads (>10kb) - apps inc finishing small genomes, microbial epigenetics, targeted seqs

Answer 15

- PacBio Sequel allowed more reads at lower cost - now Sequel II released - becoming more practical get such long reads at a lower cost, this is mainly done by improvements in optics and chemistry

Answer 16

- already at theoretical max --> limit is length of DNA mol, w/ some reads reported of >2Mb

Answer 17

- highly portable MinIONs now commercially available, w/ a few early publications - scale of sequencing has been improved by the release of the larger PromethION and GridION systems

Answer 18

- essentially lots of MinIONs together (25) --> potential to prod lots of high quality long reads

Answer 19

- 5 MinION flow cells at a time

Answer 20

- automated Oxford Nanopore library prep, goal is to take any biological sample (eg. blood, bacterial culture), deposit straight on machine, will extract DNA suitable to pass straight onto MinION sequencer

Answer 21

- only 2.3x106 bps had been seq, less than the size of most bacterial genomes - largest genome sequences was phage lambda, around 50,000 bp - aimed to next seq bacterial genomes - and eventually humans

Answer 22

- E. coli K-12 | - sequencing ordered clones based on a genetic map

Answer 23

- method was slow and laborious

Answer 24

- Venter adopted a shotgun sequencing approach and sequenced Haemophilus influenzae and Mycoplasma genitalium in 1995 (both have small genomes, <2Mb)

Answer 25

- computational assembly of seq from random clone libs

Answer 26

- take (usually) circular bacterial chroma and use sonication/enzymatic methods to randomly shear DNA into small fragments - size select, so all about same size - clone indiv fragments into plasmid vectors - pick colonies to create shotgun lib - plasmid preps - seq each insert w/ 2 primers (using Sanger) - assemble - get whole chunks of genome which are representative, but also gaps - PCR over gap regions, to fill them in (can be most time consuming part)

Answer 27

- reinvestigated Sanger seq data of bacterial genomes - tested hypothesis that assembly gaps correspond to seqs toxic to E. coli - identified many compounds toxic to E. coli - found novel toxins and restriction enzs, and new classes of small noncoding RNAs that reproducibly inhibit E. coli growth - suggests new modes of antimicrobial intervention

Answer 28

- from convalescent diphtheria patient in 1922

Answer 29

- standard E. coli for lab studies, as grows rapidly

Answer 30

- may have been acquired by horizontal transfer

Answer 31

- paper found 755 candidates - at least 234 horizontal transfer events since diverged from Salmonella - these genes tend to cluster together, can be acquired together

Answer 32

- normal component of gut flora of humans and animals | - also a wide range of pathogenic strains

Answer 33

- E. coli O157:H7 - emergent human pathogen assoc w/ haemorrhagic colitis and haemolytic uraemic syndrome (HUS), which can lead to kidney failure and sometimes can be fatal

Answer 34

- genome approx 5.5Mb, around 1Mb bigger than K-12

Answer 35

- presence of O- and K- islands - O- island are regions of O157:H7 which do not have comparable seq in K-12, so likely derived from horizontal transfer - also presence of K- islands, which was surprising

Answer 36

- CFT073 - strain of uropathogenic E. coli (UPEC) - eg. of extraintestinal E. coli (ExPEC), assoc w/ UTIs - ExPEC can be harmless when in intestines but become pathogens when invade urinary tract, blood or CSF - UPEC strains responsible for 70-90% of 7 mil cases of acute cystitis and 250,000 cases of pyelonephritis reported annually in US

Answer 37

- similar in size | - 3 way comparison found extra seqs diff from those in O157:H7

Answer 38

- carried out by taking 1 strain at a time and identify seqs present in all strains looked at so far - around 2200 genes - DIAG

Answer 39

- look at no. of unique genes, on av find approx 300 new genes in each (seq 1st and all unique, then less in second etc. and this continues until get to around 300 new genes) - so effectively infinite in size, no matter how many strains seq, will continue to find new genes - DIAG

Answer 40

- short read Illumina seq has dramatically increased no. of available bacterial genomes - however, finishing (filling in gaps) and annotation remain laborious processes, so most genomes are left as drafts

Answer 41

- repeats, eg. IS elements (transposable elements), rRNA, operons

Answer 42

- contigs can be reordered relative to complete reference genome, making assumption that genome structure is conserved - automated annotation pipelines, eg. Prokka, are increasingly used

Answer 43

- complete genomes are increasing | - but draft genomes increasing much more rapidly

Answer 44

- due to increased long read sequencing techs | - may soon be poss in automated manner, eg. Oxford Nanopore MinION and PacBio Sequel

Answer 45

- GWAS studies to find causal genetic factors underlying important phenotypes - eg. identified vitamin B5 biosynthesis as a key host specificity factor in Campylobacter (common cause of gastroenteritis) t/ GWAS - -> always present in cattle derived strains, but not chickens (may be due to diffs in adaption to host diet)

Answer 46

- on basis of motility, metabolic profile and clinical manifestation

Answer 47

- E. coli = motile | - Shigella = non-motile

Answer 48

- E. coli = usually commensal | - Shigella = obligate pathogens

Answer 49

- demonstrated that the O-antigen, H (flagellar) antigen and the K (capsular) antigen are useful for distinguishing between strains

Answer 50

- typing based on immune recognition of cell surface antigens - bacteria of same serotype cross-react to the same Abs - for E. coli the O, H (and sometimes K) antigens are used for serotyping

Answer 51

- Milkman (1973) began quantitative study of E. coli pop genetics by measuring electrophoretic mobility of enzs derived from diff E. coli strains (MLEE)

Answer 52

- involves assessing the electrophoretic mobility of a series of purified enzs - produces quantitative mol data which can be used to understand the evolutionary relationships between strains

Answer 53

- no, genetically similar strains can have diff serotypes and distantly related strains can share the same serotype

Answer 54

- set of phylogenetically diverse E. coli strains chosen based on MLEE data - selected to represent full diversity of species, but doesn't fully represent pathotypes, as mostly commensal - 5 phylogenetic groups: A, B1, B2, D, E

Answer 55

- Shigella arisen on multiple occasions from E. coli, inc some of the specific biochemical properties = convergent evolution

Answer 56

- indiv strains can pick up copy of gene from diff strain, and will therefore have diff evolutionary history to rest of genome - looked at diff genes in diff strains to see how related they were - if eg. take K12 and RM191F then they were closely related for most genes, but for another gene were quite divergent, suggesting recombination has occurred in this gene

Answer 57

- an alt to MLEE - involves amplification and sequencing of regions of around 400bp (convenient amount to PCR amplify) of 7/8 housekeeping genes distrib around genome - housekeeping genes as thought less likely to be affected by recomb as so important (strong purifying selection)

Answer 58

- carried out MLST - chose pathogenic and nonpathogenic strains - EHEC is pathogenic and contains a toxin, 2 strains have acquired this separately - EPEC are similar, but cause less serious disease, but also have 2 indep groups, have acquired genes necessary for virulence on 2 sep occasions

Answer 59

- now can take core genome and understand relationships between bacteria - but do sometimes recombine, so can be diffs in phylogeny when look at indiv genes, but get correct phylogeny when take all of the core genome as a whole

Answer 60

- found in all bacteria | - has important function in cell so tends to evolve slowly as constantly being used

Answer 61

- 9 V loops - conserved regions which don't change much so can design primers to them, spanning one of the variable regions, amplify up variable region and seq it - V loops seem to be in right amount of diversity that can use them to look at diffs, but can still align them

Answer 62

- universal primers used to amp V regions of 16s rRNA gene, which variable - seq using Sanger, Illumina or PacBio - seqs comp w/ rRNA databases such as Greengenes to identify taxas

Answer 63

- primers not completely universal, so may not work in some organisms - contamination, amplify up their 16s genes instead, even low levels are a problem as amplified - sequencing errors can result in overestimation of diversity of organisms present - some organisms have multiple copies of 16s rRNA gene, vary in seq and can result in overestimation of no. of taxa present - PCR bias may result in incorrect quantification of species

Answer 64

- showed archaea have separate kingdom (thought to be part of bacteria) and actually more closely related to euks

Answer 65

- most of what's known about bacteria, is from species that can be grown in the lab and studied - up to 99% of bacteria cannot be cultured in the lab, these species referred to as ‘microbial dark matter’ - big problem in understanding microbial diversity

Answer 66

- metagenomics

Answer 67

- environmental whole genome shotgun sequencing - seq all samples well to get deep coverage - identified 1.2 mil unknown genes and found many novel species

Answer 68

- revealed links between health conditions that weren’t thought to be assoc w/ bacteria, but there was a diff in microbial diversity between people w/ and w/o the disease → inc cardiovascular disease, obesity, IBD

Answer 69

- DNA extracted from env sample

Answer 70

- DNA extracted, fragmented and sequenced, on eg. Illumina - deep sequencing req, as lots of organisms present and otherwise will just see those present at high abundance - indiv seq reads can be identified using software such as Kraken - alt, de novo assembly can be used to piece together larger contigs

Answer 71

- assembly is complex, computationally intensive and error prone - helped by long read techs, eg. PacBio and Oxford Nanopore --> Oxford Nanopore PromethION prod enough data to make long read metagenomics poss

Answer 72

- seq w/o any original DNA - saw lots of species, due to reagents and kits used to extract DNA, ie. was a base level of contamination of microbial DNA - known as the ‘kit-ome’

Answer 73

- evidence for anthrax and bubonic plague - but several later papers debunked this - was no gene which caused the plague or anthrax present - original paper seq strains and looked at database for those most closely related, but the genes assoc w/ these diseases weren’t actually present

Answer 74

- indiv cells iso by eg. laser microdissection, micropipetting, optical tweezers, cell sorting (FACS) - the single copy of the genome is PCR amp, seq and assembled

Answer 75

- amp is challenging and assembled genomes will often have patchy coverage

Answer 76

- microfluidic tech - samples diluted so on av 1 bacterial cell ends up in each chamber in the iChip - chambers filled w/ molten agar and covered w/ semi perm mem - iChip placed back in native env - allows colony to grow from single cell, which provides enough material for DNA seq - method to culture but w/in native enc, and can still carry out sequencing

Answer 77

- claiming a new antibiotic found that kills pathogen w/o detectable resistance = teixobactin - v effective against gram +ve bacteria

Answer 78

- EPEC (enteropathogenic E. coli)

Answer 79

- encodes type III secretion system which it uses to inject host cells w/ effector molecules, to allow it to attach to the gut wall - subverts host cellular machinery

Answer 80

- similar to EPECs, but also encode a shiga toxin, which is assoc w/ the most severe disease (HUS)

Answer 81

- aggregate in a characteristic “stacked-brick” conformation to aid survival in gut - assoc w/ mild self-limiting diarrhoea

Answer 82

- usually affects children and elderly (weaker IS), but this affected otherwise healthy adults and in particular young women and was unusually severe

Answer 83

- tracked diets of infected t/ filling in forms - infected cucumbers 1st implicated --> salads often infected - spread beyond Germany --> t/ people travelling, export etc. - claimed it was Spanish cucumbers --> had huge economic impact, as Spanish cucumber crops weren’t bought - but no evidence of E. coli in Spanish cucumbers - eventually narrowed down to bean sprouts

Answer 84

- ion torrent - at the time Illumina took 2 weeks and this was o/n - published data from ion torrent

Answer 85

- race to study indiv sequence reads and piece them together to make complete seq of lethal E. coli and find out why this outbreak was so severe - crowdsourced analysis - -> v quickly got genome seq data (the 1st time it was poss to do this during an outbreak

Answer 86

- phylogenetic analysis found the outbreak seq is EAEC (not EHEC) --> found as almost identical to complete genome seq of an existing E. coli based on core genome analysis

Answer 87

- PacBio seq of O104:H4 strain | - comp seq of TY2482 w/ 55989

Answer 88

- high incidence in adults - greatly increased incidence of haemolytic uraemic syndrome (HUS) relative to other EHEC outbreaks (25% vs. 15%) - particularly high prevalence in women, inc most of HUS cases - strains iso from cases exhibited a rare serotype (O104:H4) - recognition of outbreak strains was hampered by inapprop use of diagnostic tests focused on O157:H7 - characterisation of the causative agent t/ whole genome sequencing was performed whilst outbreak still in progress

Answer 89

- rapid genome seq w/ eg. MinION

Answer 90

- much smaller, despite prod similar data - v rapid - rapid draft sequencing carried out for hospital outbreak of Salmonella - -> gen data in less than half a day and could thus distinguish outbreak from sporadic cases

Answer 91

- sequencing was needed in the field but limited infrastructure in many of the outbreak locations - so MinIONs were used (also needed PCR machines, reagents etc.)

Answer 92

- started from 1 initial virus but diverged into diff lineages as spread --> quickly gens mutations t/ error prone rep - so useful to know what particular lineage people had, and therefore where they got it from, so could find others at risk and quarantine them to minimise spread

Answer 93

- intermittent electricity supply so have to back everything up w/ UPS, to stop sequencing run failing - poor internet connection

Answer 94

- can see spread and how virus evolved | - also map of genome and see how diff regions vary over the course of the outbreak

Answer 95

- to seq DNA in space

Answer 96

- plugs into phone, input sample and can seq it

Answer 97

- mainly to do w/ how extract DNA, as need to keep v long fragments intact so they can be sequenced - dev of protocols to extract v high mol weight DNA

Answer 98

- get periodic selection - allows neutral variant to be fixed w/in pop by ‘hitch-hiking’ w/ genetically beneficial mutation - forces pop t/ bottleneck and reduces diversity at that locus

Answer 99

- seq 11 related EAEC strains for comparative analysis - av of 2700 bp reads, w/ some much longer - combined w/ CCS to get 99.9% accuracy

Answer 100

- initially investigated w/ Illumina MiSeq, then w/ MinION - both yielded reliable and actionable clinical info in less than half a day - for Illumina used new draft sequencing protocol to reduce time, inc by reducing read length and cycle time --> enough coverage to conclude all part of same outbreak - w/ MinION could unambiguously identify strain in under 30 mins

Answer 101

- can gen genomic data directly from diagnostic patient samples

Answer 102

- seq DNA w/ semi conductor chip w/ millions of wells - DNA cut into millions of fragments - each fragment attaches to a bead, and copied until covers bead - beads washed over chip and deposited into well - chip flooded w/ 1 nt at a time, if incorp then H+ released and alt pH which can be detected - this is repeated w/ diff nts to get seq

Answer 103

- utilises power of DNA pol - diff fluorescent labels added to each base on terminal phosphate, so the pol cleaves label as part of rep process, leaving a natural DNA strand - label then visualised in zero mode waveguide (ZMW) chamber

Answer 104

- uses protein nanopore inserted into synthetic membrane - current applied so only flows through aperture of nanopore - uses a strand sequencing method - ssDNA pulled t/ aperture, causing a characteristic disruption dep on base

Chaudhuri Flashcards

(130 cards)