18. Genome-wide association studies and how we can use it to better understand bacteria Flashcards
(53 cards)
What is a GWAS?
- Genome wide association study
- Studies a the entire genome of a large group of people.
- Searching for small variations in the population (SNPs)
What is a GWAS used for?
- To identify genetic variants associated with a specific trait.
- This is done by comparing the genomes of people with or without the trait.
- These genetic variants can include mutations, insertions, deletions, modifications.
- The variants are present at higher frequency in cases than the controls.
What sequencing technology does GWAS use?
- They used to use DNA microarray analysis.
- Now GWAS uses whole genome sequencing
What are the variants identified through GWAS linked to?
- Variants are linked to a disease
- Or they are in the same haplotype as a variant associated with a disease.
- These are present at higher frequency in the cases then the controls.
What is a haplotype?
A group of polymorphisms inherited together.
How is statistical analysis used in GWAS?
- It is used to determine how likely a variant is associated with a trait.
- The P-value indicates significance of the difference in frequency allele tested between cases and controls.
What is the output of a GWAS?
- A Manhattan plot
- This compares -log10 P values plotted against the position in the genome.
What did one of the first successful GWAS studies show?
- It identified the genetic basis of macular degeneration.
- It found a tyrosine to histidine change at position 402 in the complement factor H gene.
- This factor H variant has weaker affinity to oxidise lipids so there is less C3b-iC3b generation.
- This causes constant background complement activation.
- This causes retinal epithelial cell damage.
How can we use GWAS to investigate bacterial virulence, antibiotic resistance and predict outcomes of infectious disease?
- GWAS requires large collections of genome sequenced bacterial isolates.
- We can use these to interrogate bacterial phenotypes.
- These phenotypes could be pathogenic or to do with resistance.
Is there enough whole genome bacterial data to use in GWAS?
- There has been a massive increase in sequencing due to advances in technology and reducing costs.
- The amount of sequence data is doubling every 18 months.
- There is a lot of sequence data available however only 20 bacterial species make up 90% of this data. The other 10% is made up of 100s of species
- So most GWAS are restricted to these pathogens as they are the ones with enough data for a GWAS.
- These 20 are all human pathogens.
- The data is skewed though as some pathogens like E coli have a lot more data then N gonorrhoeae.
What are the prerequisites for a successful bacterial GWAS?
- The trait you are looking at needs to have a testable phenotype like toxicity, virulence or AMR.
- The whole genome-sequenced bacterial isolates. The more closely related the strains are the easier it is to identify genetic mutations associated with the phenotype.
- Understanding the genetic variation and population structure in bacterial strains.
What bacterial phenotypes can be tested for in GWAS?
- Continuous varying quantitative phenotype or a binary phenotype.
- Whatever the phenotype there needs to be a good high throughput assay and enough isolates to test.
- GWAS can also detect the effect size of a variant.
What is the effect size of a variant in GWAS?
- It is a measure of the correlation of the variant with phenotype.
- Mutations/variants in a key regulator could completely cause the effect seen.
- Some mutations in other genes could have a more modest effect on the phenotype.
What is linkage disequilibrium?
- In humans genetic recombination and chromosome segregation causing newly occurring mutations to be linked to neighbouring allele as part of a haplotype.
- This linkage lasts until recombination breaks the linkage.
- The extent that any 2 alleles within a population are contained on the same ancestral haplotype block of DNA. This is linkage disequilibrium.
- Mixing alleles between different genetic backgrounds is important for distinguishing causal loci from linked mutations.
- Linkage disequilibrium has a stronger effect on bacteria due to asexual reproduction and the clonal nature of the population.
What are homoplasious mutations?
- Mutations that occur repeatedly at the same site.
- Bacterial strains could share the same mutation at a particular genomic location not through common ancestry but because the variant arose independently.
- This introduces variability into the population
How can homoplasious mutations be introduced into bacteria?
- Horizontal gene transfer.
- Recombination
- Recurrent mutations.
What are recurrent mutations?
- The same mutations that occur independently
- These are usually due to selection pressure like AMR.
Why is considering population structure important in bacterial GWASs?
- Bacterial populations are clonal.
- In the absence of recombination all fixed genetic variants will be passed onto descendants and they will be in linkage disequilibrium with other mutations in that lineage.
- This means linkage disequilibrium has a strong effect on bacterial populations.
- Separation of causative variants and passively linked loci is a problem in association studies.
What is linear mixed modelling?
- a bioinformatics approach to deal with the bacteria population structure issue.
- It can control for the effects of relatedness as it captures population structures more accurately.
- Helps identify mutations associated with the phenotype and not the mutations from linkage disequilibrium.
- It pin points locus specific effects where possible and identifies lineage level differences.
What is VISA?
- Vancomycin intermediate resistance in S. aureus.
- These are S. aureus with a raised MIC but not fully resistant.
- It is caused by changes in multiple genetic loci to make the cell wall thicker.
- Tricky to define
How have GWASs been used to find genetic associations with VISAs?
- GWAS examined 49 VSSA and 26 VISA.
- The phenotype of vancomycin resistance was determined with microdilution tests, E-test and PAP-AUC.
- found around 55,000 SNPs across the strains.
- Lots of the SNPs were fixed in linkage disequilibrium so don’t help with VISA associations.
- 1 SNP in rpoB was highly significantly associated with increased vancomycin MIC.
- This SNP was at codon 481 of rpoB (H481Y/L/N).
- It had previous been associated with vancomycin resistance in an independent study.
What is PAP-AUC?
- Population area under the curve.
- Plot lines using an increasing amount of vancomycin and seeing how much bacteria is left.
- Calculate the area under the curve to calculate the MIC.
- Bigger area = higher MIC
Why do Manhattan plots use -log10 P?
- -log10 P is plotted on the y axis and genome location on the x axis.
- It is used to display significant association in an easier way.
- You transform the P value using -log10 so that a larger value indicates a more significant association.
- For example if a significant p value following GWAS was p<5x10-8 the -log10 (P) would be 7.3.
- This makes it easier to interpret the P value
What is a quantile-quantile (QQ) plot?
- It compares the distribution of observed p values against expected p values distribution under the null hypothesis.
- If the 2 distributions are similar the points should fall on on diagonal line.
- If a SNP lies above the significance line then P<0.05.