GWAS and How We Can Use It To Better Understand Bacteria Flashcards
What is Genome Wide Association Study
A GWAS tests genetic variants across the genome of many individuals to identify associations between specific genetic loci (mutations, insertion, deletions) and phenotypic traits, including diseases
What is the key principle behind GWAS
Genetic variants that contribute to a trait or disease will occur more frequently in individuals with that trait (cases) than in those without (controls)
What technologies are used in GWAS
- Whole Genome sequencing
- DNA microarrays
Both are used to identify common variants across individuals
How is data analysed in GWAS
Statistical analysis compares allele frequencies between cases and controls. Variants significantly more frequent in cases are flagged as potentially linked to the trait
What is the significance of the p-value in GWAS
The p-value reflects how likely it is that the association between a genetic variant and the trait is due to chance. Lower p-values suggest stronger associations
What is a haplotype and its role in the GWAS
A haplotype is a group of genes inherited together. A variant within a haplotype associated with a disease may appear frequently in cases, even if its not the direct cause.
How can GWAS be used for investigating bacterial virulence, antibiotic resistance, outcome predictions in infectious diseases
Can be used to investigate:
- What makes some bacterial strains more dangerous (virulent)
- Why some are resistant to antibiotics
- Whether we can predict clinical outcomes based on bacterial genetics
How are these genomes sequences first collected
A lot of bacterial samples from different infections have its whole genome sequenced - these are the genotypes (the genetic content of each bacterial strain)
How is this genetic data the defined
By its phenotype:
- The pathogenic phenotype/genetic basis of pathogenicity. This would help in identifying virulence factors - the gene responsible for the ability to cause disease.
- antibiotic resistance/genetic basis of resistance. Which strains are resistant to antibiotics. Which genes or mutations are responsible. Can we predict resistance from the genome without growing bacteria in the lab.
How does GWAS play a role bacterial infections
GWAS takes the genotype and phenotype data and runs statistical comparisons to find associations between specific genetic features:
- SNPs
- Genes
- Plasmids
and the traits of interest:
- Resistance
- Severity
- Immune evasion
Why is WGAS powerful
Discovery of new resistance or virulence genes
Real-time surveillance during outbreaks - tracking the spread of a strain of a resistant clone
Personalised treatment decisions - choosing the best antibiotic based on the bacteria’s genome
How is the genome database for these bacteria developing
Pace of sequencing is still increasing; amount of sequence deposited roughly doubles every 18 months whereas with
continued technological development the
cost per sequence is decreasing
What are prerequisites for a successful GWAS
- A testable phenotype: this could be binary (yes/no), or quantitative (MIC/how much toxin does this strain produce)
- WGS bacterial isolate: the more related these strains are the less interference of population structure
- Phenotype must be scalable: its hard to test a phenotype on thousands of strains so GWAS tends to focus on high-throughput phenotypes - things you can measure quickly and automatically in the lab
- Effect size: a measure of how strongly a genetic variant is associated with a trait (if a single mutation completely explains a trait - it has a large effect size which is ideal for GWAS)
What does it mean by less interference of population structure
There are fewer false associations in the GWAS results caused by genetic relatedness between bacterial strains
You would minimise background noise from unrelated mutations that have nothing to do with the trait you’re studying
You are more likely to find the true genetic causes of the phenotypes you’re testing
What is linkage disequilibrium
LD refers to how often two genetic variants are inherited together on the same stretch of DNA (haplotype block)
In humans, recombination shuffled genes during reproduction, breaking up these haplotypes over time
So in human GWAS, causal variants can often be distinguished from those physically close to the variant because of recombination.
WHy are GWAS results more challenging in bacteria compared to humans
Bacteria reproduce asexually, so they don’t recombine DNA as frequently as humans.
This means new mutations stay linked to large chunks of the genome, making it harder to pinpoint the actual causal mutation.
Without frequent mixing (recombination), many traits appear genetically linked just due to shared ancestry, not actual causation.
How does recombination help identify the true cause of a trait in GWAS?
Recombination breaks up long DNA blocks into smaller ones, allowing scientists to separate the causal variant from other nearby, non-causal mutations.
In humans, this happens naturally and frequently.
In bacteria, recombination is rare, so it’s harder to rule out false positives (mutations that are inherited together but not functionally linked to the trait).
How does population structure interfere with GWAS in bacteria
Closely related bacteria share large chunks of identical DNA.
If a trait is common in one lineage, you may mistakenly associate it with many shared mutations, even if only one is responsible.
This is why understanding genetic backgrounds and controlling for population structure is crucial in bacterial GWAS.
What is a homoplasious mutation
A mutation ath occurs repeatedly at the same site; e.g., bacterial strains could share the same mutation at a particular genomic location not through common ancestry but because the variant arose independently
What are the 3 mechanisms by which homoplasious mutations can be introduced into the genomes of bacterial populations
- HGT
- Recombination
- Recurrent mutations
Why is population structure important in bacterial GWAS
Because bacteria reproduce clonally, meaning all genetic variants in a lineage are inherited together
As a result it is difficult to tell if a mutation causes a trait or is just linked due to a shared ancestry
Without proper control for population structure, you risk identifying false associations
How does lack of recombination affect GWAS
Due to the lack of recombination in bacteria, all fixed mutations in a lineage are passed on together in a linkage disequilibrium.
If a phenotype is present in a lineage, many linked mutation may appear associate, even if only one is causal
What is a Linear Mixed Model and how does it help in GWAS
LLMs are statistical models that account for relatedness between bacterial strains
They help control for population structure by modelling the background genetic similarities across strains
This improves the ability to detect true associations between specific lovi and phenotypes
What can LLMs help identify in GWAS
- Locus specific effects: mutations truly linked to the phenotype
- Lineage-level differences: broader patterns seen in entire strain groups
- Helps separate trait-causing mutations from those just carried by related bacteria