Lecture 5 Flashcards

1
Q

Simple genetic disorders: Autosomal dominant

A
  • Only one copy/allele required for the disease
  • Most affected people only have 1 disease allele
  • Equally common in both sexes
  • Offspring of affected people have 50% probability of inheriting the disease
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Simple genetic disorders: Autosomal recessive

A
  • Two alleles required for the disease
  • Equally common in both sexes
  • Offspring of two carriers have 25% of inheriting disease
  • Disease alleles are ‘masked’ in heterozygous carriers
  • Have this skipping of parents- only the way you get it is if both the parents are carriers.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Simple genetic disorders: X-linked recessive

A
  • Females require 2 disease alleles, males only 1
  • More common in males
  • Sons of carrier females have 50% chance of disease
  • Sons of affected males are unaffected
    Hardest thing to distinguish
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Mapping Mendelian Traits

A

Non-recombinants - NR(8/10): Offspring of affected people that inherit Allele 2 tend to get the disease and offspring that don’t inherit allele 2 are unaffected

Recombinants - R (2/10): Offspring with Allele 2 but not the disease, or the disease but not Allele 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Linkage mapping to find traits

A
  • Mendelian Traits are typically rare
  • Usually mapped by following the co-segregation of markers and phenotypes in affected families
  • Results often expressed as a LOD score (Logarithm of Odds)
  • Markers with the highest LOD scores are closest to the gene.
  • LOD of 3 means linkage between a marker and a gene is 103:1 i.e. 1000:1 more likely linked to the gene than non-linkage
  • Online Mendelian Inheritance in Man (OMIM) is an online database with descriptions of genes, literature, phenotypes etc related to each disease
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Whats LOD score

A
  • A probability of linkage between the marker and the gene relative to the marker not being linked to the causal gene. LOD scores go up in units of 1, 2, 3 and 4 etc.
  • 10 raised to the power of that number. A lod score of 3 but be 10^3
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Examples

A

Most of these disease causing alleles are very rare.

A lot of these things vary in their frequencies between populations because of the effects of genetic drift. Even when these things are rare, because they have a big effect on the phenotype we can find the genes responsible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Mapping complex diseases and the two approaches

A

Common diseases e.g. heart disease, cancers, dementia, susceptibility to malaria etc are typically complex and involve a mixture of genetic and environmental causes

Two popular approaches trying to find genes responsible for these traits:

  • Exome capture - just sequence the coding bits of the genome (Lecture 2)
  • Genomewide association studies (GWAS- more common)- typically used snip chips
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Concept behind GWAS

A
  • A new mutation arises that causes or contributes to a disease
  • Initially most of the linked SNPs will be in linkage disequilibrium i.e. statistically associated with it
  • But over time, recombination will break up these associations. Only the most closely linked loci will remain in LD
  • New chromosome on an ancestral chromosome
    -Chromosomes in modern day descendants who inherit will have allele 2 at the marker locus significantly more often than in the general population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

GWAS – plotting the results

A
  • A GWAS typically involves typing a million SNPs in cases and controls
  • Every SNP is tested for an association with the trait/ phenotype of interest
    • They will produce something known as a Manhattan plot- because you’re looking for sky scrapers
    • They produce a P value for each SNP - if you take a log of that and reverse the sign you can create a statistic of a -log 10 P, the higher it is the better
  • Usually the expected and observed test statistics (results from chi squared tests) are plotted against each other
  • If the observed values are higher than expected, there could be a risk of false positives due to population structure
  • Anything above the line is strongly suggestive of those SNPs being associated with your trait. The line is roughly 0.00007.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

QQ plots – detecting structure

A
  • Each point is a SNP
  • X axis is expected -log10 P values and Y axis is observed P values
  • If line is above X=Y, P values across the whole genome are more significant than expected under the null hypothesis. This suggests we could get false positives in a GWAS.
  • Most likely cause is population structure- allele frequencies differ between different populations and they can cause false positives in a GWAS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

False positives in GWAS studies

A

If there is genetic structure in a population, then false associations between a marker and a phenotype can arise by chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Studies of genetic structure

A
  • Observation that genetic structure influences GWAS results is important
  • It means that SNP chips that are good for finding disease variants in one population might not be so good in another population – motivation for HapMap projects (Lecture 2)
  • We need to understand human population genetic structure ……. And this can tell us about our history
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Estimating and displaying human structure: two main approaches

A

Clustering: Idea is to group individuals into K different clusters, where individuals within a cluster are more similar to each other than individuals outside of it. Can use an a priori number for K or K can be estimated from the data. Best known program/method is Jonathan Pritchard’s STRUCTURE. Each individual is given a membership coefficient which tells us how well it fits its cluster and whether it contains genes from >1 cluster.

Try to work out how many distinct genetic structures

Multivariate approaches: Best known approach is Principal Component Analysis (PCA) which uses allele frequencies from many markers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Human Population Structure

A
  • Used microsatellite markers (for runners for SNPS) - didn’t have as much genetic variation.
  • 93-95% of genetic variation within populations- not between populations but within populations.
  • However, there are some subtle differences still possible to identify different genetic clusters
  • A value of 2 splits East Asias, Oceanias and Americas from EuroAsia and Africa
  • Plots like these known as ‘Structure plots’ (after the program first written to identify clusters).
  • Each colour is a distinct genetic cluster
  • The number of colours will represent the value of K- trying to work out how many genetic clusters there are. Beginning to identify slightly different genetic structures.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Example of a PCA approach

A
  • > 3000 Europeans typed at 500K SNPs
  • Restricted analysis to people with all 4 grandparents from same location from one another- tried to work out how the genetic variation was partitioned.
  • First two principal components separate out discrete populations
  • PC1 roughly SE-NW
  • PC2 roughly SW-NE
  • PC1 and PC2 only explain ~2% of the variation, but this is still enough to reveal structure- if you rotate the axis- you can see that you can capture a map of european geography very well. Most of the variation is found within the populations
  • Subtle differences in allele frequencey - tell us something about europe
17
Q

Origins of the UK population

A
  • Typed 2039 people from a Wellcome Trust study on multiple sclerosis (controls)- usually motivated by a gene mapping project
  • Same samples were part of the People of the British Isles (POBI) project- carefully selected so they came from the location all of their grandparents were from
  • Selected people with all 4 grandparents born within 80km of each other
  • Mean birth date 1885 – samples represent UK population before greater movement of 20th century
  • Typed at ~500K SNP markers
  • Another 6029 samples from 10 European countries (to provide context)
18
Q

Fine-scale structure in the UK

A

17 discrete clusters, that are geographically separated

Even close locations form discrete clusters (e.g. Cornwall and Wales)

FST between clusters is very low – 0.002

There is not one single ‘Celtic’ population

English cluster (red) is very large, perhaps because of fewer geographical / geo-political boundaries

The peripheral populations had the more unique colour. The genetic differentation between these clusters is very low- almost identical in genetic structure. Genetic drift more prominent in the more remote areas due to population size

19
Q

Inferring the origin of UK clusters

A

Lots of variation - people in North Wales had a big input from FRA17 etc.

Refer to previous slide

  • Groups that contribute to all UK clusters (e.g. GER6, BEL11 and FRA14) probably represent earliest ( post ice-age) migration events in the UK.
  • Some European clusters probably correspond to known historical events
  • e.g. Nor53 and Nor 90 (Norwegian) groups contributed to Orkney clusters. Probably reflect Norse/Viking invasions
  • GER3 and DEN18 probably represent early Saxon migration events contributions around 700AD.
  • Data are consistent with the idea that Saxons contributed to current genetic variation rather than completely replaced ice-age settler genotypes
20
Q

Geographical variation in complex diseases

A

Two obvious questions

  1. Environmental or Genetic causes …. or both?
  2. Why do disease associated alleles persist?

Type 2 Diabetes (T2D) shows a broadly similar geographical pattern to obesity.

It is very common, and it varies between populations. Why?

Thrifty genotype hypothesis (Neel 1962). In our ancestors, a rapid release of insulin in response to elevated blood sugar was useful. It enabled the build up of fat stores, which could be used in times of hardship i.e. diabetes associated alleles were once advantageous

Drifty genotype hypothesis (Speakman 2008). In our ancestors’ lipid storage genes mutations were neutral, because people didn’t have a fatty diet. Population differences in allele frequencies through drift. With modern high-fat diets the effects of the mutations are more obvious. More like a null hypothesis.

Little support for the thrifty genotype hypothesis; harder to test the drifty hypothesis