Week 4.7.8: Genetic trait associations Flashcards

1
Q

Genetic trait associations

A

Genetic association studies, genome-wide association studies, missing heritability, genetic disease associations, single gene disorders, polygenic disorders

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In previous lectures we have been looking at;

What the human genome looks like
How we sequence genomes
How history shapes genomes
How human genomes differ from the genomes of other species

BUT HOW DO GENOMES INFLUENCE OUR PHENOTYPES? WHAT DO GENOMES DO?

How? What? Why?

A

A major goal of genomics is to identify which parts of the genome are responsible for which traits. We know that the genome is having a major influence on traits like height – but how is it doing this? How do we get from our DNA too our heart or lungs etc.

Two lines of evidence in genetic trait associations

<!--[if !supportLists]-->

  1. <!--[endif]-->Genetics

<!--[if !supportLists]-->

  1. <!--[endif]-->Traits
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Two lines of evidence in genetic trait associations

Genetics
Traits

So we have to look at both of those things the genes and the traits –

A

We know that that baby didn’t come from that couple – because we know looking at the traits, the inheritance of particular traits that the adults have means that it is not likely that baby is from those parents.

We know a lot of the traits we see in this picture are heritable –

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Heritability
Of a trait within a population is the proportion of observable differences in a trait between individuals within a population that is due to genetic differences.

A

Heritability is about the variability of a trait – how much is due to genes and how much is it due to environment, we know that all traits are a mixture of our DNA and environment, thus we know its not just our genes that are responsible for how large your stomach is but if you eat lots of doughnuts you are more likely to have a big stomach – the heritability might be one reason why someone has a big gut but the environment has affect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do we untangle the difference between genetic and environment effect on traits

One way of doing this is using family studies, and twin studies we know that many human traits have a high heritability. If twins vary in their traits we know that that variability is not due to their genes but due to their environment and so by doing twin studies we can begin to untangle environment and genetic influences.

A

If we cannot use twins then we can use families instead

We could study plants; we can clone them, growing them in different environments thus controlling genes in that they are clones

However, we cannot do human cloning, even if we did a clone and then had to manipulate their environment it would be very unethical

Facebook experiment – tweeking peoples facebook feed to see if it effected there mood – with loads of backlash

We just can’t do these experiments

But we can work with twin – family studies to try to untangle genetics/environment

We have known about heritability since long before sequencing genome s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sir Francis Galton’s (1889) data showing the relationship between offspring height (928 individuals) as a function of mean parent height (205 sets of parents)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Genetics without DNA

A

From 1850 to 1950 we did genetics without knowing DNA was the hereditary material

We knew about genes since Mendel – even before we knew about DNA

Genetic maps since 1913, Alfred Sturtevant made the first genetic map (a Drosophila chromosome)

Looking at heritability is something we have been able to do for a long time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Two lines of evidence

1.Patterns of heredity – Tracing the inheritance of traits through generations

2.Patterns of DNA variation – DNA sequencing and comparison in multiple individuals

What two type of traits are there?

A

Two types of traits… Monogenic or polygenic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

**What is a monogenic trait? **

A

Monogenic
A monogeneic trait will often show a clear pattern of Mendelian inheritance, like the peas, either dominant or recessive that segregated in the F2 generation. They tend to be present/absent in phenotype, which are relatively easy to discover the genetic basis for when you can do controlled crosses and generate large families of progeny.

In humans they are a bit harder to work on than in pea plants but still they are fairly easy to work out

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

However,

Polygenic traits are not…

A

They are traits that involve many genes, they do not normally show clear Mendelian inheritance as they involve interactions of many genes (many loci interacting)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Polygenic traits are not…

They are traits that involve many genes, they do not normally show clear Mendelian inheritance as they involve interactions of many genes (many loci interacting)

Interact with environment in complex ways, genetic basis can be very hard to discover;

What appraoch do we use to study polygenic traits?

A

Quantitative trait association studies  QTLs

Commonly studied with Genome Wide Association Studies (GWAS)

GWAS is a way of looking at highly polygenic traits

From the textbook chapter 6 figure 6.9

Shows a monogeneic trait and its inheritance in comparison with polygenic traits – we are looking at disorder traits

As we know from Mendelian genetics we have simple inheritance patterns observed on the left – were are polygenic traits are not

We know there are many polygenic traits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Three examples of monogenic traits?

A

Monogenic

<!--[if !supportLists]-->

· <!--[endif]-->Cystic fibrosis I that is why we have known about its genetic basis for a long time

<!--[if !supportLists]-->

· <!--[endif]-->Sickle cell disease

<!--[if !supportLists]-->

· <!--[endif]-->Phenylketonuria

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Three examples of polygenic traits?

A

Polygenic

<!--[if !supportLists]-->

· <!--[endif]-->Type 2 diabetess

<!--[if !supportLists]-->

· <!--[endif]-->hypertension

<!--[if !supportLists]-->

· <!--[endif]-->rheumatoid arthritis
People are still working on locating loci

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Study sampling designs

With humans, we can’t design experiments on genetics as we can with other organisms
We have to make use of what variation and genealogical relationships we can discover existing in human populations
<!--[if !supportLineBreakNewLine]-->
<!--[endif]-->

A

Which people do I study?
How much of the genome do I study?

Which people do I study?
Two issues when we try to do a study,

Which people do I study?
How much of the genome do I study?

The more humans and the more genome studied the more expensive it will be – but obviously you might be able to learn a lot more looking at more people and their whole genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Case control studies

A

Compare a large group of people showing a trait with a large group of people not showing a trait. For example type 2 D, you get as many people who suffer and as many that don’t then look at all the alleles of those who have type 2 D with those that don’t have – so that if you can find a single allele found in those with type 2 D, you can infer that that allele is something to do with type 2 D

But you have to take account of;

<!--[if !supportLists]-->

· <!--[endif]-->genetic background (everyone from Manchester/Munich)

<!--[if !supportLists]-->

· <!--[endif]-->environmental exposure

<!--[if !supportLists]-->

· <!--[endif]-->same trait but different genetic cause

Works best for discrete traits (Cases/Controls)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Family-based studies

A

In a family based study you can know the genealogies (you know who the mother was and the father, the granddad and uncle etc.) and you can look at linkage analysis – often the environmental studies will be similar this can help control for environmental effects. Family based studies have been very successful are discovering many Mendelian traits.

<!--[if !supportLists]-->

· <!--[endif]-->More powerful methods

<!--[if !supportLists]-->

· <!--[endif]-->Genetic background and environmental exposures often similar among family members

<!--[if !supportLists]-->

· <!--[endif]-->Problem of numbers – families small

<!--[if !supportLists]-->

· <!--[endif]-->Used to discover basis of many Mendelian traits

<!--[if !supportLists]-->

· <!--[endif]-->May discover rare mutations unique to a family

Might give great results but it might only be particular to that family

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Cohort study

A

You do not just take people at a certain time you study them over a long period of time

This allows for a better understanding of environment, so good for G x E studies

Hard to manage and fund experiments like this is hard in practical

Large population studies

Often used for polygenic quantitative traits that show continuous variation (most polygenic traits do)

Need many sequence data

But this can be hard to get accurate phenotypes – and its expensive to get lots of genotypes and phenotypes

18
Q

Study design:
How much of the genome do I study?

A

Candidate gene studies
The candidate gene approach to conducting genetic association studies focuses on associations between genetic variation within pre-specified genes of interest and phenotypes or disease states. This is in contrast to genome-wide association studies (GWAS), which scan the entire genome for common genetic variation.

Focus on particular gene at particular locus, how will you concentrate on a chosen genomic region. Prior knowledge points to that region a previous family based study, a study of a gene function in mice or another organism. (Relatively cheap). Allows you to see if there is variation in that gene. Cheap because you are looking at one thing but it can be very hard to replicate these studies because you might pick up something unique to the sample that is studied

19
Q

But if you don’t look at candidate genes you do a Genome-wide study

A

Hypothesis free
Look across the whole genome, little prior knowledge needed

<!--[if !supportLists]-->

Ø <!--[endif]-->using SNP markers or WGS

Expensive – lots of data needed and can be hard to replicate

Complex statistics: big possibility of false positives and negatives, because you are doing MANY statistical tests

20
Q

Two major types of study

1.Linkage analysis (Linkage mapping)

2.Genome wide association studies (Linkage disequilibrium mapping)

A

These two approaches take eliminates from types of studies and how many people and genome do you study

Linkage analysis (Linkage mapping)

Brings together the two lines of evidence

·heredity patterns of traits

·genome sequence

You need to know pedigree of every individual

You can start off as a genome-wide search, but then you need to do sequential studies needed to gradually narrow down the genomic region for a trait’s locus. Can begin with a candidate region of the genome

The main aim of linkage analysis is that you want to identify genetic markers that segregate with a trait of interest

Two lines of evidence

·Patterns of segregation of a trait in families

·Patterns of segregation of genetic markers in same families

Segregation happens because:

·Chromosomes segregate in meiosis (between mum and dad)

· Recombination segregates loci within chromosomes

We have known about this for a very long time,

21
Q

Genetic distance

A

Genetic distance between two loci is measured by the recombination fraction

If 1% of progeny from a cross are recombinant, then they are 1 centimorgan apart (1 cM)

i.e. if a trait co-occurs with a marker in 99% of the progeny of a cross, the marker is likely to be 1cM from the trait locus

Genetic distance and physical distance are somewhat different, in terms of bases, because recombination is more frequent at parts on the chromosome – they look further apart than they actually are

22
Q

LOD Scores

Measure of linkage between loci, Log10 of the likelihood ratios between the observed linkage and the null hypothesis of no linkage at all

·LOD score above 3 may suggest significant linkage

·LOD score of less than -2 may suggest no linkage

A
23
Q

Genome wide association studies (GWAS) (Linkage disequilibrium mapping)

Linkage disequilibrium mapping
Do not need to know pedigrees
Hypothesis free
Look across the whole genome

<!--[if !supportLists]-->

· <!--[endif]--> using SNP markers or WGS

How often does each locus have a variant that co-occurs with the disease?

A

How often does each locus have a variant that co-occurs with the disease?

Little prior knowledge needed
Expensive – lots of data needed
Can be hard to replicate different cohorts of samples can give different results
Complex statistics: false positives and negatives

24
Q

Bus Analogy

3.2 billion seats, something goes wrong when seat number 116572 is occupied by a male

But all we know is which buses have gone wrong, and who was on each bus

How do we associate “male in seat 116572” with the problem?

This is very difficult what we have to do is look across all the buses and find occupants of the seats of all the different buses and find the seat that is always the same in the individual buses with the problem – mathematically that is a difficult problem – look for something very small in very large data set

A

What helps us is linkage disequilibrium

our 3.2 billion loci are not randomly assorting in our genomes we don’t have 3.2 billion chromosomes – we only have 23 chromosomes

Although recombination is happening within those chromosomes its not enough that the 3.2 billion are segregation randomly in EVERY generation – because it does not happen all that frequently along the chromosome we find that there are lots of block in the structure of the variation within the human genomes so lots of loci are linked when we look at human populations. That is linkage disequilibrium if something was in equilibrium, it would mean that everything is randomly assorted as if we have 3.2 billion chromosomes.

As we get further away on both sides they are less linked in many ways conceptually they are similar to linkage mapping (but in linkage mapping we are just looking at one family or one pedigree where we are tracing a lineage and tracing recombination events)

Linkage disequilibrium we are just looking at populations we are not looking at linkage pedigree we are just observing these patterns as a phenomena that is arising – but we are exploitation the fact that linkage disequilibrium occurs so we can associate blocks with one another and this means that when we are looking for an allele associated with a trait we can look for a block of loci that are linked –

25
Q

Different human populations have different patterns of linkage disequilibrium – and this partly depends on there history and so the longer it is back to a common ancestor the more linkage disequilibrium you will find in a population

A

Here is part of chromosome 7 around a gene that is involved in metabolic risk complication of obesity genes (MRC-OB) project cohort from Northen europe

Each line is a SNP marker in that chromosome – the chart shows how associated the SNPs are – if it is RED it means the two SNPS are highly associated – (they are in linkage disequilibrium with each other) – whereas if it is in white that means there is NO linkage disequilibrium (thus in equilibrium)

Imagine diagonal lines going up from each SNP, we can see that the big block is often found as one block – recombination doesn’t happen often within that block and if we look within that block it seems that recombination hardly every happens within that block and s what ever SNP apply is there if there are two variable SNPs they will always vary the same way –

Within the block you only need to know what allele is present in one of these particular bases to be able to know what is present in all the blocks – given your knowledge of the variation in the population – because they are all closely linked and we call those haplotypes – a little block where all the variation is inherited together is known as a haplotype block – we can see that along chromosome 7 there are a few haplotype blocks

We can infer the identify of ALL SNP alleles given knowledge given one of them, that is a process called imputation

If you know one allele and you use that to infer what alleles are present at other loci that is called imputation

26
Q
A

Collection based on people in Utah from people with North European heritage.

To some extent the linkage plots are smaller in the MRCOB cohort smaller than is often found

Tends to be broken up into 4 sub block in the Utah population

Even within European population you can see slight differences that is even more the case when we look at the rest of Africa

Bone-mass you can see in Europe two big blocks that are found in linakge disequilibrium where as in Africa they are smaller –

Higher linage disequilibrim in Europe than in Africa

27
Q

Linkage disequilibrium is absolutely crucial in genome wide association studies

They are crucial in genome wide association studies because we are trying to associate these haploid wide blocks with traits – we don’t want to associated 3.2 billion alleles

A

associated 3.2 billion alleles

We are trying to identify one SNP per block at least, then try to associate different sections of the genome with different traits – each column is a different case – half are cases (trait) – half controls (no trait) Looking at 8 blocks – we want to know to what extent these different loci are present in these cases – the number 4 is always blank squares but controls are diamonds filled

Whereas the one at the bottom is pretty much the same – one allele very slightly different but probably not the particular phenotype we are looing at

GWAS looks at thousands of loci scattered across the genome, at least one per halpotyde block, and asks is there a particular type of allele associated

28
Q

GWAS looks at thousands of loci scattered across the genome, at least one per halpotyde block, and asks is there a particular type of allele associated

This is normally shown in something called a Manhattan plot (because it looks like the NY skyline with lots of sky scrapers) here we have the whole genome 22 + X (female)

·X axis: distance along chromosome

· Y axis: negative log of p-value estimated for the association between locus and trait

Common genetic variants on 5p14.1 associate with autism spectrum disorders,

A lot of maths has gone into Manhattan plot to get the values, chromosomes 5, shows higher probability of being associated with autism – so we zoom in on chromosome 5

A

Common genetic variants on 5p14.1 associate with autism spectrum disorders,

A lot of maths has gone into Manhattan plot to get the values, chromosomes 5, shows higher probability of being associated with autism – so we zoom in on chromosome 5

29
Q

Important to remember that the Y-axis is a P value – we have to set threshold for P value but because we are doing multiple statistical tests we have to be very conservative and thus have to have very low P-value the more tests you do

GWAS will do thousands of tests normally it is 0.000001

GWAS significance

A

GWAS significance

Null hypothesis

<!--[if !supportLists]-->

· <!--[endif]-->There is no difference between cases and controls

<!--[if !supportLists]-->

· <!--[endif]-->There is no relationship between a genetic variable and a quantitative trait

Statistical significance: P-value

Y axis is showing significance, not strength of effect

Threshold must be set high due to multiple hypothesis testing

Loci that just cross the significance threshold may have a stronger effect than loci that cross it comfortably, its not the highest ones it’s the ones with strongest effect

Strength of effect in GWAS is normally given with an odds ratio

30
Q

**Effect size: odds ratio **

<!--[if !mso]-->

A

Odds Ratio is calculated once we know a trait is significantly associated with a locus

Odds

The odds is the ratio of the probability that the event of interest occurs to the probability that it does not.

The odds that a single throw of a die will produce a six are 1 to 5, or 1/5 = 0.2

The probability of a 6 is 1/6 = 0.166666667

The probability of a not 6 is 5/6

(1/6)/(5/6)=0.2

Odds ratio (OR)
A ratio of two ratios

OR= Odds of having the trait given you have the trait associated allele
/
of having the trait given you have the trait associated allele

31
Q

Odds ratio example
Disease associated locus: biallelic A/T

A

The odds of getting a disease given you have a T allele are 1 in 3

The odds of getting the disease if you have an A allele is 1 in 9

Odds ratio is (1/3)/(1/9) = 3 (three times as likely to get the disease, relatively speaking)

Odds Ratio (OR)

The odds ratio is an indicator of the strength of the relationship between a genetic variant and a trait

OR = 1

It doesn’t make a difference which allele you have

OR > 1

You are more likely to have the trait if you have the allele

OR < 1

You are less likely to have the trait, but statistic is not directly interpretable

32
Q

Missing heritability”

Loci identified by GWAS generally only explain a small proportion of the known heritability of a trait.

We have known that height is heritable but we don’t find enough loci that are responsible for the heritability that we see because each locus that we find by itself doesn’t explain big enough proportions that explains the total height differences that we observe in human population.

A

Diease A is highly heritable, B equally, C not so much D hardly at all

Black box is the environmental content, that explains the environmental and the SNPs explain the genetic influence but the ? is unknown it is “missing heritability”

It is found within almost every genus – many reasons why that can be found

33
Q

Nature 2010 paper, they looked at 180,000 individuals and found 180 loci influencing adult height but those 180 only explained 10% of the phenotypic variation of height but when you take out the environment you can only explain 20% so 80% of the heritability is missing

Various things can be responsible for this could be due to epistasis (the interaction of genes that are not alleles, in particular the suppression of the effect of one such gene by another.)

Could be due to smaller P values

A
34
Q

Where is the missing heritability?

A

Epistatic interactions among loci, if you change just one locus it can have effects on everything else

Small effect variants that is hard to detect

Rare variants

Gene by environment (GxE) interactions

Heritability was over-estimated in the first place

35
Q

Common disease-common variant hypothesis

A

GWAS works best if common diseases are due to common variants

It can’t pick up cases where multiple recent mutations give rise to the same disease phenotype (mutation-selection hypothesis). This is likely to happen because any mutation that gives rise to disease is likely to be selected against and so natural selection should weed it out of populations

It should be caught by rare variants, but should be weeded out by natural selection – most disease are rare deleterious mutation it will be hard to find if each one is giving rise to the same phenotype

36
Q

Ethnicity
GWAS results vary among ethnicities

A

SNP rs7612463 is associated with Type 2 diabetes in East Asian populations, it does not have this association in Caucasian populations. Could be because there is a different linkage plot – it could be that there are other genes also involved in type 2 diabetes that also differ and those mean that the locus near SNP 761243 don’t have the same effect

37
Q

Calculating personal risk

A

You discover you carry an allele with a significant association with a disease and a high odds ratio

What is your risk of getting that disease?

No generally accepted way of calculating this – pretty ad hock

38
Q

Interpretation of trait associated data

Example from p.122-123 – in EPG (correction posted on QMplus)

Locus rs2230199

Biallelic SNP: C/G
Frequency in European populations:
19% C / 81% G (doesn’t mean that 19 carry C and 81 carry G – some can be heterozygous)
C allele associated with age-related macular degeneration (ARMD) (we know that from the GWAS)
p < 5x10-29
Odds ratio for C allele is 1.53
Average population incidence of ARMD is 8% (8% of people in Europe have it) even if we didn’t know the odds ratio we would know that not everyone who carries the C allele have ARMD – because 19% carry but only 8% have it

We want to know if you do carry the C allele how likely are you to develop the ARMD

(Gets a bit dodge now…)

Assume odds ratio for alleles are multiplicative (i.e. not dominant/recessive)

<!--[if !supportLists]-->

· <!--[endif]-->Odds ratio for C is 1.53

<!--[if !supportLists]-->

· <!--[endif]-->Odds ratio for G is 1.0 (we assume) (G makes no difference doesn’t make you more or less likely to get it)

<!--[if !supportLists]-->

Ø <!--[endif]-->CC à 1.53 x 1.53 = 2.43

<!--[if !supportLists]-->

Ø <!--[endif]-->CG à 1.53 x 1 = 1.53

<!--[if !supportLists]-->

Ø <!--[endif]-->GG à 1 x 1 = 1

A

Assume Hardy-Weinberg equilibrium to calculate population genotype frequencies

<!--[if !supportLists]-->

Ø <!--[endif]-->CC à 19% x 19% = 3.6%

<!--[if !supportLists]-->

Ø <!--[endif]-->CG à 2 x 19% x 81% = 30.8%

<!--[if !supportLists]-->

Ø <!--[endif]-->GG à 81% x 81% = 65.6%

Relative risk of ARMD for whole population
= 2.43 (odds ratio) x 0.036 + 1.53 x 0.308 + 1 x 0.656 = 1.22

Relative risk of ARMD for a CC individual
=2.43/1.22 = 1.99

If average population incidence of ARMD is 8%
Overall risk of ARMD for a CC individual

=0.08 x 1.99 = 16%

39
Q

Likelihood ratio (LR)

A

A ratio of two probabilities
Needs accurate measures of population frequencies of genotypes in affected and unaffected samples

These are often not available even when a GWAS has been done

You can often learn more about your probability of getting a disease by looking at the prevalence of the disease in your population than you can learn by looking at your genotype at disease-associated loci

Multilocus risk estimation

For a polygenic trait, we cannot assess our risk just from one locus

We need to combine information from many markers

Multilocus risk estimation

Need to be sure each locus had been associated with exactly the same trait

Need to be sure each locus’ association was determined rigorously

Need to check loci are not linked in a haplotype block

Need single OR for each locus even if different GWAS studies have given different ORs

40
Q

Crohn’s disease

a chronic inflammation of the intestines which is usually found in the terminal portion of the small intestine, the ileum

Mapping Crohn’s disease

Segregation analyses suggested monogenic recessive mode of inheritance

Took an initial panel of 25 Caucasian families each containing at least two siblings with Crohn’s

Family members genotyped with 270 markers with known locations spread throughout the genome

A

Mapping Crohn’s disease

Linkage analysis with parametric LOD score method

LOD: Logarithm of Odds

the likelihood of obtaining the test data if two loci/markers/traits are linked, compared to the likelihood of observing the same data purely by chance