complex diseases Flashcards

(74 cards)

1
Q

monogenic diseases

A

those where there is a direct relationship between the disease gene and the disease status
Genotype and phenotype closely correlate (high penetrance) Variants CAUSE the disease (1 disease, 1 gene)
The traits presented so far are qualitative
= white eyed or red eyed flies
= cystic fibrosis or no cystic fibrosis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Quantitative traits

A

Traits with variation showing a
continuous range of phenotypes
e.g. human height, weight, colour, metabolic rate, behaviour

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

polygenic

A

Varying phenotypes result from input of many genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Multifactorial or complex traits

A

result of a combination of several genes and environmental factors

Complex (polygenic) diseases often show genetic predisposition, but individual genes only marginally affect disease status

Genotype and phenotype poorly correlate (low penetrance)

Variants PREDISPOSE to the disease (1 disease, many genes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

example of multifactorial inheritance

A
skin colour
additive effect
complex trait
- many genes
- environment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

single gene vs multifactorial

A

single gene

  • risk remains the same regardless if no. affected
  • if parent is carrier there is 1/2 risk
  • 1 child had disease the risk of another child is still 1/2

multifactorial

  • recurrent risk increases because the couple are high risk
  • if 1 child is affected, the recurrent risk is 1 in 25
  • if 2 children are affected, the recurrent risk is now 1 in 12
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Multifactorial disorders display familial clustering with no recognised pattern of Mendelian inheritance

A
  1. Most common cause of congenital malformations 2. Cause of many common acquired diseases
  2. More prevalent than single gene disorders
  3. Harder to find the genetic factors / causes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

not all polygenic traits show continuous variation

A

in large sample the data will reflect normal distribution
instead of using interval (so groups like age on x axis) we use number of predisposing alleles in genotype

there will be a certain point (threshold) where there is a higher frequency of disease. thus moving away from normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

3 types of polygenic traits

A

continuous traits

meristic traits
- phenotype can be recorded by counting integers

threshold traits

  • polygenic and often multifactorial
  • small number of discrete phenotypic classes
  • increasing number of diseases show this pattern
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

most common multifactorial diseases with a threshold

A
cleft lip
neural tube defect
congenital heart defect
asthma
diabetes
autism
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

multi-gene hypothesis

A
  1. A quantitative trait has continuous variation that can be quantified (measured)
  2. Two or more loci scattered in the genome account for the hereditary influence on the trait in an additive way
  3. Each gene locus is occupied by either an additive allele or a non- additive allele
  4. The contribution of each additive allele is approximately equal
  5. Together, the additive alleles contributing to a single quantitative character produce substantial phenotypic variation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

calculating number of polygenes

A

Number of polygenes (n) contributing to quantitative trait is estimated based on ratio of F2 individuals resembling either of two extreme P phenotypes

  • 1/4n = ratio of F2 individuals expressing either extreme phenotype
  • For low number of polygenes: (2n + 1) = number of distinct phenotypic categories observed

i.e. 1 gene = 3 classes (1/4, 1/2, 1/4)
2 genes = 5 classes (1/16, 1/8, 1/4, 1/8, 1/16)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Heritability (H2)

A

the proportion of the total phenotypic
variance (VP) within a certain population that is due to genetic variance (VG) H2 = VG/VP

Different in different environments

A mean heritability estimate of 0.65 for human height does not mean that your height is 65% due to your genes, but rather that in the population sampled, on average, 65% of the overall variation in height could be explained by genotypic differences among individuals in that population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Familial

A

a trait shared by a family; they may not share the same genotype e.g. an adopted child speaks the same language as the rest of the family. This
is not heritable, because it is not genetic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Heritable

A

a trait shared by people with the same genotype

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If an environmental change affects all individuals in a population equally

A

the mean changes but the variance (heritability) stays the same

if the variance changes, the heritability changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Gene-environment (G x E) interactions

A

interaction between genes and environment can play an important role in quantitative traits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

broad-sense heritability H2

A

Measures the proportion of the variance in a population within a single
generation that is due to genetic factors
Gives an estimate of 0 to 1

Low heritability = variation is due mainly to environmental effects
High heritability = variation is due mainly to genotypic effects
Ignores genotype-by-environment interactions

Includes genetic values due to dominance and epistasis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

additive gene action vs dominat gene action

A

for additive the homozygotes would be the two extremes and the heterozygote the intermediate

for dominant the homozygote are the two extremes and the heterozygote is the same as the dominant homozygote

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Narrow-sense heritability h2

A

only takes into account the fully additive genetic variants = all plant or animals wth desired trait are homozygote dominant

in dominant genetic variants the heterozygote is also desired so it would take longer for selective breeding

H2 = Va/ Vp 
Va = additive variants
Vp = total phenotypic variants
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How to quantify and interpret heritability

A

A common way to assess if a trait is heritable is to look for a correlation between the parents and the offspring.
Narrow-sense heritability (h2) = a measure of how heritable a trait is, using family data
This measurement is used in animal and plant breeding to determine if a population can be changed by selective breeding.
Estimate narrow heritability by comparing the offspring value against the averaged value for the two parents (midparent value).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How do we determine if a family

has a higher risk of disease?

A
  • Family members share a greater number of identical genetic variants than unrelated individuals
  • The degree of family clustering of a disease can be expressed by the relative risk ratio (λR)
  • Risk considers relative(s) (R) of an affected proband compared with the risk in the general population

relative risk ratio = disease prevalence in relatives R of probands / disease prevalence in population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Relative risk ratio interpretation

A

Higher λR values indicate greater proportion of risk in family compared to the population

Usually it increases with
• Increasing genetic contribution
• Decreasing population prevalence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Familial clustering: the role of environment

A

Familial clustering confounded by shared environment

If familial aggregation is detected, it does not always and only mean genetics is the explanation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Twin studies
DZ (fraternal non identical, same as siblings) MZ= identical twins if a trait is genetic, it should always be the sam in MZ twins
26
twin studies - concordance and discordance
Concordant twins* Both affected (+ / +) or unaffected ( - / - ) Discordant twins 1 affected, 1 unaffected (+ / -) ``` concordance ratio (r) = concordance in MZ/ concordance in DZ r> 1 genetics play a role ```
27
High concordance does not prove that a trait has a genetic component
Limitations of twin studies: DZ twins can be of different sex, MZ twins may share more environmental factors, there are also epigenetics factors along life, X-chromosome inactivation, post-zygotic somatic mutations, etc
28
Adoption studies
Two approaches: • Find adopted people who suffer from a particular disease known to run in families and ask whether it runs in their biological or adoptive family • Find affected parents whose children have been adopted away from the family and ask whether being adopted saved the children from the family disease Main obstacles: lack of information about the biological family, when adoption happened, intrauterine factors, and selective placement
29
linkage
property of loci to identify biological mechanism for transmission of a trait requires family pedigree use polymorphic markers
30
association
Association is a property of alleles To identify an association between an allele and a phenotype Fine mapping (<1cM) Case-control or family approach Usually bi-allelic SNPs
31
linkage analysis in complex disease
affected sibling pair When affected siblings share a chromosome region more or less often than expected by chance, then that region is likely involved in causing the disease
32
limitations of linkage
for risk ratio of 4 (high) you would need a lot of pairs of families to do a linkage analysis anything less than 4 and the number of families increased drastically
33
successful linkage study - alzheimers
1991: Linkage analysis identified the proximal long arm of chromosome 19 • Apoliprotein E (APOE) • ε2 decreases risk • ε4 increases risk • 15-25% of the population carry 1 copy, 2-4% carry 2 copies • ε4 drives earlier and more abundant amyloid pathology in the brains of carriers
34
Most SNPs in a population are
rare
35
Most SNPs in an individua, are
common
36
Why most SNPs have neutral effect on phenotype?
1. Functionally important DNA sequences are the minority of our genome. 2. Genetic redundancy: nucleotide substitutions that don’t change amino acid, or gene duplication. 3. Functionally unimportant amino acid or nucleotide positions within proteins or within functionally important noncoding sequences.
37
Linkage disequilibrium
Chromosomal segments can exist as a block that is only rarely broken up by recombination. - because theyre so close together they do not recombinate • Linkage disequilibrium (LD): the nonrandom association of alleles of different loci. some combinations of alleles are favoured
38
calculating LD
frequency of haplotype (AB,Ab,aB,ab) - the frequency of the individual alleles if no LD = frequency of haplotype = frequency of individual alleles multiplied together if d' = 1 complete linkage (no recombination) d'>0.33 threshold to determine LD
39
Haplotype
sets of nearby SNPs on the same chromosome that are inherited as a block. Haplotype blocks represent ancestral chromosome segments that have been transmitted intact through many generations - darker the blocks, the stronger the LD the older the generation the SNPs were generated and transmitted together, the more consistent the haploid blocks are going to be
40
Haplotypes are population-specific
similar ancestry, early on difference in mutations, then different haplotypes - the frequency of haplotypes depend on the population
41
recombination hotspots
concentrated in 1-2kb hotspots we have ~30,000 hotspots every 50-100kb with low LD between blocks we have recombination hotspots hotspots due to epigenetic histone methylation marker
42
tag-SNPs
reduce the number of SNPs required to examine the entire genome for association with a phenotype if SNPs are in LD they represent all the snps in that block by taking a few tag SNPs we can identify the genotype of other snps around them
43
determining if genotypes are phased cis and trans
Phasing: the process of inferring haplotypes from genotype data, assigning alleles to maternal or paternal chromosomes if on same chromosome = cis (phased) on different = trans (unphased)
44
Tag-SNPs: imputing
Using knowledge of linkage disequilibrium to fill in genotypes at loci that were not part of the original experiment.
45
Tag-SNP imputation in practice
lets say you got 6 SNPS - lets assume 1 and 2 are linked (i.e. d' = 1) - 3 and 5 linked - 6 and 4 linked we can just use 1, 3, 6 for single SNP tests - of lets say A from 1 and G from 3 always go together we can infer 6
46
Association analyses in complex diseases
Looks for co-occurrence (association) of alleles and phenotypes we use candidate gene studies (individual genes, require biological insight) and GWAS
47
Candidate gene and association analysis in complex diseases
Looks for co-occurrence (association) of alleles and phenotypes, comparing cases and controls e.g. we have two alleles T and C in cases 62% have allele C and 38% have allele T in control 49% have C and 51% have T using odds ratio (axd/bxc) calculate association
48
Case-study: Identification of NARC1/PCSK9 | candidate gene study
rare mutation in this gene strong correlation to high cardiovascular disease. used linkage analysis followed by animal studies when mutation, it binds to LDL receptor leading to lysosomal degradation of the receptor the receptor cant bind to LDL --> high LDL- leads to clogging of arteries trials to lower LDL cholesterol by targeting mutation with siRNA leads to mutation mRNA degredation
49
Not all candidate gene studies were successful: Limitations
- Inadequate matching of controls (not accounting for other factor) • Insufficient correction for multiple testing (bonferroni) • Underpowered studies leading to lack of replication
50
Reasons for an association
* Direct causation * Epistatic effect * Population stratification • Linkage disequilibrium
51
benefits in identification of susceptibility variants
new biological insights -> clinical advances - therapeutic targets - biomarkers - prevention
52
candidate genes vs whole gene
few SNPs + hypothesis | millions of SNPs and no hypothesis
53
GWAS
``` A hypothesis-free method • Uses large sample sizes, or cases versus controls • Identify regions of the human genome that are associated with a phenotype • Based on allele frequencies at hundreds of thousands of tag-SNPs • Association is usually confirmed through replication in independent datasets and/or GWAS meta-analyses • Requires fine mapping through linkage disequilibrium to identify specific variants ```
54
Methods to generate genetic information for GWAS
``` SNP arrays vs WGS - looks into tagSNP vs looking into the sequence of the whole genome - inexpensive vs expensive - reliable vs less accurate - ```
55
GWAS major steps
- data collection - genotype (via SNP arrays and NGS) - quality control (look into different populations) - imputation (tag SNPs) - association testing (manhattan plot) - meta-analysis or replication
56
GWAS major steps dependent on
It is dependent on a number of important factors, such as: • (un)relatedness of individuals (if they share DNA there will be an unwanted association) ``` • genetic architecture (quality control) • population stratification (quality control) • genetic model ```
57
P-value threshold for GWAS
f we assume P<0.05 is significant: In 100 comparisons, 5 associations will be a false positive • Need to use a multiple comparison adjustment (e.g. Bonferroni) • GWAS, we do 1 million tests (or more!) 1,000,000 x 0.05 = 50,000 false positives Estimated that P (for most GWAS) should be < 5 x 10-8 for common variants with MAF >5% and LD r2=0.8
58
bonferroni
0. 05 dived by number of comparisons made. | i. e. 1 million tests = 0.05/1,000,000
59
Visualising GWAS results: Manhattan plot
threshold red line (normally 5 x 10^-8) y-axis - adjusted p value threshold x-axis - chromosome number each dot represent a SNP based on its p value for association the higher the p-value on the plot, potentially the highest the significance for every dot, there is a SNO on a chromosome associated with the disease of interest
60
in the past what chromosomes were not seen on GWAS
sex chromosomes | its now starting to improve
61
case- inflammatory bowel disease
the monogenic alleles are few but large impact | the more complex the smaller but greater number of alleles
62
where do we get the sample size
uk biobank - 500,000 | there are many banks in Europe and America and Asia. few in Africa and other countries. demographic problem
63
case - height
top 697 variants explain 20% of heritability | top 10k variants explain 30% of Vp
64
case- blood pressure
heritability estimated to 30-70%
65
Where is the missing heritability?
1. due to rare variants with BIG effect 2. Due to gene-gene and gene-environment interactions 3. Due to epigenetic effects 4. no missing heritability; family studies overestimate heritability 5. GWAS underestimates heritability due to non reliable tag-SNP detecting variants 6. Much heritability due to common variants with very small effects
66
Whats next for complex disease
- Whole-genome sequencing of large cohorts for rare. Uncommon variants Interpreting and role of risk of SNPS
67
Genome-wide polygenic risk score
can identify individuals at risk of common complex diseases
68
Polygenic risk score (PRS)
- Single value estimate of an individuals genetic liability to a phenotype - Sum of the genome-wide genotypes, weighted by genotype effect size (odds ratio) derived from GWAS summary statistic data
69
penetrance in complex diseases
GWAS - many variants with small effects - low penetrance Mendelian - high penetrance - few variants large effect the missing alleles could be the intermediate penetrance
70
GWAS identified SNPs associated with X, now what?
we identify SNPs with GWAS associated with disease estimate SNP based heritability and build candidate predictors build polygenic risk scores composite score for personalised risk prediction
71
example of PRS
``` they identified 4 alleles on 4 loci with different effect A- +1.5 C - -0.5 T - +2.0 A - -1.5 ``` individual 1 has AT CG TT CC 1.5 (1x A) - 0.5(1x C) + 4.0(2x T) - 0.0 (0 x A0 = 5.0
72
When are PRS beneficial?
For risk calculation in European populations = LIMITATION • Conditions with proven preventative measures • The risk of disease outweighs the psychological impact of knowing you are at high genetic risk of disease
73
GWAS downstream analyses: Interpretation
causal variant genotyped = direct association | causal variant in LD with other genotyped variants = indirect association
74
Moving from association to causation
Variants are merely associated with a trait We can use further genomic analysis tools to determine: • Coding vs regulatory variants • Fine mapping • Gene expression Future in vitro, animal studies, and clinical trials