Association Analysis Flashcards

1
Q

What is genetic association?

A

Genetic Association is the presence of a variant allele at a higher frequency in unrelated subjects with a particular disease (cases), compared to those that do not have the disease (controls)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What broader term is used for ‘disease’?

A

For disease we could use the broader term “trait”, for example height is not a disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an allele?

A

One form of a variant in the genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a locus?

A

A position in the genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a genotype?

A

Both alleles at a locus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the haplotype?

A

This is the order of alleles along a chromosome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are cases in case control studies?

A

Cases are subjects with the disease of interest, e.g. obesity, schizophrenia, hypertension

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In order to have a successful case control study, what requirement must be fulfilled?

A

Definition of the disease must be applied in a rigorous and consistent way

Controls must be as well-matched as possible for non-disease traits
Such as age, sex, ethnicity, location, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What forms a more reliable case control study?

A

Measure as many / all relevant factors as possible when taking people into a study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is a case control study carried out?

A

Case control study carried out by:
Having cases and controls
Compare them
Identify gene variants in cases and controls

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does the variant frequency differ between cases and controls?

A

variant occurs at a higher frequency in cases than control ∴

Gene variant is associated with disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the purpose of a case control study?

A

Allows the identification of a genomic region associated with disease either by a single or group of markers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the trademarks of a good case control study?

A
  • Large numbers of well-defined cases (1000s)
  • Equal numbers of matched controls
  • Reliable genotyping technology (SNP array)
  • Standard statistical analysis (PLINK)
  • Positive associations should be replicated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How diverse is a population?

A

Individuals in a population are genetically far more diverse than individuals in a single family

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can we identify how diverse a population is?

A

To capture this genetic diversity we need to use 100,000s or millions of genetic markers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Outline features of an ideal genetic marker

A
  • Polymorphic
  • Randomly distributed across the genome
  • Fixed location in genome
  • Frequent in genome
  • Frequent in population
  • Stable with time
  • Easy to assay (genotype)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is an SNP?

A

single nucleotide polymorphism

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How commonly do SNPs appear in the genome?

A

Common in the genome ~1/300 nucleotides

~12 million common SNPs identified in human genome

19
Q

How do SNPs arise?

A

Generated by mismatch repair during mitosis

20
Q

Where are SNPs found withing the genome?

A

Gene (coding region)

  • No amino acid change (synonymous)
  • Amino acid change (non-synonymous)
  • New stop codon (nonsense)

Gene (non-coding region)

  • Promoter – mRNA and protein level changed
  • Terminator - mRNA and protein level changed
  • Splice site – Altered mRNA, altered protein

Intergenic region

21
Q

How are SNPs presented within databases?

A

SNP found with flanking sequences either side

22
Q

How do we calculate the frequency of SNPs?

A

The frequency of SNPs characterised by the minor allele frequency

23
Q

What are the two forms of SNP frequency

A

Frequency in general population e.g.
C 0.567
G 0.433
→ the less common allele is called the minor allele

Minor allele is often what we refer SNPs as

24
Q

What must the allele frequencies add up to?

A

Major Allele Frequency + Minor Allele Frequency = 1

25
Q

How are SNPs identified using GWAS?

A

Use markers across the whole genome

SNP Microarrays

26
Q

How are GWAS results recorded?

A

Look for association between disease and each marker – chi-squared test
This has resulted in the detection of large numbers of disease-associated genes

27
Q

Describe how to analyse GWAS results

A

P value used to describe significance of SNP
larger P value = more significant

P value of 1 = 0 signifcance

28
Q

How are GWAS results plotted?

A

GWAS data is presented as a single graph called a Manhattan plot

X-axis : position of the SNP on the chromosome
Y-axis : –log10(p-value) of the association
if p=10-9 then –log10(p-value)=9

29
Q

What is a manhattan plot?

A

The Manhattan plot is a simple way to visualise markers across the genome associated with the disease

30
Q

Describe the y axis of a manhattan plot

A

The y-axis of the plot is the –log(base10) of the p-value

If a marker is associated with disease with a p-value of 1x10-9 then the value on the y-axis for this would be 9

31
Q

Describe the x axis of a manhattan plot

A

The x-axis is the location on the chromosome

32
Q

What do the peaks in manhattan plots show?

A

The peak does not identify the gene causing the disease.

The peak identifies the genomic region associated with disease

33
Q

After carrying out GWAS, why is further investigation still required to identify the disease gene?

A

Many associated SNPs are found in the adjacent genes

Still need to identify which gene is actually causing the variant

34
Q

What is the purpose of meta-analysis?

A

Meta-analysis allows the statistical combination of results from multiple studies

35
Q

Why is meta analysis not carried out alone?

A

Difficult to do very large studies (>10K cases)
Easier to combine smaller studies
- Pre-experiment – Consortium
- Post-experiment – Meta-analysis

36
Q

Why is a lot of GWAS orientated towards obesity?

A

Obesity has significant primary effects but majority of problems caused by secondary complications of obesity

37
Q

Describe the relationship between obesity and cancer

A

Obesity is a strong predisposing factor to cancer - predicted in the next 3-5yrs obesity will overtake smoking as the primary cause of cancer in UK

38
Q

What is the significance of carrying out GWAS for obesity?

A

Identifying and understanding the underlying cause of obesity is vital for prevention and treatment

39
Q

Outline the evidence suggesting obesity is strongly genetic

A

Twin studies - 70-80% of body shape is genetically determined
Adoption studies - 30-40%
Family studies - 40-60%

40
Q

What are the genes associated with obesity?

A

rs8050136 is in the FTO gene

rs12970134 is near to the MC4R gene

41
Q

What is the advantage of using GWAS?

A

GWAS has identified associations that are statistically strong and reproducible

42
Q

What is the downfall of relying solely on GWAS?

A

The identified associations found via GWAS to the genetic component of disease is estimated to be low (<5%)

43
Q

What factors contribute to the low genetic contribution of identified associations found by GWAS?

A
  • Many common SNPs of small effect
  • Rare SNPs
  • Copy Number Variation
  • Epigenetic variation
  • Heritability is overestimated