Chapter 3 - Exome Sequencing Flashcards

Reader Ch.3

1
Q

Limiting factors of traditional gene-discovery strategies (linkage mapping and cadidate gene resequencing)

A

-Availability of small number of cases
-Reduced penetrance
-Locus heterogeneity
-Substantially diminished reproductive fitness
-Responsible mutation may be de novo

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Mendelian disorders

A

Inherited disorders like cystic fibrosis (kinkhoest), sickle cell anaemia

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Coding variation analysis > massively parallel DNA sequencing >

A

Exome sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Limitation of exome sequencing

A

it does not assess the impact of the non-coding alleles, but discovery of rare alleles underlying Mendelian phenotypes and complex traits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is exome sequencing effective for detecting rare alleles in Mendelian disorders?

A

Positional cloning studies are succesful for monogenic disorders
> most alleles underlying Mendelian disorders are protein coding
> large fraction of the rare protein altering variants are predicted to have functional consequences
> splice acceptor and donor sites are enriched for highly functional variation (targeted in exome sequencing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is the exome defined?

A

By the entire RefSeq and a large number of hypothetical proteins (this has limitations)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Limitations exome defining

A

-incomplete overview of protein-coding exons
-variety in efficiency of capture probes
-not all templates are sequences efficiently
-not all sequences can be uniquely aligned to the reference genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Wet-lab workflow for exome sequencing

A
  1. Genomic DNA is sheared and used for in vitro shotgun library
  2. library fragments are flanked by adapters
  3. enrichment for sequences corresponding to exons > aqueous-phase hybridized capture
  4. recovery of hybridized fragments by biotin-streptavidin pulldown and washing
  5. amplification and massively parallel sequencing
  6. Mapping > calling of candidate causal variants
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Bioinformatics steps in exome sequencing

A
  1. Probe design
  2. Quality control
  3. Map reads
  4. Determine variants
  5. Annotate variants
  6. Filter known variants
  7. exome comparison
  8. validation of candidate genes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Probe design

A

Designing probes for capturing exon fragments > unique and efficient probes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Quality control

A

High base quality and equal nucleotide frequencies across the sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Mapping the reads (bwa)

A

mapping against reference genome by algorithm
> unmapped reads are discarded, non-unique as well. Low confidence reads may cause problems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Determine variants (varscan)

A

Difference detection compared to reference genome: potential variant or sequencing error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Criteria varscan

A
  1. At position of the variant at least N reads (default 8)
  2. From the N reads at least K reads with variant (default 2)
  3. Average base quality at position of the variant at least Q (default 15)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Annotate variants

A

Each variant is assigned various properties; gene name, region, nucleotide position, type of mutation, number of reads, quality etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Filter the known variants

A

Remove synonymous variants and variants which are present in public SNP databases or an in-house reference database because they are unlikely to cause the disorder

17
Q

Exome comparison

A

Between different patients to find one or more affected genes in each of the patients (same variant is not required)

18
Q

Validation of cadidate genes

A

Wet-lab validation with Sanger sequencingfor example or comparison with sets of exomes and genomes

19
Q

Depending factors for stategy of indentifying causal alleles > impact sample size for adequate power in bioinformatics

A

-mode of inheritance (exome sequencing is more efficient for recessive disorders > less genes with two novel protein altering alleles)
-pedigree or population structure
-phenotype arising de novo or inherited > screening family
-extent of locus heterogeneity for a trait

20
Q

Filering data steps

A
  1. discrete filtering: by comparing variants among individuals and against public databases/controls
  2. Stratification of variants
21
Q

Novelty of allele assessment

A

-Set of public database polymorphisms like dbSNP and 1000 genomes project
> from unaffected individuals

22
Q

Filtering

A

Eliminating candidate genes by assuming any allele found in the filter set cannot be causative

23
Q

Assumption for filtering from dbSNP, and problems with the assumption

A

Controls do not have any alleles in the set from the individuals with the diseased phenotype
-Problems
>dbSNP is contaminated with a small number of pathogenic alleles
>some pathogenic alleles have a higher minor frequency: pathogenic gene variant also occurs in control exomes > risk of eliminating truly pathogenic alleles

24
Q

Stratification candidates (name the groups)

A

-by mutation type > predicted impact/deleteriousness
-by segmental duplications > variants found in segmental duplications are discarded
-by pseudognes: dysfuncitonal relatives of genes that have lost their protein-coding ability
-by function: predicted role of the protein product
-by functional impact: for non-synonymous alleles > impact on phenotype prediction

25
Q

(technical) Failure reasons of exome sequencing

A

-part of all the causative genes is not in the target definition
-inadequate coverage of the region which contains the causal variant
-the causal variant is covered but not accurately called
-false variants in a gene are called because of mismapped reads or alignment errors

26
Q

Failure with discrete filtering

A

reducing power due to genetic heterogeneity or false-positive calls (processed pseudogenes or segmental duplications)