SAQs Flashcards
SAQ (94 cards)
Question
Answer
a) Name three file types used in an NGS analysis pipeline (3)
3 from: FASTQBAM or SAM or CRAMVCFBED
b) For each of these file types describe their contents and use. (6) fastq, bam, vcf, bed
FASTQ- Text file containing sequence reads and associated quality informationStandard format containing all reads from sequencing. Can be analysed to generate quality metrics, and used as input for read alignment tools.BAM or SAM or CRAM- aligned/mapped reads and associated quality informationOutput of read alignment. Can be analysed to generate quality metrics.VCF - data lines containing information about a position in the genome, usually variants. May also include annotationsOutput of variant calling. Annotations may be added prior to variant filtering and analysis.BED - Genomic regions (chromosome, start and end)Used to define the regions of interest for the assay.
c) NGS analysis often involves aligning short DNA sequences (reads) to a reference genome. Give two reasons why a read might not align correctly to the reference. (2)
Two from:Read maps to multiple locations in the reference genome (e.g. pseudogene)Reference genome is incomplete so sequence is missing (e.g. centromeric regions)Errors introduced during sequencingVariants in the sequence compared to reference
d) Reads that do not map uniquely to the reference genome (i.e. map to more than one location) are given a mapping score of 0 and may be excluded from downstream analysis. Explain possible reasons for non-unique mapping and what impact this might have on the clinical use of NGS. (3)
Duplicated regions of the genome (segmental duplications, pseudogenes) can result in the same sequence being present in 2 or more locations in the genome. NGS sequence reads that map to these duplicated regions will not have unique mapping and therefore may be removed from downstream analyses. If clinically relevant genes have a pseudogene it may be difficult to get sufficient coverage of the gene for variant calling. Alternatively, called variants may be in the pseudogene and not the gene itself. An alternate method may be required to confirm results in these genes such as long range PCR.
e) Give an example of a gene and an associated genetic disorder that might be difficult to analyse by NGS because reads do not map uniquely to the reference (2)
Possible examples: SMN1 and Spinal Muscular Atrophy or PMS2 and Lynch Syndrome(both have pseudogenes)
Briefly describe paired-end sequencing and explain the advantages of paired-end over single-end sequencing for detecting variants associated with human disease. (4)
paired-end sequencing- Sequence both ends of the DNA fragment.Paired-end sequencing can be useful for detecting structural variants (deletions, insertions or inversions)- read pairs mapping to different locations in the genome give information about the position of that sequence. This is not possible with single-end sequencing. Structural variants are a common cause of genetic variation and therefore genetic disease.
Describe the underlying genetic cause of fragile x?
FRAX is an X-linked recessive triplet repeat expansion disorder caused by a CGGrepeat expansion within the 5’ UTR of the FMR1 gene on the X-chromosome. Whenthe triplet repeat expands beyond a threshold (>200 repeats), this causeshypermethylation of the FMR1 promotor and silencing of the gene
describe PCR for sizing?
The sizing PCR is a standard PCR with a F & R primer (one of which is fluorescentlylabelled). Products are separated by capillary electrophoresis and sized against amolecular ladder.
describe TP-PCR
TP-PCR uses F & R primers (again one of which is fluorescently labelled) and also a thirdprimer which is specific to the triplet repeat. The third primer is added in a limitedmanner so that it is exhausted in early PCR rounds. This is to avoid preferentialamplification of smaller alleles. The products from the TP-PCR are also separated bycapillary electrophoresis and sized. A full expansion allele gives a classic ‘ski-slope’pattern which tails off towards the larger end of the repeat.
a) List three differences between the nuclear and mitochondrial genomes
The mitochondrial genome is a fraction of the size of the nuclear genome (~16.5kb)The mitochondrial genome is a small circular moleculeMitochondrial DNA is maternally-inherited only.Mitochondrial has no introns and very few genes ~37
Describe the inheritance patterns associated with mitochondrial disease
Mitochondrial disease can be caused by pathogenic variants in the mtDNA itself (maternallyinherited only) or by pathogenic variants in nuclear genes involved in mitochondrial DNAmaintenance which can be autosomal dominant or recessive
Define the term heteroplasmy and homoplasmy and mitochondrial bottleneck
Heteroplasmy – where two or more different variants of mtDNA exist within a cellHomoplasmy – where all copies of the mtDNA are identical within a cell.Mitochondrial bottleneck – a random shift of mtDNA mutational load between generations(and even siblings) due to unequal transfer of mtDNA molecules during oogenesis
Describe 3 considerations for interpretation of pathogenicity unique to mtDNA variants
There are currently no mitochondrial DNA specific guidelines for interpreting variants.Inheritance pattern (maternal or nuclear)Population databases used (Mitomap instead of gnomAD for example)check heter/homoplasmy levels in proband vs mum – if homoplasmic variant inherited from homoplasmic unaffected mum its unlikely to be disease- causing
Clinicians have referred an adult presenting with optic neuropathy to the highly specialised mitochondrial diagnostic service. Describe the appropriate testing pathway and any relevant candidate genes and variants for targeted analysis
Optic neuropathy is a generic term and can be caused by pathogenic variants in mtDNA(such as Leber’s hereditary optic neuropathy (LHON)) or nuclear DNA. There are commonLHON variants which can be easily identified/excluded such as m.11778G>A (MT-ND4),m.3460G>A (MT-ND1) and m.14484T>C (MT-ND6).If these are negative, full gene screens can commence for each of the above threementioned genes.f full gene screens are negative, a nuclear based eye panel may be appropriate.
Name the gene responsible for encoding mitochondrial DNA polymerase
POLG (polymerase gamma)
What is copy number variation?
A loss or gain of a region of the genome (could be single exon, multi-exon, wholegene or multiple genes).
What types of genetic/genome abnormalities can oligoarray NOT detect
Uniparental disomyBalanced translocationsTriploidy
Describe the differences between a SNP and oligo array?
An oligo array uses the patient and a sex-matched control sample which compete forhybridisation to the probes on the array slide. The patient and the control DNAs arelabelled in different fluorescence and the captured image is converted to show if thepatient has a gain or loss compared to the control sample.SNP arrays use thousands of known SNP positions across the genome and each SNPis genotyped into AA homozygotes, BB homozygotes and AB and BA heterozygotes.The patient is genotyped at each SNP position which is used to calculate the ratio ofAA, BB, AB and BA SNPs at each position and determine the copy number by theratio of heterozygous and homozygous SNPs
Briefly explain the use of the 3 resources/databases that you would use to aid interpretation of the clinical significance of a copy number change.
Database of Genomic Variation (DGV) – the DGV ‘gold standard’ track providesinformation on the frequency of your copy number variant in the population. Forexample, a CNV with a frequency of 0.80% in a population of 17,000 would be toohigh to be disease causing.ClinGen – This resource provides information on dosage pathogenicity and gives ahaploinsufficiency score and a triplosensitivity score for each gene in the CNV call.For example a haploinsufficiency score of 3 would automatically make the CNVpathogenic.Decipher – Large database with national patient cohort. This can be used todetermine if your CNV has been seen before, the phenotype of the patient/s withthis CNV, the reporting laboratory and any overlapping features with similarpatients.
Briefly describe a known microdeletion syndrome region involving chromosome 16; include location, key gene(s) involved and provide two clinical features.
16p11.2 microdeletion syndrome which includes the TBX6 gene. Patients with thisdisorder have intellectual disability, developmental delay and some also have autisticbehaviours.
Many newly described microdeletion or microduplication syndromes detected bymicroarray are subject to reduced penetrance and variable expressivity. Define theseterms.
Reduced penetrance – Not all people with the genetic change will display thefeatures associated with that disorder.Variable expressivity – The phenotype of the disorder is variable amongstindividuals, even those within the same family.
a) Give 3 clinical features of Prader Willi syndrome.
Intellectual disabilityObesityHypotonia in infancyHyperphagiaOvergrowthstrabismus
Give 3 clinical features of Angelman syndrome.
SeizuresCharacteristic hand movementsInappropriate laughter