NGS Flashcards

1
Q

what is pyrosequencing?

A

“sequencing by synthesis” principle in which polymerase extended the DNA one dNTP at a time. When dNTP is added to an open 3’ DNA strand pyrophosphate is released. A cocktail of enzymes is used in pyrosequencing which couples this pyrophosphate to light emission by luciferase. amount of light proportional to number of base incorporated eg. Mitochondrial point mutation analysis in MELAS, MERRF, NARP and Leber’s Herediatary Optic Neuropathy (LHON)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

advantages of pyrosequencing?

A

Quick and cheap Detects low level – quantifiable down to ~5% variant level Detect heteroplasmy Can be used to detect methylation status

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

disadvantages of pyrosequencing?

A

Short length of sequence is sequenced Data can be complex to genotype depending on type of variant analysis required SNPs can affect primer binding sites

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is NGS library preparation?

A

fragmenting starting material and ligating adapters and indices to allow sequencing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is NGS enrichment?

A

Enrichment is needed to capture regions of interest for single genes, panels and exomes but not WGS. It may be amplicon (PCR) based or hybridisation based

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe amplicon (PCR-based_ enrichment) for NGS?

A

eg. Nextera XT illumina: transposomes randomly cleave ds-DNA and ligate adaptor oligos with different sequences to 5’ end. A limited PCR cycle then adds indexes and full adapter sequences to the fragmented DNA for sequencing. eg. Qiagen - reduces bias by integrating unique molecular indices (UMI). genomic DNA is fragmented and ligated with UMI and adapter. Target enrichment performed by targeted PCR with gene specific primer and universal primer to the adapter. universal PCR amplifies the library. after sequencing, reads with same UMI are pcr duplicates and are removed to identify artefacts and CNV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

describe hybridisation-based enrichment for NGS?

A

eg. Agilent SureSelect -fragment DNA, tag with adaptors and barcodes and capture libraries with RNA or DNA-based oligos. oligos anneal to specific regions of genome. hybridise sample with biotinylated RNA library baits and select target region by magnetic streptavidin beads. amplify and sequence eg. agilent Haloplex - restriction digest, anneal ds-biotinylated oligos and capture with streptavidin coated magnetic beads. PCR with common primers generates library of enriched fragments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are the advantages of amplicon (PCR) based enrichment? for NGS

A

cheaplow quantity neededfasteruseful for smaller regionssuitable for FFPE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are the disadvantages of amplicon (PCR) based enrichment? for NGS

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are the advantages of hybridisation based PCR enrichment? for NGS

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what are the disadvantagesof hybridisation based PCR enrichment? for NGS

A

high quantity required- higher cost- longer prep time- difficult to distinguish pseudogenes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the main difference between second and 3rd generation NGS platforms?

A

2nd generation platforms utilize amplification step prior to sequencing library molecules unlike single-molecule sequencing performed by 3rd generation platforms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

describe the general 2nd generation NGS process?

A

sequencing platform uses a series of automatically coordinated, repeating chemical reactions typically carried out in a flow cell or compartment which houses the immobilized templates and necessary reagents. Most platforms (with the exception of SOLiD) use ‘sequencing by synthesis’ - a repeated cyclical process which occurs within the flow cell and consists of nucleotide addition, washing and signal detection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are advantages of WGS compared to WES?

A
  • SNVs, indels , SV and CNVs in coding and non-coding regions ~3.5 million variants (WES omits promoters and enhancers & limited to coding and splice variants ~20 000 variants)- more uniform coverage- easier to capture low sequence complexity- pcr amplification not required reducing GC bias- not limited by sequencing read length (WES needs smaller target probes)- no reference bias (WES preferentially enriches reference alleles at het sites producing false calls0- WGS captures everything whereas WES is limited to current targeted genes- wgs suitable for complex trait gene identification as well as sporadic phenotypes caused by de novo variants (WES suitable for highly penetrant mendelian disease gene identification)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what are advantages of WES compared to WGS?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is targeted NGS?

A

used for disease-specific targeted tests for hereditary disorders and therapeutic decision making. Uses gene panels which are specific to certain disease types eg. clinical exome or mendeliome or custom-designed panels. only known genes included with established phenotype. can be used for tumour profiling, MRD (can see emergence of clones and allelic ratios), microbiology (disease outbreak, resistance, screening), NIPT and NIPD,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what are the advantages of targeted NGS over WES/WGS?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is the main disadvantages of targeted NGS over WES/WGS?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is a virtual exome and what are the advantages?

A

sequencing an exome and masking all but the desired data. reduces incidental findings, gives flexible analysis and addition of genes at no extra cost. can analyse primary genes first, then broader analysis if negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what are the disadvantages of a virtual exome?

A

coveragedepth is sacrificed for breadth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

how is NGS used for ct-DNA?

A

mutations on ct-DNA can act as a cancer biomarker to identify cancer patients from a group of healthy individuals. more sensitive than tissue biopsy. eg. SEPT9 methylation has been approved by FDA for blood-based screening test for CRC. NGS can also be used for treatment, selction, prognosis and MRD monitoring of ctDNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

how is NGS used for HLA typing?

A

knowledge of pilys in individuals in the HLA region is essential for organ and stem-cell transplantation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the 4 NGS methods of detecting CNVs?

A
  1. read-pair - able to identify almost all types of SVs but it is unable to detect the exact breakpoints. accuracy of RP methods is largely dependent on the insert size. poor performance for dups2. split reads - detect the exact breakpoints of SVs >1 . poor performance for dupskb 3. read depth - RD is more reliable for regions with deletions and duplications and can also count the number of CNVs but difficult to identify the exact breakpoints in RD. enriched in segmental duplications 4. Assembly- poor performance for dups
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what is a benefit of using NGS for CNV instead of MLPA/array?what is the disadvantage?

A

MLPA /array is costly (array) and time-consuming and only a subset of genes tested (MLPA)NGS is high resolution, genome wide, provides positional info, detects UPD and LOH, high throughput, detects balanced and unbalanced rearrrangementsdetection of large rearrangements such as copy-number variants (CNV) from NGS data is still challenging due to issues intrinsic to the technology including short read lengths and GC-content bias. need to confirm CNVs. the challenge is to identify a tool able to detect CNVs from NGS panel data at a single-exon resolution with sufficient sensitivity to be used as a screening step in a diagnostic setting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

what is a phred score?

A

quality. >30 is good30 = 1/1000 error rate so 99.9% accuracy20 = 1/100 99% accuracy10 = 1/10 90% accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

what is a basic bioinformatic pipeline process?

A

quality control > alignment (data mapped to reference) > variant calling > annotation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

why is it important that reads are aligned correctly?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

why is cluster density important?

A

• Low cluster density can give very high quality data but causes a lower depth of coverage. Higher cluster density gives a better depth of coverage but can lead to lower quality reads. If cluster density is too high, the clusters become difficult to read and data can be lost.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

what is a FASTQ file?

A

text-based format for storing both a nucleotide sequence and its corresponding quality scores. This is generally the input for most bioinformatic pipelines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

what is a BED file?

A

text file format used to store genomic regions as coordinates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

what is a BAM file?

A

aligned/mapped reads and associated quality informationA BAM file (or Binary Alignment Map) is a binary format for storing sequence data. Once a set of FASTQs have been aligned to a reference genome using an alignment algorithm, it forms a BAM file. These can be used in the analysis process to visualise variants or to check quality/coverage of an area. BAM files in IGV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

what is a CRAM file?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

what is a BCL file?

A

base calls per cycle, a binary file containing base call and quality for each tile in each cycle. The raw file produced by Illumina platforms (other than MiSeqs). These must be converted into FASTQs for bioinformatic analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

what things affect NGS alignment quality?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

give examples of annotation in the bioinformatics pipeline?

A

gene symbols, the transcript exon numbers, HGVS nomenclature and the variant consequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

why are quality steps used for in bioinformatics?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

why are paired-end reads better for repetitive regions and structural rearrangements eg. insertions, deletions and inversions?

A

the distance between each paired read is known and alignment algorithms can use this info to map the reads over repetitive regions more precisely

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

why is read length important?

A

if too short they will not accurately align. Long reads good for structural variation, repetitive STRs and pseudogenesLonger reads can provide more information about relative locations of specific base pairs. However, long read technology is expensive and is currently not common place in the NHS. Oxford nanopore long-read technology is becoming more affordable but currently has an error rate that would be considered too high in most diagnostic settings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

how can you validate a bioinformatic pipeline?

A
  • assess sensitivity and specifity against genome in a bottle- at least 10 individuals (not genome in a bottle alone)- sensitivity > 0.95- 3 independent runs for reporducibility- all validation samples should be downsampled to test limit of detection eg. 20x, 30x- specificity >0.95- known sanger confirmed insertions, deletions and delins should be ran through pipeline to assess complex variants-
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

how would you select genes for a panel?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

what different types of target enrichment are available for NGS?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

how do you include transcripts in design of a new NGS panel?

A
  • Alamut contains transcripts that encompass all required exons for a gene- NGS validation should include justification for selecting a transcript. if 2 transcripts have something unique (eg. unique exons) both can be joint together in the BED file- LRG is universally accepted reference standard containing fixed section and updatable section where biological info can be updated- list of transcripts fed into software using BED file with ROIs. these are tiled with RNA baits.- ROIs checked on alamut to ensure they span exon +- 50 bases- pseudogenes may result in poorer tiling across some regions. During mapping of reads, more than one alignment usually results in bot being discarded by mapping software and so may need to sanger-fill.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

what are the main steps of designing an ngs panel?

A
  1. target enrichment2. gene selection3. transcript selection4. design - ROIs tiled with RNA baits5. DNA quality checks6. barcoding samples - allows multiplexing which decreases cost7. virtual panels? sub-panel analysis eg. HCM within CM panel8. polymorphism list - gnomad data can be excluded. should be reviewed and updated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

describe validation for an NGS panel?

A
  • required on all aspects of the testing process including method, sequencing and analysis- need to understand technical weaknesses eg. homopolymer tract errors, - need to assess reproducibility and robustness eg. horizontal coverage, 3 independent runs for validation samples, run-to-run comparisons helps to determine level of multiplexing for adequate coverage, include positive controls. quality scores per base or read depth should be monitored- sensitivity = - <5% error rate at 95% confidence which requires 60 unique variants compared in new method in an independent blinded analysis- validation should be documented in laboratory-controlled document system - UKGTN requires that new panels and addition of genes to existing panels should be validated using a ‚Äòknown normal control‚Äô from the 1000 genomes project.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

describe IQC and EQA for panel validation?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

do NGS variants require confirmation according to BPG?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

what should be included in an NGS report according to BPG?

A
  • ACGS standards,- sequence data and clinical info- HGVS reporting- diagnostic yield for negative reports- panel, reference sequence, OMIM#, splice & promoter ROI, method used including library prep, analyser and bioinformatics pipelines and software, coverage, VUS with clinical relevance, ?secondary findings according to local policy, dosage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

how can NGS costs be reduced?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

what are challenges of NGS for counselling?

A
  • clinical utility?- VUS, incidental findings, variable penetrance, lack of literature- lack of data sharing (however CVA is good reource)- ethical issues for relatives- resources for resequencing, VUS follow-up, counselling and medical follow-up- detection rates need to be weighed against risks of VUS (50-100 het variants per patient)- more errors- previously pathogenic variants will need to be downgraded as true variants are found- risk estimates difficult for polygenic diseases- negative reports - but still useful to rule out a diagnosis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

what should pre-test genetic counselling involve?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

what should post-test genetic counselling involve?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

what is 3rd generation sequencing? how does it compare to 2nd generation?

A

sequencing single DNA molecules without PCR amplification. 3rd gen is higher resolution generating over 10 000 bp reads and are better at detecting structural variants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

what are the advantages of 3rd generation sequencing?

A
  • small amount of starting material- higher throughput - hundreds to millions of reactions carried out- lower cost per base- longer read lengths >10 000 bp giving better mapping, phasing, CNV detection, insertions, dels and translocations, novel alternate splicing isoforms, chimeric transcripts-de novo assembly (without ref sequence)- better for repetitive sequence- better for pseudogenes- more uniform coverage and less sensitive to GC-content- potential to detect epigenetic modifications such as methylation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

what are the 3 types of 3rd generation technologies?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

describe 3rd generation sequencing by synthesis and give examples? eg. Pacific Biosciences SMRT

A
  • directly reads original DNA molecule instead of polymerase that copies a DNA strand-eg. Single molecule real time (SMRT) sequencing:- single molecule template per well- polymerase incorporates fluorescent NTs which is visualized with a laser and cameraADVANTAGES: fast, template sequenced multiple timescan detect methylated basesDISADVANTAGES: expensive and limited throughputdetermine large scale sequence structure of DNA without sequencing every baseEg. FRET sequencing (life technology) fluorescence resonance energy transfer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

describe 3rd generation nanopore sequencing?

A
  • A single DNA molecule is threaded through a nanopore (biological or synthetic) and individual bases are detected as they pass through the nanopore- detects up to 200kb- each base alters the current to a different degree
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

what is Synthetic Long Read 3rd gen sequencing? eg. Illumina

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

what is 3rd generation mapping?

A

determine the large-scale sequence structure of DNA without sequencing every base. eg. BioNanooptical mapping system using fluorescently tagged probes attached at “nicked” restriction digest sites to fingerprint long DNA molecules. maps can be compared to a sequence assembly to construct scaffolds of how the sequences should be ordered and oriented along the chromosome, or compared to a reference genome to reveal structural changes, e.g. rearrangement/fusion of two chromosomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

give examples of referrals for which a karyotype may be needed?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

why might a balanced translocation carrier have a phenotype? how can you investigate this?

A
  1. submicroscopic imbalance - – can be investigated with FISH/Array-CGH/Optical genome mapping (OGM)eg. Miller-Dieker Syndrome (MDS)- 17p13.3 deletion- Type 1 lissencephaly with facial dysmorphism. Patients with isolated lissencephaly had smaller deletions. LIS1 gene identified, is deleted in the disease2. gene disruption such as inversioneg. GOF - splicing exons together creating novel chimeric gene such as BCR-ABL1 translated into tyrosine kinaseeg. LOF - coding sequence disrupted in haploinsufficient gene such as DMD in x;autosome translocationconstitutional translocations may give cancer risk if TSG is disabled or oncogene separated from controlling region eg. RUNX1 disruption can give rise to Familial Platelet Disorder with predisposition to AML, MDS. often a second RUNX1 hit leads to leukaemia progression. - identified by sequencing, FISH, rna sequencing, RNA acgh3. gene separated from cis regulatory elements such as promoter
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

how can autozygosity mapping identify a disease gene?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

what are potential issues with autozygosity mapping?

A

homozygous regions unrelated to disease locus andinflated LOD scores due to underestimating inbreeding extent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

what NGS methods can be used to identify disease genes.

A
  • targeted panels for clinically defined heterogeneous disease. requires known candidate genes, 100% coverage- WES - 2% of genome. useful for genetically diverse cases or multiple inheritance patterns. less biased approach to targeted NGS, cheaper than WGS and quicker to analyse however non-coding regions not covered, rarely 100% coverage due to poor enrichment & mapping issues, poor coverage of repetitive and GC-rich regions, not as good at detecting structural variantion- WGS - unbiased, includes non coding regions, fewer GC and repetitive regions bias, detects balanced chromosomal rearrangements and mosaic variants. HOWEVER it is costly, limited coverage of STRs and storage, security and sharing data issues.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

what should a pipeline take into account for filtering NGS variants?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

what are limitations of NGS for gene discovery?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

what future possibilities are there of using NGS for gene discovery?

A
  • RNA sequencing - validate resultsNGS-based methylation profiling- ChiP-seq - analyse protein interactions with DNA- gtex to look at gene expression in relevant tissues- more understanding of regulatory non-coding RNA’- improved data sharing- improved complex disease understanding eg. later onset and reduced penetrance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

what is the calculation for posterior probability?

A

a/a+b where a = prior probability (CARRIER) x conditional probability of mutation not detected by test)b = prior probability (not a carrier) X conditional probability of mutation not detected by test)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

what is the confidence interval?

A

gives an indication of how uncertain we are about that measurement with regards to the true population value, usually 95%. if we were to repeat an experiment 100 times and calculate the 95% confidence interval each time, then 95% of the intervals would contain the population mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

what does it mean if the 95% confidence interval doesn’t span 1 for an odds ratio

A

there is statistically significant association between exposure and outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

ADD TO CARDS how do you calculate the odds ratio?

A

outcome status + -exposed status + a b - c dWhere:a = Number of exposed casesb = Number of exposed non-casesc = Number of unexposed casesd = Number of unexposed non-casesOdds ratio (OR)= (a/c)/(b/d) which can be re-written as ad/bcOR of > 1 suggests that the odds of exposure are positively associated with the adverse outcome compared to the odds of not being exposed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

define test sensitivity? how do you calculate it

A

Sensitivity is the ability of a test to correctly identify individuals who are affected by a disease, (the true positive rate)True positives/true positives + false negatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

define test specificity? how do you calculate it

A

the ability of a test to correctly identify individuals who are not affected by a disease, (the true negative rate)true negatives/true negatives + false positives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

how to you calculate positive predictive value? (PPV)

A

true positives/true positives + false positives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

how to you calculate negative predictive value? (NPV)

A

true negs/true negs + false negs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

how does disorder prevalence affect PPV and NPV of a test?

A

higher prevalence means higher PPV and lower NPV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

what might a dosage quotient outside of defined range indicate? how can this be checked?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

what is a polygenic score?

A

sum of the number of trait-associated alleles in an individual weighted by per-allele effect sizes from a discovery GWASquantifies an individual’s genetic predisposition to a trait

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

give an example of a risk prediction model

A

BOADICEA (Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm)- FH- lifestyle- rare pathogenic variants- polygenic risk score- mammography density

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

what are limitations of polygenic risk scores?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

how does CRISPR-Cas9 work?

A

REF!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q

what are The three main delivery strategies that could be used for clinical genome-editing applications ?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
82
Q

what are the limitations of CRISPR-Cas9 gene editing?

A
  • Accuracy - the ratio of on- versus off-target genetic changes- precision - the fraction of on-target edits that produce the desired genetic outcome- has the potential to create rearrangements that lead to cancer- ‚Ä¢ An immune response to bacterially derived editing proteins- pre-existing antibodies against CRISPR components to cause inflammation -‚Ä¢ unknown long-term safety and stability of genome-editing outcomes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
83
Q

what are possible ethical controversies of germline gene-editing?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
84
Q

How does Sanger compare to NGS?

A
  • can produce reads of 800-1000bp (NGS is limited to 40-400) meaning it can resolve repetitive repeat regions which can be problematic with NGS if trying to aligen a repeat larger than the read length of the NGS software- requires more template DNA than NGS- requires less computational tools to analyse and store data- better at indel detection and excluding psueodgenes from analysis than NGS- variants under primer binding sites can result in false negatives and allele dropout so need to do SNp checks when designing primers and use a positive familial controls when looking for familial mutations- NGS is better at detecting low level mosaics by increasing sequencing depth. In sanger the limit of detection is 15-20%
85
Q

Why is library prep required for NGS?

A

All short read sequencing requires preparation of DNA or RNA libraries- the DNA is fragmented and adaptors are ligated to enable sequencing. For single gene seq, panels of WES there is also an enrichment step to select for the regions of interest. Enrichment is not required for WGS

86
Q

What are the main enrichment techniques and what dictates the choice?

A

Hybridisation based enrichment or amplicon based enrichment| - choice depends on cost, number of targeted regions, sample type e.g. blood or FFPE, TAT

87
Q

What is the basis of amplicon based enrichment

A

Amplicon based enrichment uses long range PCR to amplify and enrich for target regions. The PCR primers are designed to be specific for the ROI. Sequencing ready libraries can be produced from these fragments using techniques such as Illumina’s Nextera, or random shearing methods, prior to ligation of adaptor oligonucleotides for sequencing.

88
Q

Describe the illumina method for amplicon based library prep

A

Illumina Nexetera - provides library prep from PCR amplicons >300bp. Transposomes are used to randomly fragment the amplified DNA whilst simultaneosly ligating oligonucleotide adaptors to the end of the frgments. A limited PCR cycle is then used to add indexes and full adaptor sequences to the tagmented DNA ready for sequencing

89
Q

Describe Qiaseq by Qiagen amplicon based library prep

A

This method is designed to reduce bias and artefacts introduced by amplicon based library prep. this is acheived by the use of UMIs (unique molecular indices). Each DNA molecule has a different UMITotal genomic DNA is fragmented, end repaired and A-tailed in a single reaction. The fragments also have a UMI, sequencing adaptor and index ligated. Target enrichment is perfromed using PCR primers specific for the regions of interest A universal primer can then amplify the library and add further sample indices. Post sequencing, sequencing reads with different UMIs represent different sequencing molecules, those with the same are PCR duplicates and can be removed. the No. of reads for a region with different UMIs can also be used to determine the copy number

90
Q

What are the advantages and disadvantages of amplicon based library prep?

A

Advantages:- low cost- requires less staring material- useful for small regions of interest- can be used for FFPE with appropriate primer design Disadvantages- allele drop-out can result in false -ves. Regions with zero coverage are generally difficult to amplify because of high GC content or suboptimal PCR amplification conditions. - amplification may introduce bias and artefacts due to sequence content (GC)- difficult to multiplex to study larger target regions- readepth cant be used for CNV detection (except QiaSeq)

91
Q

What is the basis of hybridisation/capture based enrichment?

A

DNA is fragmented–> tag fragments with sequencing ready adaptors and barcodes–> regions of interest are targeted by hybridisation to sequence specific probes. -oligo probes are specific to regions of the ROI to result in tiling of the complete ROI. Clean-up of the targeted DNA is aided by the use of oligonucleotide baits conjugated to biotin. These can then be sdelcted for by incubating with streptavidin coupled magnetic beads (biotin and streptavidin bind)

92
Q

Describes SureSelect by Agilent

A

Agilent sure selecgt gDNA is sheared randomly to produce small fragments– barcodes and adaptors are ligated–library is incubated with biotinylated RNA baits (RNA probes are 120nt log to provide high specificity)– targeted regions are enriched for using streptavidin coupled paramagnetic beads– target regions are amplified and loaded onto the sequencer

93
Q

Describe agilent haploplex

A

Unique method using restriction digestion for the initial fragmentation stage. This means no specialist equipment is required and there is less DNA required however the use of restriction enzymes means that all fragmentation sites are identical so duplicates can’t be removed from the data in the post sequencing processing. The method is also prone to allele dropout is there are variants in restriction sites and small (1-10bp) gaps are often found in the resulting data due to the positioning of the cut sites which cannot be altered. 1) DNA is digested by a double restriction enzyme reaction to produce non-random fragments2) fragments are captured by hybridisation with biotinylated probes and captured with strptavidin magnetic beads3) frgaments are ligated to from a closed circle including PCR primer binding sites, barcodes and seq adaotors4) PCR is performed using a common primer to generate a linear library of enriched fragments. The primer binding sites are designed to be in opposing directions so that only closed molecules will be amplified, therefore increasing the specificity

94
Q

What are the advantages of hybridisation based library prep?

A

NAME?

95
Q

Describe the illumina method for sequencing

A

Illumina (nextSeq, miseq, hiseq)Uses sequencing my synthesis on a flow cells using 4 colour flourophore chemistryflow cells is composed of a flat glass slide with 8 microfluidic channels each covered in covalently attached adaptors complimentary to the library adaptors1) library fragments are applied to flow cells and bind to adaptors2) clonal cluster generation is perfromed using in situ bridge aomplification= attached fragment bends over and attaches to a second oligo forming a bridge– polymerase synthesises a complimentary strand- both strands released and process is repeated3) after cluster generation there is sequencing by synthesis.- nucleotides have a reversible 3’ blocker attached so only one can be added at a time by the pol. All 4 nts are washed over the cell at the same time, each with a different flourescent label, the complimentary base to the template is added and the chain is terminated. After each round of synthesis flourescence is detected by a camera. A de-blocking reaction is then carried out to remove the 3’ blocker and the process is repeated until the complete template has be sequenced- clusters mean that there are 1000’s of copies of the same fragment being sequenced in the same cluster so the signal is strong enough to be detectedsequecing for millions of cluster is carried out at once. During analysis regions of overlap (contigs) between clusters is used to line up the sequences and then can be compared to a reference genome for variant calling.

96
Q

Describe the thermofisher (ion torrent) NGS method)

A

Based on detecting the release of H+ when a nucleotide is incorporated which can be detected based on the resulting pH changeuses a chip with a high density of micro-machined wells, each holding a different DNA template. Beneath the well is an ion sensitive layer to detect H+ pH changes1) library fragments are clonally amplified onto sphere particles by emulsion PCR (droplet PCR)2) template containing beads are enriched using a magnetic bead based process3) sequencing primer and DNA pol are bound to templates and loaded into the sequencer4) The 4 dNTPs (unlabelled) are sequentially added, if incorporated H+ is released during the formation of the DNA phosphodiester bond- more H+ is released for homopolymer tracts5) unincorporated dNTPs are washed off and the next is added until the full fragment has been sequenced 7) H+ is detected by a semi-conductor and converted to a voltage

97
Q

What are the benefits and limitations of illumina sequencing?

A

Most commonly used platformhigh yeildperforms well in homopolymer regionsexpensive sequencer (compared to ion torrent)

98
Q

What are the advantages and disadvantages of ion torrent?

A

fasted TAT on marketlow costflexible, scalable with a range if chipsrelatively poor performance in homopolymer regionshigh seq errror rate

99
Q

Describe Roche sequencing + advanatges and limitations

A

sequencing by synthesis4 nucleotides are added sequentially by DNA pol which results in the release of pyrophophate which is converted into a chemiluminescent signal in a luciferase reaction (pysrosequencing)Fragment genomic DNA; add forward and reverse adaptors; emulsion PCR to clonally amplify library on beadAdvantages - Longer reads improve mapping of repetitive regions. Good indel detection. Fast run times. Disadvantages- high reagent costs, high error rate in homopolymers and low capacity

100
Q

Describe life technologies/thermofisher sequencing + advanatges and limitations

A

SOLID sequencing- sequencing by oligo ligation and detection (not seq by synthesis)- sequencing by ligation of a flourescently labelled hybridisation probes to deduce the signal of 2 bases at a timegDNA is fragmented, coupled to magentic beads and amplified by emulsion PCR (clonal amp). there is 3’ modification of the bead and covalent attachment to a glass slide2 base decoding provides inherent error correction but there is under representation of AT and GC rich regions.

101
Q

What is the difference between WGS and WES?

A

WGS is whole genome including introns and mtDNAWES is only the coding region of the genome- 1-2% of the whole genome but accounts for 85% of known disease causing mutations

102
Q

what are the advantages of WGS

A
  • allows analysis of SNVs, CNVs, indels in the whole genome including regulatory regions e.g. promoters and enhancers- enhanced detection of CNVs and structural variation due to more reliable and uniform sequence coverage- differences in PCR efficiency in different regions of the genome (amplicon) or efficiency in probe hyb or non specific binding in repetitive regions (hybridisation) can result in regions of the genome with little or no coverage. Regions of the genome with low sequence complexity have poor coverage as it is harder to design specific capture baits- this can result in poor coverage or off target effects- PCR amp is not required reducing library prep time and the possibility for GC bias- sequencing length is not a limitation. For Wes most target probes are ~120nt meaning it is pointless to sequence longer read lengths- lower average read depth required to achieve the same breadth of coverage as WES- doesn’t suffer from reference bias- hybridisation probes/baits tend to preferentially enrich reference alleles at het sites resulting in fale SNV calls- n WES the refseq is targeted so only targeted exons that have been identified so far will be captured. With improved WGS our understanding of the exome will improve
103
Q

What are the advantages of WES?

A

NAME?

104
Q

how can NGS be applied as a tool for gene discovery?

A

traditional methods involved positional mapping and then sanger sequencing of the region to ID new genes and pathogenic variants (association sudies, linkage analysis, karyotype)- these methods struggle with rare diseases due to the small sample size of affected individuals GWAS has contributed to the discovery of disease loci involved in complex traits where each variant only contributes a small fraction of the observed heritability. Need very large sample sizes for significance to be proven with low penetrance lociNGS- allows a single step approach but presents a challenge in terms of variant interpretation- can sequence and filter variants to ID a shared gene in a cohort of patients with the same phenotype- can sequence and filter for the same variant in a family with the same phenotype- can sequence parent/child trios and filter for de novo or homozygous/compound het variantsMDT and a good relationship with clinical genetics is required to determine how to deal with IFs and VUS

105
Q

How are variants prioritized for suspected pathogenicity

A
  • Rare in a population- located within a protein coding gene- directly affecting the function of the protein e.g. frameshift, +1/2 spice variants- follow the observed inheritance pattern in the family (AD, AR, XL)strategies also depend on the pedigree structure, extent of locus heterogeneity
106
Q

When is exome sequencing of the use of targeted panels useful in diagnosis?

A

NAME?

107
Q

what are the ethical considerations of testing whole exomes compared to targeted testing

A

NAME?

108
Q

What are the different types of targeted panels and what are the benefits over WES

A
  • custom target enrichment (many private companies off specific target kits for cancer exomes of can offer custom designed target capture)These have a lower cost and allow greater coverage including sanger gap fill of regions that are hard to sequence by NGS. this also helps with providing higher quality calls and reducing the need for sanger confirmation which is beneficial in a clinical service - only genes with known disease association are included reducing the interpretation load and potential for IFsalternatively can perfrom full exome seq or clinical exome seq (small gene subset only including those where the gene disease association is known). Following sequencing a virtual panel specific to the phenotype presentation is applied. This is beneficial as the panel can be updated when new disease associations are confirmed and all samples can be treated in the same way in the lab set-up regardless of referral reason. But there is more data generated which needs to be stored and the depth of coverage is sacrificed for a greater breadth so not all genes will be covered to a diagnostic level which may result in false -ve results
109
Q

what is the advantage of targeted panel over full WES/WGS?

A

NAME?

110
Q

how is WES used in cancer diagnostics?

A

WES can be used for tumour profiling - has a higher sensitivity than exitisting methods using deep sequencing (500-1000x)- allows parallel testing of multiple samples- can detect any mutation (within coding region etc) in the targeted gene rather than just targeted mutations as is the case for ddPCR and other tumour profiling techniques- short TATs possible with small sequencers e.g MiSeqMRD monitoring- can ID multiple variants to study the emergence of clonal dominance, emergence of new resistance mutations

111
Q

WES/WGS in prenatal testing

A

Now offered as a rapid service in prenatal testing for severe abnormalities detected on USS, following normal PCR- based on results of the PAGE study and shows a better diagnostic rate than just P CR and arryNGS techniques used for NIPT and NIPD- MPS provides the coverage to ID small increases in obverall CN for chr 13, 18 and 21 for NIPT aneuploidy testing

112
Q

What is CHIP

A

chromatin immunoprecipitation. technique for genome wide profiling of DNA- protein interactions e.g. binding sites of DNA binding protiens, chromatin modifiers, translation start sites, epigenetic mechanisms,CHIP Abs are used to select for specific proteins which are enriched for from the mixture with the associated DNA sequence. The DNA can then be sequenced to ID the specific sequence motifs associated with the protein interaction. Studies e.g. ENCODE and FANTOM5 have been used to reveal the genome wide profiles and binding sites for a range of DNA binding proteins.

113
Q

What is RNA seq?

A

generate a cDNA library from RT-PCR of total RNA or can select for a specific RNA seq e.g. mRNA which is polyA +ve- the sequencing data can than be mapped to a ref genome or use de novo assembly. Used to detect novel framshift and splicing variants, map 5’ and 3’ gene boundaries, novel splicing isoforms, post translational modifications- often used to ID differences in disease states, different tissues or developmental stages

114
Q

What are the advantages of RNA seq?

A

NAME?

115
Q

what are the challenges of RNA seq?

A

NAME?

116
Q

Other uses of NGS sequencing in cancer

A

Cancer genome atlas andf international cancer genome consortium are using DNA and RNA seq to improve the molecular subtyping of cancer and indeitfy novel fusion geneNGS has also be applied to testing circulating tuumour DNA (ctDNA) to identify biomarkers for diagnosis and prognosis. This has an increased sensitivity compared to tumour biopsy and is less invasive e.g. FDA has approved testing for SEPT9 methylation status in ctDNA for CRC

117
Q

What is the impact of cluster density on NGS quality?

A

low density clusters can give high quality data but low depth of coveragehigh cluster density gives a greater depth of coverage but lower qulaity date. If the clusters are too dense the data can be hard to read and information is lost

118
Q

What should be looked at during bioinformatic validation?

A

sensitivity, specificity, reproducibility and need to down sample to test the limit of detection. If not you cannot be sure that you will detect all the variants at the limit stated on the report e.g. 20x or 30x

119
Q

What is BCL file?

A

Base call per cycle file. It is a binary file containing the base call and quality score for each tile in each cycle. This is produced by illumina and is converted to a FASTQ for bioinformatic analysis

120
Q

What is a CRAM file

A

ultra condensed version of BAM file and are likely to become more common in a n effort to reduce the datas storage requirements of NGS tests

121
Q

What is a VCF file?

A

variant call file- used to show variants compared to the reference genome. Produced by applying variatn calling to a BAM file- very basic and contains: chrom, pos, ID, ref, alt, qual and filter

122
Q

what are the typical steps of a NGS pipeline?

A

FASTQ1) QC check to ensure the base calls are of sufficient qualty. Poor quality calls are removed from the data to prevent false results. In general a phred score of >30 (i in 1000 or 99.9% accuracy) is considered sufficient2) variant calling3) annotation- adding additional information e.g. gene symbolFASTQ files can be treated as the raw data input to be processed by the software

123
Q

What is the role of quality control in an NGS pipeline?

A

To get an overview of the inherent quality of the sequencing data. It is necessary to improve alignment, remove poor quality reads and artefactslooks at per base quality scores, per sequence quality scores, seq duplication scores, over represented sequences.

124
Q

How is alignment done?

A

plots each short read against the reference genome to enable variant calling. Uses sequence alignment algorithms to align the sequencing data and some deviation is allowed to allow to variation. Previously used BLAST which uses high scoring segment pairs within the query seq and target seq - but the algorithm can be inefficient with dealing with the large scale data NGS produces

125
Q

what are the current methods for alignment

A

Currently used algorithms use filtering or indexing to reduce the time and memory required for filtering. - filtering excludes the parts of the genome where no match can be found and can operate a K-mer or pigeon principle- indexing pre-processes the data making scanning more efficient and is used with a Borrows- Wheeler Transform

126
Q

what are the characteristic errors associated with illumina sequencing tech?

A

illumina has a relatively low error rate and is more likely to have mismatches than indels. Indels are rare overall but have a high miscall rate in illumina chemistry when called (0.1%). The main complication arises from the synthesis process being desynchronised between different copies if the DNA template in the same cluster

127
Q

What are the most common error models for alignment and how does the sequencing technology affect the choice?

A

Different sequencing technology is prone to different types of errors- therefore the error model used for aligning should be chosen with this in mind for maximum acurracy- Hamming distance- calculates the no.of positions where bases are different between the ref genome and sequence- exit distance calculates the number of operations required to convert seq short read data into an exact match of the reference genome

128
Q

what are the steps in variant calling?

A

1) pre-processing- refine data, remove duplicates, re calibrate quality scores. (can be part of or separate to the variant calling algorithm)2) variant calling- often split into SNV, CNVs, SVs and indels- each may require a different specific algorithm for maximum accuracy- alleles observed in 20-80% of reads are called as het- alleles in >80% of reads are called a homozygousthis method is affective for bases with a high seq depth and phred score >20

129
Q

How are variants called in poor quality sequencing?

A

Poor quality sequencing requires more stringent filtering and can miss het calls so probablistic models using bayes are used to calculate genotype liklihood and confidence scores- this improves accuracyindels are harder to call and are very difficult to ID if they are larger than the sequencing length i.e.>~150bp

130
Q

what is the process for annotation?

A

Annotation adds additional information to the variant call file- gene symbol, transcript no. HGVS nomenclature, variant consequencerepeat regions can be identified to facilitate annotation- may be masked

131
Q

what are the causes of NGS errors

A

over or underclusteringerrors depend on the signal-noise and can be affected by corss-talk with nearby clusters, homopolymer count, incomplete extension and the position at the end

132
Q

what are the characteristic errors in Ion torrent and ROCHE sequencing?

A

main problem is the variance in signal intensity for homopolymers and they can be called inaccurately if they are large. Also have a high error rate in insertion and deletion calls.

133
Q

what is the affect of read length on NGS quality?

A

Longer read lengths give more information in the relative position of a specific base pair. they are useful for repetivtive or homolougous regions as there is more chace one end will be anchored in a unique region to facilitate base callingmore expensive and not high throughput so not available on the NHS. Nanopore technoology is relatively affordable but the error rate is too high for diagnostic use.

134
Q

what is the difference between single and paired end sequencing?

A

Paired end sequencing sequences a fragment from both ends. Because the distance between each paired end is known this information is used by alignment algorithmsto increase the accuracy of alignment. It can also help ID deletion and duplications if the distance deviated from what is expected. It is also better and resolving SVs, indels and inversions. in single end reads the fragment is sequenced in one direction. The process is cheaper and less time consuming and is used when the accuracy of paired end seq is not required.

135
Q

what limits the accuracy of a sequence alignment?

A

limited in repetitative regions and regions of shared homology e.g.pseudogens or highly related gene families. Sanger seq is better at alignment as the sequencing length is longer

136
Q

What is the importance of sequencing depth?

A

depth is the amount of time a single base is sequenced. The data is used together to develop a consensus sequence so the more times a base is covered the greater the accuracy. Generally diagnostic labs validate pipelines to detect variants with 20-30xinadequate coverage will result in false -ves

137
Q

Why is bioinformatic validation of NGS required?

A

It is essential to asses a pipeline against a truth set e.g. genome in a bottle to asses the sensitivity and specificity of a panel- can also run known to asses the ability to detect more complex variants e.g. indles and CNVs or to detect clinically significant variants in complex regions e.g. GC rich regions. use prevuiously tested samples are +ve controls (sanger seq is the gold standard)

138
Q

what are the sensitivity requirements for NGS pipleline validation?

A

sensitivity should be >95% with a CI of 0.95| Should be calculated from 3 independent runs to determine the reproducibility (BPG)

139
Q

What is the analytical sensitivity for NGS?

A

sensitivity is down to 5-10% form most SNVs which is OK for inherited disease but may miss some low level mosaicism and may not be sensitive enoughb for oncology samples where there is a high background of WT (hence tumour profiling is usually to a much higher depth 500-1000x)systematic errors increase with increase coverage- use of overlapping paired end reads can increase sensitivity or unique identifer tags

140
Q

which areas of the genome are particularly difficult to sequence

A

homologous regions - e.g. pseudogenes can result in mismapping and false -ves/+vesrepetitive regions- unique flanking sequence is required for accurate mapping. regions longer than the insert region cannot be accurately mapped GC rich regions are not reliably interpreted as there is higher background noise. Most due to the propensity to form secondary structures or accumulate G C flourophores after washing- illumina seq is prone to substitution errors in GC rich regions.

141
Q

what are the stages of bioinformatics of NGS data?

A

base calls –> demultiplexed to form FASTQ files–> adaptor triming to give analysis ready FASTQ –> alignment to reference genome to giev BAM files –> process BAM to get QC stats —> variant calling —> variant anotation

142
Q

What is involved in panel design for NGS service deliver?

A
Target enrichmentGene selectionTranscript selectionDesignPoly checks
143
Q

Why is barcoding and DNA quality checks required in a dignostic NGS service?

A

barcoding allows samples to be pooled which increases the throughput and reduces the cost per patient. This is possible with exomes and targeted panels as there is far less DNA per patient to be sequencedDNA quality checks are required for NGS as the test is expensive to run. Quality checks look to see if the library prep has enriched for DNA with the correct fragment size and can be by running samples on a gel to check for products of using a tape station

144
Q

What are the considerations for target enrichment for NGS panel design?

A

Amplicon based methods- can suffer allele dropout due to rare polys- not scalable- agilent haploplex uses restriction digestion so all fragments have the same breakpoints. fragments are tagged with biotin, sequencing primer motif, PCR primer, barcode and are then cricularised (only circularised fragments are amplified to increase specificity). The circularised fragments are captured by streptavidin beads and PCR amplified. The inclusion of barcodes allows the removal of duplicates after sequncing, increases sensitvity for low freq variants and allows CNV detection which is not possible for other amplicon based techniquesHybrid capture probes/baits e.g. agilent sureselect- difficulty in homologous regions or repetative regions, this can result in off target binding- can be used for CNV detection- easier to have as high throughput with multiple target regions e.g. whole exome- more expensive

145
Q

what are the considerations of gene selection in panel design?

A

clinical scientists/geneticists select genes with published evidence of association to disease- dutch labs have developed a core gene list for consistency- in the UK PanleApp is used. This is crowd sourcing tool to allow virual panels to be developed and shared within the genetic community. Gene reviews are added and genes can be rated from green (high evidence and variants in gene should be reported to patient) to red (low evidence and not suitable for reporting at this time) diagnostic yeild of NGS should be as good as sequential sanger seq. Therefore need to consider if there are pseudogenes present in the panel. These can interfere with sequencing and variant calling in the natural gene and sanger gap fill may be required. limiting the number of genes in a panel increases the coverage of targets so there is higher senstivity for mosaic/ low freq hetsMost diagnostic labs opt for sequencing a full exome and application of a virtual panel as there is flexibility in the genes that can be analysed and the data can be re-analysed in the future if new evidence linking a gene to disease is published. This technique is also more suitable for batching as all patients are treated the same regardless of phenotype

146
Q

What are the considerations of target selection in panel design?

A

Alamut is a useful tool for identifying transcripts- can see which contains all the exons in the gene and what alternative transcripts may be presentNGS validation should include justification for the transcript selection- if there are 2 unique transcripts with different exons both should be included.

147
Q

What is an LRG transcript?

A

LRG project aims to provide stable transcripts for universally accepted standards for reportingEach LRG contains- stable transcript with the most up to date biological info (mapping info, anotation of all transcripts, legacy exon and a.a. numbers

148
Q

What is required to design a panel?

A

Once the genes and transcripts of interest have been determined these can be used to develop a BED file (can use agilent SureDesign) of ROIs with there genomic coodinates. These can be tiled with RNA baits/probes. RNA bait coordinates should be checked in alamut to confirm that they cover the full ROI e.g. exons +/- ROIpseudogenes can sequester some baits resulting in poor tiling so this should be considered in design and some sanger gap fill may be required for full coverage.

149
Q

What is the impact of polys when designing a panel?

A

Need to consider if they will result in allele dropout during library preplist of common polys (>1%) can be obtained from gnomad and used to filter data post sequencing - removed from variant prioritsation list as unlikely to be causative (need to be careful of some high freq AR variants whihc may be present at a level higher than this e.g. F508 in CF)

150
Q

What is required for effective NGS service delivery according to BPG 2015

A
ACGS BPG outline the requirements for: validationEQA and QCreport formatpricingpatient pathways
151
Q

What is required for NGS validation

A

Validation is required for all new tests to provide evidence that the test will provide the correct result on the sample. It should be carried out for all aspects of the NGS workflow including library prep, sequencing and data analysis including variant calling. it is important to understand the technical limitations of the test so they can be addressed during validation and if required additional tests included in the service e.g. sanger gap fill of regions with pseudogenes. Validation covers the sensitivity (ideally 95% with CI of 0.95), specificity and reproducibility and should identify the optimum conditions for the test to be run at e.g. if DNA conc or volume of consumables used is being changed from manufacturer guidelines

152
Q

Why is EQA and QC required for an NGS service?

A

IQC is concerned with internal qualit parameters. These are tested as as indicator that the test is meeting the required standards set out during validation- includes cluster density, no. of reads, coverage- there should be robust reporting and checking of these metrics so that sub-optimal data is not reported, e.g. low coverage may risk het variants being missed.- technical data should be included on the technical reportEQA checks the performance of the lab against BPG and other labs. There is a GENQA scheme for germline and somatic testing and involvement is a requirement of accrediation- wet based: gDNA is distributed to the lab so test the technical processing as well as reporting- dry based: FADTQ files supplied to test the bioinformatic processing and results interpretation

153
Q

Why is sanger confirmation required?

A
  • for labs that do not have a robust tube transfer checking system (barcodes or witnessed transfer) to confirm the ID if all samples at all stages BPG require any reported variant to be confirmed by sanger seq on a new DNA dilution- in practice most diagnostic labs have this in place. However variants have been confiremd as the sensitivity and specificity of some tests was not 100%. As NGS has become more commonplace many labs have now tested enough samples to prove that they do not require sanger confirmation for simple SNVs (100% specificity) (may still be required for CNVsd and indels), therefore sanger confirmation is not always required saving time and money. However by not sanger confirming it may just mean that primers need to be designed at a later date for family testing.
154
Q

what is the guidance on the report format for NGS test?

A

The aim of a report is to clearly communicate the genetics results the clinicians. It is split into scinstific and technical sectionsthe scientific section should follow BPG and include:patient identifiers, reason for referral, headline result and result interpretation linked to the patients phenotype. It may also include reauest for parental samples of the offer of PND following counselling where appropriate. Results should be given using HGVS nomenclature (including transcript) and UK labs should be moving to using build 38 by the end of 2021. For negative reports the expected diagnostic yeild of the test should be reportedtechnical section to include:supporting infromationa bout the panel including: genes tested with refseq ID and OMIM no., info on splice sites and primoter regions and info on ehihc regions ahve an have not been covered (especillay if variants in regions not covered have been associated with disease) - methods used (library prep, seq tech, bioinformatic pipeline and software)coverage achieved should be highlighted including regions of poor coverageVous with clinical relevance may be reported depending ona labs individual guidelines. In general ‘hot’ class 3’s are reported. IFs are reported depending on the labs policy e.g. ACMG59details of whether dosage analysis has been included

155
Q

what determines pricing of NGS?

A

the cost of NGS technology is constantly decreasing however the cost of reagents, and technical and scientist time remains constant. batching patients using barcodes allows more patients to be tested per run but reduces the coverage and is limited by the library prep and platform used. the cost of NGS panel needs to be competetitive.In generall if a pateint has a hetergenous disrder it is cheaper to do the NGS panel then sequential sanger seq and can significantly reduce the diagnostic odessy for the patient. For virtual panels there is the added benefit that further panels can be analysed if the first is negative

156
Q

what is the translational utility of NGS?

A

NAME?

157
Q

What are the challenges of NGS?

A

Clinical utility & the interpretation of results- the ability to interpret genetic data lags behind the ability to obtain it and not all genetic data is clinically useful so should it be reported?detection of VUS and IFs and variably penetrant alles require expertise in their interpretation and reporting- this can be difficult for novel variants where there is a lack of clinical or functional research to confirm or refute a diseases association. there is a fundamental need to share genetic information between labs to support the interpretation of variants- this is supported with platforms such as decipher. Lab may also have internal variant databases so that they can be referred to is the same variant is detected again.

158
Q

what are the ethical issues associated with NGS?

A

risk of detecting IFs and VUS-m this is more likely the larger the panelgenetic results may also have impact on family members who will not have been counselled or agreed to testing.

159
Q

what are the pros and cons of single gene testing vs panel testing

A

Single gene testing is cheaper and there is a lower risk of VUS. But may become more expensive if lots of single gene tests are required to reach a diagnosispanel offer an improved detection rate but there may not be sufficient data in the medical literature to report variants. Improved detection rate needs to be weighed against increased risk of VUScompounding this is genotyping errors and some previously detected variants will have bee misclassified. A study in HGMD showed that 27% of a wide selection of entries where actually benign polys or mis annotated.

160
Q

What is the issue with variable penetrance?

A

usually ascribed to environmental factors or genetic modifiers- correct management may need to bear in mind a patients medical and family history

161
Q

what are the requirements for pre-test genetic counselling?

A

needs to cover the infor that can be derived from the test and the limitations, the benfits and possibility of IFs and VUS- genes included- whether the report will be added to the patients medical records (issue in the US where there is a fear of insurance discrimination)- whether IFs will be reported - if Trio testing whether the parents want Ifs to be reported if not detected in the proband

162
Q

what are the requirements of post test genetic counselling?

A

the result of the test and the implications for the individual and their family- needs to be a mechanism for re-review of VUS

163
Q

Impact of 100K on geneticNGS reporting for NGS in UK

A

100K project as well as trying to ID variants in cancer and DD was used to pilot different reporting and testing techniques and act as a proof of principle for the roll out of NGS testing in routine diagnostics.In the 100K project researchers did not have an obligation to search out Ifs (as is the case in the US) as this would distract from the aim of the project, however if one was identified it would be reported if it was considered to be clinically valid (consistent clinical significance) and clinically useful (is knowledge of the variant likely to benefit the parent/family)- these needed confirmation in an accredited lab prior to reporting. They were also carefully considered in conjunction with a clinical geneticist.

164
Q

What is single molecule sequencing?

A

single molecule seq aims to sequence a single DNA meolcule without the PCR based amplification or the need to halt between read steps. Therefore there is no reliance on a clonal population of amplified reads or chemical cycling

165
Q

what is the difference between 2nd and 3rd generation sequencing?

A

second generation sequencing is produced by short read technology. IN contrast 3rd generation seq uses long read tech (>10,000bp) therefore there is improved detection of SVs, CNVs, and repetitive regions- these are all important in development, evolution, adaption and disease

166
Q

what are the benefits of 3rd generation sequencing?

A
  • smaller amount of starting material required- higher throughput- simplified template preplower projected per base costlonger read lenghts (10,000-100,000bp) resulting in enhanced de novo assmebly, haplotype detection, CNV and SV detection, detection of insertions and inversions, detection of balanced translocations and novel chimeric fusion transcripts- enhanced sequencing of repetitive regions resulting in a more contiguous reconstruction of the genome- more uniform covergae- less sensitive to GC content- potential to detect epigenetic modifications e.g. methylation
167
Q

what are the different types of 3rd generation seq? + examples

A

1) sequencing by synthesis (SMRT - pac bioscience and FRET seq- life technologies2) nanopore - biological or synthetic (Oxford Nanopore)3) Syntheic long read technology (Illumina and 10x genioomics)

168
Q

describe the technique for sequencing by synthesis for SMRT (Pac bioscience)

A

Conitnuously moniros the incorportation of a differently labelled nt. In second generation seq there is a blocker so the seq is stopped and visualised after each nt hasd been added. Each NT carries a differently coloured florophore attached to the phosphate group. The flourescence is read (laser and camer read the flourescence emitted) as the nt is incorporated and the flourescnet group is naturally cleaved by the polymerase when the next nt is added. sequencing is performed on SMRT cells wihihc have 1000’ of ZMWs (zero magnitude waveguide) - this is a metal film with a small hole (30-70nm) in, this means only light can pass through the hole in the bottom of the ZMW so it can detect the activity of a sinlge molecule against the background of 1000’s of labelled nt

169
Q

what are the steps in a SMRT reaction?

A
  1. template molecule is circularised by attachment to hairpin adaptors2. single circularised DNA molecule is attached to the base of each ZMW3. DNA is processed by the po4. fluorescent nt are flooded on top of the ZMW5. when a complimentary nt is included the enzyme holds it is place fractionally longer than an unincorporated base and during this time the flourophore is excited and fluorescence detected6. flouorphore attached to p so cleaved and next nt added.
170
Q

what are the advantages and disadvantages of SMRT seq?

A
  • can detect methylation as the pol pauses for slightly longer on modified bases- circularised molecule so it can b read multiple times to produce a consensus seq and remove random errors (up to 15%)- very fast- but limited input due to number of ZMWs that can be read per SMRT cell- expensive
171
Q

Describe FRET seq (life technology)

A

sequencing by synthesis. The DNA pol is tagged by a donor flourophore, which when bough into close proximity with an acceptor flourophore results in a FRET signal. After incorporation the flourophore attached to the nt is released as it is bound to the phosphate group 1. DNA is covalently attached to the coverslip of a TIRF based micrscope2. unviersal primers are added to the gDNA3. labelled pol added to slidebecause the DNA pol is not attached to a solid substrate it can be exchanged mid sequence - therefore damaged pols can be replaced enabling a high read length

172
Q

How does nanopore seq technology work?

A

ssDNA is electrophoretically driven through a nanopore as linear denatured DNA- an eletrical potential is applied to a solution either side of the pore - DNA passing through the pore results in a measurable change in the ionic current- due to the lenght of ht pore the change in the current is due to a string of bases called a k-mer not a single ni

173
Q

describe the oxford nanopore

A

Biological nanpore- TM protein channles in a substrate (lipid bilayer, liposomes)- this gives a highly reproducible nanopore size ans structure

174
Q

describe the oxford minion

A

Minion is a consumer nanpore sequncer, it is inexpensive, portable, hand held device that can sequence upt to 200kb- DNA to be sequenced is linked to an unzip enzyme so it is thread through the pore as ssDNA. The DNA is also altered so their is a hairpin structure at one end so both strands are thread through, 1 after the other. this provides a continuous read whihc is re-aligned to form a 2D consensus sequence

175
Q

what are the advantages, disadvantages and uses of the minion

A
  • fast- portable- as Kmers are measured there are 1000’s of possible signals rather than just 4 therefore there is a high error rate especially for indels and homopolymers longer than the kmer lenght- has been used to genotype salmonella outbreak in a UK hospital in 6hrs, seq Ebola in the field (monitor transmission history and viral evolution)
176
Q

what are synthetic nanpores?

A

made of graphite with drilled pores and as DNA is passed through the electrical signal reflects the size and confirmation of the DNA molecule in the pore. More stable than biological nanopores and more control over pore diameter, channel length- lower sensitivity to external parameters e.g. pH, temp, salt conc,

177
Q

describe illumina synthetic long read sequencing

A

DNA is fragmented into 10kbp molecules of DNA, clonally amplified and barcoded before sequencing with a short read instrument- long reads are then synthetically generated from short read sequencesvery high accuracy (>99.9%) but read lengths are shorter than ohter 3rd generation tech as requires long range amplification and long reads are synthetically generatedprone to bias in regions with high GC content or tandem repeats

178
Q

what is 10x genomics synthetic long read seq

A

similar to illumina but uses oilbased emulsion and multiple displacement amplification to ligate short barcoded sequences across a longer moleculeunlke illumina it does not aim to generate gapless coverage from short reads, but provides gapless coverage by ensuring there are multiple long fragments originating from the same region in the library prep

179
Q

what are the challenges to 3rd generation seq?

A
  • Time- Most of the current systems do not generate enough data fast enough for a rapid response eg in a hospital setting for monitoring infections.* Storage and bioinformatic solutions are required to handle the wealth of data being generated.* Utility and ethics- The vast number of tests and availability of direct to consumer testing raises questions of consumer understanding and personal impact of results, and problems of false positives and negatives.
180
Q

what information should be collected on the gene for variant interpretation?

A

NAME?

181
Q

what is considered for bioinformatics prioritisation?

A

NAME?

182
Q

What is the value in MDT?

A

MDT is useful escpecially for rare diseases and NGS variants as it allows the variant in question to be interpreted in the context of the patients phenotype. usually involves clinical geneticist and scientists +/- the referring clinician and experts in the field

183
Q

what are the classes a variant can be?

A
class 5- 99% probability of being disease causing (actionable)class 4 - 90% (actionable)class 3- VUSclass 2- 10% class 1 - 0.1%
184
Q

what information on the variant should be considered?

A
  1. literature search2. mutation databases3. in silico predictions4. splicing5. functional studies6. inherited/de novo7. RNA studies8. IHC9. segregation10. co-occurence with a known pathogenic mutation (different impact for AR and AD)11. LOH12. enzyme assays
185
Q

what tools can be used for literature search?

A

google, pubmed- use HGVS and legacy nomenclature| some software packages collate the information

186
Q

what databases can be used for variant interp?

A

population databases e.g. gnomad can rule out population polysdecipher, HGMD for variants- be aware of miinterpreted info ~27% of HGMD variants listed as pahtogenic are actually polysdisease specific databases e.g. LOVD

187
Q

what is the impact of co-occurence with a pathogenic variant?

A

for AR varaints can support pathogencity (need to prove variants are in trans- parental studies)for AD variant this can refute the pathogenicity of the variant where 2 could be lethal e.g. BRCA

188
Q

how is co-segregation data interpreted?

A

use jarvik and browning paper to determine the level of evidence that can be assigned- co-segregation with disease in affected family members supports pathogenicity and there is more evidence if it is found in multiple unrelated families- no. of informative meioses can be summed- co-segregation in family members without disease support being benign- need to consider phenocopies, variable penetrance, age of onset, - apparent segregation does not confirm that the variant is pathogenic as it may be in linkage disequilibrium with the pathogenic variant

189
Q

what is the impact of inheritance info?

A

for AD disorders presence in unaffected families members support benign whereas absence supports pathogenicity. need to consider penetrance e.g. 22q11.2 can be inherited from an unaffected parent but may be pathogenic in a child- consider if if fits the phenotype

190
Q

what do in silico predictions consider?

A

make predictions based on:- species conservation- the more variability at a position the less constraint and so more likely the variant will be tolerated. Need good quality alignments from multiple species and gaps in the alignment can affect the result so these should be removed- grantham distance- protein software e.g. alignGVD (combines grantham distance with alignment) or polyphen and SIFT (seq alignment and 3D structure)- CADD scorecan only be used as supporting evidence PP3

191
Q

when should splicing be considered?

A

variants affecting +/-1/2 splic consensus sites will affect splicing. Splice predictions can be poor in the rest of the gene. However an affect on splicing should be considered for missense and synonymous variants, especially when they affect AG or GT dinucleotide sequences are formedSplice finder like, MaxENTScan, GeneSplicerRNA studies are required to confirm an affect on splicing

192
Q

RNA studies for splicing investigation?

A

to investigate the affect of splicing on cDNA need to consider:- normal isoforms (may complicate interpretation of results)- expression of mRNA in blood- is the trancript of interest expressed in blood or does another tissue need to be sampled- quality of RNA- degrades quickly- NMD, confirmation of biallelic expression of the variant can rule out NMD. If not not variant isoform may not be present as degraded by NMD.

193
Q

When is LOH investigated?

A

LOH is important for investigating the pathogenicity of TSGs.If a variant in a TSG is detected in constitutional DNA, LOH of the normal allele in tumour can support the pathogenicity of the variant whereas LOH of the variant refutes this. - LOH studies assume the second hit is a large deletion- presence of normal tissue in tumour sample may obscure result so need precise dissection- If LOH of normal allele seen it does not confirm pathogenicity

194
Q

How are functional studies used for variant interpretation?

A

can provide in vitro demonstration of molecular consequence e.g. disruption of expression or misfolding or in vivo recapitulation of a human phenotype in a model organism- requires identification of a measurable property associated with fucntion- useful info but not definitive. If the result is negative need to consider if the assay was performed in the appropriate genomic context, organismal context, developmental context and a positive result does not confirm pathogenicity e.g. a mutation may be shown to influence transcriptional regulation but is is difficult to assess the significance of an increase in transcriptionenzyme assays- useful for disorders that result from enzyme deficiencies e.g. non-ketotic hyperglycinemia due to deficiency in glycine cleavage enzymeIHC- MMR genes, DMD- loss of expression

195
Q

finding disease related genes using cytogenetics:| - an apparently balanced rearrangement may result in phenotype if?

A
  1. there is a submicroscopic imbalance at a breakpoint2. results in disruption of a gene3. gene is separated from gene regulatory elements
196
Q

How do you investigate a submicroscopic imbalance at a breakpoint?

A
  • can be investigated by arrayCGH- cannot use FISH unless have a suspicion of the gene involved so cannot be used to investigate novel genese.g. t(6;8) found to have 8q12 dletion at breakpoint- included CDH7 gene resulting in CHARGE syndrome
197
Q

How to investigate disruption of a gene caused by a translocation?

A

GOF- splicing together exons from different genes resulting in a novel fusion gene- common in cancer but rare in inherited disease e.g. BRC-ABL1 in CML t(9;22)(q34;q11)- consituitively active tyr kinaseLOF- if a gene falls across a breakpoint this can disrupt expression resulting in haploinsufficiencye.g. X;A translocations - to avoid imbalance there is skewed X inactivation favouring inactivation of the normal X. This can result in expression of an XL phenotype if a gene is interrupted on the translocated X. This has been reported in DMD in a femaleKleefstra syndrome found in a pateint with a t(x;9) and found disrupted EHMT1 gene in 9q., Further studies found 3 patients with microdel incl EHMT1 = established that HI forf EHMT1 is sufficient to cause Kleefstra syndrome

198
Q

how can a rearrangement separating a gene from it regulatory element result in phenotype?

A
  1. positional effects- can separate a gene from its regulatory regions e.g. promoter- rearrangement may switch regulatory elements with another gene- changes in chromatin structure - euchroamtic can be move to a region of heterchromatin (active to inactive0e.g. PAX6 HI causes aniridia. Also described in patients with translocation breakpoints 3’ to PAX6
199
Q

how can dels and dups ID genes by cyto

A

large dels and duos >5-10Mb can be detected by karyotype and smaller for arrray. Now evident that the majority of microdel or dup syndromes are due to the effects of a single gene. come are due to contiguous gene del e.g. PKD and TSC resulting in TSC with early onset chronic kidney diseasemiller-dieker syndrome- type 1 lissencephaly with facial dysmorphism. LIS1 gene loss has been identified as the cause of lissencephaly

200
Q

How can inversions assist gene identification?

A

breakpoints can directly interfere with a gene or have a positional effect similar to a translocation

201
Q

What are the potential problems of gene ID from cytogenetics?

A

co-occurrence of a balance rearrangement and phenotype may be coincidence and mask the presence of a second undetected genetic abnormalityregions can be large and may contain may genes- difficult to decide which to investigate. Can look for other patients with the same phenotype to narrow down the candidate genecontiguous gene deletions or duplications may be due to HI or TS for more than one gene in the region

202
Q

what population studies have used WES for gene discovery?

A

DDD project for DD- used array CGH and WES| 100K- WES (mainly of trios)

203
Q

How can targeted panels be used for gene discovery?what are the considerationsExample

A
  • used for genetically heterogeneous but clinically well define disorders e.g. epilepsy- can be virtual or targeted to specific genes during library prepExample- Hu et al used a paenl targeted to the genes on the X chormosome to investigate the cause of Xlinked ID in males and identified 7 novel genes including CLCN4consideration:- requires a list of candidate genes- usually close to 100% coverage- requires knowledge of the biological systems involved in disease.
204
Q

How can WES be used for gene discovery?

A

sequencing of all the coding regions of human DNA ~20,00 genes and 1-2% total genome- most commonly used ,ethod- suitable for genetically diverse cases and or multiple patterns of inheritance- no bias from using candidate gene list as all genes are sequenced- cheaper than WGS, lower interpretation burden and less data to store- may miss varaiant from incomplete coverage e.g. GC rich, homologous or repetitive regions- less able to detect SVs than WES- no coverage of non-coding regions (although variant in these regions are still a challenge to interpret)

205
Q

How can WGS be used for gene discovery?

A

sequencing of the whole genome including non-coding region- use statistical analysis and bioinformatic filtering to identify likely variants or genes- no bias, coverage of non-coding regions, better coverage of repetitive and GC rich regions, better detection of CNVs and SVs, can detect mosaic variants with sufficient coverage- but huge amount of data generated which is costly to analyse- still has limited coverage of tandem repeats e.g. trp repeats

206
Q

what are the analysis strategies for gene discovery using NGS

A
  1. filter out poor quality calls or those with a low minimum read depth (although this may mean that mosaics are missed)2. filter out population polys (Gnomad)- need to be aware of common mendelian or cancer genes e.g. F508 in CF3. does it fit with the inheritance pattern seen in the family?4. predicted consequence- LOF, missense, splice site- variants can be filtered based on this prediction. Usually investigate LOF variants first as they are the most likely to have impact on gene function5. specialist statistical tests to ID gene association with disease e.g. CAST Cohort allelic sum test ) compares total amount of variation in a gene between cases and controls6. systems/biological pathway analysis- genes can be filtered for those in known disease related pathways for those that interact with other genes previously associated with a similar phenotype.
207
Q

How does inheritance pattern influence the NGS testing strategy?

A

AR- sequence siblings to ID variants and parents to confirm. Expect homozygous variants in consanguineous families and compound het in non-consanguineousAD- het variant in affected bu not unaffected family members. Can start with trio and mapping can reduce the number of individuals to testdominant de novo- trio. variant present in proband but not parentsXlinked recessive- test affected male. should not be present in unaffected males but will be in carrier females. The more distantly related 2 males it is found in the greater the probability that it is disease relatedMosaic variants- compare variants in affected and unaffected tissue- this is usually sufficient for dn variants. Tissue of interest may not be accessible. Possible in skin e.g. for NF1

208
Q

what are the genetic and experimental methods for validating novel genes?

A

genetic- likely pathogenic variant which segregated with disease. experimental- protein interactions with previously implemented gene- has biological function consistent with pheno- expressed in relevant tissue and developmental time- can reproduce phenotype in model organism or rescue phenotype with WT

209
Q

what are the limitations to NGS based gene discovery?

A

interpretation- difficult in non-coding regions- functional studies can be complex and expensive or require patient tissue that is not always availablecohort size- often very few patients or small families available with rare phenotype- highly accurate phenotyping required to combine cohorts- risk of incidental findings (WES/WGS)