how does sanger sequencing work
take DNA, compartmentalize/target regions of interest in genome
generate sequence-specific primers
these modified DNA bases lack 3'-OH on ribose moiety, so any replicating DNA chain into which they're added by DNA pol will beu nable to be further extended
these terminator bases are added into individual elongating DNA molecules
produces ladder of DNA chains of specific lengths
each chain is tagged by terminator molecule w/ unique flourescent molecule tag that's detected by reading device
how does next gen DNA sequencing process work
physically shear DNA, produce fragments
size select fragments for enzymatic attachment of adaptor primers
denature the DNA into single strands, adaptor primers used to capture individual fragments onto a sequencing matrix
amplify these molecules to make library, by PCR
add DNA polymerase + modified bases that aren't extendable but are tagged w/ a flourescent color, 1 for each base
generates the series of individual rxns that're located at a specific location on matrix and emit a specific color, allow ID of base added
what is massively parallel sequencing
once have done 1 round of next gen sequencing,
wash, removing all reagents
chemically modify, remove flourescent tags, DNA can be elongated again
add second reagents cycle
generates a 2nd piece of seuqnece info for each position
repeat this cycle multiple times, generate sequence data in parallel at massive scale
how are clusters visualized w/ next gen
measure dNTP-specific flourescence at individual rxn centers b/c each piece of light represnts an individual sequencing rxn
this is advanced microscopy
tihs requires lots of computational power
once use next gen sequencing to sequence, how do you apply that info
massively parallel sequence data analysis: probabilistic approach, align fragments against reference genome and find variance w/in that fragment of DNA
how can computer faster analyze next gen genome
1) split genome by chromosome, create many jobs
2) run jobs concurrently on diff cluster nodes
3) combine results into single output for further analysis
what is the computational challenege re: input of next gen sequencing
input n bp long sequences from sample, as short reads
map those back to reference genome to align them, map out genome
can map your reference back to specific range
but repetitive regions are a problem for this b/c hard to unambiguously assign to reference genome
what are limitations to computational abitilies of next gen
1) space needed is massive for storing image files and subsequent data
2) processing power needed for aligning huge number of relatively short sequence fragments (reads) thatre generated in order to ID positions w/ sequence variatns (polymorphisms, mutations)
need high performance computer assays, sophisticated computational algorithms to minimize the processing time needed to accomplish these tasks
sequence alignment vs databse mapping?
sequence alignment: comparing 2 sequences of DNA
database mapping: comparing many small sequences to one really big sequence
what happens w/ DNA fragment sequence once generated in next gen seq
must align sequence unambiguously to a specific chromosomal position
use coputer to generate best fits of each fragment to a genome region
highly repetitive regions are difficult to align well, arent sequenced this way
also can not align regions of genome which arent efficienctly amplified by PCR b/c of sequence identity (i.e. high GC content)
what is the significance of variants in next gen sequencing?
if sample fragments have variation from reference genome, could be incorrect seq assignment, poor alignment, experimental noise if in region w/ few seq reads
HOWEVER if true variation, hard to interpret b/c of:
1) incomplete knowledge of fxn of all genes
2) incomplete knowledge of range of tolerated variation in human populations
3) incomplete knowledge of effect of individual amino acid changes on protein fxn
4) incorrect assignments of pathogenicity in current mutation databases
what is the exome
protein coding portion of the genome
better understood than regulatory regions of genome that're noncoding
why might do exome sequnecing?
to reduct amount of variation to be interpreted in clinical seq sample for next gen seq
can capture specific DNA fragments representing the coding part of the genome, using specially designed primers that incorporate tag molecules
how does exome sequencing work
what can it be used on?
primers confer specificity to your target
primers incorporate tag molecule and use its physical characteristics to yield enrichment in DNA fragmnets of interest
cna capsure: whole exome, subset of known disease-associated genes (medical exome), or panel of genes for a specific condition (i.e. epilepsy)
efficacy of exome sequencing?
v helpful for clinical test especially in undiagnosed, mendelian disease - good for rare disease detection
has been used at Baylor CoM, 25% hit rate on first 250, exomes done
what are the possible applications of next gen seq?
1) transcription profiling
2) undiagnosed diseases
4) infectious diseases
what info does NGS transcriptional profiling provide?
1) tissue-specific mRNA abundance, expression across tissues or pathologic states
2) alternative splicing events in normal and diseased tissue
describe process of NGS for transcriptional profiling
take total RNA
create random hexamer primed cDNA
map to reference gene
do gene function analysis
understanding tissue-specific gene expression in healthy & disease states;
observe rare splicing events, better quantitation of expression
how can NGS help w/ undiagnosed disease?
patients who've exhausted medical testing and remain undiagnosed
if ID underlying genetic basis of disease, may be beneficial b/c prognostic, diagnostic (genetic counseling in the family), or therapeautic (rarely)
how many variants does WGS produce?
how do we analyze them, how does penetrance complicate analysis?
must define if they have small effect size, low penetrance, and thus are polymorphic in "normal" population - requires DB w/ large control pop
or if large effect size, high penetrance, and variant is de novo in proband, or inherited from a phenotypically normal parent - presumes variant's fully penetrant
how are variants studied?
filtration of variants by polymorphic frequency, false positives, inheritance models
what are limitations of NGS technology?
1) false-positive & false-negative variant calls increase w/ size of sequenced target
2) much varaibility among datasets in SNVs, indels, calls
3) sequence-specific limitations- highly GC or AT rich regions don't amplify well & extended repetitive sequence runs won't assemble, sequence well
how is NGS used to study cancer
1) study underlying disease biology - demonstrate clonal evolution of relapsed cancer
2) make ttmnt decisions based on ID of specific driver mutations that might be amenable to targeted therapies
3) RNA req and look at epigenome which is helpful for therapy
how is NGS used to study ID
rapid ID of microbial species from epidemic outbreaks, i.e. Haitian cholera outbreak after the earthquake
reconstructed phylogenetic relationships among strains of pathogen
can sequence a CSF sample, identify organism
how can genomic data be integrated into clinical practice?
apply genetic data in clinical practice - use whole exome data to ID individuals who carry genomic variants that confer specific disease susceptibilities
more possible as prices for genomic sequencing decline
Does NGS provide a comprehensive look at the entire genome? Why or why not?
Yes, you fragment the entire genome for NGS whereas w/ Sanger technique, target a part of the genome for study
Here use special DNA adaptors that amplify the entire genome for study
Would NGS provide data on mitochondrial DNA mutations?
No because mitochondria has its own genome
How is quantitation of gene expression accomplished by NGS?
Aligns fragments found with reference genomes of the population
Calculate frequency variance between patient and model reference genome
Which types of mutations are most likely to be unambiguously associated w/ clinical disease states?
Changes such as splicing changes or mutations in the exome or on RNA analysis, in coding regions
List 4 challenges associated with classifying variants as pathogenic or benign
- the reference genomes we have are incomplete
- there can be computing errors that may be variants but may be misreads by the computer
- variants are not necessarily pathogenic they could just be variants and we have more info on some populations and subpopulations than others making this especially a challenge in understudied populations
- cannot know if variant is de novo in the proband or inherited from a phenotypically normal parent
- penetrance hard to classify w/ NGS