LEC48: Applications of Next Gen Sequencing Flashcards Preview

MCG > LEC48: Applications of Next Gen Sequencing > Flashcards

Flashcards in LEC48: Applications of Next Gen Sequencing Deck (30):

how does sanger sequencing work

take DNA, compartmentalize/target regions of interest in genome

generate sequence-specific primers 

these modified DNA bases lack 3'-OH on ribose moiety, so any replicating DNA chain into which  they're added by DNA pol will beu nable to be further extended 

these terminator bases are added into individual elongating DNA molecules 

produces ladder of DNA chains of specific lengths 

each chain is tagged by terminator molecule w/ unique flourescent molecule tag that's detected by reading device 


how does next gen DNA sequencing process work

physically shear DNA, produce fragments 

size select fragments for enzymatic attachment of adaptor primers 

denature the DNA into single strands, adaptor primers used to capture individual fragments onto a sequencing matrix

amplify these molecules to make library, by PCR 

add DNA polymerase + modified bases that aren't extendable but are tagged w/ a flourescent color, 1 for each base

generates the series of individual rxns that're located at a specific location on matrix and emit a specific color, allow ID of base added



what is massively parallel sequencing

once have done 1 round of next gen sequencing, 

wash, removing all reagents 

chemically modify, remove flourescent tags, DNA can be elongated again 

add second reagents cycle

generates a 2nd piece of seuqnece info for each position 

repeat this cycle multiple times, generate sequence data in parallel at massive scale 


how are clusters visualized w/ next gen

measure dNTP-specific flourescence at individual rxn centers b/c each piece of light represnts an individual sequencing rxn

this is advanced microscopy 

tihs requires lots of computational power 


once use next gen sequencing to sequence, how do you apply that info

massively parallel sequence data analysis: probabilistic approach, align fragments against reference genome and find variance w/in that fragment of DNA


how can computer faster analyze next gen genome

1) split genome by chromosome, create many jobs

2) run jobs concurrently on diff cluster nodes 

3) combine results into single output for further analysis


what is the computational challenege re: input of next gen sequencing

input n bp long sequences from sample, as short reads 

map those back to reference genome to align them, map out genome 

can map your reference back to specific range 

but repetitive regions are a problem for this b/c hard to unambiguously assign to reference genome


what are limitations to computational abitilies of next gen

1) space needed is massive for storing image files and subsequent data 

2) processing power needed for aligning huge number of relatively short sequence fragments (reads) thatre generated in order to ID positions w/ sequence variatns (polymorphisms, mutations) 

need high performance computer assays, sophisticated computational algorithms to minimize the processing time needed to accomplish these tasks


sequence alignment vs databse mapping?

sequence alignment: comparing 2 sequences of DNA 

database mapping: comparing many small sequences to one really big sequence


what happens w/ DNA fragment sequence once generated in next gen seq

must align sequence unambiguously to a specific chromosomal position 

use coputer to generate best fits of each fragment to a genome region 

highly repetitive regions are difficult to align well, arent sequenced this way 

also can not align regions of genome which arent efficienctly amplified by PCR b/c of sequence identity (i.e. high GC content)


what is the significance of variants in next gen sequencing?

if sample fragments have variation from reference genome, could be incorrect seq assignment, poor alignment, experimental noise if in region w/ few seq reads

HOWEVER if true variation, hard to interpret b/c of:

1) incomplete knowledge of fxn of all genes 

2) incomplete knowledge of range of tolerated variation in human populations 

3) incomplete knowledge of effect of individual amino acid changes on protein fxn

4) incorrect assignments of pathogenicity in current mutation databases


what is the exome

protein coding portion of the genome

better understood than regulatory regions of genome that're noncoding


why might do exome sequnecing?

to reduct amount of variation to be interpreted in clinical seq sample for next gen seq 

can capture specific DNA fragments representing the coding part of the genome, using specially designed primers that incorporate tag molecules 



how does exome sequencing work

what can it be used on?

primers confer specificity to your target 

primers incorporate tag molecule and use its physical characteristics to yield enrichment in DNA fragmnets of interest 

cna capsure: whole exome, subset of known disease-associated genes (medical exome), or panel of genes for a specific condition (i.e. epilepsy)


efficacy of exome sequencing?

v helpful for clinical test especially in undiagnosed, mendelian disease - good for rare disease detection

has been used at Baylor CoM, 25% hit rate on first 250, exomes done



what are the possible applications of next gen seq?

1) transcription profiling 

2) undiagnosed diseases 

3) cancer

4) infectious diseases 



what info does NGS transcriptional profiling provide?

1) tissue-specific mRNA abundance, expression across tissues or pathologic states

2) alternative splicing events in normal and diseased tissue


describe process of NGS for transcriptional profiling

take total RNA 

fragment it 

create random hexamer primed cDNA 

map to reference gene 

do gene function analysis 

understanding tissue-specific gene expression in healthy & disease states;

observe rare splicing events, better quantitation of expression


how can NGS help w/ undiagnosed disease?

patients who've exhausted medical testing and remain undiagnosed 

if ID underlying genetic basis of disease, may be beneficial b/c prognostic, diagnostic (genetic counseling in the family), or therapeautic (rarely)


how many variants does WGS produce? 

how do we analyze them, how does penetrance complicate analysis?

~4.8 million 

must define if they have small effect size, low penetrance, and thus are polymorphic in "normal" population - requires DB w/ large control pop

or if large effect size, high penetrance, and variant is de novo in proband, or inherited from a phenotypically normal parent - presumes variant's fully penetrant


how are variants studied?

filtration of variants by polymorphic frequency, false positives, inheritance models


what are limitations of NGS technology?

1) false-positive & false-negative variant calls increase w/ size of sequenced target 

2) much varaibility among datasets in SNVs, indels, calls 

3) sequence-specific limitations- highly GC or AT rich regions don't amplify well & extended repetitive sequence runs won't assemble, sequence well


how is NGS used to study cancer

1) study underlying disease biology - demonstrate clonal evolution of relapsed cancer 

2) make ttmnt decisions based on ID of specific driver mutations that might be amenable to targeted therapies 

3) RNA req and look at epigenome which is helpful for therapy


how is NGS used to study ID

rapid ID of microbial species from epidemic outbreaks, i.e. Haitian cholera outbreak after the earthquake

reconstructed phylogenetic relationships among strains of pathogen 

can sequence a CSF sample, identify organism 


how can genomic data be integrated into clinical practice?

personalized medicine 

apply genetic data in clinical practice - use whole exome data to ID individuals who carry genomic variants that confer specific disease susceptibilities 

more possible as prices for genomic sequencing decline



Does NGS provide a comprehensive look at the entire genome? Why or why not? 

Yes, you fragment the entire genome for NGS whereas w/ Sanger technique, target a part of the genome for study

Here use special DNA adaptors that amplify the entire genome for study 


Would NGS provide data on mitochondrial DNA mutations? 

No because mitochondria has its own genome


How is quantitation of gene expression accomplished by NGS? 

Massive computing

Aligns fragments found with reference genomes of the population

Calculate frequency variance between patient and model reference genome 


Which types of mutations are most likely to be unambiguously associated w/ clinical disease states? 

Changes such as splicing changes or mutations in the exome or on RNA analysis, in coding regions



List 4 challenges associated with classifying variants as pathogenic or benign 

  1. the reference genomes we have are incomplete
  2. there can be computing errors that may be variants but may be misreads by the computer
  3. variants are not necessarily pathogenic they could just be variants and we have more info on some populations and subpopulations than others making this especially a challenge in understudied populations
  4. cannot know if variant is de novo in the proband or inherited from a phenotypically normal parent
  5. penetrance hard to classify w/ NGS


Decks in MCG Class (77):