LEC43: Intro to Genomics Flashcards Preview

MCG > LEC43: Intro to Genomics > Flashcards

Flashcards in LEC43: Intro to Genomics Deck (40):

what does every human cell contain?

complete genetic code


when are chromosomes visualizable?

during mitosis, when chromosomes condense 


what was first genome screen technology?

karyotyping by G-banding 



what was reference human genome, its use?

traditional sequencing method: took DNA from individual, arranged into pieces of chromosomes, chopped up, individually sequenced, stitched back together, compared to reference human genome 

= sanger sequencing

costly and slow, not method anymore


what did human genome project do?

sequence entirety of human genome 

took 10 yrs, $1 billion

info acquired allowed cataloging of complete set of human genes 


what did human genome proj find?

complete set of human genes = similar to number observed in other model organisms like mouse, watercress plant, roundworm 

explanation: complexity in mammals due to alternative splicing, permitting increased number of potential proteins


how much of our DNA is protein coding genes?



what is difference between number of genes, gene density, in humans vs plants?

genome size is similar 

but gene number is much greater in humans 

thus human's avg gene density is much lower; only 1.5% of our DNA encodes proteins

due to alternative splicing


what is result of alternative splicing?

from 1 single gene, exons' arrangement can be different, get different resulting proteins!

only 1.5% of our DNA is protein coding, though


how much of non-gene DNA is conserved? 

2-5% of non-gene DNA is conserved through evolution 



if a piece of DNA is conserved, what does that suggest?

that it's important 

basis for idea that there's functionality among non-gene portions of our DNA that've been conserved through ages/across animals

suggests these regions have important regulatory role in genome function


HOXD gene cluster function?

basic body patterning control 

example of conserved region of essential proteins that regulate genome function


how much of our genome is repeat elements?

what are they relics of?


relics of retrovirsues and 'genomic parasites' that invaded our DNA in evolutionary history, i.e. HIV - 'junk DNA'


what causes finger webbing?

mutation in hoxD gene cluster, as HoxD genes encode for basic body patterning 



segmental duplications?

blocks of DNA 1-500 kb in length that occur at multiple sites in the genome, share a high level of sequence identity

~5% of our DNA 

can be intrachromosomal (same chromsome) duplications or interchromosomal (between chromosomes)


what role do segmental duplications play in genetic disease?

these large highly idneitical repeats often flank certain regions of the genome that are thus prone to misalignment during meiosis, leading to improper recombination

if any repeats are dosage sensitive, results in genomic deletions and/or duplications that are associated w/ a particular genetic disease 


examples of recurrent genomic disorders? 

velocardiofacial syndrome 

angelman/prader-willi syndrome 

charcot-marie tooth disease

x-linked hemophilia 

all caused by mechanism of recombination between large high-identity repeats


recurrent deletion on chromosome 15 causes what/example of what?

causes intellectual disability, dysmorphisms, epilepsy 

deletion = most common known genetic cause of epilepsy, present in ~1% of epilepsy patients 

example of recurrent genomic disorder caused by aberrant recombination between large high-identity repeats


how many bases of difference exist between 2 individuals?

avg ~6 million bases, ~0.1% of genome 

means 99.9% shared DNA among humans


what are the types of variation in the human genome? from smallest to largest

1) single base-pair changes - point mutatoins/SNVs/SNPs

2) small insertions/deletions & microsatellites

3) mobile elements - retroelement insertions (300 bp - 10kb in size) 

4) large-scale genomic variation (>1 kb) - deletions, duplications

5) chromosomal variation - translocatoins, inversions, trisomy


most common type of genetic variant?

SNVs, single nucleotide variants or polymorphism or point mutation

occurs 1x every 1,000 bp = 3-5 million SNVs in individual genome




where do SNVs usually occur?

most in non-coding regions - may have regulatory effects, but not well understood

however, 10,000 per genome (0.3%) are in coding regions, & cause changes in protein sequence 



what do SNPs within coding regions cause?

sometimes, no change, since a.as are reduntant

sometimes, changes amino acid, different protein results


what do SNPs outside of coding regions cause?

how much of SNPs are outside of genes?

can influence disease by altering gene regulation 

i.e. if change a ntd within txn factor binding site code, txn factor may not recognize, may not bind to DNA, no activation occurs, gene may be OFF when should be ON 

99.7% of SNPs are outside of genes


what does microarry on SNP chips show?

useable to genotype millions of SNPs in a single experiment 

can find identity of a base pair at an SNP

floursescently labeled DNA is hybridized to an array of probes immobilized on a glass slide that bind either to normal or variant DNA 



how does array CGH work?

label a control and patient DNA w/ flourscent dyes 

cohybridize them together onto a slide that contians DNA corresponding to different parts of the genome 

flourescently labeled DNA hybridizes to the slide 

scan it, get image 

YELLOW indicates no gain or loss or duplication on the array

however if see color of flourescence of sample, know there is duplication in, for ex., patient's DNA, at that position


what does array CGH enable?

detection of copy number changes that're too small to be seen by karyotyping


what do different chip/microarray tests give info about?

sattistical testing for assocaition between diseases of interset and SNPs at specific chromosomal locations

DNA copy number across genome 

detection of sub-microscopic gains or losses of material for rare conditions and common conditions alike 

duplications of genomic regions that're associated w/ protection from disease



how can array-based technology be used to inform diagnosis and treatment of cancer?

take cancerous tumor DNA and match to control DNA from same person's blood or non-tumorous site 

compare the 2 to see what happened in tumor 

see where chromosomes have extra copies, see deletion of known tumor suppressor

see massive amplification of EGFR gene region, growth-promoting gene and see deletion of tumor suppressor genes: so can develop drugs to inhibit this gene where see amplification b/c clearly key event in tumorogenesis 



what are tandem repeats?

serial repetition of 2 bases (acacacacac...)

inherently unstable highly repetitive sequences

are rich source of variation in genome b/c polymerase working on DNA at repeat site will add or delete copies of repeats 

highly variable regoins btwn individuals


what are triple repeat expansions associated w?

neurological diseases 



what is cause of fragile x?

CGG motif repeat has 5-50 copies in healthy individuals; in ill individuals, can be up to 50-200 copies; in patient w/ fragile X, hundreds/thousands of repeats 

this switches off nearby gene, causes disease 

causes breakage of chromosome, making DNA polymerase unable to replicate 

causes mental retardation w/ distinct dysmorphic features, accompanied by a 'fragie site' on X chromosome (= original name)


what can a large tandem repeat contain?

entire genes 

may be true for genes present in multiple copies, e.g. salivary amylase


are genomic duplication regions ever protective from disease?


eg. cheokine CCL3L1, inflammatory signaling molecule

it's binding partner of CCR5, major receptor molecule for HIV cell entry 

more copies of CCL3L1 gene is inversely correlated w/ susceptibility to HIV infection 


is complete personal genome sequencing expensive?

no! quick and cheap now


what is focus of next generation sequencing?

whereas old sanger sequencing focused on 1 gene at a time, 

next gen sequencing permits analysis of massively parallel sequencing- more data simultaneously



describe process of next generation genome sequencing

1) extract genomic DNA 

2) shear DNA into small 200-500 ntd pieces 

3) ligate adaptors to ends of fragments 

4) enrich and amplify library by PCR 

5) sequence on microscopic scale, from adaptor w/ platform

wash through w/ bases that floursece differently; each cluster of DNA will flouresce 

measure that flourescence or electrochemical energy, detemrine which base was added durign each step of DNA synthesis rxn


describe whole genome shotgun sequencing

can stictch back together fragments of DNA by mapping onto reference human genome 

due to random nature of sequences, depth of coverage at any 1 place in genome is variable 

reads also contain errors (1%)

therefore need high redundancy to generate high-quality gap-free sequence (20x-20x)



what is imperfect about whole genome shotgun sequencing?

random errors in sequencing occur

thus cannot know if heterozygous SNV or sequencing error or random error has occurred when a base is mismatched 

so SNV calling in genome sequencing is a probabilistic exercise


what are barriers to personalized genomics being the be-all/end-all of medical treatment today?

cost is falling rapidly ($1500-2k now)

BUT knowledge of how to interpret consequences of majority of genetic variation is limited 

geneticists only know phenotypes caused by mutations in ~3200 of ~25,000 human genes (13%) 

each human has ~3 million SNVs, 1200 CNVs - what are effects of these on individual disease risk? 

even for the ~10,000 that change a.a. sequence of proteins, currently we can interpret effects of a minority of these, + these are 0.3% of each person's variation

Decks in MCG Class (77):