what does every human cell contain?
complete genetic code
when are chromosomes visualizable?
during mitosis, when chromosomes condense
what was first genome screen technology?
karyotyping by G-banding
what was reference human genome, its use?
traditional sequencing method: took DNA from individual, arranged into pieces of chromosomes, chopped up, individually sequenced, stitched back together, compared to reference human genome
= sanger sequencing
costly and slow, not method anymore
what did human genome project do?
sequence entirety of human genome
took 10 yrs, $1 billion
info acquired allowed cataloging of complete set of human genes
what did human genome proj find?
complete set of human genes = similar to number observed in other model organisms like mouse, watercress plant, roundworm
explanation: complexity in mammals due to alternative splicing, permitting increased number of potential proteins
how much of our DNA is protein coding genes?
what is difference between number of genes, gene density, in humans vs plants?
genome size is similar
but gene number is much greater in humans
thus human's avg gene density is much lower; only 1.5% of our DNA encodes proteins
due to alternative splicing
what is result of alternative splicing?
from 1 single gene, exons' arrangement can be different, get different resulting proteins!
only 1.5% of our DNA is protein coding, though
how much of non-gene DNA is conserved?
2-5% of non-gene DNA is conserved through evolution
if a piece of DNA is conserved, what does that suggest?
that it's important
basis for idea that there's functionality among non-gene portions of our DNA that've been conserved through ages/across animals
suggests these regions have important regulatory role in genome function
HOXD gene cluster function?
basic body patterning control
example of conserved region of essential proteins that regulate genome function
how much of our genome is repeat elements?
what are they relics of?
relics of retrovirsues and 'genomic parasites' that invaded our DNA in evolutionary history, i.e. HIV - 'junk DNA'
what causes finger webbing?
mutation in hoxD gene cluster, as HoxD genes encode for basic body patterning
blocks of DNA 1-500 kb in length that occur at multiple sites in the genome, share a high level of sequence identity
~5% of our DNA
can be intrachromosomal (same chromsome) duplications or interchromosomal (between chromosomes)
what role do segmental duplications play in genetic disease?
these large highly idneitical repeats often flank certain regions of the genome that are thus prone to misalignment during meiosis, leading to improper recombination
if any repeats are dosage sensitive, results in genomic deletions and/or duplications that are associated w/ a particular genetic disease
examples of recurrent genomic disorders?
charcot-marie tooth disease
all caused by mechanism of recombination between large high-identity repeats
recurrent deletion on chromosome 15 causes what/example of what?
causes intellectual disability, dysmorphisms, epilepsy
deletion = most common known genetic cause of epilepsy, present in ~1% of epilepsy patients
example of recurrent genomic disorder caused by aberrant recombination between large high-identity repeats
how many bases of difference exist between 2 individuals?
avg ~6 million bases, ~0.1% of genome
means 99.9% shared DNA among humans
what are the types of variation in the human genome? from smallest to largest
1) single base-pair changes - point mutatoins/SNVs/SNPs
2) small insertions/deletions & microsatellites
3) mobile elements - retroelement insertions (300 bp - 10kb in size)
4) large-scale genomic variation (>1 kb) - deletions, duplications
5) chromosomal variation - translocatoins, inversions, trisomy
most common type of genetic variant?
SNVs, single nucleotide variants or polymorphism or point mutation
occurs 1x every 1,000 bp = 3-5 million SNVs in individual genome
where do SNVs usually occur?
most in non-coding regions - may have regulatory effects, but not well understood
however, 10,000 per genome (0.3%) are in coding regions, & cause changes in protein sequence
what do SNPs within coding regions cause?
sometimes, no change, since a.as are reduntant
sometimes, changes amino acid, different protein results
what do SNPs outside of coding regions cause?
how much of SNPs are outside of genes?
can influence disease by altering gene regulation
i.e. if change a ntd within txn factor binding site code, txn factor may not recognize, may not bind to DNA, no activation occurs, gene may be OFF when should be ON
99.7% of SNPs are outside of genes
what does microarry on SNP chips show?
useable to genotype millions of SNPs in a single experiment
can find identity of a base pair at an SNP
floursescently labeled DNA is hybridized to an array of probes immobilized on a glass slide that bind either to normal or variant DNA
how does array CGH work?
label a control and patient DNA w/ flourscent dyes
cohybridize them together onto a slide that contians DNA corresponding to different parts of the genome
flourescently labeled DNA hybridizes to the slide
scan it, get image
YELLOW indicates no gain or loss or duplication on the array
however if see color of flourescence of sample, know there is duplication in, for ex., patient's DNA, at that position
what does array CGH enable?
detection of copy number changes that're too small to be seen by karyotyping
what do different chip/microarray tests give info about?
sattistical testing for assocaition between diseases of interset and SNPs at specific chromosomal locations
DNA copy number across genome
detection of sub-microscopic gains or losses of material for rare conditions and common conditions alike
duplications of genomic regions that're associated w/ protection from disease
how can array-based technology be used to inform diagnosis and treatment of cancer?
take cancerous tumor DNA and match to control DNA from same person's blood or non-tumorous site
compare the 2 to see what happened in tumor
see where chromosomes have extra copies, see deletion of known tumor suppressor
see massive amplification of EGFR gene region, growth-promoting gene and see deletion of tumor suppressor genes: so can develop drugs to inhibit this gene where see amplification b/c clearly key event in tumorogenesis
what are tandem repeats?
serial repetition of 2 bases (acacacacac...)
inherently unstable highly repetitive sequences
are rich source of variation in genome b/c polymerase working on DNA at repeat site will add or delete copies of repeats
highly variable regoins btwn individuals
what are triple repeat expansions associated w?
what is cause of fragile x?
CGG motif repeat has 5-50 copies in healthy individuals; in ill individuals, can be up to 50-200 copies; in patient w/ fragile X, hundreds/thousands of repeats
this switches off nearby gene, causes disease
causes breakage of chromosome, making DNA polymerase unable to replicate
causes mental retardation w/ distinct dysmorphic features, accompanied by a 'fragie site' on X chromosome (= original name)
what can a large tandem repeat contain?
may be true for genes present in multiple copies, e.g. salivary amylase
are genomic duplication regions ever protective from disease?
eg. cheokine CCL3L1, inflammatory signaling molecule
it's binding partner of CCR5, major receptor molecule for HIV cell entry
more copies of CCL3L1 gene is inversely correlated w/ susceptibility to HIV infection
is complete personal genome sequencing expensive?
no! quick and cheap now
what is focus of next generation sequencing?
whereas old sanger sequencing focused on 1 gene at a time,
next gen sequencing permits analysis of massively parallel sequencing- more data simultaneously
describe process of next generation genome sequencing
1) extract genomic DNA
2) shear DNA into small 200-500 ntd pieces
3) ligate adaptors to ends of fragments
4) enrich and amplify library by PCR
5) sequence on microscopic scale, from adaptor w/ platform
wash through w/ bases that floursece differently; each cluster of DNA will flouresce
measure that flourescence or electrochemical energy, detemrine which base was added durign each step of DNA synthesis rxn
describe whole genome shotgun sequencing
can stictch back together fragments of DNA by mapping onto reference human genome
due to random nature of sequences, depth of coverage at any 1 place in genome is variable
reads also contain errors (1%)
therefore need high redundancy to generate high-quality gap-free sequence (20x-20x)
what is imperfect about whole genome shotgun sequencing?
random errors in sequencing occur
thus cannot know if heterozygous SNV or sequencing error or random error has occurred when a base is mismatched
so SNV calling in genome sequencing is a probabilistic exercise
what are barriers to personalized genomics being the be-all/end-all of medical treatment today?
cost is falling rapidly ($1500-2k now)
BUT knowledge of how to interpret consequences of majority of genetic variation is limited
geneticists only know phenotypes caused by mutations in ~3200 of ~25,000 human genes (13%)
each human has ~3 million SNVs, 1200 CNVs - what are effects of these on individual disease risk?
even for the ~10,000 that change a.a. sequence of proteins, currently we can interpret effects of a minority of these, + these are 0.3% of each person's variation