1000 Genome Slidedeck Flashcards
(37 cards)
Long haplotypes= what type of frequency
low
Length of a haplotype that a mutations is present on is proportional to
how old the mutation is
Recent=long=low frequency
why was wide and shallow coverage done in the 1000 genome project?
Wide=more people and more data
having more people means more variation in data and allows for the identification of common variants
Why was exon sequencing used in 1000 Genome project
sequencing is expensive
exons are the coding region so to find meaningful variants it would make sense to use the coding region
average distance of nucleotides between variants
number of variants over total space
Which would produce more accurate variable calls, low coverage WGS, or high coverage exome?
Snip chips are more accurate so variant calls
Pros of WGS
errors become big with little data
Pros of high coverage exomes
average of 80x
cons of high coverage exome
more expensive
Pros of SNP
high confidence and cheap
Why did the 1000 genomes project summarize variant sites with 0,1, and 2
Diploid=2 chromosomes
AA=0
AB=1
BB=2
Why is the evidence for a single genotype typically weak in low coverage regions
can be a sequencing error
not enough data to confirm
How can we address the problem of evidence being weak in low coverage regions
sequencing using SNP chips
type of variation we didn’t talk about
structural
what is another name given to regions of low complexitity
repetitive regions
What technology is used to help with repetitive regions
longer reads that can map out more unique data
Accessible genome
the fraction of the reference genome in which short-read data can lead to reliable variant discovery
Accessible genome percentage went from 85% to
94% now
why would individual calls be more accurate at common variants than at low frequency variants
common variants have more data and are more likely to be true than to be a sequencing error
variation among samples in genotype accuracy is primarily driven by sequencing depth- why is this true
more data=less sequencing errors
allows you to determine what are the variants
Moderate to high frequency variants tend to be
old
low frequency variants tend to be
new
New mutation equation
1/2N
Lower frequency variants are
population dependent- show up in one population and have not spread to others