Week 11 (1000 Genomes Project) Flashcards
what is the 1000 genome project?
The 1000 Genomes Project is an international research consortium that was set up in 2007 with the aim of sequencing the genomes of at least 1,000 volunteers from multiple populations worldwide in order to improve our understanding of the genetic contribution to human health and disease.
what was the first model for the 1000 genomes project? why?
humans! human research is more funded so they had the money to do this.
what combination of sequencing tools did they use to complete the 1000 genome project?
- low coverage whole genome
- exome sequencing
the 1000s genome project validated a haplotype map of ____ _____ single nucleotide polymorphisms
38 million
why do low frequency variants tend to be recent?
a frequency is the amount of times something shows up, so something that is new tends to have a lower frequency (like a new variant or mutation in the population)
is it possible for mutations to occur over time? if so, how?
yes! possible mutations can occur during cell division
what is the equation that you use to determine the frequency of a mutation in a population?
1/2N (N=number of individuals)
what is the chance of transmission from parent to offspring?
50/50 (to transmit ot to not transmit)
in every generation recombination occurs, this is an example of _______ __________
linkage disequilibrium
while doing the 1000s genome project, they found 3.6 million SNPs per individual. On average, how many variants or how different is the genome?
0.1%
what is low coverage?
<5%
what is high coverage?
> 20%
why did the 1000 genomes project use 5x coverage?
it was really expensive to do more than that! (it cost $5 million)
what is the typical amount of coverage that we use today?
30x
we transmit ________ NOT _______ to the next generation
chromosomes; alleles
what amount of coverage did the 1000s genome project use?
low coverage (2-6x)
the 1000s genome project used wide sampling and low coverage, why?
they wanted to characterize common variation, they were able to sample more individuals but sequence at a lower coverage to achieve this
how did the 1000 genomes project contract an integrated map of variation?
- primary data
- canidate variants and quality metrics
- variant calls and genotype likelihoods
- integrated haplotypes
which would produce more accurate variant calls, low coverage WGS or high coverage exome?
high coverage exome
pro and con of low coverage WGS?
- pro: cost effective, can conduct large scale studies
- con: less accurate variant calls
pro and con to high coverage exome?
- pro: more accurate variant calls
- con: only sequencing 2% of the genome
what are exomes sequencing?
they sequence only exons (the protein coding regions) and nothing else in the genome, so only 2% of the genome is sequenced
why 0, 1, or 2 copies of a variant for an individual?
that is the amount of chromosomes available, so you can either have it on neither, one, or both
why is the evidence for a single genotype typically weak in low coverage regions?
(low coverage=5x), at each position we sequences only 5 reads so there are only 5 reads available to support reference calls