1000 Genome Slidedeck Flashcards

(37 cards)

1
Q

Long haplotypes= what type of frequency

A

low

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Length of a haplotype that a mutations is present on is proportional to

A

how old the mutation is
Recent=long=low frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

why was wide and shallow coverage done in the 1000 genome project?

A

Wide=more people and more data
having more people means more variation in data and allows for the identification of common variants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why was exon sequencing used in 1000 Genome project

A

sequencing is expensive
exons are the coding region so to find meaningful variants it would make sense to use the coding region

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

average distance of nucleotides between variants

A

number of variants over total space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which would produce more accurate variable calls, low coverage WGS, or high coverage exome?

A

Snip chips are more accurate so variant calls

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Pros of WGS

A

errors become big with little data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Pros of high coverage exomes

A

average of 80x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

cons of high coverage exome

A

more expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Pros of SNP

A

high confidence and cheap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why did the 1000 genomes project summarize variant sites with 0,1, and 2

A

Diploid=2 chromosomes
AA=0
AB=1
BB=2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is the evidence for a single genotype typically weak in low coverage regions

A

can be a sequencing error
not enough data to confirm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can we address the problem of evidence being weak in low coverage regions

A

sequencing using SNP chips

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

type of variation we didn’t talk about

A

structural

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is another name given to regions of low complexitity

A

repetitive regions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What technology is used to help with repetitive regions

A

longer reads that can map out more unique data

17
Q

Accessible genome

A

the fraction of the reference genome in which short-read data can lead to reliable variant discovery

18
Q

Accessible genome percentage went from 85% to

19
Q

why would individual calls be more accurate at common variants than at low frequency variants

A

common variants have more data and are more likely to be true than to be a sequencing error

20
Q

variation among samples in genotype accuracy is primarily driven by sequencing depth- why is this true

A

more data=less sequencing errors
allows you to determine what are the variants

21
Q

Moderate to high frequency variants tend to be

22
Q

low frequency variants tend to be

23
Q

New mutation equation

24
Q

Lower frequency variants are

A

population dependent- show up in one population and have not spread to others

25
Why would we expect many low frequency variants
different environments and more people is what gets new variants
26
What would you expect for a population that is contracting
less new variants, more variants at a higher frequency
27
Are all variants equally important?
NO
28
Wobble
third base on codon is changed
29
Synonymous
same amino acid is coded for
30
nonsynonymous
different amino acid is coded for
31
How do you know if an individual has more or less variants than expected
Intron vs. exon placement of variation wobble nonsense synonymous nonsynonymous
32
How is it that we can have an average 150 broken genes but still be normal
It depends on other genes or factors environment also plays a role
33
Everyone carries
bad variants
34
Lots of variants in regulatory regions. Why
35
Why would regulatory sequence tolerate deleterious variations?
36
What is the primary reason to do imputation
fine mapping existing association signals and detecting new associations can fill in missing stuff and find variants
37
Rare variants need to be evaluated using
the correct null distribution