Week 8 (Variants & SNP Chips) Flashcards

1
Q

_____ _________ based on multiple metrics (need to be determined empirically) for variant filtering

A

hard filtering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are the two most important metrics in hard filtering?

A
  • QualByDepth (QD)
  • RMSMapping Quality (MQ)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

VCF

A

variant call format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

MQ (RMSMappingQuality) has a value of 40 associated with it. What does that mean?

A

it allows us to evaluate how good we think the gene mapped to the genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

where would you find a lower MQ (mapping quality)?

A

in repetitive sequences because there are multiple places it could go

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

sensitivity

A

identifying true positives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

specificity

A

identifying true negatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

all variant callers produce errors. these errors can be classified as false positives and false negatives. when performing a genomic analysis, or any similar analysis for that matter, on has to balance sensitivity and specificity, what do the terms sensitivity and specificity mean in the context of variant calling?

A

sensitivity: trying to discover all the real variants
specificity: trying to limit the false positives that creep in when filters get too lenient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

if you call a variant where one doesn’t exist this is a false __________

A

positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

if you fail to identify where a variant exists it is a false __________

A

negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what type of error is considered the worst?

A

Type 1 error (we don’t want to say something is true if it is not)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Variant Quality Score Recalibration

A

does not actually recalibrate QUAL but creates a new score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the purpose of the variant quality score recalibration?

A

the purpose of this new score is to enable variant filtering in a way that allows analysts to balance sensitivity and specificity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

sensitivity in variant quality score recalibration

A

trying to discover all the real variants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

specificity in variant quality score recalibration

A

trying to limit the false positives that creep in when filters get too lenient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

________ _______ ________ ___________ uses machine learning algorithms to learn from each dataset what the annotation profile of good variants vs bad variants

A

variant quality score recalibration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

VQSR

A

variant quality score recalibration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

key to VQSR is that you need a “________ _____” for training the model

A

truth set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is 100% sensitivity?

A

calling every difference a variant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

the recalibrated variant quality score provides a continuous estimate of the probability that each variant is true, allowing one to partition the call sets into quality ____________

A

tranches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

__________ are essentially slices of variants, ranked by VQSLOD

A

tranches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

high tranche

A

if you want more variants and are willing to accept false positives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

middle tranche

A

if you want to remove most false positives but are also willing to remove some true variants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

low tranche

A

if you only want highly accurate true variants with few false positives and willing to miss perhaps many true positives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
what are tranches?
slices in the variant quality scores, where to set the threshold to identify the amount of true positives and accept a number of false positives
26
slices in the variant quality scores, where to set the threshold to identify the amount of true positives and accept a number of false positives
tranches
27
what is a genotyping model and software that google has released?
Deep Variant
28
what is the standard genotyping model and software used in humans?
Deep Variant
29
WGS
whole genome sequence
30
what is a whole genome sequence (WGS)?
the sequence library
31
what is a SNP Chip used for?
to build a (relatively) low cost assay to genotype a large number of individuals
32
sample size is statistical ________
power
33
what is deep coverage?
30 x
34
approximately how much does it cost to run a whole genome sequencing on a mammalian genome at 30x coverage?
$1000
35
what is the difference between WGS and SNP chips related to variants?
- WGS captures "all" variation - SNP chips have lower number of variants but also a lower cost per sample
36
_______ is the largest genotype provider in the world
Neogen
37
what was the purpose of in-silico digest of reference genome with multiple restriction enzymes?
every time a sequence was seen it would cut it, from there you could compile the amount of reads you had from each segment and the repetitive elements (that you are not interested in) could be found because they had the most sections cut out
38
what is illuminated infinium chemistry?
small beads have unique barcodes, for each SNP 50 mer oligos flank it called probes, attach these SNP specific probes to the beads and then create a chip that has microwells for each of the beads to sit in, then deposit the beads on the chip to produce an array
39
what is the basics of the beads used in illumina indium chemistry?
the small beads have oligos hanging off of them that correspond to the section of the sequence that you want
40
for each SNP, synthesize a ____ mer oligo that flanks the SNP (probe)
50
41
what is a probe?
50 mer oligo that flanks the SNP
42
_____ base probe
50
43
infinitum I = ____ probe(s)
2
44
infinitum II = ____ probe(s)
1
45
what color bead is G and C in illumina infinitum chemistry?
green
46
what color bead is A and T in illumina infinitum chemistry?
red
47
G/C and A/T = ____________
infinium I
48
in illumina infinium chemistry, what happens if your variant is As AND Ts? How do you solve this?
it would all show up as red, so you would need to use two probes
49
why do we cluster SNPs?
so we can determine genotype
50
is this a well clustered SNP or a poorly clustered SNP?
well clustered SNP
51
are these example of a well clustered SNP or a poorly clustered SNP?
poorly clustered SNP
52
what are these clustered SNPs an example of?
improperly clustered SNPs, the automated system just got it wrong, so you should manually fix it
53
SNP chips are really accurate but things can go wrong. remember this when making decisions based on chip genotypes for any single SNP specifically. Why?
it depends on what error rate you are comfortable with, your willingness to be wrong
54
call rate per SNP
best indicator of genotype quality
55
call rate per individual
best indicator sample of DNA quality
56
2 key metrics for looking for errors:
1. call rate per SNP 2. call rate per individual
57
why might for an individual have a low call rate?
poor DNA (for example taking from a live cow vs a cow that has been dead for 1000 years)
58
what does it mean to impute?
taking missing data from data that you have already observed and filling in the gaps
59
what do the signs on the right symbolize?
whole genome sequence: it is high density and all the variants have been found
60
what do the signs on the left symbolize?
SNP Chip: low density