Week 8 (Variants & SNP Chips) Flashcards
_____ _________ based on multiple metrics (need to be determined empirically) for variant filtering
hard filtering
what are the two most important metrics in hard filtering?
- QualByDepth (QD)
- RMSMapping Quality (MQ)
VCF
variant call format
MQ (RMSMappingQuality) has a value of 40 associated with it. What does that mean?
it allows us to evaluate how good we think the gene mapped to the genome
where would you find a lower MQ (mapping quality)?
in repetitive sequences because there are multiple places it could go
sensitivity
identifying true positives
specificity
identifying true negatives
all variant callers produce errors. these errors can be classified as false positives and false negatives. when performing a genomic analysis, or any similar analysis for that matter, on has to balance sensitivity and specificity, what do the terms sensitivity and specificity mean in the context of variant calling?
sensitivity: trying to discover all the real variants
specificity: trying to limit the false positives that creep in when filters get too lenient
if you call a variant where one doesn’t exist this is a false __________
positive
if you fail to identify where a variant exists it is a false __________
negative
what type of error is considered the worst?
Type 1 error (we don’t want to say something is true if it is not)
Variant Quality Score Recalibration
does not actually recalibrate QUAL but creates a new score
what is the purpose of the variant quality score recalibration?
the purpose of this new score is to enable variant filtering in a way that allows analysts to balance sensitivity and specificity
sensitivity in variant quality score recalibration
trying to discover all the real variants
specificity in variant quality score recalibration
trying to limit the false positives that creep in when filters get too lenient
________ _______ ________ ___________ uses machine learning algorithms to learn from each dataset what the annotation profile of good variants vs bad variants
variant quality score recalibration
VQSR
variant quality score recalibration
key to VQSR is that you need a “________ _____” for training the model
truth set
what is 100% sensitivity?
calling every difference a variant
the recalibrated variant quality score provides a continuous estimate of the probability that each variant is true, allowing one to partition the call sets into quality ____________
tranches
__________ are essentially slices of variants, ranked by VQSLOD
tranches
high tranche
if you want more variants and are willing to accept false positives
middle tranche
if you want to remove most false positives but are also willing to remove some true variants
low tranche
if you only want highly accurate true variants with few false positives and willing to miss perhaps many true positives