Week 8 (Variants & SNP Chips) Flashcards by Emma Sellers

_____ _________ based on multiple metrics (need to be determined empirically) for variant filtering

hard filtering

How well did you know this?

Not at all

Perfectly

what are the two most important metrics in hard filtering?

QualByDepth (QD)
RMSMapping Quality (MQ)

How well did you know this?

Not at all

Perfectly

VCF

variant call format

How well did you know this?

Not at all

Perfectly

MQ (RMSMappingQuality) has a value of 40 associated with it. What does that mean?

it allows us to evaluate how good we think the gene mapped to the genome

How well did you know this?

Not at all

Perfectly

where would you find a lower MQ (mapping quality)?

in repetitive sequences because there are multiple places it could go

How well did you know this?

Not at all

Perfectly

sensitivity

identifying true positives

How well did you know this?

Not at all

Perfectly

specificity

identifying true negatives

How well did you know this?

Not at all

Perfectly

all variant callers produce errors. these errors can be classified as false positives and false negatives. when performing a genomic analysis, or any similar analysis for that matter, on has to balance sensitivity and specificity, what do the terms sensitivity and specificity mean in the context of variant calling?

sensitivity: trying to discover all the real variants
specificity: trying to limit the false positives that creep in when filters get too lenient

How well did you know this?

Not at all

Perfectly

if you call a variant where one doesn’t exist this is a false __________

positive

How well did you know this?

Not at all

Perfectly

if you fail to identify where a variant exists it is a false __________

negative

How well did you know this?

Not at all

Perfectly

what type of error is considered the worst?

Type 1 error (we don’t want to say something is true if it is not)

How well did you know this?

Not at all

Perfectly

Variant Quality Score Recalibration

does not actually recalibrate QUAL but creates a new score

How well did you know this?

Not at all

Perfectly

what is the purpose of the variant quality score recalibration?

the purpose of this new score is to enable variant filtering in a way that allows analysts to balance sensitivity and specificity

How well did you know this?

Not at all

Perfectly

sensitivity in variant quality score recalibration

trying to discover all the real variants

How well did you know this?

Not at all

Perfectly

specificity in variant quality score recalibration

trying to limit the false positives that creep in when filters get too lenient

How well did you know this?

Not at all

Perfectly

________ _______ ________ ___________ uses machine learning algorithms to learn from each dataset what the annotation profile of good variants vs bad variants

variant quality score recalibration

How well did you know this?

Not at all

Perfectly

VQSR

variant quality score recalibration

How well did you know this?

Not at all

Perfectly

key to VQSR is that you need a “________ _____” for training the model

truth set

How well did you know this?

Not at all

Perfectly

what is 100% sensitivity?

calling every difference a variant

How well did you know this?

Not at all

Perfectly

the recalibrated variant quality score provides a continuous estimate of the probability that each variant is true, allowing one to partition the call sets into quality ____________

tranches

How well did you know this?

Not at all

Perfectly

__________ are essentially slices of variants, ranked by VQSLOD

tranches

How well did you know this?

Not at all

Perfectly

high tranche

if you want more variants and are willing to accept false positives

How well did you know this?

Not at all

Perfectly

middle tranche

if you want to remove most false positives but are also willing to remove some true variants

How well did you know this?

Not at all

Perfectly

low tranche

if you only want highly accurate true variants with few false positives and willing to miss perhaps many true positives

How well did you know this?

Not at all

Perfectly

what are tranches?

slices in the variant quality scores, where to set the threshold to identify the amount of true positives and accept a number of false positives

tranches

what is a genotyping model and software that google has released?

Deep Variant

what is the standard genotyping model and software used in humans?

Deep Variant

WGS

whole genome sequence

what is a whole genome sequence (WGS)?

the sequence library

what is a SNP Chip used for?

to build a (relatively) low cost assay to genotype a large number of individuals

sample size is statistical ________

power

what is deep coverage?

30 x

approximately how much does it cost to run a whole genome sequencing on a mammalian genome at 30x coverage?

$1000

what is the difference between WGS and SNP chips related to variants?

- WGS captures "all" variation - SNP chips have lower number of variants but also a lower cost per sample

_______ is the largest genotype provider in the world

Neogen

what was the purpose of in-silico digest of reference genome with multiple restriction enzymes?

every time a sequence was seen it would cut it, from there you could compile the amount of reads you had from each segment and the repetitive elements (that you are not interested in) could be found because they had the most sections cut out

what is illuminated infinium chemistry?

small beads have unique barcodes, for each SNP 50 mer oligos flank it called probes, attach these SNP specific probes to the beads and then create a chip that has microwells for each of the beads to sit in, then deposit the beads on the chip to produce an array

what is the basics of the beads used in illumina indium chemistry?

the small beads have oligos hanging off of them that correspond to the section of the sequence that you want

for each SNP, synthesize a ____ mer oligo that flanks the SNP (probe)

what is a probe?

50 mer oligo that flanks the SNP

_____ base probe

infinitum I = ____ probe(s)

infinitum II = ____ probe(s)

what color bead is G and C in illumina infinitum chemistry?

green

what color bead is A and T in illumina infinitum chemistry?

red

G/C and A/T = ____________

infinium I

in illumina infinium chemistry, what happens if your variant is As AND Ts? How do you solve this?

it would all show up as red, so you would need to use two probes

why do we cluster SNPs?

so we can determine genotype

is this a well clustered SNP or a poorly clustered SNP?

well clustered SNP

are these example of a well clustered SNP or a poorly clustered SNP?

poorly clustered SNP

what are these clustered SNPs an example of?

improperly clustered SNPs, the automated system just got it wrong, so you should manually fix it

SNP chips are really accurate but things can go wrong. remember this when making decisions based on chip genotypes for any single SNP specifically. Why?

it depends on what error rate you are comfortable with, your willingness to be wrong

call rate per SNP

best indicator of genotype quality

call rate per individual

best indicator sample of DNA quality

2 key metrics for looking for errors:

1. call rate per SNP 2. call rate per individual

why might for an individual have a low call rate?

poor DNA (for example taking from a live cow vs a cow that has been dead for 1000 years)

what does it mean to impute?

taking missing data from data that you have already observed and filling in the gaps

what do the signs on the right symbolize?

whole genome sequence: it is high density and all the variants have been found

what do the signs on the left symbolize?

SNP Chip: low density

Week 8 (Variants & SNP Chips) Flashcards

(60 cards)