Week 6 (QC and Alignment) Flashcards
the ability to resolve a repetitive structure is dependent on the __________ of the molecules in your library
length
Sanger sequencing is ~_______bp accurate
1000
short read sequencers
illumina and element
short read sequencers use <____ bp and have a _______ error rate
<150bp; low error rate («1%)
long read sequencers
- PacBio
- Oxford Nanopore
long read sequencers have a lower number of reads but much longer being ____ to _____ kb, with a _______ error rate of ~____%
10’s to 100’s kb; higher error rate of ~1%
what are the standard file formats?
- FASTA
- FASTQ
FASTA has ___ parts
2
what are the parts of FASTA
- > sequence name (always has >)
- sequence
FASTQ has ___ parts
4
what are the parts of FASTQ
- @ sequence name (and other info)
- sequence
- (sometimes other info)
- quality value
when is FASTA used?
when per base quality is not needed
what does FASTA present?
presents only the sequence itself
the FASTA sequence name always starts with ______ symbol
>
when is FASTQ used?
when per base quality is needed
what does FASTQ present?
presents the sequence and the estimated base quality
FASTQ sequence name always starts with ____ symbol
@
when reading the sequence in FASTQ, what does the letter “N” symbolize?
any of the 4 nucleotides, it did not know which nucleotide it was, so the more N’s the worse the quality
the _______ the Q value the more accurate the sequence is
higher
Quality scores increase by a factor of _____
10
Qphred equation
Qphred = -10log10P(error)
At any given position in a sequence, the base present is either A/C/T/G but we cannot _________ observe the base. The base that is produced from a DNA sequencer is an observation based on some biochemical/physical property that has an ________.
directly; error
Q20 is _____ times more accurate then Q10
10
when using fluorescence in illumina, we notice a change in color distribution with each cycle. How does this affect our accuracy?
clear signal intensity decreases as you do more cycles. this occurs because their may have been failure to cleave previous fluors on nucleotides