Week 10 (Genome Assembly) Flashcards
all assembly relies on what simple assumption?
that highly similar DNA fragments originate from the same position within a genome
all ______ approaches rely on the simple assumption that highly similar DNA fragments originate from the same position within a genome
assembly
during genome assembly, it is important that highly similar DNA fragments originate from the same position within a genome. How does this apply to repetitive sequences in the genome? How do we resolve this?
repetitive sequences pose a challenge for genome assembly because they appear at multiple locations in the genome. we can resolve this by having longer sequences read so that they span the repeat or include the unique sequences around them.
what are 3 critical properties of sequencing reads
- length
- accuracy
- evenness
comparatively to long read sequencing, ______ read genome sequencing is no longer relevant
short
contig
set of sequence reads that overlap to form a contiguous stretch of DNA sequence
a _______ is a set of sequence reads that overlap to form a contiguous stretch of DNA sequence
contig
_______ numbers better = bigger contigs
lower
N50
shortest contig length such that 50% of the bases contained in contigs of length N
_____ is the shortest contig length such that 50% of the bases contained in contigs of length N
N50
for N50, is higher or lower better?
higher
L50
smaller number of contigs whose length sum to N50
_____ is the smaller number of contigs whose length sum to N50
L50
for L50, is higher or lower better?
lower
De Bruin graph
assembly method that uses smaller sub-sequences (k-mers) of sequence reads to find overlaps and build a graph
__________ ________ is an assembly method that uses smaller sub-sequences (k-mers) of sequence reads to find overlaps and build a graph
De Bruijn graph
OLC assembly method
Overlap Layout Consensus
what does each letter mean in OLC assembly method?
- Overlap: find all pairwise overlaps between all reads
- Layout: use those overlaps to determine how the reads should be put together
- Consensus: produce a consensus based on the layout and overlap of reads
.
what does the O mean in OLC assembly method?
Overlap: find all pairwise overlaps between all reads
what does the L mean in OLC assembly method?
Layout: use those overlaps to determine how the reads should be put together
what does the C mean in OLC assembly method?
Consensus: produce a consensus based on the layout and overlap of reads
T2T assembly recipe
- error correction of accurate long reads
- assembly graph construction
- graph simplification with ultra long reads
- phasing and scaffolding
______ _______ to establish parent origin
k-mer counting
what makes full siblings genetically different from each other?
crossing over / recombination during meiosis