Alignment Flashcards

1
Q

Mention some pitfalls for construction of alignments

A
Repeats in the genome
Poor ref. quality
Read errors
Regions not in ref. 
Surprise sample
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is wrong with simply searching in the reference?

A

It takes too much time/CPU

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is wrong with using BLAST?

A

It gives unnecessary output and finds only local alignments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the smart solution?

A

Find possible matches.

Do precise alignment for these.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe the principle for hash based algorithms

A

Index the reference: Make a dict with keys being kmers and values being positions
Search your read against the k-mer keys (seeds).
Do an alignment in the area of the seed and report best alignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is spaced seeds?

A

Using a longer kmer but not requiring everything to match

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are drawbacks of hash-based approaches?

A

Memory!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is BWT?

A

A reversible transformation of the genome. Works through a suffix array. First do rotation of the input, then sort lexicographically and output the last column. Repeats cluster together and they are easier to compress.
It can be reversed by the use of LF-mapping.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to look up in BWT?

A

Recreate the F string and find the last base in read here. See if the L of the row matches the second last one etc. Use the suffix array to find out where in the genome it is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is BWT clever?

A

You store very little data and calculate missing parts when you need them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does bwa mem do?

A

Uses multiple short seeds across the reads. Extends the seed if several matches are found. Mostly for longer reads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain SAM/BAM formats

A

First line is header (@), the rest are alignments where you can see the read, info about alignment (e.g. paired or not), mapping quality, to where it maps, the edit distance etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly