Data basics Flashcards

1
Q

Describe the workflow from question to answer

A

Raw reads –> preprocessing –> assembly (de novo/alignment) –> application specific steps, e.g. variant calling –> compare samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe fasta

A

Header, sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe fastq

A

Header, sequence, + (maybe additional info here), qualities in ASCII

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What determines quality score?

A

Phasing and intensity of signal compared to noise. This info is converted into a score depending on the machine and software

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Expain the encoding

A

ASCII character can be converted to a number that can be converted to a probability that the base is wrong
Note that there are different ASCII bases (33 and 64).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does Phred score of 20 mean?

A

0,01 % risk of the base being wrong

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain mate pair reads

A

Long insert paired end reads
DNA is fragmented, ends are repaired using labelled dNPTs and circularized. This circle is then fragmented and the part containing the labelled bases is selected for cluster generation and Illumina seq. Note: Gives reverse-forward reads.
Good for scaffolding in de novo assembly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Advantages of paired ends?

A

Precise mapping/alignment/SNP calls
Detection of indels
Easier to build scaffolds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly