4.2 Working with SAM/BAM files Flashcards

1
Q

What is the first part of any SAM file?

A

Header: contains info related to reference sequence dictionary and program used to make the BAM file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is in each of the 11 columns in SAM/BAM?

A
  1. QNAME: Query template Name
  2. FLAG: bitwise Flag
  3. RNAME: reference sequence name
  4. POS: 1-based leftmost mapping position
  5. MAPQ = mapping quality
  6. CIGAR = Cigar string
  7. RNEXT = Ref. name of the mate/next read
  8. PNEXT = position of mate/next read
  9. TLEN = observed template length
  10. SEQ segment sequence
  11. QUAL = ASCII pf phred based quality +33
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a bit flag?

A

An encoding of alignment info; can be used to filter reads downstream

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is secondary alignment?

A

the same part of the sequence aligns to multiple locations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is supplementary alignment?

A

where (mostly) non-overlapping parts of a sequence align to multiple locations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a cigar string?

A
  • compresses info about alignments such as matches, insertions, deletions in the order the occur
  • encodes alignment segments
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does this cigar string mean?

3M1I3MD5M

A

3 match, 1 insertion, 3 match, 1 deletion, 5 match

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What to +/- signs in the observed template length indicate?

A

+: forward read (left most read)

- :reverse read (right most read)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If all segments are mapped to the same reference, what does the unsigned observed template length equal to

A

the number of bases from the left most mapped base to the right most mapped based of the read pairs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what does an observed template length of 0 mean?

A

the template is a single segment of the information in N/A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Samtools?

A

A set of utilities that manipulates alignments in BAM format
Used for
-sorting, merging, indexing, read retrieval

checks the working dir for index file and download index if absent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What 3 steps are needed to work effectively with SAM files?

A
  1. convert to binary format –> faster for computer to access
  2. sort by reference position
  3. generate an index to speed look up process
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the samtools sort cmd do?

A
  • sorts alignments by leftmost coordinate

- may create tmp files when whole genome alignment can’t be fitted to memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What command produced a .bai file

A

samtools index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What 3 files are needed to view the alignments in IGV?

A
  1. reference
  2. BAM files w alignments
  3. index file (bai)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly