Bioinformatics 8: advanced searching and multiple alignment Flashcards

1
Q

Why might you want to filter query sequences?

A

Statistical models of alignment assume that all matching residues are of equal significance

But this is not the case i.e Poly-A etc. (low complexity) , short period repeats, generic protein secondary structures (coiled coils)

Essential in repeat rich genomes e.g. Human (45% repeating)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How could you filter query sequences?

A

use a ‘masked’ query sequence (less meaningful regions marked with null character)

Via filtering/masking programs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

% identity which could be real or could be noise (as suggested by good friend Doolittle in 1981)

A

18-25% (Twilight zone)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain iterative searching (e.g. in BLAST) and how it identifies distantly related sequences

A

Protein A (query) and Protein C (Database) may be distantly related, but not detected by BLAST

A 3rd Protein B is initially detected in the database using Protein A query

Protein C is then detected by using Protein B as a query: an iteration

-> PSI (Position Specific Iterative) BLAST most widely used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Problems with iterative searching and provided solutions?

A

1) Number of BLAST searches significantly increases with each iteration
2) Erroneous results in first iteration can bias results

Solutions

1) Sequence profile stores existing matched sequences -> iterate until no new matches found
2) “triage” of sequences after first iteration required

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a PHI-BLAST?

A

Pattern Hit Initiated BLAST

  • an extension to PSI-BLAST using a pattern (e.g. insulin family motif) to start a search
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Applications of MSA (Multiple sequence alignment)?

A

Finding new related sequences

Genome sequence assembly

Phylogeny (highly conserved sequences can help establish evolutionary tree)

Protein structure predicition (conserved domains, motifs etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Purpose of progressive alignment? Overview?

A

As MSA is very computationally demanding due to scale , progressive alignment used to be faster yet still effective

Related sequences are progressively aligned by clustering (e.g. by programs like clustal) creating a ‘guide’ tree

-> sequences progressively aligned using this guide matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly