Sequence Alignment and Searching Flashcards

Week 3 Lecture 1

1
Q

What are homologues?

A

Evolutionarily-related proteins
Two types: orthologs and paralogs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do protein sequences evolve?

A
  • Substitutions due to single-base mutations
  • Insertions or deletions of residues - usually in the connecting loops (not the secondary structures)
  • Indels make it harder to compare sequences (need to line up the equivalent regions and put gaps where there are indels)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The formula for % sequence identity

A

(no. of identical residues/no. of residues in smallest protein) x100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you search sequence databases?

A
  • Do fast scans using approximate methods (BLAST or PSI-BLAST)
  • Align proteins carefully using a dynamic programming method
  • Scan against sequence profiles/HMMs in secondary databases
  • Align query sequences against family relatives
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Tuple size

A

Runs of identical residues (at least 3 in a row)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Window (path matrix)

A

The two red bars on either side of the matrix. The window is a certain distance not too far from the centre diagonal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Score (path matrix)

A

The score of the path (watch yt video)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Types of residue substitution matrices

A
  • Identity matrix
  • Physicochemical properties matrix
  • Evolutionary matrix
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Physicochemical properties matrix

A

Score residue pairs according to similarities in their physicochemical properties

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Identity matrix

A

Simplest scoring scheme - amino acids are either identical (1) or non-identical (0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Evolutionary matrices

A

Score residue pairs according to how frequently the mutation is observed to occur in evolution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Dayhoff matrix

A
  • Based on evolutionary relationships, it is based on analysing the substitutions observed in closely related sequences (>80% identity)
  • The method measures evolutionary distance by determining the number of point-accepted mutations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

BLOSUM substitution matrices

A
  • The matrix is derived from analysing substitution patterns in more distant relatives (<85% sequence identity)
  • For clusters of related sequences derive multiple alignments without gaps
  • For short regions of related sequences use the alignments to calculate residue substitution frequencies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do we know which matrix to use?

A
  • Matrices derived from observed substitution data (e.g. BLOSUM) are better than identity matrices or those based on physical properties
  • In database searching it may be best to use PAM120 or BLOSUM62
  • Various studies suggest that PAM250 gives the best result when aligning distant proteins using dynamic programming algorithms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Needleman & Wunsch Algorithm steps in dynamic programming

A
  • Score the path matrix
  • Accumulate scores in the path matrix
  • Trace the highest-scoring path in the path matrix
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do we accumulate scores in the path matrix?

A
  • Start at the bottom right
  • Move right to left accumulating scores
  • Move up the next row
16
Q

How does BLAST work?

A
  • A highest-scoring segment pair is found between two sequences
  • The sequences may be related if HSP score >cutoff
    1. Match significant words
    2. Compare the word list to the database and identify exact matches
    3. For each word match, extend the alignment using a PAM matrix and dynamic programming
  • BLAST searches for 2 non-overlapping segments on the same diagonal. They must be within a certain distance of each other before the extension is invoked. It can also allow gaps so that the method joins segments on different diagonals.
17
Q

How do we assess the significance of a sequence match?

A
  • Length - we can get artificially high scores between small sequences
  • Composition - if sequences are rich in particular amino acid residues we can get high scores for unrelated proteins
  • To assess the significance of a match it is necessary to compare the score with that returned by random or unrelated sequences
  • If the database is small or when considering a pair-wise comparison, the sequences can be shuffled to generate random sequences
18
Q

S (BLAST)

A

Score for the pairwise alignment

19
Q

E-value (BLAST)

A

Number of expected hits by chance with score S or higher given the size of the database and the length of the alignment

20
Q

How do you conduct a Multiple Sequence Alignment

A
  • Align the most closely related pairs using DP and gradually align these groups together keeping the gaps that appear in earlier alignments fixed
  • (or) Add sequences one at a time to a growing multiple alignment