Lecture 2 DA Flashcards

1
Q

What is a blast search?

A

A sequence alignment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is checked for in a protein when a sequence is found (4)?

A

Whether the protein is:

  • known, with known function
  • known, but with unknown function
  • novel, but with known similar sequences of known function
  • novel, with no known similar sequences
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Does high similarity necessitate actual similarity? Why/why not?

A

No, because they could be similar due to convergent evolution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is homology?

A

Descent from a common ancestor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is orthology?

A

Descent from speciation event.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is paralogy?

A

Descent from duplication event.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is xenology?

A

Descent from horizontal transfer event.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What percentage of identical amino acids is required for two proteins to have similar folding patterns? What is the most likely relation between them, and what does this depend on?

A

> 25%. They’d most likely be homologous. Depends highly on e-value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What percentage similarity of identical amino acids between two proteins is the twilight zone for determining similar folding patterns?

A

18-25%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What percentage similarity of identical amino acids between two proteins means similar folding patterns by a sequence alignment cannot be determined?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does the e-value represent?

A

Measure of the chance of obtaining the result by random.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is more informative, amino acid/primary sequence alignment, or a nucleic acid alignment?

A

Protein, as it involves 20 characters, vs 4.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the e-value represent?

A

Measure of the chance of obtaining the result by random.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is more informative, amino acid/primary sequence alignment, or a nucleic acid alignment?

A

Protein, as it involves 20 characters, vs 4.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is rarer: insertion of a gap, or the exchange of a nucleotide? What consequence does this have on the scoring system?

A

Insertion of a gap. Therefore, a bigger penalty on inserting gaps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the difference between local and global alignment?

A

Global - aligns all of two sequences, finding global/overall similarity.
Local - looks for regions of similarity, rather than a complete alignment.

17
Q

What are the three steps of dynamic programming?

A

Initialisation
Scoring the matrix
Traceback

18
Q

What does initialisation in dynamic programming involve?

A

0 is put to the end of all sequences.

19
Q

Describe the affine gap penalty in dynamic programming?

A

Large penalty for opening a gap, smaller penalty for extending one.

20
Q

What are substitution matrices developed from?

A

From direct observation of homologous sequences.

21
Q

What is a PAM? How regularly do they occur?

A

Point accepted mutation.

1 per 100 amino acids.

22
Q

What is a BLOSUM62 matrix derived from? What is its percentage identity?

A

Derived from BLOCKS protein families.

Derived from proteins that have no more than 62% identity.

23
Q

What is the use of a BLOSUM62 matrix?

A

Default matrix for BLAST.

24
Q

Describe BLASTN scoring.

A

+1 match, -1 mismatch

-5 open gap, -2 extend gap

25
Q

What is the formula for the e-value?

A

e=M/2^s M=mn, m is the length of the query, n is the total number of residues in the database. s is the score

26
Q

What does the e-value formula imply about the e-value?

A

It is the expected frequency of score s` in a database search, or the chance of getting this score by random.

27
Q

What does it mean if the e-value is 1?

A

Probability of 1 false mismatch in the search.

28
Q

What e-value is preferred? What is an interesting e-value, and what do e-values depend on?

A

Closer to 0, the better.
Less than 1e-4 is interesting.
e-values are search dependent.