Lecture 2 DA Flashcards Preview

Advanced Bioinformatics > Lecture 2 DA > Flashcards

Flashcards in Lecture 2 DA Deck (28):
1

What is a blast search?

A sequence alignment.

2

What is checked for in a protein when a sequence is found (4)?

Whether the protein is:
- known, with known function
- known, but with unknown function
- novel, but with known similar sequences of known function
- novel, with no known similar sequences

3

Does high similarity necessitate actual similarity? Why/why not?

No, because they could be similar due to convergent evolution.

4

What is homology?

Descent from a common ancestor.

5

What is orthology?

Descent from speciation event.

6

What is paralogy?

Descent from duplication event.

7

What is xenology?

Descent from horizontal transfer event.

8

What percentage of identical amino acids is required for two proteins to have similar folding patterns? What is the most likely relation between them, and what does this depend on?

>25%. They'd most likely be homologous. Depends highly on e-value.

9

What percentage similarity of identical amino acids between two proteins is the twilight zone for determining similar folding patterns?

18-25%.

10

What percentage similarity of identical amino acids between two proteins means similar folding patterns by a sequence alignment cannot be determined?

11

What does the e-value represent?

Measure of the chance of obtaining the result by random.

12

What is more informative, amino acid/primary sequence alignment, or a nucleic acid alignment?

Protein, as it involves 20 characters, vs 4.

13

What does the e-value represent?

Measure of the chance of obtaining the result by random.

14

What is more informative, amino acid/primary sequence alignment, or a nucleic acid alignment?

Protein, as it involves 20 characters, vs 4.

15

What is rarer: insertion of a gap, or the exchange of a nucleotide? What consequence does this have on the scoring system?

Insertion of a gap. Therefore, a bigger penalty on inserting gaps.

16

What is the difference between local and global alignment?

Global - aligns all of two sequences, finding global/overall similarity.
Local - looks for regions of similarity, rather than a complete alignment.

17

What are the three steps of dynamic programming?

Initialisation
Scoring the matrix
Traceback

18

What does initialisation in dynamic programming involve?

0 is put to the end of all sequences.

19

Describe the affine gap penalty in dynamic programming?

Large penalty for opening a gap, smaller penalty for extending one.

20

What are substitution matrices developed from?

From direct observation of homologous sequences.

21

What is a PAM? How regularly do they occur?

Point accepted mutation.
1 per 100 amino acids.

22

What is a BLOSUM62 matrix derived from? What is its percentage identity?

Derived from BLOCKS protein families.
Derived from proteins that have no more than 62% identity.

23

What is the use of a BLOSUM62 matrix?

Default matrix for BLAST.

24

Describe BLASTN scoring.

+1 match, -1 mismatch
-5 open gap, -2 extend gap

25

What is the formula for the e-value?

e=M/2^s`
M=mn, m is the length of the query, n is the total number of residues in the database.
s` is the score

26

What does the e-value formula imply about the e-value?

It is the expected frequency of score s` in a database search, or the chance of getting this score by random.

27

What does it mean if the e-value is 1?

Probability of 1 false mismatch in the search.

28

What e-value is preferred? What is an interesting e-value, and what do e-values depend on?

Closer to 0, the better.
Less than 1e-4 is interesting.
e-values are search dependent.