L3 Flashcards

1
Q

Retrieval of biological sequences in databases is based on what?

A

Similarity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Searching biological sequence databases involves?

A

Submission of a query sequence and performing a pairwise comparison query with all individual sequences in a database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Requirements for implementing algorithms for sequence database searching include

A
  • sensitivity
  • selectivity
  • speed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sensitivity

A

Refers to the ability to find as many correct hits as possible. The correct hits are considered true positives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Selectivity

A

also called specificity, which refers to the ability to exclude incorrect hits. These
incorrect hits are considered “false positives.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Speed

A

which is the time it takes to get results from database searches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

An increase in sensitivity leads to

A

a decrease in selectivity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

an increase in speed leads to

A

a decrease in sensitivity and selectivity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the types of algorithms in database searching

A
  • exhaustive
  • heuristic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Exhaustive algorithm

A

makes use of a rigorous algorithm to find the best or exact solution for a particular problem by examining all mathematical combinations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Heuristic algorithm

A

a computational strategy to find the near optimal solution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do heuristic algorithms take shortcuts

A

by reducing space according to some criteria

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are the methods used to infer sequence similarity

A

Global and Local alignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Local alignment

A

Finds domains and short regions of similarity between a pair of sequences eg
-looking for domains within proteins
-looking for regions of genomic DNA that contain introns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Global alignment

A

Finds the optimal alignment over the entire length of the two sequences under comparison eg
-genes are being aligned whose sequences are of comparable length
-entire gene is homologous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what does BLAST stand for

A

Blasic Local Alignment Search Tool

17
Q

How does BLAST work

A

It uses heuristics to align a query sequence with all sequences in a database. Its objective is to find high-scoring segments among related sequences.

18
Q

How does BLAST perform sequence alignment

A
  1. reads in query sequence
  2. Create a list of words from the query sequence (seeding) 3 RESIDUES FOR PROTEIN, 11 FOR DNA SEQUENCES
  3. Search a sequence database for the occurrence of these words.
  4. matching of the words is scored by a given substitution matrix
  5. Pairwise alignment
19
Q

The resulting contiguous aligned segment pair without gaps is called what

A

high-scoring segment pair

20
Q

Database search programs such as BLAST use

A

scoring/substitution matrices

21
Q

Scoring matrices are what

A

empirical weighting schemes

22
Q

Possible identities and substitutions are assigned a score based on the?

A

observed frequencies of such occurrences in alignments of related proteins

23
Q

What does BLASTN do

A

queries nucleotide sequences with a nucleotide sequence database

24
Q

How does BLASTP work

A

uses protein sequences as queries to search against a protein sequence
database. Default word size is 3

25
How does BLASTX work
uses translated nucleotide sequences as queries which are used to query a protein sequence database.
26
How does TBLASTN
queries protein sequences to a nucleotide sequence database with the DNA sequences translated.
27
How does TBLASTX work
uses nucleotide sequences, which are to search against a nucleotide sequence database that has all the sequences translated also
28
What is BLAST used for?
- to detect similarity between sequences of interest. - to determine whether there are other plausible alignments between query and target sequences
29
What is the BLAST E-value
it provides information about the likelihood that a given sequence match is purely by chance. The lower the E-value, the less likely the database match is a result of random chance.
30
HSPs significances are determined by Blast using the Karlin-Altschul equation
E = kmNe -lamda(s)
31
E stands for
the expectation value
32
k and lamda are what?
Karlin-Altschul constants
33
m stands for
the number of letters (amino acids/nucleotides) in the query
34
N is the
the total number of letters (aa/nuc) in the database
35
If E < 1e− 50 (or 1 × 10−50),
there should be an extremely high confidence that the database match is a result of homologous relationships.
36
If E is between 0.01 and 1e− 50,
the match can be considered a result of homology
37
If E is between 0.01 and 10,
the match is considered not significant, but may hint at a tentative remote homology relationship.
38
If E > 10,
the sequences under consideration are either unrelated or related by extremely distant relationships that fall below the limit of detection with the current method.