L3 Flashcards
Retrieval of biological sequences in databases is based on what?
Similarity
Searching biological sequence databases involves?
Submission of a query sequence and performing a pairwise comparison query with all individual sequences in a database
Requirements for implementing algorithms for sequence database searching include
- sensitivity
- selectivity
- speed
Sensitivity
Refers to the ability to find as many correct hits as possible. The correct hits are considered true positives
Selectivity
also called specificity, which refers to the ability to exclude incorrect hits. These
incorrect hits are considered “false positives.”
Speed
which is the time it takes to get results from database searches
An increase in sensitivity leads to
a decrease in selectivity
an increase in speed leads to
a decrease in sensitivity and selectivity
What are the types of algorithms in database searching
- exhaustive
- heuristic
Exhaustive algorithm
makes use of a rigorous algorithm to find the best or exact solution for a particular problem by examining all mathematical combinations
Heuristic algorithm
a computational strategy to find the near optimal solution
How do heuristic algorithms take shortcuts
by reducing space according to some criteria
what are the methods used to infer sequence similarity
Global and Local alignment
Local alignment
Finds domains and short regions of similarity between a pair of sequences eg
-looking for domains within proteins
-looking for regions of genomic DNA that contain introns
Global alignment
Finds the optimal alignment over the entire length of the two sequences under comparison eg
-genes are being aligned whose sequences are of comparable length
-entire gene is homologous
what does BLAST stand for
Blasic Local Alignment Search Tool
How does BLAST work
It uses heuristics to align a query sequence with all sequences in a database. Its objective is to find high-scoring segments among related sequences.
How does BLAST perform sequence alignment
- reads in query sequence
- Create a list of words from the query sequence (seeding) 3 RESIDUES FOR PROTEIN, 11 FOR DNA SEQUENCES
- Search a sequence database for the occurrence of these words.
- matching of the words is scored by a given substitution matrix
- Pairwise alignment
The resulting contiguous aligned segment pair without gaps is called what
high-scoring segment pair
Database search programs such as BLAST use
scoring/substitution matrices
Scoring matrices are what
empirical weighting schemes
Possible identities and substitutions are assigned a score based on the?
observed frequencies of such occurrences in alignments of related proteins
What does BLASTN do
queries nucleotide sequences with a nucleotide sequence database
How does BLASTP work
uses protein sequences as queries to search against a protein sequence
database. Default word size is 3