Bioinformatics and AI Flashcards
What the heck is Bioinformatics?
- An emerging interdisciplinary in research & applied sciences
- Deals with the computational management and analysis of biological information: genes, genomes, proteins, cells, ecological systems, medical information, robots, artificial intelligence…
What are the 2 cores of Bioinformatics?
- Coding
- Algorithm
Bioinformatics is NOT simply using an existing software to analyze biological data. You will need to be able to create your own ____ and ____.
Code and Algorithm
What are 5 Bioinformatics Applications?
- Sequence analysis
– Geneticists/ molecular biologists analyze genome sequence information to understand disease processes - Molecular modeling
– Crystallographers/ biochemists design drugs using computer-aided tools - Phylogeny/evolution
– Geneticists obtain information about the evolution of organisms by looking for similarities in gene sequences - Ecology and population studies
– Bioinformatics is used to handle large amounts of data obtained in population studies - Medical informatics
– Personalized medicine
Explain KEGG
- Protein Pathway Database
- Search database for metabolic and regulatory pathways
- Compute KEGG: Generate possible reaction pathways between two compounds
Define Phylogeny Tree
Analysis of sequences allows evolutionary relationships to be determined
Explain the Task of Sequence Alignment
- The draft human genome is available
- Automated gene finding is possible
- Gene: AGTACGTATCGTATAGCGTAA
- One approach: Is there a similar gene in another species?
1. Align sequences with known genes
2. Find the gene with the “best” match
Define Heuristic
use of the general knowledge gained by experience
What is BLAST?
- Basic Local Alignment Search Tool
- BLAST is by far the most frequently used database search program. This algorithm finds the longest significant match between query sequence and corresponding database.
- Example of a Heuristic Method for database search
BLAST Key Terminologies: Word
a substring of a sequence of a given length
BLAST Key Terminologies: Segment
a substring of a sequence
BLAST Key Terminologies: Segment Pair
an un-gapped alignment between 2 equal-length segments with an associated score
BLAST Key Terminologies: MSP (maximum scoring pair)
the segment pair with the highest score in a given context
How does BLAST works
Preprocessing –> Comparison –> Extension
Key idea: the longer the MSP can be stretched in both directions, the less chance the matches found is occurred by chance
Explain BLAST Preprocessing
Query sequence is broken down into a list of short, contiguous words with no repeats.
Explain BLAST Comparison
- Words in the list are compared with a list of words from the database. Matches are recorded as scores.
- Only the high scoring seeds are stored for later use
Explain BLAST Extension
Extend the matches between query & database sequence by linking MSPs in both directions. The process continues
or significant matches.
What does ktup mean?
- ktup factor is used to adjust the word size.
- Larger word size increases speed.
- ktup (Tuple) = n –> n letters are read as the basic “scan” unit
- Ktup ↑ = selectivity ↑ = sensitivity ↓
ADD SLIDES 28-32
What are some challenges in Bioinformatics?
- Explosion of Information
- Lack of “Bioinformaticians”
Explain Challenge 1 in Bioinformatics: Explosion of Information
- Need for faster, automated analysis to process large amounts of data
- Need for integration between different types of information (sequences, literature, annotations, protein levels, RNA levels etc…)
- Need for “smarter” software to identify interesting relationships in very large data sets
Explain Challenge 2 in Bioinformatics: Lack of “Bioinformaticians”
- Software needs to be easier to access, use and understand
- Biologists need to learn about the software, its limitations, and how to interpret its results