Lecture 8 Flashcards
(10 cards)
What are 3 features of UniProt?
Integrates:
- Sequence
- Structure
- Function data
How do secondary databases enhance the PDB (Protein Data Bank)?
They add annotation about:
- Structure
- Function
- Evolution of protein domains
What is the main obsticle in Natural Language Processing (NLP)?
Natural language is ambiguous and context dependent and requires syntactic analysis
What is the result of machine pairwise associations?
Can form chains of reasoning
What is indexing?
Extraction of possible search terms is compiled and searched in leu of full text which is too slow
What are 3 issues with indexing?
- English/American spelling
- Synonyms
- Contamination (Such as author’s names being locations)
What are 3 benefits of Electronic Lab Notebooks (ELNs)?
- Backed up
- Non-linear organisation
- Easily searchable
What are 3 approaches of machine learning?
- Statistical techniques; clustering, classification, principle components analysis; Hidden Markov model
- Artificial Neural Networks; protein structure prediction, gene prediction
- Support Vector Machines; classification algorrithms
What is BLAT?
Very rapid genomic sequence searching algorithm
How do you avoid the challenge of synonyms?
Use of official and standardised names; ie the use of HGNC names is increasing