Flashcards in Bioinformatics 4 Deck (21):
What are the benefits of predicting the protein fold?
It benefits medicine for drug design and biotechnology for design of novel enzymes.
What assesses these programs for protein fold prediction and how?
CASP (Critical Assessment of Techniques for Protein Structure Prediction)
The software is assessed by giving each software with a known protein structure and then seeing what it predicts it to be.
How are the structures often predicted?
Through homology - similar sequences tend to fold in similar ways.
BLAST can only identity homologues with >40% identity, what other programs can be used to find homology?
PSI-BLAST and HMM.
What program was developed by the Sternberg group at imperial?
How does Phyre work?
Works by searching the 10 million known sequences for homology using PSI-BLAST and captures the mutational changes at each position in the protein and creates an evolutionary fingerprint.
It then runs every known protein structure’s (65,000) sequence through PSI-BLAST this then creates a HMM from all the sequences with a known structure.
Finally, the query sequence has already been run through PSI-BLAST so then a HMM is created for it. The HMM for the query sequence is then compared to the HMM database of all known protein structures. When a good match is found a 3D model will be produced with a value of confidence.
What is a phylogenetic tree?
An prediction of the ancestry of a protein.
What are the 3 main tree building algorithms?
What do these trees identify?
Phylogenetic trees identify the closest related protein to the one you are working with.
What is the first step to building a tree? (common to all algorithms)
The first step to building a tree is to produce a MSA.
What are the 3 major categories of tree building methods and which algorithms do they include?
Distance based methods - neighbour joining.
Character based methods - Maximum Parsimony and Maximum Likelihood.
Bayesian - method similar to maximum likelihood.
How does a distance based method work?
Distance methods uses a MSA to calculate pairwise distance, or the number of changes between each pair of sequences in a group.
This creates a distance matrix which can be used to produce a phylogenetic tree.
What are the advantages of the Neighbour Joining method?
Fast and can handle many sequences.
Neighbour Joining does not assume a ultrmetric tree, what is this?
An ultrametric tree is a special kind of additive tree, the “tips” or terminal nodes are equidistant from the root. Ultrametric trees can thus depict evolutionary time.
What are the limitations of a Neighbour joining?
Lacks any sort of tree search and optimality criterion and so there is no guarantee that the tree produced is the best fit for the data.
Explain Maximum Parsimony method.
Builds a tree from finding the paths with the minimum number of mutations required at each point to go from one sequence to the other.
To begin it performs a MSA and identifies informative sites.
What is an informative site?
An informative site is one where there are at least two different kinds of nucleotides at the site, each of which of which is represented in at least two of the sequences under study.
Explain the Maximum Likelihood method.
Creates all possible trees using the Maximum Parsimony method but also uses a model of evolution whereby different rates of mutation can be used.
GAU --> UGU is in fact 2 changes not one - uses prior knowledge.
Why is Maximum Likelihood a more realistic tree estimation?
It does not assume equal mutation probabilities for all branches.
What are the only sequences suitable for Maximum Parsimony?
Sequences which are very similar.