Protein structure prediction Flashcards
What is the protein folding problem?
over 200 mil sequences but only 110k structures
- working hypothesis - sequence of a protein in an environment determined its strucutre
- aim to develop theoretical approach to predict structures from sequeence
What is the protein folding problem?
over 200 mil sequences but only 110k structures
- working hypothesis - sequence of a protein in an environment determined its strucutre
- aim to develop theoretical approach to predict structures from sequeence
Why predict protein structure?
-the sequence- structure gap
structure can inform about function
- to guide rational drug design
- to guide mutagenesis studies
- to help solve structures from experimental data
- focuses om fundamental understanding of thee chemistry of protein structure
How do you calculate. the similarity of two proteins?
-superpose the structure (often just the main chain) and quantify on. average the separation between equivalent positions
- quantified as root mean square devaition of equivalence positions
How do you quantify the accuracy of predicted models?
- superpose predicted and x-ray structure
- RMSD used for close structures
- typically 70 out of 90 superposed residues have an RMSD of 2.6A
- arbitrary decision such as choice of maximum difference between equivalent residues
What is the TM score?
- Template modelling removes arobotrary choices
- score between 0 and 1 includes all equivalences and is scaled for number of residues in the protein
What TM do you need to say that the fold of your protein is good ?
Tom > 0.5 means overall food of protein is good
>0.75 means a good predicted structure
How do we know that predictions work?
-evaluate on known structures
-if you know the answer you have an advantage even if you predict that you don’t
What is CASP?
-critical assessment of protein structure prediction
-blind trial required to evaluate the different approaches
-sequences sent to predictors prior to experimental coordinates revealed
-every two years with manual evaluation of results
- Manual interventions and server- only predictions - let’s the community know what servers are good
What are ab initio energy calculations?
- original idea describe interactions between atoms and search for conformation of lowest energy
-methods are evenrgy minimalists ion and molecular dynamics
What is the potential energy of a protein in a particular conformation
Bond length + bond angle + bond dihedral rotation + van Dee walls contacts + electrostatic interactins
Secondary structure predictions
- aim to identify local secondary structures
- theory is that to a large extent local sequence determines local structure
- current ,ethos use multiply aligned sequences to provide extra information
Secondary structure predictions
- aim to identify local secondary structures
- theory is that to a large extent local sequence determines local structure
- current ,ethos use multiply aligned sequences to - provide extra information
Wiat Information abolutnie the strukturę dań you gest from the sequence?
What is the current state of secondary prediction?
- nearly every helix identified
- most beta strands but short edge strands still poorly predicted
- errors tend to be defined the precise ends
- programs such as PsiPred
What are three major approaches to protein prediction?
- template based : reliable; protein fold space is limited< 50% of typical proteome covered
- template free - sometimes reliable : deep learning with multiply aligned sequences can sometimes but not always give you good results
- hybrids - deep learning with templates produces excellent models like alpha fold
How does template based modelling work?
- magenta protein structure unknown
-cyan protein structure known
-via sequence search find magenta sequence is similar to cyan sequence
-predict structure of magenta protein from structure of cyan protein
Describe how Phyre 2 works
How do you do loop modelling?
-fragment the pdb
-find sequences similar to insertion and deletion
-check end point distances
-check backbone geometry
-fit fragment to core structure
Loop modelling accuracy
Insertion and deletions relative to template modelled by loop library up to 15aas in lneght
-short loops under 5 good. Longer loop less trustworthy
-be wary of basing any interaction of the structural effect of point mutations
Side chain modelling
-fit most probable rotated at each position
- according to given backbone angles
- whilst avoiding clashes
Side chain modelling - accuaracy
Sidechainswillbemodelledwith~80%accuracy(chi 1) IF……the backbone is correct.
* Clasheswillsometimesoccurandiffrequent, indicate probably a wrong alignment or poor template
* AnalysewithPhyreInvestigator
Interpreting results - sequence identity and model accuracy
Highconfidence(>90%)andhighseq.id.(>35%): almost always very accurate: TM score>0.7, RMSD 1- 3Å.
* Highconfidence(>90%)andlowseq.id.(<30%) almost certainly the correct fold, accurate in the core (2-4Å) but may show substantial deviations in loops and non-core regions
What is the structural coverage of human proteome
53% — 36% Phyre and 17% pdb