A conservation and biophysics guided stochastic approach to refining docked multimeric proteins
© Akbal-Delibas and Haspel; licensee BioMed Central Ltd. 2013
Published: 8 November 2013
We introduce a protein docking refinement method that accepts complexes consisting of any number of monomeric units. The method uses a scoring function based on a tight coupling between evolutionary conservation, geometry and physico-chemical interactions. Understanding the role of protein complexes in the basic biology of organisms heavily relies on the detection of protein complexes and their structures. Different computational docking methods are developed for this purpose, however, these methods are often not accurate and their results need to be further refined to improve the geometry and the energy of the resulting complexes. Also, despite the fact that complexes in nature often have more than two monomers, most docking methods focus on dimers since the computational complexity increases exponentially due to the addition of monomeric units.
Our results show that the refinement scheme can efficiently handle complexes with more than two monomers by biasing the results towards complexes with native interactions, filtering out false positive results. Our refined complexes have better IRMSDs with respect to the known complexes and lower energies than those initial docked structures.
Evolutionary conservation information allows us to bias our results towards possible functional interfaces, and the probabilistic selection scheme helps us to escape local energy minima. We aim to incorporate our refinement method in a larger framework which also enables docking of multimeric complexes given only monomeric structures.
Protein binding and docking
Proteins often associate with other proteins to create complexes that function as a biological unit. These complexes play a central role in nearly every cellular process . Since the structure and function of proteins are closely related, detection of protein complexes and their structures helps us understand their role in various important biological processes.
Despite the advance in experimental structure detection methods, elucidating the three-dimensional arrangement of protein complexes is still a very challenging process. Computational methods have become very useful in complementing and helping experimental structure detection methods. Computational docking methods try to predict the way two or more proteins bind. They are typically made of two stages: The search stage uses structural and geometric techniques to detect native-like configurations of the complex, and the ranking stage uses a scoring function made of physico-chemical and geometric filters to estimate the binding affinity and rank computed structures according to energetic criteria. These functions typically focus on electrostatic, Van der Waals, and solvent interactions, similarity to experimental structures, or agreement with other experimental data [2–9].
In nature many proteins interact to generate multimers containing more than two monomeric units, but most docking and refinement methods only focus on dimeric structures due to the possible exponential increase in the already large search space, posed by the addition of monomers. Due to the additional increase in complexity, in the case of multimeric docking it is especially important to carefully select the search and ranking methods. Only a small number methods exist for docking more than two monomers. These methods attempt to make the search for the correct docking configuration tractable by focusing on symmetric complexes  or by extending pairwise solutions via combinatorially assembling monomers incrementally, using greedy heuristics to cut down the search space such as selecting only a subset of the complexes of size k and pass them to the next stage as candidates to search for a complex of size k + 1, or generating pairwise docking results and expanding them using a minimum spanning tree [11, 12].
The results generated by computational docking methods are expected to be low-energy structures that are similar to the native complex structures. However, computational docking methods are not complete. The energetic difference between the native structure and other non-native complexes may be small and the scoring function used by docking methods is often not sensitive enough to detect it. Additionally, the correct binding site is not always known experimentally and docking methods may miss the correct binding site completely. As a result, low-energy structures produced by docking programs often disagree with NMR data . Recent CAPRI (Critical Assessment of PRedicted Interactions) rounds show an important observation: even the most accurate methods predict only about 50% of the targets . A survey of various scoring functions showed that although some components in several scoring functions have meaningful individual components, none of these functions could predict the binding affinity reliably . Therefore, the results of computational docking methods need to be further refined in order to obtain native-like structures. Usage of refinement methods on protein complexes is not limited to computational docking methods; structures obtained by experimental methods can also be refined. Docking algorithms often produce a large number of putative complexes, ranked according to some scoring function. Docking refinement methods refine and re-rank these complexes in order to produce improved structures with lower energy and better interface packing. The goal is to improve both the RMSD and the ranking of the solution closest to the native structure. Refinement methods are often based on a combination of geometric and energetic optimization. Existing methods include rigid body transformations with side chain flexibility [15, 16], flexible fitting that accounts for the changes proteins undergo upon binding , normal-mode analysis [18, 19], Molecular Dynamics (MD) [3, 20], energy minimization , Monte Carlo (MC) , genetic algorithms  and more.
Refinement and re-ranking using conservation and electrostatics
We recently developed a docking refinement method that uses a scoring function based on evolutionary conservation [23, 24], in addition to the usual VdW energy term. It employs a novel Evolutionary Trace (ET)-based [25, 26] conservation scoring function. Evolutionary Traces are based on the idea that residues on functional interfaces are important for correct binding, and are therefore more likely to be conserved. We showed a strong correlation between conservation scores and the correct binding geometry when tested on dimeric protein structures. Our method biases the search towards conformations which have those conserved amino acids positioned close to each other on the binding interface. The scoring function iteratively detects top-scoring transformations at each stage of the refinement and passes them to the next stage for further refinement. We use a greedy selection approach to avoid exponential growth of the number of candidate complexes and speed up the computation time. We showed that the method can significantly improve docking results and also help distinguishing badly docked complexes from near-native complexes.
More recently we extended our refinement method to multimeric protein structures . Biasing the search towards functional interface greatly reduces the search space, which is especially important in the case of multimeric complexes. We also incorporated electrostatic interaction energy to improve the accuracy of our prediction and provide a greater diversity of the selected conformations. The search iteratively selects two monomers out of the complex, and they are refined with respect to each other. Out of the newly refined candidates, top ranking conformations with respect to energy are passed on to the next stage for further refinement. In that work we also introduced a new probabilistic search scheme, which allows a greater variety in the selection of complexes and enables the method to escape possible local minima. We showed that our refinement method significantly improved the geometry of the input complexes and achieved lower lRMSD with respect to the native complexes.
In the current work we introduce an improved scoring function which aims to eliminate the bias created by the conservation score towards large interfaces. As input, we use coarsely docked complexes resulting from a multimeric docking program, Multi-LZerD . We tested our refinement method on a large dataset of both dimeric and multimeric complexes. In most cases, there are several results among the top ranking complexes with better lRMSD than the input structure. This shows the potential of our method to serve as an efficient tool to improve the geometry and interface packing of coarsely docked complexes.
Our program takes as input a protein complex generated by any docking method. The refinement proceeds in cycles where each cycle seeks to improve the conformation of one unit (i.e., a chain or a list of chains) with respect to the other ones. For each input structure, we create 100 conformations using rigid-body rotations by a random angle within a predefined range around an arbitrary axis passing through the centroid of the unit. Each rotation results in a new conformation and these randomly generated conformations are first energy minimized for 200 steps using NAMD  to resolve local clashes without introducing drastic changes to the structure, then ranked using both a conservation scoring function and an electrostatic scoring function. After creating probability distributions based on conservation and electrostatic ranking, 10 conformations are selected according to the probabilistic selection scheme described below and provided as inputs for the following refinement cycle.
Creating multimeric protein structures
The coarsely docked multimers used in this paper were produced using Multi-LZerD  without the refinement module. We selected coarsely docked complexes whose distance to the native complexes was between 1 and 6Å, to allow effective refinement and not attempt to refine incorrectly docked complexes whose RMSD from the native structure was too big to refine.
The scoring function that we aim to optimize is computed for the set of interface atoms, which is defined, for each chain, as the atoms within at most 6 Å distance to an atom from an adjacent chain. In our previous work , we employed a scoring function consisting of the Van der Waals term taken from the AMBER ff03 force field  and the conservation term that we defined using ET scores of each interface residue.
For each interface atom, we defined the evolutionary conservation value, c i , as the relative importance of the residue that the atom belongs to. Relative importance of a residue is specified in the coverage column of the corresponding ET files for each protein chain. The coverage value ranges between 0 and 1, where low coverage implies evolutionary importance.
Probabilistic selection of conformations
We rank our refinement candidates using the above mentioned scoring function and select a subset of them as the refinement output. In our previous work , we ranked random conformations according to E TC values and selected the 10 top ranked conformations.
Deterministic selection increased the likelihood of false positives because we selected only top 1% (10 out of 2000) of conformations in a multi cycle refinement process. The scoring function rarely correlates perfectly with lRMSD values and is only a model of the "true" potential energy. Also, it increased our chances of getting trapped in a local minimum. In order to address this limitation, we employ a probabilistic selection approach detailed below, which we first introduced in .
Probability distribution table.
Selection Probability (100 conf.)
Selection Probability (2000 conf)
In order to test our multimeric refinement method, we used docked dimeric structures provided by Shehu et al.  with the following PDB IDs: 1BDJ, 1C1Y, 1CSE, 1DS6, 1OHZ, 1TX4 and 1WQ1. In addition to these dimers, we produced multimeric input structures by running the Multi-LZerD multimeric docking program without refinement  for protein complexes with the following PDB IDs: 1I3O, 1JYO, 1LOG, 1QGW, 1VCB, 1W88, 1WWW, 2BBK, 2PRG and 6RLX. Some of these proteins are trimers or tetramers that we used before as dimers only [23, 30], while others are popular test cases .
For each input docked structure, the refinement is performed iteratively in 2 steps. In the first step, 100 random conformations are generated from the input structure as described in Section. These 100 conformations are ranked using the two scoring functions and 20 conformations are selected according to our selection function (10 according to E TC values and 10 according to E TE values). In the second step, 100 new random conformations are created for each of the 20 conformations produced in the first step. Then, these 2000 new conformations are ranked using the scoring functions and 20 conformations are selected and output as refined candidate complexes.
Results and discussion
Dimeric protein refinement results.
Multimeric protein refinement results.
On the other hand, the refinement performance is not alike across different proteins. Even though our method yields better solutions than the input structure for all dimeric and some multimeric complexes, the magnitude of improvement varies from protein to protein. Indeed, there are some complexes, such as 1VCB, for which our solutions are not better than the input structure. We believe it is crucial to better understand what causes this performance difference in order to further improve our refinement method. As explained earlier, our method relies on the observation that residues on binding interfaces tend to be more conserved throughout the evolution due to their functional importance. Therefore, the conservation energy component of our scoring function is designed to favor complexes with more conserved residues on interfaces. Stated differently, structures with more clusters of conserved residues on interfaces are expected to have lower conservation energy and lower lRMSDs with respect to the native structure. On the other hand, the electrostatic energy component of the scoring function is devised to prefer complexes with lower electrostatic energy based on the assumption that native-like structures have better electrostatic interactions.
Correlation coefficients for the ratio of conserved atoms on interfaces (ICAR) vs.lRMSD, total conservation energy (E TC ) vs. lRMSD, total electrostatic energy (E TE ) vs. lRMSD, and ICAR vs. E TC .
ICAR vs. lRMSD
E TC vs. lRMSD
E TE vs. lRMSD
ICAR vs. E TC
Secondly, ICAR exhibits a strongly negative correlation with lRMSD correlation in most, but not all cases.
This suggests that there are cases, such as 1LOG and 6RLX, where structures with a large proportion of conserved interface atoms are less native-like, contrary to our underlying hypothesis. Whenever ICAR vs. lRMSD correlation is strong negative (e.g. 1C1Y and 1TX4), E TC shows a strong positive correlation with lRMSD as expected. In other words, structures that are closer to the native have lower conservation energy. On the other hand, when ICAR vs lRMSD is not a strong negative correlation, the conservation score is not able to favor low lRMSD structures, again as expected.
Lastly, there are certain cases where E TC does not show a positive correlation with lRMSD (e.g. 1WQ1 and 6RLX) but we are able to obtain better lRMSD structures. This is due to the positive E TE vs lRMSD correlation in these cases. This is the reason we intentionally did not mix E TC and E TE into a single energy function as also explained in our previous work . The results in this paper reaffirms that observation, which suggests that we may be able to group input structures into one of two categories and employ a scoring function (E TC or E TE ) selectively. This is the subject of ongoing research.
For input structures like 1LOG and 2BBK, we could not select better lRMSD structures even though E TC or E TE had relatively strong correlation with lRMSD. Analyzing them further uncovers that out of 2000 through small-scale random conformations produced for 2BBK only 7 had lower or same lRMSD as the input. In fact, our scoring function was able to select one of them. Similarly, out of 2000 random conformations produced for 1LOG only 9 had lower or same lRMSD as the input. Hence, this is either a statistical matter or generation of random conformations could have been improved to address this issue (possibly by taking symmetry that exists in some protein complexes into account), which can be considered in future work.
Proteins interact to create complexes as part of their cellular function. Modeling the structure of these complexes is highly important in order to understand these processes. Here we present a refinement and re-ranking algorithm to improve the structures of coarsely docked multimeric complexes. Many protein complexes contain more than two monomers, but the vast majority of docking and refinement algorithms can only handle dimers due to the increased computational cost which causes a potential exponential increase in the runtime. Our method uses a geometry-based local search and a scoring function that is based on evolutionary conservation and pairwise interactions, relying on the observation that amino acids on binding interfaces tend to be highly conserved due to their important role. This scoring function allows us to bias our refinement scheme towards potential functional interfaces, reducing the large search space and improving the geometry and energy of the input structures. We introduced a probabilistic search scheme that allows us to escape local energy minima and enhance the diversity of selected structures. Future work includes testing our method on a larger dataset and incorporate backbone and sidechain flexibility into the search. Additionally, we plan to further investigate the difference between complexes which give better conservation score and complexes with better electrostatic energy, in order to establish an automated way to distinguish between them during the refinement process. Finally, we aim to incorporate the refinement method in a larger framework which also includes docking of multimeric complexes given only monomeric structures.
We thank Dr. Amarda Shehu and Irina Hashmi for their help. Special thanks to Juan Esquivel Rodriguez for his extensive help with Multi-LZerD. Lastly, we thank Rhonald Lua for his help with the ET Server. The work is supported in part by NSF grant no. AF-1116060.
The publication costs for this article were funded by the corresponding author.
This article has been published as part of BMC Structural Biology Volume 13 Supplement 1, 2013: Selected articles from the Computational Structural Bioinformatics Workshop 2012. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcstructbiol/supplements/13/S1.
- Goodsell D, Olson A: Structural symmetry and protein function. Annual review of biophysics and biomolecular structure 2000, 29: 105–153. 10.1146/annurev.biophys.29.1.105View ArticlePubMedGoogle Scholar
- Vajda S, Kozakov D: Convergence and combination of methods in protein-protein docking. Curr Opin Struct Biol 2009, 19(2):164–170. 10.1016/j.sbi.2009.02.008PubMed CentralView ArticlePubMedGoogle Scholar
- Dominguez C, Boelens R, Bonvin A: Haddock: A protein-protein docking approach based on biochemical orbiophysical information. J Am Chem Soc 2003, 125: 1731–1737. 10.1021/ja026939xView ArticlePubMedGoogle Scholar
- Schneidman-Duchovny D, Inbar Y, Nussinov R, Wolfson HJ: Geometry based flexible and symmetric protein docking. Proteins 2005, 60(2):224–231. 10.1002/prot.20562View ArticleGoogle Scholar
- Wang C, Bradley P, Baker D: Protein-protein docking with backbone flexibility. J Mol Biol 2007, 373(2):503–519. 10.1016/j.jmb.2007.07.050View ArticlePubMedGoogle Scholar
- Camacho CJ, Vajda S: Protein-protein association kinetics and protein docking. Curr Opin Struct Biol 2002, 12: 36–40. 10.1016/S0959-440X(02)00286-5View ArticlePubMedGoogle Scholar
- Mandell JG, Roberts VA, Pique ME, Kotlovyi V, Mitchell JC, Nelson E, Tsigelny I, Eyck LFT: Protein docking using continuum electrostatic and geometric fit. Protein Eng 2001, 14(2):105–113. 10.1093/protein/14.2.105View ArticlePubMedGoogle Scholar
- Rahaman O, Estrada T, Doren D, Taufer M, Brooks C 3rd, Armen R: Evaluation of several two-step scoring functions based on linear interaction energy, effective ligand size, and empirical pair potentials for prediction of protein-ligand binding geometry and free energy. J Chem Inf Model 2011, 51: 2047–2065. 10.1021/ci1003009PubMed CentralView ArticlePubMedGoogle Scholar
- Ferrara P, Gohlke H, Price D, Klebe G, Brooks C III: Assessing scoring functions for protein-ligand interactions. Journal of medicinal chemistry 2004, 47(12):3032–3047. 10.1021/jm030489hView ArticlePubMedGoogle Scholar
- Pierce B, Tong W, Weng Z: M-ZDOCK: a grid-based approach for Cn symmetric multimer docking. Bioinformatics 2004, 21(8):1472–1478.View ArticlePubMedGoogle Scholar
- Esquivel-Rodríguez J, Yang YD, Kihara D: Multi-LZerD: multiple protein docking for asymmetric complexes. Proteins 2012, 80(7):1818–1833.PubMed CentralPubMedGoogle Scholar
- Inbar Y, Benyamini H, Nussinov R, Wolfson HJ: Combinatorial docking approach for structure prediction of large proteins and multi-molecular assemblies. Phys Biol 2005, 2: S156-S165. 10.1088/1478-3975/2/4/S10View ArticlePubMedGoogle Scholar
- Potluri S, Yan AK, Chou JJ, Donald BR, Bailey-Kellogg C: Structure determination of symmetric homo-oligomers by a complete search of symmetry configuration space, using NMR restraints and van der Waals packing. Proteins 2006, 65: 203–219. 10.1002/prot.21091View ArticlePubMedGoogle Scholar
- Kastritis PL, Bonvin AMJJ: Are Scoring Functions in Protein-Protein Docking Ready To Predict Interactomes? Clues from a Novel Binding Affinity Benchmark. Journal of Proteome Research 2010, 9(5):2216–2225. 10.1021/pr9009854View ArticlePubMedGoogle Scholar
- Andrusier N, Nussinov R, Wolfson HJ: FireDock: fast interaction refinement in molecular docking. Proteins 2007, 69: 139–159. 10.1002/prot.21495View ArticlePubMedGoogle Scholar
- Lyskov S, Gray JJ: The RosettaDock server for local protein-protein docking. Nucleic Acids Res 2008, 36(S2):W233-W238.PubMed CentralView ArticlePubMedGoogle Scholar
- Mashiach E, Nussinov R, Wolfson HJ: FiberDock: Flexible induced-fit backbone refinement in molecular docking. Proteins 2010, 78(6):1503–1519.PubMed CentralPubMedGoogle Scholar
- Lindahl E, Delarue M: Refinement of docked protein-ligand and protein-DNA structures using low frequency normal mode amplitude optimization. Nucleic Acids Research 2005, 33(14):4496–4506. 10.1093/nar/gki730PubMed CentralView ArticlePubMedGoogle Scholar
- May A, Zacharias M: Energy minimization in low-frequency normal modes to efficiently allow for global flexibility during systematic protein-protein docking. Proteins 2008, 70(3):794–809. [http://dx.doi.org/10.1002/prot.21579]View ArticlePubMedGoogle Scholar
- Krol M, Tournier AL, Bates PA: Flexible relaxation of rigid-body docking solutions. Proteins 2007, 68: 159–169. [http://dx.doi.org/10.1002/prot.21391] 10.1002/prot.21391View ArticlePubMedGoogle Scholar
- de Vries SJ, van Dijk ADJ, Krzeminski M, van Dijk M, Thureau A, Hsu V, Wassenaar T, Bonvin AM: HADDOCK versus HADDOCK: New features and performance of HADDOCK2.0 on the CAPRI targets. Proteins 2007, 69(4):726–733. [http://dx.doi.org/10.1002/prot.21723] 10.1002/prot.21723View ArticlePubMedGoogle Scholar
- Chaudhury S, Sircar A, Sivasubramanian A, Berrondo M, Gray JJ: Incorporating biochemical information and backbone flexibility in RosettaDock for CAPRI rounds 6–12. Proteins 2007, 69(4):793–800. [http://dx.doi.org/10.1002/prot.21731] 10.1002/prot.21731View ArticlePubMedGoogle Scholar
- Akbal-Delibas B, Hashmi I, Shehu A, Haspel N: An Evolutionary Conservation Based Method for Refining and Reranking Protein Complex Structures. J Bioinform Comput Biol 2012, 10(3):1242002. 10.1142/S0219720012420024View ArticlePubMedGoogle Scholar
- Hashmi I, Akbal-Delibas B, Haspel N, Shehu A: Guiding protein docking with geometric and evolutionary information. J Bioinform Comput Biol 2012, 10(3):1242008. 10.1142/S0219720012420085View ArticlePubMedGoogle Scholar
- Lichtarge O, Bourne H, Cohen F: An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 1996, 257(2):342–58. 10.1006/jmbi.1996.0167View ArticlePubMedGoogle Scholar
- Mihalek I, Res I, Lichtarge O: A Family of Evolution-Entropy Hybrid Methods for Ranking of Protein Residues by Importance. J Mol Biol 2004, 336(5):1265–82. 10.1016/j.jmb.2003.12.078View ArticlePubMedGoogle Scholar
- Akbal-Delibas B, Haspel N: Refining multimeric protein complexes using conservation, electrostatics and probabilistic selection. Bioinformatics and Biomedicine Workshops (BIBMW), 2012 IEEE International Conference on: 4–7 October 2012 2012, 648–653. 10.1109/BIBMW.2012.6470272View ArticleGoogle Scholar
- Kalé L, Skeel R, Bhandarkar M, Brunner R, Gursoy A, Krawetz N, Phillips J, Shinozaki A, Varadarajan K, Schulten K: NAMD2: Greater scalability for parallel molecular dynamics. J Comp Phys 1999, 151: 283–312. 10.1006/jcph.1999.6201View ArticleGoogle Scholar
- Duan Y, Wu C, Chowdhury S, Lee M, Xiong G, Zhang W, Yang R, Cieplak P, Luo R, Lee T, Caldwell J, Wang J, Kollman P: A point-charge force field for molecular mechanics simulations of proteins based on condensed-phase quantum mechanical calculations. J Comput Chem 2003, 24(16):1999–2012. 10.1002/jcc.10349View ArticlePubMedGoogle Scholar
- Akbal-Delibas B, Hashmi I, Shehu A, Haspel N: Refinement of docked protein complex structures using evolutionary traces. Bioinformatics and Biomedicine Workshops (BIBMW), 2011 IEEE International Conference on 2011, 400–404. IEEEView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.