Fast dynamics perturbation analysis for prediction of protein functional sites
© Ming et al; licensee BioMed Central Ltd. 2008
Received: 18 October 2007
Accepted: 30 January 2008
Published: 30 January 2008
We present a fast version of the dynamics perturbation analysis (DPA) algorithm to predict functional sites in protein structures. The original DPA algorithm finds regions in proteins where interactions cause a large change in the protein conformational distribution, as measured using the relative entropy D x . Such regions are associated with functional sites.
The Fast DPA algorithm, which accelerates DPA calculations, is motivated by an empirical observation that D x in a normal-modes model is highly correlated with an entropic term that only depends on the eigenvalues of the normal modes. The eigenvalues are accurately estimated using first-order perturbation theory, resulting in a N-fold reduction in the overall computational requirements of the algorithm, where N is the number of residues in the protein. The performance of the original and Fast DPA algorithms was compared using protein structures from a standard small-molecule docking test set. For nominal implementations of each algorithm, top-ranked Fast DPA predictions overlapped the true binding site 94% of the time, compared to 87% of the time for original DPA. In addition, per-protein recall statistics (fraction of binding-site residues that are among predicted residues) were slightly better for Fast DPA. On the other hand, per-protein precision statistics (fraction of predicted residues that are among binding-site residues) were slightly better using original DPA. Overall, the performance of Fast DPA in predicting ligand-binding-site residues was comparable to that of the original DPA algorithm.
Compared to the original DPA algorithm, the decreased run time with comparable performance makes Fast DPA well-suited for implementation on a web server and for high-throughput analysis.
Prediction of protein functional sites is a key aspect of protein function prediction , and can be an important step in identifying small-molecule interactions for drug discovery . It can also potentially be used as a pre-processing step to reduce the search space in computational docking algorithms. There are many methods to predict functional sites–here we emphasize those that make use of analysis of protein structure and dynamics. Existing protein structure analysis methods are based on diverse principles, including: association of functional sites with surface clefts that have extreme values of volume [3–6] or other shape descriptors [7–11]; identifying spatial clusters of methyl probes that exhibit energetically favorable interactions with the protein ; association of functional sites with charged surface residues either in unfavorable electrostatic environments  or with anomalous predicted pH titration curves ; identifying spatial clusters of residues whose diversity appears to be correlated with changes in protein function [15, 16]; defining structural features (e.g. motifs) associated with functional sites [17–22]; identifying residues that are on average close to other residues in the protein (closeness centrality) [23–25]; and machine-learning prediction of functional sites/residues using sequence, structure, and chemical features from training sets [26–28]. Principles of methods that consider protein dynamics include association of functional sites with: hinge regions [29, 30]; regions where the harmonic vibrations are largely determined by high-frequency modes ; intrinsically disordered regions that are highly mobile in the absence of a molecular interaction partner ; and residues where mutations cause a large change in the couplings of local perturbations to remote, local changes in the distribution of folded vs. unfolded states of the protein . Information from complementary methods may be integrated for functional site prediction [34, 35].
We recently developed an additional approach to prediction of protein functional sites that is based on analysis of protein dynamics [36–39]. To help motivate the approach, we note that cellular functions are regulated by molecular interactions that alter protein activity. To enable such control, protein activity, and therefore protein conformational distributions, must be susceptible to alteration by molecular interactions at functional sites. In other words, protein activity should be controllable by allosteric effects (allostery).
Weber  recognized the importance of considering changes in the full conformational distribution to understand allostery, as opposed to considering mechanistic changes among discrete, well-defined structural states in earlier models due to Monod, Wyman, and Changeux ; and Koshland, Nemethy, and Filmer . Weber's perspective is well-aligned with more recent emphases on the need to consider allostery from a global thermodynamic/statistical perspective [43, 44, 36–39, 33, 45]. It is also well-aligned with modern rate theories based on the control of protein activity by dynamical transitions among conformational substates , as originally suggested by spectroscopic assays of ligand-binding at low-temperature [47, 48].
Given the above considerations, we hypothesized that protein functional sites might tend to evolve at control points where interactions cause a large change in the protein conformational distribution . To test this hypothesis, we developed a method called dynamics perturbation analysis (DPA) to quantify changes in protein conformational distributions due to molecular interactions [36, 37], examined 305 protein structures from the GOLD  docking test set , and found that interactions at small-molecule binding sites cause a relatively large change in protein vibrations.
Motivated by these results, we developed a DPA-based algorithm that successfully predicts small-molecule binding sites at locations where interactions cause a large change in protein vibrations . This method was evaluated in Ref.  using 305 proteins in the GOLD  docking test set of protein-ligand structures. For the test, only the top-ranked functional site was selected and was used to predict the location of the ligand-binding site. This is a relatively strict requirement; in other published methods for predicting functional sites , performance often is evaluated by allowing for any of several predicted functional sites to overlap a known ligand-binding site. The method produced at least one predicted functional site for 287 of the 305 proteins in the test set. In 87% of cases (250 proteins), at least one predicted residue was in the ligand-binding site. The recall of binding-site residues (percentage of binding-site residues found among the predicted residues) was at least 30% for 80% of cases, and was at least 50% for 76% of the cases. The precision of the predicted residues (percentage of predicted residues found among the binding-site residues) was at least 30% for 68% of the cases, and was at least 50% for 44% of the cases. The statistical significance of the overlaps was assessed using a null model in which surface residues were randomly selected. Using the null model, a P-value was calculated to evaluate predictions for the 250 proteins in which at least one predicted residue was in the ligand-binding site. The P-value estimated the probability of obtaining a precision at least as high as the observed precision by randomly selecting surface residues . For 87% of the cases, the P-value was 10-3 or smaller, indicating a statistically significant overlap. The performance of the DPA method compared favorably to that of a cleft analysis method for predicting ligand-binding residues.
The original DPA algorithm is a highly innovative approach that performs well. However, the computational requirements limit the utility of the original method. For example, it takes about an hour to analyze a 150-residue protein domain using DPA, and the method doesn't scale well to larger systems. Here, we report an improved algorithm based on use of first-order perturbation theory that will facilitate the use of DPA in high-throughput scenarios and increase its utility, e.g., for web server applications. The algorithm, called Fast DPA, enables a dramatic decrease in the time required to predict protein functional sites, with performance that is comparable to the original DPA algorithm.
Dynamics perturbation analysis
In the present case (unlike in other useful biological applications [52–56]), the relative entropy is not just an ad hoc measure; rather, it has real biophysical significance [39, 57]: , where T is the temperature and k B is Boltzmann's constant, is the free energy required to change the protein conformational distribution from an equilibrium distribution P(x) to a non-equilibrium distribution P(m)(x).
The first six modes involve zero eigenvalues and are ignored in the sums. Equation (5) is the central equation that enables DPA.
First, DPA is performed on a protein and the distribution of values is modeled using Eq. (6). Points with values in the upper 96% of the modeled distribution are selected and are spatially clustered. The clusters are ranked according to the mean value of within the cluster, and all clusters are considered to be potentially associated with a functional site. Finally, residues in the neighborhood of the clusters are selected and form the basis for functional site predictions.
Fast dynamics perturbation analysis
where λ i is the i th eigenvalue of H.
The Fast DPA algorithm is the same as the original DPA algorithm, except instead of using values of D x , the analysis is based on values of estimated using perturbation theory. (It is possible to evaluate all terms in Eq. (5) using first-order perturbation theory, but doing so would not accelerate the method because the computational cost is comparable to that of solving the full eigenvalue problem in original DPA.)
Implementation of Fast DPA
Our implementation of DPA and Fast DPA here follows our previous implementation of DPA for functional site prediction . Given an input PDB structure, MSMS  was run with a 1.5 Å probe radius and a triangulation density of 1 vertex per Å2 to generate test points on the surface of the protein. As when using original DPA to predict functional sites, perturbations were calculated using every other point in the MSMS output (we also tried using every point, but this led to decreased performance in the precision measures). The cutoff r c for interactions between protein Cα atoms was 8.5 Å. For some proteins, this cutoff yielded more than six zero-frequency modes, indicating that the network of springs was too sparse (for example, if only one spring connects two domains, then free rotations about the spring yield two additional zero-frequency modes). In these cases, the connectivity of the elastic network model was increased by incrementing r c in 1 Å steps until the additional zero-frequency modes were eliminated. The cutoff r s for interactions between a test point and the protein was 14 Å, and the interaction strength between a test point and protein atoms was γ s = 12γ, or 12 times the strength of the interaction between two protein atoms. Results are independent of the value of γ.
Implementation of functional site prediction using DPA
To predict functional sites, the distribution of y = values was fit using Eq. (6). Points with values in the upper 96% of the distribution were selected and spatially clustered using the OPTICS algorithm  with a distance threshold of 6 Å and a minimum of 3 points per cluster. Cα atoms within 6 Å of any point in a cluster were selected and were used to define predicted functional sites. The sites were ranked according to the mean value of within the corresponding cluster of points. Only the top-ranked predicted site was used for the evaluation of performance described below.
Results and Discussion
Results that motivate Fast DPA
Evaluation of Fast DPA for prediction of functional sites
Because D x calculated using original DPA and calculated using Fast DPA are highly correlated (Fig. 5), we expected the performance of Fast DPA in predicting functional site residues to be comparable to that of the original DPA. We analyzed the performance of the algorithm on the 305-protein GOLD  test set, which was used to evaluate the original DPA algorithm . Each prediction has an associated recall (fraction of residues in the binding site that are among those in the rank-1 prediction) and precision (fraction of rank-1 predicted residues that are among those in the binding site). To evaluate performance statistically, we use (1) the fraction of binding sites for which the recall is greater than or equal to a minimum value, and (2) the fraction of fraction of rank-1 predictions for which the precision is greater than or equal to a minimum value.
Performance statistics for Fast DPA and original DPA using a threshold of 0.96
Recall ≥ 0.3c
Precision ≥ 0.3d
Recall ≥ 0.5c
Precision ≥ 0.5d
Use of Fast DPA enables functional site predictions to be performed N-fold faster than original DPA, with comparable performance in predicting residues in functional sites. The acceleration will facilitate optimization of Fast DPA for functional site predictions. Calculations that once took hours using DPA now may be performed in a matter of minutes, making practical the use of DPA via a web server. Indeed, high-throughput analysis using Fast DPA has already produced over 60,000 predicted functional sites for about 50,000 protein domains in the SCOP  database (J.D. Cohn, D. Ming, and M.E. Wall, in preparation). These predictions will provide a rich source of information for developing hypotheses concerning mechanisms of protein function.
Supported by the US Department of Energy through contract DE-AC52-06NA25396. We thank James Faeder for reading the manuscript.
- Ofran Y, Punta M, Schneider R, Rost B: Beyond annotation transfer by homology: novel protein-function prediction methods to assist drug discovery. Drug Discov Today 2005, 10(21):1475–1482.View ArticleGoogle Scholar
- Campbell SJ, Gold ND, Jackson RM, Westhead DR: Ligand binding: functional site location, similarity and docking. Curr Opin Struct Biol 2003, 13(3):389–395.View ArticleGoogle Scholar
- Dundas J, Ouyang Z, Tseng J, Binkowski A, Turpaz Y, Liang J: CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues. Nucleic Acids Res 2006, 34(Web Server issue):W116–8.View ArticleGoogle Scholar
- Hendlich M, Rippmann F, Barnickel G: LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J Mol Graph Model 1997, 15(6):359–63, 389.View ArticleGoogle Scholar
- Glaser F, Morris RJ, Najmanovich RJ, Laskowski RA, Thornton JM: A method for localizing ligand binding pockets in protein structures. Proteins 2006, 62(2):479–488.View ArticleGoogle Scholar
- Laskowski RA: SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J Mol Graph 1995, 13(5):323–30, 307–8.View ArticleGoogle Scholar
- Coleman RG, Burr MA, Souvaine DL, Cheng AC: An intuitive approach to measuring protein surface curvature. Proteins 2005, 61(4):1068–1074.View ArticleGoogle Scholar
- Coleman RG, Sharp KA: Travel depth, a new shape descriptor for macromolecules: application to ligand binding. J Mol Biol 2006, 362(3):441–458.View ArticleGoogle Scholar
- Hendrix DK, Kuntz ID: Surface solid angle-based site points for molecular docking. Pac Symp Biocomput 1998, 317–326.Google Scholar
- Nayal M, Honig B: On the nature of cavities on protein surfaces: application to the identification of drug-binding sites. Proteins 2006, 63(4):892–906.View ArticleGoogle Scholar
- Xie L, Bourne PE: A robust and efficient algorithm for the shape description of protein structures and its application in predicting ligand binding sites. BMC Bioinformatics 2007, 8 Suppl 4: S9.View ArticleGoogle Scholar
- Laurie AT, Jackson RM: Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites. Bioinformatics 2005, 21(9):1908–1916.View ArticleGoogle Scholar
- Elcock AH: Prediction of functionally important residues based solely on the computed energetics of protein structure. J Mol Biol 2001, 312(4):885–896.View ArticleGoogle Scholar
- Ondrechen MJ, Clifton JG, Ringe D: THEMATICS: a simple computational predictor of enzyme function from structure. Proc Natl Acad Sci U S A 2001, 98(22):12473–12478.View ArticleGoogle Scholar
- Lichtarge O, Bourne HR, Cohen FE: An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 1996, 257(2):342–358.View ArticleGoogle Scholar
- Yao H, Mihalek I, Lichtarge O: Rank information: a structure-independent measure of evolutionary trace quality that improves identification of protein functional sites. Proteins 2006, 65(1):111–123.View ArticleGoogle Scholar
- Wallace AC, Borkakoti N, Thornton JM: TESS: a geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites. Protein Sci 1997, 6(11):2308–2323.View ArticleGoogle Scholar
- Shulman-Peleg A, Nussinov R, Wolfson HJ: Recognition of functional sites in protein structures. J Mol Biol 2004, 339(3):607–633.View ArticleGoogle Scholar
- Stark A, Russell RB: Annotation in three dimensions. PINTS: Patterns in Non-homologous Tertiary Structures. Nucleic Acids Res 2003, 31(13):3341–3344.View ArticleGoogle Scholar
- Stark A, Shkumatov A, Russell RB: Finding functional sites in structural genomics proteins. Structure 2004, 12(8):1405–1412.View ArticleGoogle Scholar
- Liang MP, Brutlag DL, Altman RB: Automated construction of structural motifs for predicting functional sites on protein structures. Pac Symp Biocomput 2003, 204–215.Google Scholar
- Barker JA, Thornton JM: An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis. Bioinformatics 2003, 19(13):1644–1649.View ArticleGoogle Scholar
- Amitai G, Shemesh A, Sitbon E, Shklar M, Netanely D, Venger I, Pietrokovski S: Network analysis of protein structures identifies functional residues. J Mol Biol 2004, 344(4):1135–1146.View ArticleGoogle Scholar
- Thibert B, Bredesen DE, del Rio G: Improved prediction of critical residues for protein function based on network and phylogenetic analyses. BMC Bioinformatics 2005, 6: 213.View ArticleGoogle Scholar
- Chea E, Livesay DR: How accurate and statistically robust are catalytic site predictions based on closeness centrality? BMC Bioinformatics 2007, 8: 153.View ArticleGoogle Scholar
- Ofran Y, Rost B: ISIS: interaction sites identified from sequence. Bioinformatics 2007, 23(2):e13–6.View ArticleGoogle Scholar
- Gutteridge A, Bartlett GJ, Thornton JM: Using a neural network and spatial clustering to predict the location of active sites in enzymes. J Mol Biol 2003, 330(4):719–734.View ArticleGoogle Scholar
- Wei L, Altman RB: Recognizing complex, asymmetric functional sites in protein structures using a Bayesian scoring function. J Bioinform Comput Biol 2003, 1(1):119–138.View ArticleGoogle Scholar
- Ma B, Wolfson HJ, Nussinov R: Protein functional epitopes: hot spots, dynamics and combinatorial libraries. Curr Opin Struct Biol 2001, 11(3):364–369.View ArticleGoogle Scholar
- Yang LW, Bahar I: Coupling between catalytic site and collective dynamics: a requirement for mechanochemical activity of enzymes. Structure 2005, 13(6):893–904.View ArticleGoogle Scholar
- Haliloglu T, Keskin O, Ma B, Nussinov R: How similar are protein folding and protein binding nuclei? Examination of vibrational motions of energy hot spots and conserved residues. Biophys J 2005, 88(3):1552–1559.View ArticleGoogle Scholar
- Radivojac P, Iakoucheva LM, Oldfield CJ, Obradovic Z, Uversky VN, Dunker AK: Intrinsic disorder and functional proteomics. Biophys J 2007, 92(5):1439–1456.View ArticleGoogle Scholar
- Liu T, Whitten ST, Hilser VJ: Functional residues serve a dominant role in mediating the cooperativity of the protein ensemble. Proc Natl Acad Sci U S A 2007, 104(11):4347–4352.View ArticleGoogle Scholar
- Rossi A, Marti-Renom MA, Sali A: Localization of binding sites in protein structures by optimization of a composite scoring function. Protein Sci 2006, 15(10):2366–2380.View ArticleGoogle Scholar
- Petrova NV, Wu CH: Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties. BMC Bioinformatics 2006, 7: 312.View ArticleGoogle Scholar
- Ming D, Wall ME: Quantifying allosteric effects in proteins. Proteins 2005, 59(4):697–707.View ArticleGoogle Scholar
- Ming D, Wall ME: Allostery in a coarse-grained model of protein dynamics. Phys Rev Lett 2005, 95: 198103.View ArticleGoogle Scholar
- Ming D, Wall ME: Interactions in native binding sites cause a large change in protein dynamics. J Mol Biol 2006, 358: 213–223.View ArticleGoogle Scholar
- Wall ME: Ligand binding, protein fluctuations, and allosteric free energy. AIP Conf Proc 2006, 851: 16–33.View ArticleGoogle Scholar
- Weber G: Ligand binding and internal equilibria in proteins. Biochemistry 1972, 11(5):864–878.View ArticleGoogle Scholar
- Monod J, Wyman J, Changeux JP: On the nature of allosteric transitions: a plausible model. J Mol Biol 1965, 12: 88–118.View ArticleGoogle Scholar
- Koshland DE Jr., Nemethy G, Filmer D: Comparison of experimental binding data and theoretical models in proteins containing subunits. Biochemistry 1966, 5(1):365–385.View ArticleGoogle Scholar
- Pan H, Lee JC, Hilser VJ: Binding sites in Escherichia coli dihydrofolate reductase communicate by modulating the conformational ensemble. Proc Natl Acad Sci U S A 2000, 97(22):12020–12025.View ArticleGoogle Scholar
- Gunasekaran K, Ma B, Nussinov R: Is allostery an intrinsic property of all dynamic proteins? Proteins 2004, 57: 433–443.View ArticleGoogle Scholar
- Hilser VJ, Thompson EB: Intrinsic disorder as a mechanism to optimize allosteric coupling in proteins. Proc Natl Acad Sci U S A 2007, 104(20):8311–8315.View ArticleGoogle Scholar
- Frauenfelder H, Wolynes PG: Rate theories and puzzles of hemeprotein kinetics. Science 1985, 229(4711):337–345.View ArticleGoogle Scholar
- Austin RH, Beeson K, Eisenstein L, Frauenfelder H, Gunsalus IC, Marshall VP: Dynamics of carbon monoxide binding by heme proteins. Science 1973, 181(99):541–543.View ArticleGoogle Scholar
- Austin RH, Beeson KW, Eisenstein L, Frauenfelder H, Gunsalus IC: Dynamics of ligand binding to myoglobin. Biochemistry 1975, 14(24):5355–5373.View ArticleGoogle Scholar
- Jones G, Willett P, Glen RC, Leach AR, Taylor R: Development and validation of a genetic algorithm for flexible docking. J Mol Biol 1997, 267(3):727–748.View ArticleGoogle Scholar
- Harata K, Muraki M: X-ray structure of turkey-egg lysozyme complex with tri-N-acetylchitotriose. Lack of binding ability at subsite A. Acta Crystallogr D Biol Crystallogr 1997, 53(Pt 6):650–657.View ArticleGoogle Scholar
- Kullback S, Leibler RA: On information and sufficiency. Annals of Math Stats 1951, 22: 79–86.View ArticleGoogle Scholar
- del Sol Mesa A, Pazos F, Valencia A: Automatic methods for predicting functionally important residues. J Mol Biol 2003, 326(4):1289–1302.View ArticleGoogle Scholar
- Liu X, Zhang LM, Guan S, Zheng WM: Distances and classification of amino acids for different protein secondary structures. Phys Rev E Stat Nonlin Soft Matter Phys 2003, 67(5 Pt 1):51927.View ArticleGoogle Scholar
- Igarashi Y, Aoki KF, Mamitsuka H, Kuma K, Kanehisa M: The evolutionary repertoires of the eukaryotic-type ABC transporters in terms of the phylogeny of ATP-binding domains in eukaryotes and prokaryotes. Mol Biol Evol 2004, 21(11):2149–2160.View ArticleGoogle Scholar
- Bhasi K, Zhang L, Brazeau D, Zhang A, Ramanathan M: Information-theoretic identification of predictive SNPs and supervised visualization of genome-wide association studies. Nucleic Acids Res 2006, 34(14):e101.View ArticleGoogle Scholar
- Sterner B, Singh R, Berger B: Predicting and Annotating Catalytic Residues: An Information Theoretic Approach. J Comput Biol 2007, 14: 1058–1073.View ArticleGoogle Scholar
- Qian H: Relative entropy: free energy associated with equilibrium fluctuations and nonequilibrium deviations. Phys Rev E 2001, 63(4 Pt 1):42103.View ArticleGoogle Scholar
- Tirion MM: Large amplitude elastic motions in proteins from a single-parameter, atomic analysis. Physical Review Letters 1996, 77(9):1905–1908.View ArticleGoogle Scholar
- Bahar I, Atilgan AR, Erman B: Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. Fold Des 1997, 2(3):173–181.View ArticleGoogle Scholar
- Hinsen K: Analysis of domain motions by approximate normal mode calculations. Proteins 1998, 33(3):417–429.View ArticleGoogle Scholar
- Atilgan AR, Durell SR, Jernigan RL, Demirel MC, Keskin O, Bahar I: Anisotropy of fluctuation dynamics of proteins with an elastic network model. Biophys J 2001, 80(1):505–515.View ArticleGoogle Scholar
- Sanner MF, Olson AJ, Spehner JC: Reduced surface: an efficient way to compute molecular surfaces. Biopolymers 1996, 38(3):305–320.View ArticleGoogle Scholar
- Ankerst M, Breunig MM, Kriegel HP, Sander J: OPTICS: ordering points to identify the clustering structure. In Proceedings of the ACM SIGMON International Conference on Management of Data. Volume 28. Philadelphia, PA ; 1999:49–60.Google Scholar
- Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995, 247(4):536–540.Google Scholar
- Varughese KI, Su Y, Cromwell D, Hasnain S, Xuong NH: Crystal structure of an actinidin-E-64 complex. Biochemistry 1992, 31(22):5172–5176.View ArticleGoogle Scholar
- Holt DA, Luengo JI, Yamashita DS, Oh HJ, Konialian AL, Yen HK, Rozamus LW, Brandt M, Bossard MJ, Levy MA, Eggleston DS, Liang J, Schultz LW, Stout TJ, Clardy J: Design, synthesis, and kinetic evaluation of high-affinity FKBP ligands and the X-ray crystal-structures of their complexes with FKBP12. J Am Chem Soc 1993, 115: 9925–9938.View ArticleGoogle Scholar
- Weber PC, Ohlendorf DH, Wendoloski JJ, Salemme FR: Structural origins of high-affinity biotin binding to streptavidin. Science 1989, 243(4887):85–88.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.