Prediction of functionally important residues in globular proteins from unusual central distances of amino acids
© Kochańczyk; licensee BioMed Central Ltd. 2011
Received: 22 May 2011
Accepted: 18 September 2011
Published: 18 September 2011
Well-performing automated protein function recognition approaches usually comprise several complementary techniques. Beside constructing better consensus, their predictive power can be improved by either adding or refining independent modules that explore orthogonal features of proteins. In this work, we demonstrated how the exploration of global atomic distributions can be used to indicate functionally important residues.
Using a set of carefully selected globular proteins, we parametrized continuous probability density functions describing preferred central distances of individual protein atoms. Relative preferred burials were estimated using mixture models of radial density functions dependent on the amino acid composition of a protein under consideration. The unexpectedness of extraordinary locations of atoms was evaluated in the information-theoretic manner and used directly for the identification of key amino acids. In the validation study, we tested capabilities of a tool built upon our approach, called SurpResi, by searching for binding sites interacting with ligands. The tool indicated multiple candidate sites achieving success rates comparable to several geometric methods. We also showed that the unexpectedness is a property of regions involved in protein-protein interactions, and thus can be used for the ranking of protein docking predictions. The computational approach implemented in this work is freely available via a Web interface at http://www.bioinformatics.org/surpresi.
Probabilistic analysis of atomic central distances in globular proteins is capable of capturing distinct orientational preferences of amino acids as resulting from different sizes, charges and hydrophobic characters of their side chains. When idealized spatial preferences can be inferred from the sole amino acid composition of a protein, residues located in hydrophobically unfavorable environments can be easily detected. Such residues turn out to be often directly involved in binding ligands or interfacing with other proteins.
The task of assigning a function to each new protein structure resulting from high-throughput structural genomics experiments requires reliable computational annotation methods. Identified functionally important amino acids can provide preliminary clues on the co-evolution and molecular workings of proteins. Such information is crucial for the site-directed mutational engineering and de novo protein design. The integration of knowledge of the locations of binding sites with ligand screening or docking protocols improves initial stages of the rational drug design . Also, when putative residues responsible for the complex formation are identified, protein-protein interaction interfaces can be characterized in silico.
Currently, due to the availability of 3D data, the exploration of properties embedded in the structure of proteins prevails over the traditional motif recognition and sequence comparison (that may turn out to be surprisingly ambiguous ). For close homologs, the knowledge-based approaches transfer functional annotations from proteins with already known structure and function [4–8]. Their average effectiveness is inherently limited by the availability of solved and annotated structures, so more generic methods are still desirable. Numerous pure geometry-based methods search locally for clefts and pockets in the molecular surface by employing computational geometry algorithms [9–16]. The spatial neighborhood of residues is used to characterize local environments in methods that take into account additional factors such as the flexibility of residues , electrostatic potential [18, 19] or overall interaction energy , excess or deficiency of the hydrophobicity , hydrophobic potential around a protein  or a multitude of other, predominantly physicochemical, residue properties [23–27].
Interestingly, indications based on diverse descriptions are usually not correlated ; nor can they be used for the prediction of both protein-ligand and protein-protein interaction sites . As a consequence, well-performing present-day approaches use combinations of complementary characteristics, for example the electrostatics and geometric properties  or the geometry and conservation [31–33]. Metaservers offer combinations of several independent fully-fledged methods in order to compensate for the shortcomings of some methods with capabilities of others [34, 35]. As the compositions of distinct binding site prediction methods achieve better success rates than constituent techniques applied solo, it is still valuable not only to provide fine-tuned variations of heterogeneous approaches, but also to search for assorted methods that could complement existing ones by the exploration of specific orthogonal features.
Contrary to the majority of approaches that characterize fragments of proteins locally and with a considerable degree of detail, Brylinski et al. [21, 36] showed that the rough analysis of the global spatial distribution of amino acids with respect to their hydrophobicity is capable of localizing ligation sites. They did not follow usual hydrophobicity quantifications such as the average solvent-accessible surface area or number of contacts , but rather measured the discrepancy between idealized and observed hydrophobicity within the fuzzy oil drop model , where the trivariate Gaussian distribution is used to express the idealized protein hydrophobicity (maximum value in the protein core, smoothly approaching 0 about and beyond the perimeter). It turned out that amino acids of high discrepancy (unexpectedly high hydrophobicity in relation to their peripheral position) often occur in function-related areas of proteins.
This observation is fundamental to the current work, where we devised and validated a method for the identification of function-related residues based on the probabilistic description of atomic burials originating from the conceptual framework of Gomes et al. . We collected necessary statistics from a selection of globular proteins and, as opposed to the original application of the framework, we used a radial probability density function to describe preferred central distances of individual atoms of types defined within amino acids. In this view, proteins are treated as mixtures of amino acids where restraints resulting from their covalent connectivity are ignored (except for cysteines). Any deviations from the spherical shape of the macromolecule, intrinsic rigidness imposed by the presence of secondary structures and local interactions are neglected: proteins are treated as compact solid-like bodies of atoms, where the isotropic hydrophobic segregation and packing are considered to be the dominant driving forces conferring spatial organization of residues [40–42].
The classic analysis of just several protein structures suggested that the sole orientational preferences of side chains can be a criterion for the hydrophobic or hydrophilic character . Therefore, although a multitude of hydrophobicity scales or burial indices are available for (whole) amino acids and many knowledge-based pair-potentials are constructed for (united) residue side chains , we decided to act on the per-atom rather than per-residue basis in order to account for (radial) orientational preferences of residues. The actual amino acid composition of a protein influences its native structure topology [45, 46], folding type [47, 48] and interactions . In our statistical model, for a protein with a known amino acid abundance we assume that the relative probabilities are directly proportional to the stoichiometry. In our approach to the function prediction, every heavy atom in every amino acid of the protein considered has the measure of its unexpectedness estimated with respect to all possible atom types in a given point of space. The measure depends solely on the distance from the geometric center of the polymer. Typically, residues that place their atoms in the least probable central distances appear to contribute to the creation of ligand binding sites (including active sites of enzymes) or protein-protein binding interfaces.
Extraction of a non-redundant set of globular proteins
We examined a total of 172 265 protein chains as deposited in RCSB PDB  in January 2011 and excluded structures of high asymmetry or in other aspects irregular. Two geometric descriptors were used discriminatively: asphericity, calculated as the normalized sum of squared differences of the eigenvalues of the gyration tensor (according to ), was required to be smaller than 0.1 and compactness to be at least 0.5; the latter value was calculated as the ratio of the solvent accessible surface area of the (ideal) sphere of the volume of a considered protein to its actual solvent accessible surface area (this is a more intuitive inverse of the fraction introduced by Galzitskaya et al. ). Chains of sequence lengths smaller than 100 amino acids were excluded due to strong geometric constraints. Proteins that fulfill all the aforementioned conditions are denoted as globular in this paper.
Furthermore, it was required that every solved structure should contain no discontinuities, be determined with an experimental method to a resolution better than 2 Å, contain only a single domain (according to both SCOP  and CATH  classifications) and must not create multi-chain complexes, even transiently (determined on the basis of biological units assemblies available from PDB). A total of 2953 proteins were extracted for further considerations (1.71% of the whole PDB).
In the last step, in order to reduce sequence redundancy, precomputed clustering results available from the PDB, generated by the Cd-hit program  that grouped sequences of at least 90% of sequence identity in clusters, were used to select a single protein per every cluster. Finally, the learning data set comprised 775 high-resolution single-domain globular chains (26.2% of previously selected chains). The full list of PDB ids is available in Additional file 1 Table S1.
Compactness and asphericity of proteins in the set turned out to be only weakly interdependent (correlation coefficient, CC, -0.14). Longer chains were characterized by lower compactness (CC = -0.45) but not necessarily higher asphericity (CC = -0.06). Distributions and dependencies of geometric descriptors are presented in the Additional file 2 Figure S1.
Probabilistic description of atomic burials
Geometric centers and radii of gyration were calculated for every chain in the learning set. Distances to the geometric center of a chain of every heavy atom, r, were divided by the radius of gyration of the whole chain, r g , enabling a uniform view of globular proteins of various sizes . Histograms of such normalized distances, R = r/r g , were collected for every amino acid-dependent atom type denoted by τ. Three types of cysteines were considered separately: generic Cys (irrespective of the presence or absence of SS bonding), Cys creating (intra-chain) disulfide bridges (denoted CSS, nearly 40% of all Cys) and Cys reduced and not involved in SS bridging (CSH). A total of 170 histograms for different τ were obtained.
After applying the direct least-squares method for fitting individual histograms, obtained fits yielded unsatisfactory sums of the squared residuals (SSR) for atoms in hydrophilic residues, where the expression overestimated their propensity to occur in the protein core. To account for this observation, the assumption of the strictly quadratic increase was abandoned and an additional tunable parameter, γ τ , was introduced while α τ was set to 1 (see Additional file 3 Figure S2). The following form was finally used:
for fitting. Parameter A τ provides normalization, μ τ principally determines location, β τ influences the width of the distribution and γ τ controls convexity of the left ridge. The goodness-of-fit of distributions of the latter form was better for 124 of 170 fits (in terms of SSR) in comparison to the original distribution function with variable α (Equation 1) and for 130 of 170 fits (F-test with p-value < 0.000001) in comparison to the original distribution function with α = 1.
Expected atomic burials in proteins
Prediction of functionally important residues
which gives estimates in bits.
Prediction of ligand binding sites
As for compact structures it holds that r g is roughly proportional to (sequence length)1/3 and as in the task of binding sites recognition one is interested primarily in non-buried residues on the surface, the area of which is proportional to , as a rule of thumb, residues containing the most unexpected atoms are initially selected. (However, assuming the general spatial character of the statistical model, no additional factors such as estimates of solvent accessibility are taken into account.) Selected residues are weighted proportionally to the maximum value of unexpectedness among values assigned to constituent atoms and then clustered hierarchically using the pairwise average-linkage method. In search for ligand binding sites, the hierarchy of residues is partitioned into clusters separated by more than 7 Å (average Euclidean distance) that indicate (possibly multiple) putative sites. Positions of cluster centroids are computed in a weighted manner and located closer to the most unexpected atoms. Putative sites are ranked according to the proximity of their predicted centroids to the geometric center of the whole protein.
Prediction of protein-protein interfaces
Contrary to the development of the complete algorithm for the prediction of binding sites of (small) ligands, we do not attempt to create a new protein-protein docking method but rather to provide a simple unexpectedness-based scoring function for the ranking of docking predictions. Heavy atoms of one protein located within a distance of 10 Å from the other have their unexpectedness calculated and a maximum value of unexpectedness is found in this way for both macromolecules of a docked assembly. A docking prediction is then scored using the average of the highest values of unexpectedness in two interfaces.
Evaluation of predictions
The evaluation of the method based on the introduced characteristics was performed separately for the task of predicting binding sites of small ligands and for the prediction of regions creating interfaces to other proteins. In both cases, if a test data set allowed, predictions were made for unbound structures; after the assignment, the apo form was superimposed onto the holo form so that intermolecular distances were measured between the unbound structure and ligand/another macromolecule as located in the structure of the complex.
For the prediction of ligand binding sites, a set of 48 pairs of unbound/bound structures and a set of 210 bound structures, which were already employed for the benchmarking of other methods (LigSitecsc and IBIS ), were used for the comparison with already measured success rates of the state of the art geometry-based methods: SURFNET , PASS  and LigSite . The former set, further referred to as the LB48 test set, includes 38 enzymes that cover 39 diverse enzymatic activities according to the EC annotations from the Catalytic Sites Atlas version 2.2.12  and 10 proteins that bind compounds in their non-active sites. The latter set, referred to as the LB210 test set, enabled large-scale benchmarking.
In order to juxtapose the results of our approach and similar fuzzy oil drop-based method (FOD), which assign prediction scores to clusters of atoms, with pocket identification methods, which indicate geometric centers of pockets located over the molecular surface, we used MSMS  and projected coordinates of centroids of putative binding sites onto the solvent-excluded molecular surface. Then, in order to apply the cut-off value of 4 Å used in pocket prediction benchmarks, we displaced surface-projected coordinates by 1 Å in the direction of the vector normal to the surface and 1 Å outwards from the geometric center of the protein. As the points do not always lie the space in the pocket, additionally we used the cut-off of 6 Å. We examined whether any atom of the ligand is located within the cut-off distance and reported success rates for the best ranked (Top 1) and 3 highest ranked (Top 3) candidate sites.
In order to show, preliminarily, that the unexpectedness is a property of protein-protein interfaces, we used the latest and most extensive docking benchmark (version 4.0) , further referred to as the PPI176 test set. Residues of two macromolecules were considered as interfacing if they were separated by at most 4 Å. In the case of protein-protein binding interfaces, unexpected residues are usually isolated, so we did not cluster them, but rather reported the average unexpectedness in binding/non-binding protein regions.
Eventually, the capability of appropriate ranking of protein-protein docking predictions was compared to that of one of the best performing docking algorithms, ZDock , optionally amended with ZRank , and two other methods, recent ASP-Dock  and older FTDock . The methods have their success rates already measured over the complete protein docking benchmark version 3.0 , so this set (referred to as the PPI124 test set) was used to estimate the capacity of our approach. The unexpectedness-based score assessed 54,000 docking poses of a decoy generated by ZDock 3.0 operating at the rotational scanning interval of 6°. A successful prediction was defined as a docking solution of ligand C α RMSD < 10 Å.
Comparison with other characteristics
A direct evaluation of the current method was performed in parallel with the fuzzy oil drop (FOD) method  using the LB48 test set. The same clustering and ranking methods were used for residues with the highest unexpectedness and for residues of the highest observed vs. theoretical hydrophobicity discrepancy, (FOD). For the detailed comparison with other explorable characteristics, useful for the prediction of (small) ligand binding sites, the evolutionary conservation scores were assigned to residues according to the multiple-sequence alignment-based ConSurf-DB ; only residues of the highest conservation score (i.e. 9) are indicated in this paper. Independently, the clusters of ionisable residues with anomalous predicted titration behaviour, identified with the finite difference Poisson-Boltzmann-based technique, Thematics , were included in the comparison.
Orientational preferences of amino acids
Parameters of probability distribution functions given by Equation 2, A τ , μ τ β τ and γ τ , were determined independently for every amino acid-dependent atom type, τ, allowing to capture the specific radial orientational propensities of amino acids. The full list of 170 sets of parameters for atomic distribution functions derived from the obtained learning set can be found in the Additional file 4 Table S2. Since the structure of side chains allows to single out the atom most distant from the C α atom, it is possible to capture and demonstrate preferred orientations using a less redundant description. We decided to evaluate unexpectedness of every atom uniformly motivated by the fact that among 83 distributions of all side chain heavy atom types as many as 58 were statistically significantly different than distributions of relevant C α atoms (Kolmogorov-Smirnov tests with p-value < 0.000001; see Additional file 4 Table S2 for details).
Although side chains determine the hydrophobic/hydrophilic character of amino acids, they influence considerably probabilities of spatial occurrence of (chemically equivalent across amino acid types) C α atoms. In the synthetic picture of atomic densities (Figure 1 and Additional file 5 Figure S3), hydrophobic propensities of amino acids in the body of a protein are modulated by their sizes: broad distributions of Gly and Ala atoms are shifted from those of other hydrophobic types; distributions of large amino acids, such as Trp or Arg, are less dispersed around their maxima; the broad distribution of His can be explained by diverse possible protonation states and the ambivalent distribution of Tyr - by mixed aromatic/polar character of its side chain.
The analysis of the intriguing case of Cys reveals that, although their orientation does not depend on the possible disulfide bonding, the non-bridging cysteines prevail as the most buried residues, while those constituting cystines occur more often on the protein surface (Figure 1; Additional file 6 Figure S4). Cysteines are relatively frequently found in active sites ; supposedly, the evolution may easily redefine the function of a protein by tailoring the state of cysteines and adjusting their positions .
Distribution of unexpectedness
Correlations of mean values of distal side chain atom distributions to other characteristics
Description of the characteristics
Mean fractional area loss upon folding
Solvent accessibility based on self-information [16% accessibility]
Information value for accessibility [average fraction 35%]
Normalized eigenvector of the Sweet & Eisenberg scale
Mean combined polarity calculated from distributions of residues in proteins
Hydrophobicity coefficient in RP-HPLC [C4 with 0.1%TFA/MeCN/H2O]
Prediction of ligand binding sites
Benchmarks of several ligand binding site prediction methods
LB48 test set
LB210 test set
Residues in correctly predicted 3 top-ranked clusters
protease (HIV-2 retropepsin)
fatty acid binding
azobenzoic acid binding
(folic acid) reductase
Among the proteins annotated with EC numbers in the LB48 test set, 35 out of 38 enzymes have their active sites recognized in Top 3 clusters (31/38 in Top 1). Notwithstanding, out of 10 proteins that exhibit no enzymatic activity and bind ligands in their non-active sites, binding sites are properly recognized in only 5 cases, mainly because of their eccentric locations (see Additional file 7 Table S3 for details).
Ranking of protein-protein docking results
Comparison to the fuzzy oil drop model
We developed a web server SurpResi for the prediction of functionally important sites based on the unusual central distances of atoms. The input of SurpResi server is a Protein Data Bank (PDB) file or user file in the PDB format. The output is a downloadable PDB file where the column of beta factors is replaced by the unexpectedness and the occupancy is replaced by the same value normalized to the range [0,1] over all protein atoms. In the header section, the file contains detailed information about clustering and ranking of clusters. The web server and source code are freely available at http://www.bioinformatics.org/surpresi.
The presented approach quantifies polar and directional propensities of amino acids using the partition in the knowledge-based continuous gradient of hydrophobicity generated by the protein itself. It yields a middle level of description of hydrophobic preferences between (coarse-grained) scales of hydrophobicity and (fine-grained) residue-residue contact matrices, where more specific local effects such as homophilic, counterion or phenyl rings interactions can be expressed explicitly . It has been already demonstrated that reduced representations and global geometric potentials are capable of a quantitative description of protein-ligand binding sites [75, 76].
The adopted view concentrates on the characterization of proteins not assuming any specific chemical properties of ligands. Although based on a statistical model parametrized assuming spherical shapes of proteins (resembling the assumption behind the generalized Born solvation model), the method works well for moderately aspherical macromolecules, allowing for not only descriptive but also predictive applications. We do not incorporate into the identification method any additional features, such as the solvent accessible area or evolutionary conservation; the direct distance to the centroid was used only for the ranking in order to enable fair comparison with the FOD method; our measure is assigned homogeneously and isotropically in the whole protein volume, thus allowing for the examination of the predictive potential of the sole unexpectedness.
Favorable outcomes of our approach, especially when applied to enzymatic active sites, can be explained by analyzing the consequences of the requirement of the precise and resolute positioning of a ligand (as the prerequisite for chemical specificity), which can be best fulfilled by the creation of a binding pocket . The burial of (still accessible) charged amino acids or the exposure of (partially unburied) conjugated aromatic ones, which are essential from the point of view of the mechanisms of the catalytic reactions, are not commensurate with their general expected radial positions in the bulk protein body. Frequently, despite their indented locations, pocket residues cannot be predominantly apolar as well, because of the need for the presence of bound water molecules assisting the catalysis (involved in, e.g., nucleophilic attack).
The most unexpected atoms are usually found in the deep-set parts of the pockets. The atomic depth has been found to be correlated with residue conservation [78, 79] (more conserved amino acids create more contacts), which provides the explanation for the overlap between the sets of unexpected and conserved residues. It has been found, based on electrostatics, that functional sites comprise the most destabilizing residues . Similarly, the unexpected amino acids are those introducing a local hydrophobic mismatch, plausibly counterbalanced by the formation of salt bridges and hydrogen bonding. The relation of the unexpectedness to the electrostatics is not, however, as simple as in the case of the conservation: buried charged residues can be encountered occasionally. It has been also demonstrated that electrostatic and hydrophobic interactions may compete . This interplay is important with respect to the desolvation energy. The ease of desolvation is strongly predictive of protein-binding interfaces  and influences intricately ligand binding affinities . As the hydrophobic interactions are dominant at protein interfaces , indicated scattered residues at the surface likely coincide with the view of the small fraction of hot-spots, which account for the majority of the binding energy .
Our approach yielded sets of parameters for every atom in an amino acid of a given type that is similar to the construction of a hydrophobicity scale, because the amount of information needed to characterize a protein is linearly proportional to the length of its sequence. The introduction of information-theoretic interpretation of hydrophobicity distributions may lead to valuable insights . One result of the meeting of hydrophobicity and information theory, especially noteworthy in this context, supports our approach by demonstrating improvements in contact potentials tailored to the compositional properties of the sequences of interest .
The "mixture model" used in Equation 3 may be tuned via the expectation-maximization procedure to better fit the idealized distribution of the mass in individual proteins. However, we observed no improvement in the performance of the predictions for tuned forms, probably due to the already balanced composition of hydrophobic and polar amino acids in proteins selected by nature . In this view, it would be interesting to check whether sequences of disordered or unfoldable structures give "mixture models" that deviate significantly from compact atomic distributions. It seems to be possible to apply the method from the smoothed surface towards the protein interior to some depth, and in this way cover proteins of more irregular shapes, consequently surpassing the most severe limitation of the approach. The attempt would require, however, the inquiry into the structure of hydrophobic cores in elongated or bent proteins.
The method is expected to be applicable for the functional annotation of low resolution structures, e.g., those resulting from mature homology modeling pipelines. Crude estimates of unexpectedness may be advantageous over computational geometry-based methods requiring precise atomic coordinates of active sites, where residues or even whole loops undergo significant displacements, not obeying the classic lock-and-key model .
We present an approach that captures orientational propensities of amino acids in globular proteins and offers a balanced description of their hydrophobic preferences. The description is created at the granularity of individual (amino acid-dependent types of) atoms but does not enumerate explicitly all possible interactions between them.
The approach is useful for the construction of a generic method that quantifies the unexpectedness of occurrences of individual atoms in a given distance from the geometric center of a protein. It turns out that the characteristics can be applied to the recognition of binding sites of both small ligands (enzymatic active sites) and other proteins (protein-protein interfaces).
The author would like to thank prof. I. Roterman for reading a preliminary version of the manuscript and dr. K. Prymula for discussions. A computational grant from the Academic Computer Center (ACK) CYFRONET AGH (MNiSW/IBM_BC_HS21/UJ/049/2009) is acknowledged.
- Li YY, Hou TJ, Goddard WA: Computational modeling of structure-function of G protein-coupled receptors with applications for drug design. Curr Med Chem 2010, 17(12):1167–80. 10.2174/092986710790827807View ArticlePubMedGoogle Scholar
- Fiorucci S, Zacharias M: Binding site prediction and improved scoring during flexible protein-protein docking with ATTRACT. Proteins 2010, 78(15):3131–9. 10.1002/prot.22808View ArticlePubMedGoogle Scholar
- Seffernick JL, de Souza ML, Sadowsky MJ, Wackett LP: Melamine deaminase and atrazine chlorohydrolase: 98 percent identical but functionally different. J Bacteriol 2001, 183(8):2405–10. 10.1128/JB.183.8.2405-2410.2001PubMed CentralView ArticlePubMedGoogle Scholar
- Ivanisenko VA, Pintus SS, Grigorovich DA, Kolchanov NA: PDBSiteScan: a program for searching for active, binding and posttranslational modification sites in the 3D structures of proteins. Nucleic Acids Res 2004, W549–54.Google Scholar
- Jambon M, Imberty A, Deléage G, Geourjon C: A new bioinformatic approach to detect common 3D sites in protein structures. Proteins 2003, 52(2):137–45. 10.1002/prot.10339View ArticlePubMedGoogle Scholar
- Doppelt-Azeroual O, Delfaud F, Moriaud F, de Brevern AG: Fast and automated functional classification with MED-SuMo: an application on purinebinding proteins. Protein Sci 2010, 19(4):847–67. 10.1002/pro.364PubMed CentralView ArticlePubMedGoogle Scholar
- Brylinski M, Skolnick J: A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation. Proc Natl Acad Sci USA 2008, 105: 129–34. 10.1073/pnas.0707684105PubMed CentralView ArticlePubMedGoogle Scholar
- Thangudu RR, Tyagi M, Shoemaker BA, Bryant SH, Panchenko AR, Madej T: Knowledge-based annotation of small molecule binding sites in proteins. BMC Bioinformatics 2010, 11: 365. 10.1186/1471-2105-11-365PubMed CentralView ArticlePubMedGoogle Scholar
- Laskowski RA: SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J Mol Graph 1995, 13(5):323–30, 307–8. 10.1016/0263-7855(95)00073-9View ArticlePubMedGoogle Scholar
- Brady GP Jr, Stouten PF: Fast prediction and visualization of protein binding pockets with PASS. J Comput Aided Mol Des 2000, 14(4):383–401. 10.1023/A:1008124202956View ArticlePubMedGoogle Scholar
- Levitt DG, Banaszak LJ: POCKET: a computer graphics method for identifying and displaying protein cavities and their surrounding amino acids. J Mol Graph 1992, 10(4):229–34. 10.1016/0263-7855(92)80074-NView ArticlePubMedGoogle Scholar
- Hendlich M, Rippmann F, Barnickel G: LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J Mol Graph Model 1997, 15(6):359–63, 389. 10.1016/S1093-3263(98)00002-3View ArticlePubMedGoogle Scholar
- Weisel M, Proschak E, Schneider G: PocketPicker: analysis of ligand binding-sites with shape descriptors. Chem Cent J 2007, 1: 7. 10.1186/1752-153X-1-7PubMed CentralView ArticlePubMedGoogle Scholar
- Liang J, Edelsbrunner H, Woodward C: Anatomy of protein pockets and cavities: measurement of binding site geometry and implications for ligand design. Protein Sci 1998, 7(9):1884–97. 10.1002/pro.5560070905PubMed CentralView ArticlePubMedGoogle Scholar
- Le Guilloux V, Schmidtke P, Tuffery P: Fpocket: an open source platform for ligand pocket detection. BMC Bioinformatics 2009, 10: 168. 10.1186/1471-2105-10-168PubMed CentralView ArticlePubMedGoogle Scholar
- Coleman RG, Sharp KA: Protein pockets: inventory, shape, and comparison. J Chem Inf Model 2010, 50(4):589–603. 10.1021/ci900397tPubMed CentralView ArticlePubMedGoogle Scholar
- Yuan Z, Zhao J, Wang ZX: Flexibility analysis of enzyme active sites by crystallographic temperature factors. Protein Eng 2003, 16(2):109–14. 10.1093/proeng/gzg014View ArticlePubMedGoogle Scholar
- Elcock AH: Prediction of functionally important residues based solely on the computed energetics of protein structure. J Mol Biol 2001, 312(4):885–96. 10.1006/jmbi.2001.5009View ArticlePubMedGoogle Scholar
- Bate P, Warwicker J: Enzyme/non-enzyme discrimination and prediction of enzyme active site location using charge-based methods. J Mol Biol 2004, 340(2):263–76. 10.1016/j.jmb.2004.04.070View ArticlePubMedGoogle Scholar
- Laurie ATR, Jackson RM: Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites. Bioinformatics 2005, 21(9):1908–16. 10.1093/bioinformatics/bti315View ArticlePubMedGoogle Scholar
- Brylinski M, Prymula K, Jurkowski W, Kochańczyk M, Stawowczyk E, Konieczny L, Roterman I: Prediction of functional sites based on the fuzzy oil drop model. PLoS Comput Biol 2007, 3(5):e94. 10.1371/journal.pcbi.0030094PubMed CentralView ArticlePubMedGoogle Scholar
- Oda A, Yamaotsu N, Hirono S: Evaluation of the searching abilities of HBOP and HBSITE for binding pocket detection. J Comput Chem 2009, 30(16):2728–37. 10.1002/jcc.21299View ArticlePubMedGoogle Scholar
- Bagley SC, Altman RB: Characterizing the microenvironment surrounding protein sites. Protein Sci 1995, 4(4):622–35.PubMed CentralView ArticlePubMedGoogle Scholar
- Jones S, Thornton JM: Prediction of protein-protein interaction sites using patch analysis. J Mol Biol 1997, 272: 133–43. 10.1006/jmbi.1997.1233View ArticlePubMedGoogle Scholar
- Ondrechen MJ, Clifton JG, Ringe D: THEMATICS: a simple computational predictor of enzyme function from structure. Proc Natl Acad Sci USA 2001, 98(22):12473–8. 10.1073/pnas.211436698PubMed CentralView ArticlePubMedGoogle Scholar
- Bordner AJ: Predicting small ligand binding sites in proteins using backbone structure. Bioinformatics 2008, 24(24):2865–71. 10.1093/bioinformatics/btn543PubMed CentralView ArticlePubMedGoogle Scholar
- Cilia E, Passerini A: Automatic prediction of catalytic residues by modeling residue structural neighborhood. BMC Bioinformatics 2010, 11: 115. 10.1186/1471-2105-11-115PubMed CentralView ArticlePubMedGoogle Scholar
- Panjkovich A, Daura X: Assessing the structural conservation of protein pockets to study functional and allosteric sites: implications for drug discovery. BMC Struct Biol 2010, 10: 9. 10.1186/1472-6807-10-9PubMed CentralView ArticlePubMedGoogle Scholar
- Burgoyne NJ, Jackson RM: Predicting protein interaction sites: binding hot-spots in protein-protein and protein-ligand interfaces. Bioinformatics 2006, 22(11):1335–42. 10.1093/bioinformatics/btl079View ArticlePubMedGoogle Scholar
- Tong W, Wei Y, Murga LF, Ondrechen MJ, Williams RJ: Partial order optimum likelihood (POOL): maximum likelihood prediction of protein active site residues using 3D structure and sequence properties. PLoS Comput Biol 2009, 5: e1000266. 10.1371/journal.pcbi.1000266PubMed CentralView ArticlePubMedGoogle Scholar
- Capra JA, Laskowski RA, Thornton JM, Singh M, Funkhouser TA: Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput Biol 2009, 5(12):e1000585. 10.1371/journal.pcbi.1000585PubMed CentralView ArticlePubMedGoogle Scholar
- Huang B, Schroeder M: LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation. BMC Struct Biol 2006, 6: 19. 10.1186/1472-6807-6-19PubMed CentralView ArticlePubMedGoogle Scholar
- Bray T, Chan P, Bougouffa S, Greaves R, Doig AJ, Warwicker J: SitesIdentify: a protein functional site prediction tool. BMC Bioinformatics 2009, 10: 379. 10.1186/1471-2105-10-379PubMed CentralView ArticlePubMedGoogle Scholar
- Laskowski RA, Watson JD, Thornton JM: ProFunc: a server for predicting protein function from 3D structure. Nucleic Acids Res 2005, W89–93.Google Scholar
- Huang B: MetaPocket: a meta approach to improve protein ligand binding site prediction. OMICS 2009, 13(4):325–30. 10.1089/omi.2009.0045View ArticlePubMedGoogle Scholar
- Brylinski M, Kochańczyk M, Konieczny L, Roterman I: Sequence-structure-function relation characterized in silico. In Silico Biol 2006, 6(6):589–600.PubMedGoogle Scholar
- Jones S, Thornton JM: Analysis of protein-protein interaction sites using surface patches. J Mol Biol 1997, 272: 121–32. 10.1006/jmbi.1997.1234View ArticlePubMedGoogle Scholar
- Konieczny L, Brylinski M, Roterman I: Gauss-function-based model of hydrophobicity density in proteins. In Silico Biol 2006, 6(1–2):15–22.PubMedGoogle Scholar
- Gomes ALC, de Rezende JR, Pereira de Araújo AF, Shakhnovich EI: Description of atomic burials in compact globular proteins by Fermi-Dirac probability distributions. Proteins 2007, 66(2):304–20.View ArticlePubMedGoogle Scholar
- Kauzmann W: Some factors in the interpretation of protein denaturation. Adv Protein Chem 1959, 14: 1–63.View ArticlePubMedGoogle Scholar
- Richards FM, Lim WA: An analysis of packing in the protein folding problem. Q Rev Biophys 1993, 26(4):423–98. 10.1017/S0033583500002845View ArticlePubMedGoogle Scholar
- Dill KA: Dominant forces in protein folding. Biochemistry 1990, 29(31):7133–55. 10.1021/bi00483a001View ArticlePubMedGoogle Scholar
- Rackovsky S, Scheraga HA: Hydrophobicity, hydrophilicity, and the radial and orientational distributions of residues in native proteins. Proc Natl Acad Sci USA 1977, 74(12):5248–51. 10.1073/pnas.74.12.5248PubMed CentralView ArticlePubMedGoogle Scholar
- Jha AN, Vishveshwara S, Banavar JR: Amino acid interaction preferences in proteins. Protein Sci 2010, 19(3):603–16. 10.1002/pro.339View ArticlePubMedGoogle Scholar
- Nishikawa K, Ooi T: Correlation of the amino acid composition of a protein to its structural and biological characters. J Biochem 1982, 91(5):1821–4.PubMedGoogle Scholar
- Taguchi Yh, Gromiha MM: Application of amino acid occurrence for discriminating different folding types of globular proteins. BMC Bioinformatics 2007, 8: 404. 10.1186/1471-2105-8-404PubMed CentralView ArticlePubMedGoogle Scholar
- Ma BG, Chen LL, Zhang HY: What determines protein folding type? An investigation of intrinsic structural properties and its implications for understanding folding mechanisms. J Mol Biol 2007, 370(3):439–48. 10.1016/j.jmb.2007.04.051View ArticlePubMedGoogle Scholar
- Rackovsky S: Global characteristics of protein sequences and their implications. Proc Natl Acad Sci USA 2010, 107(19):8623–6. 10.1073/pnas.1001299107PubMed CentralView ArticlePubMedGoogle Scholar
- Roy S, Martinez D, Platero H, Lane T, Werner-Washburne M: Exploiting amino acid composition for predicting protein-protein interactions. PLoS One 2009, 4(11):e7813. 10.1371/journal.pone.0007813PubMed CentralView ArticlePubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28: 235–42. 10.1093/nar/28.1.235PubMed CentralView ArticlePubMedGoogle Scholar
- Baumgärtner A: Shapes of flexible vesicles at constant volume. J Chem Phys 1993, 98: 7496–7501. 10.1063/1.464689View ArticleGoogle Scholar
- Galzitskaya OV, Bogatyreva NS, Ivankov DN: Compactness determines protein folding type. J Bioinform Comput Biol 2008, 6(4):667–80. 10.1142/S0219720008003618View ArticlePubMedGoogle Scholar
- Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995, 247(4):536–40.PubMedGoogle Scholar
- Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM: CATH - a hierarchic classification of protein domain structures. Structure 1997, 5(8):1093–108. 10.1016/S0969-2126(97)00260-8View ArticlePubMedGoogle Scholar
- Li W, Jaroszewski L, Godzik A: Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics 2001, 17(3):282–3. 10.1093/bioinformatics/17.3.282View ArticlePubMedGoogle Scholar
- Brylinski M, Kochanczyk M, Broniatowska E, Roterman I: Localization of ligand binding site in proteins identified in silico. J Mol Model 2007, 13(6–7):665–75. 10.1007/s00894-007-0191-xView ArticlePubMedGoogle Scholar
- Arteca GA: Scaling behavior of some molecular shape descriptors of polymer chains and protein backbones. Phys Rev E 1994, 49(3):2417–2428. 10.1103/PhysRevE.49.2417View ArticleGoogle Scholar
- Porter CT, Bartlett GJ, Thornton JM: The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res 2004, D129–33.Google Scholar
- Sanner MF, Olson AJ, Spehner JC: Reduced surface: an efficient way to compute molecular surfaces. Biopolymers 1996, 38(3):305–20. 10.1002/(SICI)1097-0282(199603)38:3<305::AID-BIP4>3.0.CO;2-YView ArticlePubMedGoogle Scholar
- Hwang H, Vreven T, Janin J, Weng Z: Protein-protein docking benchmark version 4.0. Proteins 2010, 78(15):3111–4. 10.1002/prot.22830PubMed CentralView ArticlePubMedGoogle Scholar
- Mintseris J, Pierce B, Wiehe K, Anderson R, Chen R, Weng Z: Integrating statistical pair potentials into protein complex prediction. Proteins 2007, 69(3):511–20. 10.1002/prot.21502View ArticlePubMedGoogle Scholar
- Pierce B, Weng Z: ZRANK: reranking protein docking predictions with an optimized energy function. Proteins 2007, 67(4):1078–86. 10.1002/prot.21373View ArticlePubMedGoogle Scholar
- Li L, Guo D, Huang Y, Liu S, Xiao Y: ASPDock: protein-protein docking algorithm using atomic solvation parameters model. BMC Bioinformatics 2011, 12: 36. 10.1186/1471-2105-12-36PubMed CentralView ArticlePubMedGoogle Scholar
- Gabb HA, Jackson RM, Sternberg MJ: Modelling protein docking using shape complementarity, electrostatics and biochemical information. J Mol Biol 1997, 272: 106–20. 10.1006/jmbi.1997.1203View ArticlePubMedGoogle Scholar
- Hwang H, Pierce B, Mintseris J, Janin J, Weng Z: Protein-protein docking benchmark version 3.0. Proteins 2008, 73(3):705–9. 10.1002/prot.22106PubMed CentralView ArticlePubMedGoogle Scholar
- Glaser F, Pupko T, Paz I, Bell RE, Bechor-Shental D, Martz E, Ben-Tal N: ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics 2003, 19::163–4. 10.1093/bioinformatics/19.1.163View ArticleGoogle Scholar
- Eisenberg D, Weiss RM, Terwilliger TC: The hydrophobic moment detects periodicity in protein hydrophobicity. Proc Natl Acad Sci USA 1984, 81: 140–4. 10.1073/pnas.81.1.140PubMed CentralView ArticlePubMedGoogle Scholar
- Wu S, Liu T, Altman RB: Identification of recurring protein structure microenvironments and discovery of novel functional sites around CYS residues. BMC Struct Biol 2010, 10: 4. 10.1186/1472-6807-10-4PubMed CentralView ArticlePubMedGoogle Scholar
- Marino SM, Gladyshev VN: Cysteine function governs its conservation and degeneration and restricts its utilization on protein surfaces. J Mol Biol 2010, 404(5):902–16. 10.1016/j.jmb.2010.09.027PubMed CentralView ArticlePubMedGoogle Scholar
- Klotz IM: Comparison of molecular structures of proteins: helix content; distribution of apolar residues. Arch Biochem Biophys 1970, 138(2):704–6. 10.1016/0003-9861(70)90401-7View ArticlePubMedGoogle Scholar
- Lins L, Thomas A, Brasseur R: Analysis of accessible surface of residues in proteins. Protein Sci 2003, 12(7):1406–17. 10.1110/ps.0304803PubMed CentralView ArticlePubMedGoogle Scholar
- Meirovitch H, Rackovsky S, Scheraga HA: Empirical studies of hydrophobicity. 1. Effect of protein size on the hydrophobic behavior of amino acids. Macromolecules 1980, 13(6):1398–1405. 10.1021/ma60078a013View ArticleGoogle Scholar
- Ben-Shimon A, Eisenstein M: Looking at enzymes from the inside out: the proximity of catalytic residues to the molecular centroid can be used for detection of active sites and enzyme-ligand interfaces. J Mol Biol 2005, 351(2):309–26. 10.1016/j.jmb.2005.06.047View ArticlePubMedGoogle Scholar
- Singer MS, Vriend G, Bywater RP: Prediction of protein residue contacts with a PDB-derived likelihood matrix. Protein Eng 2002, 15(9):721–5. 10.1093/protein/15.9.721View ArticlePubMedGoogle Scholar
- Xie L, Bourne PE: A robust and efficient algorithm for the shape description of protein structures and its application in predicting ligand binding sites. BMC Bioinformatics 2007, 8(Suppl 4):S9. 10.1186/1471-2105-8-S4-S9PubMed CentralView ArticlePubMedGoogle Scholar
- Feldman HJ, Labute P: Pocket similarity: are alpha carbons enough? J Chem Inf Model 2010, 50(8):1466–75. 10.1021/ci100210cView ArticlePubMedGoogle Scholar
- Campbell SJ, Gold ND, Jackson RM, Westhead DR: Ligand binding: functional site location, similarity and docking. Curr Opin Struct Biol 2003, 13(3):389–95. 10.1016/S0959-440X(03)00075-7View ArticlePubMedGoogle Scholar
- Godzik A, Sander C: Conservation of residue interactions in a family of Ca-binding proteins. Protein Eng 1989, 2(8):589–96. 10.1093/protein/2.8.589View ArticlePubMedGoogle Scholar
- Pintar A, Carugo O, Pongor S: Atom depth in protein structure and function. Trends Biochem Sci 2003, 28(11):593–7. 10.1016/j.tibs.2003.09.004View ArticlePubMedGoogle Scholar
- Wang L, Friesner RA, Berne BJ: Competition of electrostatic and hydrophobic interactions between small hydrophobes and model enclosures. J Phys Chem B 2010, 114(21):7294–301. 10.1021/jp100772wPubMed CentralView ArticlePubMedGoogle Scholar
- Wang L, Berne BJ, Friesner RA: Ligand binding to protein-binding pockets with wet and dry regions. Proc Natl Acad Sci USA 2011, 108(4):1326–30. 10.1073/pnas.1016793108PubMed CentralView ArticlePubMedGoogle Scholar
- Jones S, Thornton JM: Principles of protein-protein interactions. Proc Natl Acad Sci USA 1996, 93: 13–20. 10.1073/pnas.93.1.13PubMed CentralView ArticlePubMedGoogle Scholar
- Tuncbag N, Gursoy A, Keskin O: Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy. Bioinformatics 2009, 25(12):1513–20. 10.1093/bioinformatics/btp240View ArticlePubMedGoogle Scholar
- Pereira de Araujo AF, Onuchic JN: A sequence-compatible amount of native burial information is sufficient for determining the structure of small globular proteins. Proc Natl Acad Sci USA 2009, 106(45):19001–4. 10.1073/pnas.0910851106PubMed CentralView ArticlePubMedGoogle Scholar
- Solis AD, Rackovsky SR: Information-theoretic analysis of the reference state in contact potentials used for protein structure prediction. Proteins 2010, 78(6):1382–97.PubMed CentralPubMedGoogle Scholar
- Bastolla U, Porto M, Roman HE, Vendruscolo M: Principal eigenvector of contact matrices and hydrophobicity profiles in proteins. Proteins 2005, 58: 22–30.View ArticlePubMedGoogle Scholar
- Schmidt A, Lamzin VS: Internal motion in protein crystal structures. Protein Sci 2010, 19(5):944–53.PubMed CentralPubMedGoogle Scholar
- Rose GD, Geselowitz AR, Lesser GJ, Lee RH, Zehfus MH: Hydrophobicity of amino acid residues in globular proteins. Science 1985, 229(4716):834–8. 10.1126/science.4023714View ArticlePubMedGoogle Scholar
- Naderi-Manesh H, Sadeghi M, Arab S, Moosavi Movahedi AA: Prediction of protein surface accessibility with information theory. Proteins 2001, 42(4):452–9. 10.1002/1097-0134(20010301)42:4<452::AID-PROT40>3.0.CO;2-QView ArticlePubMedGoogle Scholar
- Biou V, Gibrat JF, Levin JM, Robson B, Garnier J: Secondary structure prediction: combination of three different methods. Protein Eng 1988, 2(3):185–91. 10.1093/protein/2.3.185View ArticlePubMedGoogle Scholar
- Cornette JL, Cease KB, Margalit H, Spouge JL, Berzofsky JA, DeLisi C: Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins. J Mol Biol 1987, 195(3):659–85. 10.1016/0022-2836(87)90189-6View ArticlePubMedGoogle Scholar
- Guy HR: Amino acid side-chain partition energies and distribution of residues in soluble proteins. Biophys J 1985, 47: 61–70. 10.1016/S0006-3495(85)83877-7PubMed CentralView ArticlePubMedGoogle Scholar
- Wilce MCJ, Aguilar MI, Hearn MTW: Physicochemical basis of amino acid hydrophobicity scales: evaluation of four new scales of amino acid hydrophobicity coefficients derived from RP-HPLC of peptides. Analytical Chemistry 1995, 67(7):1210–1219. 10.1021/ac00103a012View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.