Atomic hydration potentials using a Monte Carlo Reference State (MCRS) for protein solvation modeling
© Rakhmanov and Makeev; licensee BioMed Central Ltd. 2007
Received: 15 September 2006
Accepted: 30 March 2007
Published: 30 March 2007
Accurate description of protein interaction with aqueous solvent is crucial for modeling of protein folding, protein-protein interaction, and drug design. Efforts to build a working description of solvation, both by continuous models and by molecular dynamics, yield controversial results. Specifically constructed knowledge-based potentials appear to be promising for accounting for the solvation at the molecular level, yet have not been used for this purpose.
We developed original knowledge-based potentials to study protein hydration at the level of atom contacts. The potentials were obtained using a new Monte Carlo reference state (MCRS), which simulates the expected probability density of atom-atom contacts via exhaustive sampling of structure space with random probes. Using the MCRS allowed us to calculate the expected atom contact densities with high resolution over a broad distance range including very short distances. Knowledge-based potentials for hydration of protein atoms of different types were obtained based on frequencies of their contacts at different distances with protein-bound water molecules, in a non-redundant training data base of 1776 proteins with known 3D structures. Protein hydration sites were predicted in a test set of 12 proteins with experimentally determined water locations. The MCRS greatly improves prediction of water locations over existing methods. In addition, the contribution of the energy of macromolecular solvation into total folding free energy was estimated, and tested in fold recognition experiments. The correct folds were preferred over all the misfolded decoys for the majority of proteins from the improved Rosetta decoy set based on the structure hydration energy alone.
MCRS atomic hydration potentials provide a detailed distance-dependent description of hydropathies of individual protein atoms. This allows placement of water molecules on the surface of proteins and in protein interfaces with much higher precision. The potentials provide a means to estimate the total solvation energy for a protein structure, in many cases achieving a successful fold recognition. Possible applications of atomic hydration potentials to structure verification, protein folding and stability, and protein-protein interactions are discussed.
Most biochemical and biophysical processes take place in aqueous solutions. Interaction with water is the dominant force driving protein folding, providing approximately 90% of the total structure stability [1, 2]. In many cases biological functions of macromolecules crucially depend on specific bound water molecules [3, 4]. Water molecules bound in macromolecular interfaces significantly contribute to recognition of proteins by other proteins or DNA . Successful ligand design also requires consideration of bound water molecules [6, 7].
Protein solvation at the molecular level has been studied with different approaches varying from first principle modeling [1, 8–10] to evolutionary considerations . Here, we present a new approach to this important problem, which is based upon knowledge-based potentials (KBP) that proved to be efficient for modeling atomic interactions in biopolymers [12, 13].
KBP are heuristic constructions  measuring the tendency of particular atoms and residues to form close contacts or to avoid each other in a macromolecular 3D structure. Statistical preferences of structure variables such as atom-atom contact distances in the conformation space are measured by log likelihood ratio . These preferences can also be measured in energy units, when multiplied by a kT factor, the absolute temperature times Boltzmann constant . The quasi-energy (the statistical preference) then takes the form of the Boltzmann equilibrium energy distribution:
Here f obs (d) is the observed frequency of contacts between atoms of two considered types at distance d in the database of macromolecular structures. Preferable atomic contact distances correspond to higher frequencies of atom contacts observed at this distance. It should be noted, that physical meaning of KBPs is not clearly defined, for instance, incorrect is a direct interpretation of (1) as Boltzmann distribution in some energy field . In this study, atom types were defined as both residue- and atom-specific, e.g., a CA_Val atom type was assigned for the Cα-atom of valine. The expected frequency of atom pair contacts f exp (d) is evaluated for a virtual state where the atoms do not interact, the so-called reference state. Calculation of the observed frequency f obs is relatively straightforward, although it depends on selection of the training data set, the binning procedure, and other technicalities. The principal difference between methods for construction of knowledge-based potentials lies in the definition of the reference state for calculation of f exp (d) [15, 18–21].
Historically, KBPs were developed in parallel with construction of more complex and elaborate reference states. In the pioneering work of Tanaka and Scheraga , equations for complex formation at low concentrations were used as an implicit reference state. The quasi-chemical approximation reference state  employed an equilibrium mixture of unconnected residues to derive distance-independent inter-residue contact energies. Sippl  introduced distance-dependent potentials, and explicit accounting for chain connectivity. The "uniform density" reference state in  was based on the distribution of distances between all residue pairs of different types separated by k residues in the protein backbone averaged over a large training set of known structures. Subsequently, Samudrala and Moult  used conditional probability formalism, evaluating the probability for two atoms a and b to be found within a particular distance shell or bin in a correctly folded structure. Skolnick et al  introduced their composition-corrected reference state, with a direct dependence on the amino acid composition of individual structures from the training set.
In all the examples above, reference states were constructed via averaging of observed distances over pairs of residues of all possible types. Such atom type-averaged reference states tend to underestimate the interactions common to all types of atoms, such as atom repulsion at close distances or protein compaction due to solvent expulsion. However, inclusion of additional terms can compensate the effects of protein packing and compactness .
At small distances KBP are also plagued by small statistics of observed interactions. Therefore, the atom-atom contacts at distances closer than 2.0–3.0 angstrom are usually treated separately with the interaction potential at such distances set to some arbitrary prohibitive value. In the "ideal gas" reference state  which is not atom type averaged, the potential is also calculated separately for distances greater than 3.0 Å. This approach is based on uniformly distributed points in finite spheres with a complicated empirical dependence of a sphere radius based on the length of the protein backbone.
In this work, we present a novel method for construction of the reference state, which we have called Monte Carlo Reference State (MCRS). This method utilizes random 3D points in the structure volume as (by definition) non-interacting probes for calculation of the expected contact probability density distance distribution. These random probes are evenly distributed in the structure space regardless of the structure elements, thus providing a true zero interaction energy reference for atom-atom interaction. Monte Carlo methods have been long and successfully used in protein folding modeling ([22, 25, 26]) and atom interaction studies . However, to our knowledge, no attempts have been made to use these methods for construction of the reference state.
Although MCRS technique can also be applied to produce knowledge-based potentials for interaction of protein atoms with each other, we used it to obtain KBP for hydration of protein atoms (atom hydration potentials, or AHP), based on the statistics of their contacts with structure-bound water molecules. Any atom locally distorts the interaction-free hydration density, obtained using MCRS. The direction and magnitude of this influence depends on the atom-water contact distance and on the hydropathy of the atoms of this type. Our aim in obtaining the AHP was to estimate the atomic hydropathy, quantitatively and in a distance dependent manner, for all types of atoms found in proteins.
Since water molecules are small and are not connected to the protein backbone, their location is primarily determined by intermolecular interactions of non-covalent nature. There are also comparatively many structure water molecules in the PDB database. Thus, the defined set of structural water contacts with different amino acids is relatively large, providing statistical power for our study. This suggests that KPBs for protein atom – water interaction can be obtained with greater detail and with less distortion than KPBs for residue-residue interaction used in protein folding.
Water distribution in structural space
The probability curve depends substantially on the size and the shape of the protein molecule, and on water content and distribution. Bell-shaped hydration distance distribution curves are typical for roughly globular structures. In contrast, a bimodal distribution is obtained with the hourglass-shaped FE hydrogenase (1HFE) subunit S, and the low-value, long-range distribution of 1JSU, subunit C reflects an elongated 3D structure. Because of this variability, we believe that it is not possible to generate a reliable reference state by measuring atom-water distances in the aggregate for the entire training set, or by using common geometric parameters for some generalized average structure. In our approach, the distance distribution of atom-water contacts observed in a given structure is normalized by the expected distance distribution from a random point to water for that structure. This procedure automatically takes into account variations in structure size, shape, bound water content, and pattern of water distribution.
Interaction of protein bound water molecules with each other
Although the structure of the liquid water and the distribution of the bound water molecules in protein structures are not directly related, the Fobs/Fexp ratio we obtained for protein crystal structure water agrees remarkably well with the g OO (r) plot for liquid water. The higher g OO (r) levels observed for protein crystals may be due to the excluded volume taken by protein atoms, which biases the expectation, and the fact that water molecules group around charged protein atoms, increasing the chances of observing one water molecule in the vicinity of another. The effective repulsion in the medium range in the g OO (r) for interacting structure water molecules is probably non-physical. In bulk water it appears to be a collective effect most likely related to formation of clusters of coordinated water molecules .
The three peaks at 2.75 Å, 4.5 Å and 7 Å reflect the layers of structured water molecules that are also present in water distance distribution around hydrophilic oxygen atoms within different protein groups as it can be seen from Figure 3(a).
Interaction of bound water molecules with protein atoms of different types
Comparison of the profiles shown in Figure 4(a),(b) and 4(c) reveals that the location of the first maximum of preferable hydration distance is determined by the van der Waals radii of the water oxygen and the corresponding protein atom. The optimal hydration distance of carbon atoms (3.7 Å, Figure 4(c)) is considerably larger than that of nitrogen and oxygen atoms (2.8 Å, Figure 4(a) and 4(b)). For oxygen and nitrogen, the secondary maximum is also present at about 4.9 Å and 5 Å respectively, with an additional maximum at 7.2 Å. These additional maxima for oxygen are much sharper than the corresponding maxima in liquid water. This likely reflects stabilization of the structural water sites by the protein crystal, which causes better molecular resolution. In addition, the protein hydration shell itself may be more structured than the liquid water . The first peak for oxygen and nitrogen atoms is generally narrower than for carbon atoms. This is probably related to the ability of hydrophilic oxygen and nitrogen atoms to form hydrogen bonds with water.
Atoms of similar types, but from different amino acids, may have different AHP, which reflects preferences for amino acid location in the protein hydrophobic core or at the surface, and the average solvent accessibility. For example, Ca atom of isoleucine has a very different hydration distance plot than the Ca atom of aspartate (Figure 4(c)). However, even the most hydrophobic residues and atoms, such as Cδ of isoleucine or Cε of tyrosine (Fig. 4(d,e)), are occasionally exposed in solvent even in folded proteins, and thus exhibit the first peak at the van der Waals distance. One can see from Figure 4(b), that Nz atom of lysine typically attracts significantly more water molecules than the backbone nitrogen of histidine. This happens because the Nz of lysine has a substantial partial charge, and because this atom at the tip of a long side chain is more solvent-accessible. Figure 4(c) also shows how a partial charge at a carbon atom (Cγ atom of aspartate) shifts the first hydration maximum to a smaller distance, most likely because the electrostatic forces of ion solvation attract water molecules. This effect is less pronounced for nitrogen or oxygen atoms.
The behavior of AHP at large distances also reflects the formation of hydrophobic core of the protein globule. For example, atoms within hydrophilic residues have a sharp peak at short distances, but a long basin in the range of 8–25 angstroms (Figure 4(e)). In contrast, atoms within hydrophobic residues have a much lower peak at short distances, and a very broad elevation in the 8–25Å range. This elevation is probably attributable to the typical distance from a residue buried in a protein hydrophobic core to the hydrated surface of the protein.
Residues with symmetric side chains and atoms, such as symmetric carbons in the aromatic rings of tyrosine or phenylalanine, provide a sort of a natural test for statistical atom contact potentials. These atoms are far enough from each other to have independent statistics of water contact distances, yet there is no physical reason why these chemically identical atoms in undistinguishable positions should have different potentials. Figure 4(d)) shows that indeed they are very similar. It also demonstrates the magnitude of the statistical error in the AHP obtained using MCRS.
Prediction of hydration sites in protein structures
Prediction of hydration sites in 16 structures
Number of Amino Acids
Structural waters (experimental)
Hydration sites predicted
Water prediction average RMSD
As can be seen from the Table 1, the number of predicted hydration sites is equal to twice the number of the experimental water positions in that structure. In cases where the number of the lowest energy putative hydration sites equaled the number of reported waters for the structure, usually approximately 50 per cent of the experimental hydration sites were closely reproduced (with the distance from the structure water to the nearest predicted hydration site less than 1 Å). The percentage of correctly predicted structure waters was about 75–90% depending on the structure with the number of predicted hydration sites equal to twice of the number of real waters. This over-prediction is inevitable due to the fact that the number of structure-bound water molecules, reported by the structure depositors, may vary very much for similar structures. This problem is also addressed in 'the Discussion' section below. Routinely, it takes the number of putative hydration sites 3 to 10 times more than the number of reported waters, to reproduce most water locations . A level of over-prediction as high as 40 times may be necessary to reproduce bound water well . Predictions using AHP for the same structures have higher precision, a level of over-prediction reduced 2- to 5- fold, and are less demanding computationally.
Decoy selectivity experiments
It is well-known  that hydrophobic interactions are decisive factors in protein folding. Thus, we tested the power of the AHP we had generated to select correct protein folds exclusively on the basis of hydration energy, using a decoy selectivity test. We used Decoys'R'Us database  used in  for evaluation of protein models by atomic solvation preference, and in [38, 39] for fold recognition tests of an effective energy function using a Gaussian model for the solvation free energy.
To evaluate the differences in the hydration energy between the native and the decoy conformations, we introduced a geometric hydration shell (HS), defined as the 3D area in which water molecules can closely approach a heavy protein atom. Monte Carlo methods were used both to construct the HS, and to evaluate the number of water molecules within the HS (please see Methods section for the details).
Decoy recognition based on total structure hydration energy estimates
Hydration Shell Energy Average, kT
Native to Decoy ΔG, kT\Z-Score
Inspired by the results of AHP application in this relatively simple decoy set, we tried a more demanding test. We used the improved Rosetta decoy set , which has an increased frequency of near native models and is considered to be a well-constructed decoy set obtained by large-scale comparative modeling . Each of 41 native structures in the Rosetta set has between 1724 and 1900 decoys. Results are given in Table 2, lines 24–64. For 26 proteins, the native structure had the lowest hydration energy, often separated from those of all the decoys with a considerable energy gap amounting up to several standard deviations. Several of the structures tested could not be satisfactorily distinguished from their decoys using AHP. After close inspection we found that 'good' structures were mostly globular proteins with distinct hydrophobic cores and hydrophilic surfaces. Among unrecognized structures, e.g. 1ptq has a large hydrophobic patch at its surface and lacks a clearly defined hydrophobic core, whereas 1a32 has a loose packing with a number of hydrophobic groups accessible to the solvent. An interesting example is 1utg, which is a shell-shaped structure, with practically all hydrophobic residues at its concave side, thus providing a large hydrophobic momentum. The structure as it is given in the test set has all these hydrophobic residues accessible to the solvent, whereas in the native state, the protein forms a dimer, with its concave sides pressed against each other, forming a small hydrophobic cave. In our opinion, this test reveals the limits of applicability of our potentials in the field of the fold recognition, as mostly to soluble globular proteins.
Comparison of atomic hydration potentials with other methods
Commonly used knowledge based approaches to modeling protein solvation are based on amino acid hydrophobicity or solvation preference, and solvation accessible surface area (SASA) calculations [42, 43]. It has been noted that in protein-structure selections, all-atom based potentials perform better than residue-based potentials, and distance-dependent potentials better than distance-independent ones [, and the references therein]. In contrast to SASA-based methods, in AHP approach, no rigid atomic radii are set. Distance-dependent potentials allow considering spatial hydration shell rather than just the surface of the molecule. This may have an advantage in certain conformational scenarios, not distinguishable in terms of surface area.
A recent comparison  of five different protein solvation models, including a grid-based finite difference Poisson-Boltzmann procedure, demonstrated that the empirical atomic solvation model performed better than all the other models. At the same time  reports that protein design presents a particularly challenging test for implicit solvation models because it requires accurate estimates of the solvation contribution of individual residues. AHP introduce elaborate, distance-dependent representation of amphiphilic properties of individual protein atoms, which we believe may have important role in many practical applications.
First-principle physical models yield good approximations for water structure around solvated ions in solution , and have been applied to prediction of structural waters around protein globules [32, 9]. The MCRS approach we have developed may have a number of advantages in addressing this problem. First, it allows calculation of the expected contact densities for any atom pair, to any desired precision, reducing the statistical error inevitable for the atom-type-averaged reference states. Second, it uses an individualized approach for each structure, taking into account protein shape, atom molar fractions and spatial distribution of atoms. Third, the MCRS scheme also generates detailed empirical short range atom interaction potentials. Figure 4(c) illustrates that AHP can characterize atomic interaction even at sub-van der Waalse distances, thus providing a unique capacity for quantitative description of atom contacts in this important distance range.
In principle, the MCRS allows one to obtain the expected atom-atom contact probability density distance distribution starting from zero distance, and for any individual macromolecular structure. The accuracy of the expected probability density estimates depends on the number of the random probes, which in turn is limited only by the computational power.
Oxygen-oxygen distribution function in structural and bulk water
One particularly interesting result of this study is the oxygen-oxygen radial distribution function of the structural water. As Figure 3 shows, this function exhibits three peaks at similar distances to those observed in the oxygen-oxygen radial distribution function gOO(r) obtained experimentally for the liquid water . The protein structure is likely to stabilize locations of bound water molecules , which agrees with the taller and narrower first peak in the distribution. However, water molecules in bulk liquid probably also form large clusters with quasi-stable structure [46, 47]. The coincidence of main maxima in oxygen-oxygen distance distribution agrees with the hypotheses that the water around the protein basically retains its non-trivial structure found in the liquid [1, 29, 30].
The remarkable agreement demonstrated by Figure 3 not only indirectly justifies the accuracy of our technique, but also allows us to compare the packing of water molecules in liquid water and in protein crystals. Interestingly, while Molecular Dynamics simulation of pure liquid water reproduces well the experimental radial distribution function (and other physical properties) in water  and simple solutions , the same function of radial oxygen-oxygen contact frequency, obtained during MD simulation of solvated protein crystals, may have maxima locations and amplitudes different from those obtained for pure water .
The quality of training data of structure water location and prediction of protein hydration sites
Evaluation of the quality of the prediction of structural waters reported in a PDB structure is not trivial. Favorable hydration locations can produce a diffuse electron density and be excluded from the structure . Thus, only a fraction of strongly bound water molecules in protein crystals is usually reported. We observed that the number of waters detected for similar structures may vary by two orders of magnitude (for instance, compare two structures of the same HIV-1 protease-inhibitor complexes: 2BPY with 115 reported water molecules, and 1HVK with just one). Second, a large fraction of the structural waters reported for X-ray structures are probably stabilized not only by the protein molecule, but by the crystal unit cell. Indeed,  reported that only 17 structural water molecules were found at the same sites in nine crystals of ribonuclease A resolved from five different space groups and containing from 88 to 188 water molecules. Similar results have been published for other proteins [51, 52] and discussed in . The extremely variable water content reported for different proteins most likely reflects the fact that not all suitable sites are occupied, and this obviously hinders verification of location of hydration sites. Nevertheless, we believe that despite a considerable noise in the input data, our procedure still affords to capture the essential aspects of the interaction potentials at the atomic level provided enough statistical data, as illustrated by Figure 3(d), and allows us to provide a valuable prediction of putative structural water sites.
KBP for analysis of known structures
Use of the AHP allows calculation of the probability that a water molecule will occupy a certain site, or, equivalently, the proportion of time that a molecule of water is present within a certain volume in the protein environment. Moreover, individual inputs from atoms and residues into, for instance, transmembrane channel permeability or a fold solvational stability can be estimated.
MCRS KBP at short and large distances
AHP at short distances offer a high resolution water assignment. At large distances (see Figure 4(f)), the potentials allow applying them to model cooperative processes like protein folding. For decoy recognition and similar applications, the behavior of KBP at large distances (Figure 4e,f) is important. These distances are comparable with the average radius of a protein globule, thus global structures as a hydrophobic core may be discerned. At these distances, the knowledge-based potentials work as statistical models, rather than provide a description of pairwise physical interactions.
Possible applications of the KBP for protein folding and interaction
Solvation term is the single most important factor, usually providing about 90% of the total structure stability . Models for protein folding were even proposed which included only the hydrophobic interactions of amino acids .
In the decoy recognition tests we evaluated the change in the hydration energy of the entire structure that accompanies changes in protein conformation. These tests ignored any intra-molecular interaction, i.e. between different protein atoms and groups. As such, evaluation of hydration energy with AHP can be complementary to other potential sets and atomic force fields in estimation of structure stability, by adding an explicit accounting of the solvation forces.
Accounting for hydration changes at the atomic level is crucial for interaction of macromolecules. Even indirect account for water mediated interactions improve significantly the quality of analysis of interaction of supersecondary structural elements in protein folding, which probably is true also for protein binding .
AHP also allow calculating the desolvation energies accompanying mutual occlusion of different parts of hydration shells during protein-protein or protein-DNA interaction and ranking of different binding variants with regard to desolvation.
We present a novel method for constructing the reference state for knowledge-based inter-atomic potentials, based on a Monte Carlo technique (MCRS). Using it, we have developed original potentials for protein atom hydration. The new potentials provide a detailed quantitative description of atom hydropathies in a distance-dependent manner. Using the new potentials allows calculated placement of individual water molecules in a protein environment, and estimation of the (de)solvation energy changes accompanying protein conformational changes and interaction.
The training database contained 1776 3D structure of proteins and protein subunits, with sequence identity between any pair of proteins less than 25% . For each structure, all explicit water molecules located within 4.5Å from any structural atom were collected. Figure 1 illustrates this selection of the cutoff distance. Coordinates of only the electron-dense oxygen atoms were used for water molecules.
Crystallographically symmetrical positions were calculated for all water molecules, using the unit cell data and symmetry transformation operators provided in the PDB file. If such transformation yielded a symmetrical location, which was at sterically acceptable but closer distance to any of the protein atoms, the new position was adopted for the corresponding water molecule.
Distances from each protein atom to each water molecule were calculated and stored for each atom type. Our definition of atom type takes into account the specific amino acid context and the elemental identity, e.g., a Cα-atom of valine is different from Cα-atom of leucine, different from Cβ-atom of leucine. No effort was made to group atom types on the basis of their hydropathy properties, or to cluster them depending on their determined hydration potentials.
Constructing the Monte Carlo Reference State
For each structure in the training set, probes with random coordinates were generated in the rectangular box encompassing the structure plus 2.5 Å in every direction. The number of probes was sufficient to produce a stable atom-probe distance distribution, and typically was 100-fold greater than the number of atoms in the structure. We discarded isolated probes separated by more than 2.5 Å from any heavy protein atom. For each of the remaining probes, a distance distribution to all the explicit water molecules was computed and stored. This quantity was aggregated for all probes, and divided by the number of probes, to obtain the structure-dependent empirical distribution of the probability to find a water molecule at the given distance from a random point in the structure volume. From this empirical distribution the smoothed probability distribution, or hydration probability density distance distribution for structure-specific MCRS was calculated using Parzen kernels . This approach is known to give a consistent estimation of probability density distribution and has been successfully used in a study of statistical preferences in protein structures . Several examples of such distributions are given in Figure 2.
Calculating knowledge-based atom hydration potentials
For each structure in the training set and for every atom type we calculated distribution of atom-water contact distances. This was done in a manner similar a MCRS calculation, but with structural atoms of a particular type used instead of random probes. For each test atom type and for each structure, the observed atom-water distance distribution was divided by the expected atom-water distance distribution derived for that structure using MCRS. The resulting likelihood ratios were summed for all structures in the training set. This aggregate was divided by the overall number of the test atoms in the training set and smoothed using Parzen kernels . Examples of such empirical normalized distance distributions are given in Figure 4(a–e) and are also provided in the Additional files [see Additional file 1]. A logarithm of this value multiplied by minus kT factor produces the KBP for hydration of the test atom type (Figure 4(f)). The distance-dependent hydration potentials thus obtained quantitatively describe atomic hydropathies of all atom types found in proteins, DNA and some hetero groups, such as ions, hem groups etc. Characteristic features of the potentials were robust in spite of variations in the training database composition; to establish this point, the training protein structure dataset was split into various subsets, and the potentials were recalculated for each subset separately.
Prediction of location of structural water molecules
For this test, we selected 16 diverse protein structures based on criteria including the published computational predictions of water locations; a small to medium structure size; overall structure quality, and the number of structure water molecules. Water molecules were pre-filtered to remove those with any heavy protein atom closer than 2.0Å, or with no protein atoms within 4.5Å; such waters in the 16 structures were few, or none. Virtual probes were placed at the nodes of a 0.2 Å cubic grid encompassing complete structure volume. Nodes with any heavy protein atom closer than 1.8Å, or with no protein atoms within 4.5Å, were discarded. For all the remaining probes, the local hydration energy was estimated as the sum of hydration potentials contributed by all protein atoms within 15 Å. The grid step of 0.2 Å ensured that the estimated aggregate hydration energy was comparable at adjacent grid nodes.
In the next step, the probe with the hydration energy minimum for the entire structure was selected and all nodes within 2.6 Å around it were removed. This was repeated for the next best minimal energy probe, and continued until only the probes located at local minima of hydration energy and not closer to each other than 2.6 Å remained. The minima positions were further corrected via iterative steepest descent energy minimization algorithm. This rapidly converged to very small displacements of probes per one step. Inter-probe water-water interactions (Figure 3) were added during this stage. The resulting set of predicted structural water positions proved robust to variations in the grid step and to the initial placement of probes. For instance, a very close set of predicted structure water locations was obtained when a comparable number of probes were initially located at random points in the structure space.
The number of predicted hydration sites (lowest hydration energy probes) for each of the 16 structures (Table 1) was set to twice the number of the experimentally determined structural water molecules for the corresponding structure (please see Discussion section for the explanation of the increased number of probes in the prediction). The positions of predicted probes were compared to those of the experimental structural water molecules.
The prediction quality was assessed using a Z-score calculated as:
z = (RMSDrandom - RMSDpredicted)/srandom,
where RMSDpredicted is the root mean square deviation of predicted hydration sites from the crystal structure water locations, and RMSDrandom and srandom are respectively the mean similar quantity calculated for the same number of random probes in the structure volume, and its standard deviation. The latter is obtained in a computational experiment with the same steric limitations imposed on the probe – protein atom distances as used during prediction.
Decoy selectivity experiments
Randomly placed probes were generated in a rectangular box containing the structure with an extra 4.5Å added from all sides. Probes with any heavy protein atom closer than 2.5Å or with no protein atoms within 4.5Å were discarded. The remaining probes were considered as belonging to the protein hydration shell (HS). For simplicity, the volume of this shell was estimated via the ratio of the number of probes assigned to the HS to the total number of probes created in the box. To discriminate between the native fold and the decoys, per-volume average hydration energy of probes in the hydration shell was selected as a criterion. Table 2 summarizes the results.
We are grateful to Dr. Pavel Novichkov, Prof. Vladimir Tumanyan and Dr. Natalia Esipova for many fruitful discussions. We are also grateful to Ivan Kulakovsky and Svetlana Lobacheva for their help with the website. Many thanks to Prof. Erica Golemis for careful reading of the manuscript and many remarks. This study was partially supported by Russian Academy of Sciences Program in Molecular and Cellular Biology.
- Makarov VA, Andrews BK, Smith PE, Pettitt BM: Residence times of water molecules in the hydration sites of myoglobin. Biophys J 2000, 79(6):2966–2974.PubMed CentralView ArticlePubMedGoogle Scholar
- Li X, Liang J: Knowledge-based energy functions for computational studies of proteins. In Computational Methods for Protein Structure Prediction and Modeling. Edited by: Xu Y, Xu D, Liang J. Springer-Verlag ; 2006.Google Scholar
- Garczarek F, Gerwert K: Functional waters in intraprotein proton transfer monitored by FTIR difference spectroscopy. Nature 2006, 439(7072):109–112. 10.1038/nature04231View ArticlePubMedGoogle Scholar
- Schneider B, Berman HM: Hydration of the DNA bases is local. Biophys J 1995, 69(6):2661–2669.PubMed CentralView ArticlePubMedGoogle Scholar
- Jayaram B, Jain T: The role of water in protein-DNA recognition. Annu Rev Biophys Biomol Struct 2004, 33: 343–361. 10.1146/annurev.biophys.33.110502.140414View ArticlePubMedGoogle Scholar
- Garcia-Sosa AT, Mancera RL: The effect of a tightly bound water molecule on scaffold diversity in the computer-aided de novo ligand design of CDK2 inhibitors. J Mol Model (Online) 2006, 12(4):422–431. 10.1007/s00894-005-0063-1View ArticleGoogle Scholar
- Raymer ML, Sanschagrin PC, Punch WF, Venkataraman S, Goodman ED, Kuhn LA: Predicting conserved water-mediated and polar ligand interactions in proteins using a K-nearest-neighbors genetic algorithm. J Mol Biol 1997, 265(4):445–464. 10.1006/jmbi.1996.0746View ArticlePubMedGoogle Scholar
- Makarov V, Pettitt BM, Feig M: Solvation and hydration of proteins and nucleic acids: a theoretical view of simulation and experiment. Acc Chem Res 2002, 35(6):376–384. 10.1021/ar0100273View ArticlePubMedGoogle Scholar
- Halle B: Protein hydration dynamics in solution: a critical survey. Philos Trans R Soc Lond B Biol Sci 2004, 359(1448):1207–23; discussion 1223–4, 1323–8. 10.1098/rstb.2004.1499PubMed CentralView ArticlePubMedGoogle Scholar
- Petukhov M, Rychkov G, Firsov L, Serrano L: H-bonding in protein hydration revisited. Protein Sci 2004, 13(8):2120–2129. 10.1110/ps.04748404PubMed CentralView ArticlePubMedGoogle Scholar
- Wang JY, Lee HM, Ahmad S: Prediction and evolutionary information analysis of protein solvent accessibility using multiple linear regression. Proteins 2005, 61(3):481–491. 10.1002/prot.20620View ArticlePubMedGoogle Scholar
- Tanaka S, Scheraga HA: Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins. Macromolecules 1976, 9(6):945–950. 10.1021/ma60054a013View ArticlePubMedGoogle Scholar
- Russ WP, Ranganathan R: Knowledge-based potential functions in protein design. Curr Opin Struct Biol 2002, 12(4):447–452. 10.1016/S0959-440X(02)00346-9View ArticlePubMedGoogle Scholar
- Koppensteiner WA, Sippl MJ: Knowledge-based potentials--back to the roots. Biochemistry (Mosc) 1998, 63(3):247–252.Google Scholar
- Godzik A: Knowledge-based potentials for protein folding: what can we learn from known protein structures? Structure 1996, 4(4):363–366. 10.1016/S0969-2126(96)00041-XView ArticlePubMedGoogle Scholar
- Finkelstein AV, Gutin AM, Badretdinov AY: Perfect temperature for protein structure prediction and folding. Proteins 1995, 23(2):151–162. 10.1002/prot.340230205View ArticlePubMedGoogle Scholar
- Ben-Naim A: Statistical potentials extracted from protein structures: Are thesemeaningful potentials? J Chem Phys 1997, 107(9):3698–3706. 10.1063/1.474725View ArticleGoogle Scholar
- Miyazawa S, Jernigan RL: Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation. Macromolecules 1985, 18: 534–552. 10.1021/ma00145a039View ArticleGoogle Scholar
- Sippl MJ: Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. J Mol Biol 1990, 213(4):859–883. 10.1016/S0022-2836(05)80269-4View ArticlePubMedGoogle Scholar
- Skolnick J, Kolinski A, Ortiz A: Derivation of protein-specific pair potentials based on weak sequence fragment similarity. Proteins 2000, 38(1):3–16. 10.1002/(SICI)1097-0134(20000101)38:1<3::AID-PROT2>3.0.CO;2-SView ArticlePubMedGoogle Scholar
- Zhou H, Zhou Y: Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Sci 2002, 11(11):2714–2726. 10.1110/ps.0217002PubMed CentralView ArticlePubMedGoogle Scholar
- Tanaka S, Scheraga HA: Model of protein folding: inclusion of short-, medium-, and long-range interactions. Proc Natl Acad Sci U S A 1975, 72(10):3802–3806. 10.1073/pnas.72.10.3802PubMed CentralView ArticlePubMedGoogle Scholar
- Samudrala R, Moult J: An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. J Mol Biol 1998, 275(5):895–916. 10.1006/jmbi.1997.1479View ArticlePubMedGoogle Scholar
- Miyazawa S, Jernigan RL: Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J Mol Biol 1996, 256(3):623–644. 10.1006/jmbi.1996.0114View ArticlePubMedGoogle Scholar
- Dinner AR, Sali A, Smith LJ, Dobson CM, Karplus M: Understanding protein folding via free-energy surfaces from theory and experiment. Trends Biochem Sci 2000, 25(7):331–339. 10.1016/S0968-0004(00)01610-8View ArticlePubMedGoogle Scholar
- Shimada J, Shakhnovich EI: The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation. Proc Natl Acad Sci U S A 2002, 99(17):11175–11180. 10.1073/pnas.162268099PubMed CentralView ArticlePubMedGoogle Scholar
- McGreevy RL Pusztai, L.: Reverse Monte Carlo simulation: a new technique for the determination of disordered structures. Molec Simul 1988, 1: 359–367.View ArticleGoogle Scholar
- Head-Gordon T, Hura G: Water structure from scattering experiments and simulation. Chem Rev 2002, 102(8):2651–2670. 10.1021/cr0006831View ArticlePubMedGoogle Scholar
- Chaplin M: Water structure and behavior [http://www.lsbu.ac.uk/water/index.html]. 2006.Google Scholar
- Robinson GW, Cho CH: Role of hydration water in protein unfolding. Biophys J 1999, 77(6):3311–3318.PubMed CentralView ArticlePubMedGoogle Scholar
- van Gunsteren WF, Berendsen HJ, Hermans J, Hol WG, Postma JP: Computer simulation of the dynamics of hydrated protein crystals and its comparison with x-ray data. Proc Natl Acad Sci U S A 1983, 80(14):4315–4319. 10.1073/pnas.80.14.4315PubMed CentralView ArticlePubMedGoogle Scholar
- Henchman RH, McCammon JA: Extracting hydration sites around proteins from explicit water simulations. J Comput Chem 2002, 23(9):861–869. 10.1002/jcc.10074View ArticlePubMedGoogle Scholar
- Gelpi JL, Kalko SG, Barril X, Cirera J, de La Cruz X, Luque FJ, Orozco M: Classical molecular interaction potentials: improved setup procedure in molecular dynamics simulations of proteins. Proteins 2001, 45(4):428–437. 10.1002/prot.1159View ArticlePubMedGoogle Scholar
- Marrone TJ, Resat H, Hodge CN, Chang CH, McCammon JA: Solvation studies of DMP323 and A76928 bound to HIV protease: analysis of water sites using grand canonical Monte Carlo simulations. Protein Sci 1998, 7(3):573–579.PubMed CentralView ArticlePubMedGoogle Scholar
- Finkelstein AV, Ptitsyn O: Protein Physics: A Course of Lectures (Soft Condensed Matter, Complex Fluids and Biomaterials). London, San Diego , Academic Press; 2002.Google Scholar
- Samudrala R, Levitt M: Decoys 'R' Us: a database of incorrect conformations to improve protein structure prediction. Protein Sci 2000, 9(7):1399–1401.PubMed CentralView ArticlePubMedGoogle Scholar
- Holm L, Sander C: Evaluation of protein models by atomic solvation preference. J Mol Biol 1992, 225(1):93–105. 10.1016/0022-2836(92)91028-NView ArticlePubMedGoogle Scholar
- Lazaridis T, Karplus M: Discrimination of the native from misfolded protein models with an energy function including implicit solvation. J Mol Biol 1999, 288(3):477–487. 10.1006/jmbi.1999.2685View ArticlePubMedGoogle Scholar
- Lazaridis T, Karplus M: Effective energy function for proteins in solution. Proteins 1999, 35(2):133–152. 10.1002/(SICI)1097-0134(19990501)35:2<133::AID-PROT1>3.0.CO;2-NView ArticlePubMedGoogle Scholar
- Tsai J, Bonneau R, Morozov AV, Kuhlman B, Rohl CA, Baker D: An improved protein decoy set for testing energy functions for protein structure prediction. Proteins 2003, 53(1):76–87. 10.1002/prot.10454View ArticlePubMedGoogle Scholar
- Dong Q, Wang X, Lin L: Novel knowledge-based mean force potential at the profile level. BMC Bioinformatics 2006, 7: 324. 10.1186/1471-2105-7-324PubMed CentralView ArticlePubMedGoogle Scholar
- Fleming PJ, Fitzkee NC, Mezei M, Srinivasan R, Rose GD: A novel method reveals that solvent water favors polyproline II over beta-strand conformation in peptides and unfolded proteins: conditional hydrophobic accessible surface area (CHASA). Protein Sci 2005, 14(1):111–118. 10.1110/ps.041047005PubMed CentralView ArticlePubMedGoogle Scholar
- Eisenberg D, McLachlan AD: Solvation energy in protein folding and binding. Nature 1986, 319(6050):199–203. 10.1038/319199a0View ArticlePubMedGoogle Scholar
- Jaramillo A, Wodak SJ: Computational protein design is a challenge for implicit solvation models. Biophys J 2005, 88(1):156–171. 10.1529/biophysj.104.042044PubMed CentralView ArticlePubMedGoogle Scholar
- Tan ML, Lucan L, Ichiye T: Study of multipole contributions to the structure of water around ions in solution using the soft sticky dipole-quadrupole-octupole (SSDQO) model of water. J Chem Phys 2006, 124(17):174505. 10.1063/1.2177240View ArticlePubMedGoogle Scholar
- Watterson JG: The role of water in cell architecture. Mol Cell Biochem 1988, 79(2):101–105. 10.1007/BF02424550View ArticlePubMedGoogle Scholar
- Doye JP, Wales DJ: Polytetrahedral clusters. Phys Rev Lett 2001, 86(25):5719–5722. 10.1103/PhysRevLett.86.5719View ArticlePubMedGoogle Scholar
- Yu H, Gunsteren WF: Charge-on-spring polarizable water models revisited: From water clustersto liquid water to ice. J Chem Phys 2004, 121(19):9549–9564. 10.1063/1.1805516View ArticlePubMedGoogle Scholar
- Wana S, Stote RH, Karplus M: Calculation of the aqueous solvation energy and entropy, as well as freeenergy, of simple polar solutes. J Chem Phys 2004, 121(19):9539–9548. 10.1063/1.1789935View ArticleGoogle Scholar
- Zegers I, Maes D, Dao-Thi MH, Poortmans F, Palmer R, Wyns L: The structures of RNase A complexed with 3'-CMP and d(CpA): active site conformation and conserved water molecules. Protein Sci 1994, 3(12):2322–2339.PubMed CentralView ArticlePubMedGoogle Scholar
- Wlodawer A, Deisenhofer J, Huber R: Comparison of two highly refined structures of bovine pancreatic trypsin inhibitor. J Mol Biol 1987, 193(1):145–156. 10.1016/0022-2836(87)90633-4View ArticlePubMedGoogle Scholar
- Finer-Moore JS, Kossiakoff AA, Hurley JH, Earnest T, Stroud RM: Solvent structure in crystals of trypsin determined by X-ray and neutron diffraction. Proteins 1992, 12(3):203–222. 10.1002/prot.340120302View ArticlePubMedGoogle Scholar
- Gerstein M, Chothia C: Packing at the protein-water interface. Proc Natl Acad Sci U S A 1996, 93(19):10167–10172. 10.1073/pnas.93.19.10167PubMed CentralView ArticlePubMedGoogle Scholar
- Ernst JA, Clubb RT, Zhou HX, Gronenborn AM, Clore GM: Demonstration of positionally disordered water within a protein hydrophobic cavity by NMR. Science 1995, 267(5205):1813–1817. 10.1126/science.7892604View ArticlePubMedGoogle Scholar
- Vaitheeswaran S, Yin H, Rasaiah JC, Hummer G: Water clusters in nonpolar cavities. Proc Natl Acad Sci U S A 2004, 101(49):17002–17005. 10.1073/pnas.0407968101PubMed CentralView ArticlePubMedGoogle Scholar
- Yu B, Blaber M, Gronenborn AM, Clore GM, Caspar DL: Disordered water within a hydrophobic protein cavity visualized by x-ray crystallography. Proc Natl Acad Sci U S A 1999, 96(1):103–108. 10.1073/pnas.96.1.103PubMed CentralView ArticlePubMedGoogle Scholar
- Harano Y, Kinoshita M: Translational-entropy gain of solvent upon protein folding. Biophys J 2005, 89(4):2701–2710. 10.1529/biophysj.104.057604PubMed CentralView ArticlePubMedGoogle Scholar
- Chan HS, Dill KA: Origins of structure in globular proteins. Proc Natl Acad Sci U S A 1990, 87(16):6388–6392. 10.1073/pnas.87.16.6388PubMed CentralView ArticlePubMedGoogle Scholar
- Papoian GA, Ulander J, Eastwood MP, Luthey-Schulten Z, Wolynes PG: Water in protein structure prediction. Proc Natl Acad Sci U S A 2004, 101(10):3352–3357. 10.1073/pnas.0307851100PubMed CentralView ArticlePubMedGoogle Scholar
- Hobohm U, Sander C: Enlarged representative set of protein structures. Protein Sci 1994, 3(3):522–524.PubMed CentralView ArticlePubMedGoogle Scholar
- Fukunaga K: Introduction to Statistical Pattern Recognition. New York , Academic Press; 1972.Google Scholar
- Sunyaev S, Kuznetsov E, Rodchenkov I, Tumanyan V: Protein sequence-structure compatibility criteria in terms of statistical hypothesis testing. Protein Eng 1997, 10(6):635–646. 10.1093/protein/10.6.635View ArticlePubMedGoogle Scholar