Energetics of the protein-DNA-water interaction
© Spyrakis et al; licensee BioMed Central Ltd. 2007
Received: 14 September 2006
Accepted: 10 January 2007
Published: 10 January 2007
To understand the energetics of the interaction between protein and DNA we analyzed 39 crystallographically characterized complexes with the HINT (Hydropathic INTeractions) computational model. HINT is an empirical free energy force field based on solvent partitioning of small molecules between water and 1-octanol. Our previous studies on protein-ligand complexes demonstrated that free energy predictions were significantly improved by taking into account the energetic contribution of water molecules that form at least one hydrogen bond with each interacting species.
An initial correlation between the calculated HINT scores and the experimentally determined binding free energies in the protein-DNA system exhibited a relatively poor r2 of 0.21 and standard error of ± 1.71 kcal mol-1. However, the inclusion of 261 waters that bridge protein and DNA improved the HINT score-free energy correlation to an r2 of 0.56 and standard error of ± 1.28 kcal mol-1. Analysis of the water role and energy contributions indicate that 46% of the bridging waters act as linkers between amino acids and nucleotide bases at the protein-DNA interface, while the remaining 54% are largely involved in screening unfavorable electrostatic contacts.
This study quantifies the key energetic role of bridging waters in protein-DNA associations. In addition, the relevant role of hydrophobic interactions and entropy in driving protein-DNA association is indicated by analyses of interaction character showing that, together, the favorable polar and unfavorable polar/hydrophobic-polar interactions (i.e., desolvation) mostly cancel.
Macromolecular recognition is based on the requirement of dual geometric and chemical complementarity, eventually leading to the formation of a thermodynamically stable and specific complex between interacting molecules. These aspects are key elements for understanding the function of biological systems: enzymes that bind substrates and effectors, proteins that mediate signal transduction via networks of alternative or specific protein-protein pair, and nucleic acids that, via the binding of transcription factors, repressors, co-activators, regulate protein expression. In particular, the site-specific associations between DNA and proteins regulate most biological events , with key involvement in transcription, replication and recombination. Matthews  stated "the full appreciation of the complexity and individuality of each complex will be discouraging to anyone hoping to find simple answers to the recognition problem". A few years later, Draper  was still asking "...how does a protein select a specific DNA site out of the many available, when all potential binding sites share such a high degree of structural similarity? Thermodynamic, as well as structural, approaches must be used to answer this question... ". Now, more than a decade later, no simple model for recognition between amino acids and nucleotides has been found [2, 4–7]. From the analysis of the first protein-DNA crystal structure it was evident that several distinct contributions lead to formation of the complex [8–10], i.e., hydrogen bonds, electrostatic interactions, direct and indirect contacts between amino acids and phosphate, sugars and bases, water-mediated contacts, hydrophobic effects, ion release, mutual conformation rearrangement, bending and distortion. Amongst the enthalpic contributions, hydrogen bonds are the most easily recognized, and energetically may represent the bulk of interactions between nucleic acids and proteins, comprising protein backbone and side-chains contacting bases at their edges and the polynucleotide backbone . About half of the hydrogen bonds found in known protein-DNA complexes involve phosphodiester oxygens , initially mediating indirect recognition between DNA and protein, and favoring a subsequent localization of the protein in a specific site . In direct recognition, representing the foundation of sequence specificity, hydrogen bonds are formed between amino acid side-chains and DNA bases. Even earlier in the binding process, entropy plays a significant role in recognition as non-specific (low affinity) interactions, driven by long-range electrostatic forces, bring the DNA and protein into proximity and cause the release of counter-ions from the free DNA . Thusly, water molecules in free and protein-bound DNA complexes have been thoroughly investigated both experimentally and theoretically, and different roles have been proposed for interaction and recognition (see [14–16] and references therein).
While enthalpy is associated with molecular interactions resulting from complex formation, entropy is associated with multiple protein and DNA conformations, variations in the structure of water molecules and counterions, and other factors. This complexity and the interplay between multiple chemical and physical mechanisms necessary to achieve the required level of specificity are extremely difficult to describe quantitatively [17, 18]. Recent investigations using osmotic pressure  has led to a determination of the differential role and number of water molecules released in specific and nonspecific binding of protein to DNA sequences [20, 21]. Some results from these studies do not appear to be supported by x-ray crystallographic data of specific and nonspecific protein-bound DNA complexes . Interestingly, osmotic pressure experiments suggest that in vitro studies in dilute solutions are likely to be less informative on in vivo processes than expected, due to the presence of crowding and confinements effects [22, 23]. This, in turn, implies that the biological environment is relatively similar to that experienced by macromolecules within a crystal lattice [15, 24, 25]. Computational methods, which are heavily dependent on x-ray crystallographic data and are widely and successfully used in the evaluation of the energetics of ligand-protein interactions [26–29], should also be applicable to understanding protein-DNA complex formation. The earliest attempts [3, 30] tried to estimate the contribution of each pair of amino acid residues/nucleotide bases with respect to the total protein to DNA binding affinity. A different approach, proposed by Mandel-Gutfreund and Margalit , assumed that a global score reflecting the complementarity between a protein and its DNA target can be calculated by statistical analyses of the frequency of interactions for specific amino acid residue-nucleotide base types, thus implying additivity in binding energetics. Other attempts to qualitatively, semi-quantitatively and/or quantitatively describe the interaction between protein and DNA [6, 7, 11, 14, 31–37], have taken advantage of the available three-dimensional crystallographic structures of proteins that bind to DNA, a field pioneered by Matthews .
A wealth of information on the rules that govern biomolecular recognition is derived from structural data, predominantly x-ray crystallography and nuclear magnetic resonance. However, the analysis of the three-dimensional structure of a complex can only provide a geometric framework that ultimately needs quantitative evaluation of the binding energetics to enable assessment of codes, rules, and/or mechanisms. To date, pursuit of this goal has primarily focused on ligand-protein interactions due to the intense interest in designing compounds that bind selectively and with high affinity to therapeutically relevant enzyme or protein targets. Consequently, a variety of computational modeling approaches have been developed to obtain quantitative descriptions of ligand-protein encounters. Usually, the process is simplified by: i) considering only the volume specified by the active site; ii) assuming no or reduced conformational flexibility; and iii) neglecting the energetic contributions of water molecules (both with respect to their contribution to enthalpy and entropy).
To overcome some of these limitations, the HINT (H ydrophatic INT eractions) force field was developed. HINT is based on LogPo/w, the solvent partition coefficient of a species between 1-octanol and water, these solvents being models for the internal apolar and polar protein milieus, respectively . Because LogPo/w is a free energy parameter, its measurement takes into account both enthalpic and entropic contributions originating from all molecules, including water, that participate in a biomolecular association, and solvent partitioning data are unique experimental measurements of intermolecular and interatomic interactions. The total interaction score (B) for a complex is calculated with the following equation:
H TOTAL = Σ i Σ j b ij = Σ i Σ j (a i S i a j S j T ij R ij + r ij ), (1)
where b ij represents the interaction score between atoms i and j, a the atomic hydrophobic atom constant, S the atomic solvent accessible surface area, T ij a logic function assuming +1 or -1 values, depending on the polar nature of interacting atoms, and R ij and r ij are, respectively, functions of the distance between the atoms i and j. R ij is usually a simple exponential function, while r ij is an adaptation of the Lennard-Jones function. The key parameters a are calculated by a procedure adapted from the CLOG-P method . Because the sum of all a i s is the LogPo/w for a molecule, each a i is a partial logP that can be considered a δg for solute transfer. If the "receptor" is changed from the 1-octanol/water solvent pair to a biomacromolecule with hydrophobic and polar regions, then, in a sense, the a i s represent atomic free energies of association. Each a i thus encodes all aspects of free energy, both enthalpic and entropic. In HINT, favorable b ij interactions (hydrogen bonds, acid-base, hydrophobic-hydrophobic) are positively scored, while unfavorable contacts (acid-acid, base-base, hydrophobic-polar) are negatively scored in the HINT paradigm. H TOTAL , the sum of all b ij terms describes the total interaction between the two species. In this way, the ligand-protein interaction is not separated in multiple factors by interaction type (e.g., hydrogen bond, hydrophobic, etc.), but is considered a concerted event, as it occurs in nature . Because the HINT analysis is carried out on biomolecular systems with three-dimensional structure, geometric information is embedded in the procedure. We have applied this approach to the energetics of protein-ligand complexes both in the absence and presence of water molecules that bridge protein and ligand, at constant pH and as a function of the ionization state of interacting groups [29, 42–44], in protein-protein interfaces [45, 46], and in ligand-DNA recognition [47–49]. Results from these diverse studies indicate that HINT is a powerful tool to quantitatively investigate and describe the energetics and specificity of biomolecular processes. It must be noted that HINT analysis evaluates the interactions between pre-formed molecules, and does not include terms for evaluating the internal energies of these molecules. These internal energies are certainly important components of overall binding free energy, but may be relatively invariant within a particular data set as we have reported [29, 42–49].
In the present work HINT analysis was used to evaluate the strength of interaction in protein-DNA complexes, explicitly taking into account the energetic contribution generated by water molecules found at the interface between protein and DNA. This analysis was performed on 39 DNA-protein complexes, determined at resolution better or equal to 2.8 Å and for which experimental equilibrium constants are available. Correlation of HINT score with experimental free energy indicates predictive models with a standard error of ± 1.28 kcal mol-1. These results represent a quantitative basis for ultimately dissecting the amino acid residue-nucleotide base interaction to understand the amino acid-base "recognition code", a topic we are currently investigating.
Results and discussion
Being able to accurately model the energetics of protein-DNA association will help us to more completely understand the machinery of life itself and to uncover a wealth of new opportunities for the therapeutic treatment of many diseases. While direct interactions, i.e., recognition, between the two macromolecules are important for specificity, the water molecules at protein-DNA interface also contribute to the complex formation and potentially play a role in mediating specific interactions (see [14–16] and references therein). In fact, Janin reports that protein-protein and protein-DNA interfaces contain at least as many water-mediated interactions as direct hydrogen bonds or salt bridges . Water molecules mediating biological interactions have been the subject of intense recent study [16, 43, 44, 51–53]. The importance of water in regulating recognition, complex formation and, generally, interactions among biomolecules is widely accepted, but experimental and computational tools for quantifying these effects are still somewhat lacking . Even x-ray crystallography at high resolution likely underestimates the number of solvent molecules, and can misrepresent other ions, precipitant molecules or artifacts as water. One approach we have previously used to validate crystallographic water sets is application of the GRID program , which evaluates empty regions of space in terms of water (or another probe) being favorably bound. We found that crystallographic water molecules with high specificity virtually always exhibit favorable GRID energies , and thus should be considered well-placed.
Both protein and DNA molecules in solution, when uncomplexed, are surrounded by a variable number of water molecules interacting through hydrogen bonds with exposed polar groups. While the protein solvation pattern is extremely variable, a consequence of the protein's nature and folding [18, 56], DNA presents largely the same (conserved) hydration pattern, with minor sequence-dependent local variation. An ordered spine of hydration occupies the minor groove, whereas the major groove is too wide to retain the same water network and is filled with ordered water molecules interacting singly or in pairs with the nucleotide bases . In addition, the phosphate groups are usually surrounded by six hydration sites, with positions differing with the conformation and nucleotide types . Overall, these conserved water patterns contribute to stabilization of the DNA conformation .
Protein-DNA interaction energetics
Classification data for the 39 protein-DNA complexes used in this study.
NF kB p52
Rel homology region
Zif268 zinc finger
Zinc coordinating group
AraC transcriptional activator
Retinoic acid receptor rxr-alpha
Zinc coordinating group
T7 RNA polymerase
T7 RNA polymerase
Orphan nuclear receptor NGFI-B
Zinc coordinating group
Engrailed homeodomain (q50a)
Replication terminator protein
Replication terminator protein
Homing endonuclease I-CreI
Homing endonuclease I-CreI
Myb proto-oncogene protein
Homing endonuclease I-TevI
DNA-binding domain of intro endonuclease
DNA-binding domain of intro endonuclease
Homeotic protein Msx-1
Zif268 d20a mutant
Zinc coordinating group
Zif268 d20a mutant
Zinc coordinating group
3A Mating-type protein a-1
Smad Mh1 Domain
Smad Mh1 Domain
Transcription factor skn-1
Transcription factor skn-1
Other α-helix group
Homing endonuclease I-TevI
DNA-binding domain of intro endonuclease
DNA-binding domain of intro endonuclease
Homing endonuclease I-CreI (d20n)
DNA-binding domain of intro endonuclease
DNA-binding domain of intro endonuclease
Homing endonuclease I-CreI (q47e)
DNA-binding domain of intro endonuclease
DNA-binding domain of intro endonuclease
Homing endonuclease I-CreI (y33c)
DNA-binding domain of intro endonuclease
DNA-binding domain of intro endonuclease
Mating-type protein a-1
TATA box binding protein
TATA box binding protein
Bovine papillomavirus-I E2
bovine papillomavirus-I E2
Other α-helix group
Engrailed homeodomain (q50k)
Structural, experimental dissociation and calculated HINT score data for the 39 protein-DNA complexes.
HINT scores for protein-DNA complexes
PDB code (Resolution, Å)
Total water count
pK d (M)
K d ref
ΔG° (kcal mol -1 )
H P-D-W a (water count)
H P-D-W (R > 0) b (water count)
H P-D-W (R≥4) c (water count)
1by4 (2.10) d
1c9b (2.65) d
1pue (2.10) d
ΔG° = -0.000198 HTOTAL -9.98, (2)
with a relatively poor r2 = 0.21 and a standard error of ± 1.71 kcal mol-1. However, several outliers (open symbols) are evident, negatively affecting the correlation. All outlier complexes contain the same protein: homing endonuclease I-CreI, complexed in the native form with either the DNA product (1g9z) or the DNA substrate (1g9y), and enzyme mutants (1t9j and 1u0c). While the data point for the endonuclease I-CreI substrate complex 1t9i is well placed in this correlation, it is considered an outlier in this discussion (vide infra). The exclusion of these five outliers produces a significantly improved correlation (Figure 2, dashed line):
ΔG° = -0.000409 HTOTAL -7.77, (3)
with an improved r2 of 0.51 and a decreased standard error of ± 1.41 kcal mol-1.
The count of solvent molecules is extremely variable in the analyzed structures (Table 2), ranging from 2 in 1jkr to 857 in 1g9z, with a mean value of 200. We have shown previously that water molecules, in particular those that bridge between interacting species, play a significant energetic role in biomolecular associations . Significantly, the average number of crystallographically detected waters in the endonuclease I-CreI-DNA complexes (1g9z, 1g9y, 1t9i, 1t9j, 1u0c) is much higher, 454. Complexes with an overall high number of crystallographic waters would also be expected to have a concomitantly high number of potentially bridging and energetically relevant waters at the protein-DNA interface. Since a high water count in crystallographic models is usually due to higher accuracy in the x-ray structure as a larger fraction of bound waters are revealed, the crystallographic resolution of the five endonuclease I-CreI-DNA complexes (varying between 1.6 to 2.5 Å with an average of 1.99 Å), is only a partial cause of this difference. It is important to note, however, that water molecules may be introduced during crystallographic refinement only to account for electron density with unknown origin, which improves the apparent data analysis statistics.
Water role in protein-DNA interaction energetics
Number, mean HINT scores and mean Ranks of waters found at the protein-DNA interface.
Mean HINT score
All within 4 Å
global Rank > 0, partial Ranks > 0
global Rank ≥ 4, partial Ranks > 0
ΔG° = -0.000118 HTOTAL -9.38, (4)
with r2 of 0.43 and standard error of ± 1.45 kcal mol-1. Complexes previously identified as outliers (Figure 2) are now coherent with the correlation, supporting the fundamental contribution played by water molecules to the free energy of binding between protein and DNA. Only 1t9i, the endonuclease I-CreI-DNA complex that was not an obvious outlier in Figure 2 (but nevertheless removed), is an obvious outlier in Figure 3a.
Previous analyses of protein-ligand systems indicated that only "bridging" water molecules are relevant for complex formation , and these highly constrained waters should be located in crystallographic experiments of even moderate resolution. We used the Rank algorithm , which has been validated with a wide set of protein and protein-ligand structures , to identify bridging waters and predict the weighted number of hydrogen bonds potentially formed by each with both the protein and the DNA. Using the filter that only waters characterized by Rank greater than 0 with both macromolecules (i.e., forming at least one hydrogen bond with each) are included in HTOTAL, the number of significant waters placed at the protein-DNA interface is reduced from 1244 to 996 (Table 3) for all complexes. Correlating this HTOTAL with free energy (not shown) yielded a model with r2 of 0.47 and standard error of ± 1.41 kcal mol-1. Visual inspection suggested, however, that some of the members of this water set are not truly bridging, possibly because the Rank algorithm does not distinguish between distance and angular contributions to Rank. The implication is that Rank only modestly greater than zero may correspond to unstable and very weak contacts.
Our previous studies of water molecules in proteins and protein-ligand complexes  demonstrated that water molecules with total Rank of at least 4 and non-zero partial Ranks had more impact on the formation of protein-ligand complexes. Waters with Rank ≥ 4 should form at least two hydrogen bonds and have very favorable geometry and thus be more locked and stable at the protein-DNA interface and, thus, more detectable by X-ray diffraction analysis. The number of waters that satisfy these criteria is 261. The more "fixed" position of these waters is confirmed by a relatively lower mean B factor (32.7 Å2) than the mean value calculated for all the 7394 crystallographic water molecules (42.8 Å2). Correlation of HTOTAL calculated with this set of waters with free energy (Table 2 and Figure 3b) yields:
ΔG° = -0.000302 HTOTAL -8.22, (5)
with r2 = 0.56 and standard error ± 1.28 kcal mol-1. The improvement in the correlation is clearly due to the contributions of a smaller, but more significant, set of water molecules. The 261 waters, in fact, correspond to just 3.5% of the whole set of crystallographically detected water molecules in the 39 complexes. This value is close to the 5.5% identified by Reddy and co-workers as the percentage of waters contacting simultaneously both the protein and the DNA and thus mediating recognition directly, on a set of 17,963 analyzed crystallographic water molecules . This model has no outliers: the reduction in the count of potential bridging waters from 98 to 31 in the 1t9i complex led to a considerable decrease in HTOTAL from 43,140 to 18,371, positioning this point within about 1.5 kcal mol-1 of the correlation line. The water contribution to the total interaction energy for all complexes is 28%. But, removing the endonuclease I-CreI-DNA native and mutant complexes from the data set, where the solvent contribution to the overall binding process is seemingly anomalously high, reduces the water contribution to just 16% of HTOTAL for the remaining 34 complexes. This value is similar to the fraction of water-mediated bonds (14.9%) estimated by Luscombe and Thornton  after a geometry-based analysis of all protein-DNA interactions.
It is important to emphasize that the results presented here explain only part of the protein-DNA-water interaction and the tools we have used only illuminate the process through examination of the bound endpoint. For example, the energetic contribution of the internal conformations, i.e., conformational entropy, of the interacting biopolymers is not treated explicitly, and is only one of several components of the additive constant portion of our correlations (eqs. 2–5). However, the low standard errors in our models indicate that these contributions are more or less constant across the data series. The magnitude of the additive constant can be rationalized by the fact that these complexes do have many structural and chemical similarities – the most important of which is that they all form crystals analyzable by x-ray diffraction. Note (eqs. 2, 4 and 5) that as we incrementally improved the models by explicitly including more appropriate sets of water molecules, the additive constant decreased in magnitude as the standard error improved, indicating that this particular contribution to ΔG° is now being treated more explicitly.
Energy contributions of the DNA base, phosphate and ribose to complex formation
The association of a protein-DNA complex usually involves a two-step process: an initial binding via non-specific interactions and a subsequent translocation of the protein to the specific binding site [62, 63]. The first step is regulated by electrostatic contacts between the protein side-chains and the DNA backbone phosphates, while binding specificity is achieved by interactions with the nucleotide bases themselves. However, the DNA backbone (ribose and phosphates) may play a less dramatic but fundamental role in specificity by holding the protein in a defined orientation, thus decreasing the energetic cost of the complex formation, or because the phosphate orientations are somewhat determined by the base sequence . From a geometric-based analysis, which evaluated two atoms to be in contact if their centers were 1–5 Å apart, Lejeune and co-workers  reported that an average of 47% protein-DNA interactions involve the phosphate group, while 24% can be attributed to the base.
Both direct, including hydrophobic interactions, and indirect (water-mediated), interactions between the protein and DNA are relevant [12, 64]. Figure 4b illustrates the contribution of the 261 interfacial (bridging) water molecules (both water-protein and water-DNA partial Ranks > 0, total Rank ≥ 4, as in eq. 5) to the protein-DNA interaction, where the water-DNAphosphate, water-DNAribose, water-DNAbase and water-protein terms are individually shown. The favorable DNA-water interaction is generally attributable to both water-DNAphosphate and water-DNAbase contacts, reinforcing the notion that in most complexes water facilitates binding by screening unfavorable electrostatic contacts and acts as a linker at the protein-DNA interface. The water-DNAphosphate HINT score ranges from near zero to 9800 with a mean value of 2400, while the water-DNAbase contribution ranges from near zero to 6700 HINT score units, with a mean value of 1600. Only in a few cases is a positive DNA-water HINT score completely attributable to the water-DNAbase interaction; e.g., in 1azp, 1bl0, 1pue and 1qpz complexes water mediation is necessary to achieve specific recognition between the two macromolecules. The water-DNAribose interaction always negatively affects the global HINT score because of unfavorable hydrophobic-polar contacts made between water and the hydrophobic moieties of ribose. Finally, the score contributions from protein-water contacts range from -1340 to 2120, with an average of only 140 units. The discrepancies between protein-water and DNA-water HINT scores will be discussed later, but are generally attributable to the different chemical natures of the interacting groups. It is evident in comparing Figure 4a and 4b that in the cases where the overall protein-DNA score is negative (i.e., the DNA/I-CreI endonuclease complexes), the water terms are able to compensate.
Water molecules in protein-DNA interaction specificity identified by role
Number, mean HINT scores and mean Ranks of waters initially classified by role.
Mean HINT score
All H 2 O interacting with DNA backbone
All H 2 O interacting with nucleotide base
All non-bridging H 2 O
All H 2 O bridging nucleotide base and protein
H 2 O bridging nucleotide base and protein's
backbone & side-chain
The remaining 461 waters of the set interact only with the bases of the polynucleotide. Each was then categorized (Table 4) as either non-bridging (when they are positioned such that cannot link protein and DNA base or when they unnecessarily mediate already favorable interactions between protein and DNA bases) or bridging (mediating specific protein-DNA recognition and association). This analysis identified 212 waters as non-bridging with an average Rank of 2.8. In fact, only 23 of these non-bridging waters (10%) have Rank ≥ 4. Among the 249 nucleotide base-to-protein bridging waters, 218 are found between bases and amino acid side-chains, 20 between bases and protein backbone, and 11 connect the bases to both the side-chain and backbone of the protein. The average Rank of these bridging waters is 3.7, with those linking to both the protein side-chain and backbone having an unsurprisingly larger average Rank of 4.6. One-third (82) of the bridging waters have Rank ≥ 4. HINT scores and Rank statistics for the set of waters interacting with both protein and DNA bases are summarized in Table 4. The mean interaction scores for waters bridging protein side chains to DNA bases are 94 and 360 for Hprotein-water and HDNA-water, respectively, while the partial Ranks are 1.9 and 1.8.
The previous analyses of bridging waters in protein-ligand systems , revealed a global Rank of 4.5 less evenly divided between protein and ligand: the mean partial protein-water Rank was 3.0, while the mean partial ligand-water Rank was 1.5. This difference is probably attributable to the different natures of protein-ligand and protein-DNA interfaces. Proteins, with a more extended and heterogeneous surface characterized by clefts and cavities, usually envelop small ligands, but formation of a protein-DNA complex likely involves winding of the objects together, yielding two more or less comparable surface areas. The HINT score values are also differently distributed in protein-ligand systems compared to protein-DNA systems. In the protein-ligand system , Hprotein-water and Hligand-water were 307 and 277 HINT score units, respectively, i.e., nearly equal. Here, even in the case of protein side-chain to DNA base, waters interact notably stronger with the DNA (360) than with the protein (94). This is, at first, somewhat surprising, given that the bases are structurally constrained to be planar, while the protein side-chains possess more flexibility and would presumably adopt the most conducive conformation for binding. However, the aromatic groups, present in both pyridine and purine bases, are capable of forming weak hydrogen bonds with water, either by water hydrogen atoms donating to aromatic electron clouds, or by water oxygen atoms accepting from polarized aromatic hydrogens. Thus, nearly all contacts between the nucleic acid bases and the surrounding water molecules are potentially positive. In contrast, hydrophobic protein side-chains would produce numerous unfavorable (negative scored) hydrophobic-polar contacts with water, regardless of the water geometry. Also, structural differences between the two types of interfaces are relevant. The cavities and shallows that bind waters at interfaces in protein-ligand complexes are usually formed by backbone or, more frequently, by charged and polar groups; however, the surface of a protein interacting with a polynucleotide can also be formed by apolar moieties. Thus, even though the number of hydrogen bonds to waters is more equally distributed between the two macromolecules in the protein-DNA case, these waters cannot be enveloped by either the protein or the DNA.
A most interesting consequence of the above results is that water molecules contributing to protein-DNA recognition specificity have a somewhat different set of criteria than those contributing energetically to the complex stability. Visual evaluation indicates that: 1) 54% of waters with nonzero Rank with respect to both macromolecules were involved in interactions with the DNA backbone and thus play a minor role in specificity but are energetically critical for the association. 2) 46% of the waters interact with the nucleotide base; of these, 21% are actually non-bridging, and the remainder (25%) bridge between the base and various features of the protein. Interestingly, only 2–3% of the nucleotide base-bridging waters interact (only) with the protein backbone, so that the vast majority interacts with the protein side-chains and potentially governs binding specificity. It is likely that only these waters forming hydrogen bonds with amino acid side-chains would be involved in recognition of specific nucleic acid sequences, but that accounts for more than 90% of the waters bridging between protein and bases of DNA.
Character of protein-DNA interactions
Hydrophobic effects have been proposed to be the major driving forces of protein-DNA association [35, 37, 66], as this force arises from the burial of non-polar protein surfaces into the DNA binding site. The predominant role of hydrophobicity (i.e., entropy) is supported by calorimetric analyses that reported a negative change in heat capacity upon complex formation [67, 68]. On the enthalpic side, the electrostatic term of free energy counteracts binding because favorable charge-charge interactions are often counterbalanced by the highly unfavorable contribution from dehydration of the polar groups . Jayaram's computational analysis of binding  also demonstrated that packing and hydrophobic effects favor binding, whereas electrostatic interactions energetically oppose it . However, the negative heat capacity change associated with the formation of specific protein-DNA complexes could not be completely explained by taking into account only hydration effects [14, 17, 18]. Other contributions, like the conformational changes of both proteins and nucleic acids accounting for 20% of the total ΔCp [69–72], the modification of the protonation state of the interacting residues  and counterion release , have been considered. In particular, even if ion release was generally considered to be favorable for complex formation, several studies demonstrated that the negative contribution from ion-molecule electrostatics, rather than the positive entropy given by the ion reorganization, dominates the salt-dependent solvation effects [36, 37]. Furthermore, the ionic interaction with water molecules induces an increased ordering of waters, producing a large negative heat capacity change [14, 74].
The HINT analysis in this work allowed examination of the character of interactions contributing to an association without actually parsing them energetically because all atom-atom interactions are evaluated with the same protocol. HINT evaluates not only the electrostatic and van der Waals contributions, but also hydrophobic-related contacts and should be able to evaluate the observation of Mandel-Gutfreund and Margalit  that amino acid-nucleotide base recognition is governed by both hydrogen bonds and hydrophobic interactions. Stabilizing hydrophobic contacts, mainly between sugar methylenes and aliphatic or aromatic amino acid side-chains, were estimated to account for 63% of protein-DNAribose contacts . Note that the free energy-based analysis illustrated in Figure 5 is over the entire protein-DNA interaction set (not just protein to ribose). Nevertheless, the hydrophobic/hydrophobic interactions (Figure 5) always contribute favorably to the protein-DNA binding but apparently only to a moderate extent. These contributions are not impacted by unfavorable effects or the presence/absence of bridging waters, and in some cases are the dominant factors in binding after the other terms appear to cancel out. In this sense, hydrophobic contacts and the related hydrophobic effects may represent the main driving force of protein-DNA association, while the electrostatic interactions seem to increase specificity but not affinity. It must be reiterated, however, that this computational analysis tool is probing only the (relatively short range) energetics between pre-formed DNA and protein components of the final, end-state, complex. As such, it does not measure or account for the internal energies of the protein and DNA molecules and the energy involved in conformational changes of these molecules between their unassociated and bound states. The quality of the resulting models, eq. 5, suggests that these and other terms are largely invariant over the data set.
Water contributes to protein-DNA complex formation in two principal ways. Without water, some of the complexes would be scored as energetically unfavorable. There is an apparent, but interesting, disconnect between water molecules that are significant for DNA-protein recognition having a lower Rank threshold than those critical for accurate free energy calculations. Also, the results above demonstrated that including the energetic contribution from waters at the protein-DNA interface significantly improved the quality of our computational free energy predictions, particularly with only "true" bridging waters. Our criterion, based on the previous analysis of 15 protein-ligand complexes , is that only waters characterized by nonzero partial Ranks with each interacting molecule and total Rank of at least 4 are energetically relevant. In effect, a bridging solvent molecule should form a minimum of two strong, well-located hydrogen bonds, with at least one additional favorable contact. Those waters with lower Rank, especially between 3 and 4, are still significant in mapping the energetic landscape for interaction by altering the shape, polarity and surface charge of the DNA or protein, even if they do not directly contribute to the free energy of binding.
This report is the first part in an effort to decode the molecular features leading to protein-DNA recognition. The interaction between these two biomacromolecules is an essential component of the machinery of life. Here we have demonstrated that our modeling experiments, using the empirical HINT free energy forcefield, with a measured incorporation of critical water molecules, gives more than acceptable estimates (± 1.28 kcal mol-1) of the free energy of binding. In addition, we have identified a set of traits based on Rank for water molecules that impact binding specificity. The count, orientation and binding strength of this set of water molecules is far more dependent on the chemical nature of the protein amino acid side-chains than on features of the DNA bases. In a forthcoming work, we will explore the specific match-ups of protein amino acid residues and DNA nucleotide bases by their types, with confidence that our computational approach is representative of actual binding free energy, and with these guidelines for the inclusion of relevant water molecules in our models.
Protein-DNA data set
The protein-DNA data set was selected from the available structures in the Protein Data Bank . While there are 123 unique structures in the PDB, many do not have reliable protein-DNA dissociation constants for exactly the same complex, are of poor resolution, and/or have missing residues or bases due to disorder or other experimental factors. The structures of the remaining thirty-nine protein-DNA complexes solved at a resolution better than 2.90 Å (28 complexes at better than 2.50 Å), were retrieved from the PDB and are listed in Table 1. Twenty-one structures are monomeric proteins interacting with double-stranded DNA, while eighteen structures are homodimeric and heterodimeric proteins complexed with palindromic double-stranded DNA. When only the monomeric-single stranded structure was available in the PDB because of crystallographic symmetry, the actual biological complex (i.e., dimeric protein and double-stranded DNA) was obtained from the Nucleic Acid Database http://ndbserver.rutgers.edu. 1jkr and 1jko structures are protein mutants of the 1hcr DNA-native protein complex. Analogously, 1jk1 and 1jk2 are mutants of the 1aay DNA-native complexes, and 1t9j, 1t9i, 1u0c are mutants of 1g9y. Only non-covalent complexes with four or more base pairs in the polynucleotide strand were included in the dataset. PDB files characterized by anomalous DNA structure, non-classical bases or anomalous base-base coupling were not considered. Moreover, only complexes for which published experimental dissociation (Kd) constants values are available were retained. In particular, to avoid misleading correlations between experimental and computational results, a structure of a particular protein-DNA complex was included in the data set only when the DNA sequence used for the experimental assay was completely coincident with the sequence of the crystallized complex, and when, at least, the same protein domain involved in DNA recognition was used in both binding and crystallographic experiments. When small differences between the DNA sequences used in Kd determination and crystallization experiments were observed, those complexes were included in our analysis only if the divergent bases were not directly involved in the protein-DNA recognition and association.
All complexes were modeled with Sybyl version 7.0 . The structures were carefully checked and corrected for chemically consistent atom and bond type assignment. Hydrogen atoms, not normally detected with common X-ray diffraction techniques, were computationally added, using the Sybyl Biopolymer and Build/Edit menu tools. To avoid steric clashes, added hydrogen atoms were then energy minimized using the Powell algorithm, with a convergence gradient of 0.5 kcal (mol Å)-1 for 1500 cycles, while fixing all heavy atom positions.
Hydropathic analyses were carried out with the HINT software , using a locally modified version 3.09Sβ , as previously reported [29, 42–44]. All partition calculations (where atomic HINT constants are assigned based on LogPo/w) were performed using the dictionary option for both proteins and nucleic acid sequences . In this work ionization states of neither protein residues nor DNA nucleotides were modified, i.e., keeping the default protonation models (ca. pH 7) of Sybyl. Because the interactions between proteins and nucleic acids are mainly electrostatic and H-bond based, the 'essential' option, which treats only the polar hydrogen atoms explicitly, was chosen as partition mode. A new HINT option that corrects the Si terms for backbone amide nitrogens and hydrogens  by adding 20 Å2 was used in this study. This correction improves the relative energetics of inter- and intra-molecular hydrogen bonds involving backbone amides.
Energetic contribution of water molecules
Water molecules crystallographically placed at the protein-DNA interface in a 4 Å range were automatically optimized and scored, using the "optimize bridging waters" and the "water accounting" options, implemented in the 3.09Sβ HINT version. For all of the succeeding calculations, each water was treated as an individual static molecule, and no statistical mechanical averaging on dynamics simulation trajectories were performed. During HINT optimization, the crystallographically-determined oxygen atom is allowed to translate at most 0.1 Å around its original position. HINT scores involving water are calculated as if each water molecule is a "ligand" interacting with the surrounding biomolecules acting in concert as a "receptor". Next, the "optimize water network" option was applied on crystallographic waters within 4 Å of both atoms of the protein and atoms of DNA using the geometry-based Rank algorithm [44, 60]. Rank is able to predict the weighted number of potential hydrogen bonds formed by each water molecule with both the protein and the DNA sequence. During the optimization process the water hydrogen atoms are allowed to adopt all possible positions in order to maximize hydrogen bonds and acid/base interactions, and to minimize unfavorable hydrophobic/polar or acid/acid contacts; i.e., the process is exhaustive. Only waters exhibiting Rank values greater than 0 with both protein and nucleic acid are considered bridging water molecules . Waters forming hydrogen bonds with only the protein, the DNA or neither are considered as waters of solvation that are not involved in the binding event and presumed to be not essential to the energetics of complex formation. Therefore, for each analyzed complex, the contribution given by waters characterized by Rank > 0 was calculated and added to the protein-DNA HINT score, i.e., HTOT = Hprotein-DNA + Hprotein-water + HDNA-water. Even though the Rank algorithm allows each water molecule to act as donor with at most two hydrogen bond acceptors and as acceptor with at most two hydrogen bond donors, Rank should be interpreted only loosely as a count of hydrogen bonds. In previous analyses performed on protein-ligand complexes , Rank greater than four was associated with very locked and stable water molecules. Thus, in this work, bridging waters with total Rank ≥ 4 were identified for special consideration (see Results and Discussion).
Identification of water molecules mediating specific protein-DNA recognition
Some water molecules are specific mediators of recognition between protein and DNA. To isolate specific interactions between protein and base atoms, the phosphate and ribose groups were excluded from the HINT partition. Again, water molecules found in a 4 Å range at the protein-DNA interface with Rank > 0 were optimized, scored and Ranked only with respect to protein residues and DNA bases. These waters, potentially significant for specific recognition and association, were classified as bridging or not bridging. Another constraint is that bridging waters must mediate interactions between groups that are too far to contact each other otherwise. The bridging waters were divided into three different classes: (I) waters bridging DNA bases and protein amino acid residue side-chains, (II) waters bridging DNA bases and the protein backbone, and (III) waters bridging DNA bases and both protein side-chain and backbone atoms. Specific mean HINT score and Ranks were determined for each category, paying particular attention to side chain bridging waters, the only that should be able to mediate specific recognition. HINT score and Rank diagnostic of the three classes were calculated in order to identify essential water molecules in new protein-DNA complexes.
This work was partially supported by funds from the Italian Ministry of Instruction, University and Research within an Internationalization project (Mozzarelli), FIRB RBNE0157EH (Marabotti), and U.S. NIH grant GM71894 (Kellogg). We acknowledge the guidance and support of Donald J. Abraham in all of these studies.
- Harrington RE: DNA curving and bending in protein-DNA recognition. Mol Microbiol 1992, 6: 2549–2555. 10.1111/j.1365-2958.1992.tb01431.xPubMedView ArticleGoogle Scholar
- Matthews BW: Protein-DNA interaction. No code for recognition. Nature 1988, 335: 294–295. 10.1038/335294a0PubMedView ArticleGoogle Scholar
- Draper DE: Protein-DNA complexes: the cost of recognition. Proc Natl Acad Sci U S A 1993, 90: 7429–7430. 10.1073/pnas.90.16.7429PubMed CentralPubMedView ArticleGoogle Scholar
- Pabo CO, Sauer RT: Protein-DNA recognition. Annu Rev Biochem 1984, 53: 293–321. 10.1146/annurev.bi.53.070184.001453PubMedView ArticleGoogle Scholar
- Mandel-Gutfreund Y, Margalit H: Quantitative parameters for amino acid-base interaction: implications for prediction of protein-DNA binding sites. Nucleic Acids Res 1998, 26: 2306–2312. 10.1093/nar/26.10.2306PubMed CentralPubMedView ArticleGoogle Scholar
- Pabo CO, Nekludova L: Geometric analysis and comparison of protein-DNA interfaces: why is there no simple code for recognition? J Mol Biol 2000, 301: 597–624. 10.1006/jmbi.2000.3918PubMedView ArticleGoogle Scholar
- Benos PV, Lapedes AS, Stormo GD: Is there a code for protein-DNA recognition? Probab(ilistical)ly. Bioessays 2002, 24: 466–475. 10.1002/bies.10073PubMedView ArticleGoogle Scholar
- Jordan SR, Pabo CO: Structure of the lambda complex at 2.5 A resolution: details of the repressor-operator interactions. Science 1988, 242: 893–899. 10.1126/science.3187530PubMedView ArticleGoogle Scholar
- Brennan RG, Roderick SL, Takeda Y, Matthews BW: Protein-DNA conformational changes in the crystal structure of a lambda Cro-operator complex. Proc Natl Acad Sci U S A 1990, 87: 8165–8169. 10.1073/pnas.87.20.8165PubMed CentralPubMedView ArticleGoogle Scholar
- Schultz SC, Shields GC, Steitz TA: Crystal structure of a CAP-DNA complex: the DNA is bent by 90 degrees. Science 1991, 253: 1001–1007. 10.1126/science.1653449PubMedView ArticleGoogle Scholar
- Mandel-Gutfreund Y, Schueler O, Margalit H: Comprehensive analysis of hydrogen bonds in regulatory protein DNA-complexes: in search of common principles. J Mol Biol 1995, 253: 370–382. 10.1006/jmbi.1995.0559PubMedView ArticleGoogle Scholar
- Pabo CO, Sauer RT: Transcription factors: structural families and principles of DNA recognition. Annu Rev Biochem 1992, 61: 1053–1095. 10.1146/annurev.bi.61.070192.005201PubMedView ArticleGoogle Scholar
- Gurlie R, Duong TH, Zakrzewska K: The role of DNA-protein salt bridges in molecular recognition: a model study. Biopolymers 1999, 49: 313–327. 10.1002/(SICI)1097-0282(19990405)49:4<313::AID-BIP6>3.0.CO;2-0PubMedView ArticleGoogle Scholar
- Oda M, Nakamura H: Thermodynamic and kinetic analyses for understanding sequence-specific DNA recognition. Genes Cells 2000, 5: 319–326. 10.1046/j.1365-2443.2000.00335.xPubMedView ArticleGoogle Scholar
- Schwabe JW: The role of water in protein-DNA interactions. Curr Opin Struct Biol 1997, 7: 126–134. 10.1016/S0959-440X(97)80016-4PubMedView ArticleGoogle Scholar
- Jayaram B, Jain T: The role of water in protein-DNA recognition. Annu Rev Biophys Biomol Struct 2004, 33: 343–361. 10.1146/annurev.biophys.33.110502.140414PubMedView ArticleGoogle Scholar
- Cooper A, Johnson CM, Lakey JH, Nollmann M: Heat does not come in different colours: entropy-enthalpy compensation, free energy windows, quantum confinement, pressure perturbation calorimetry, solvation and the multiple causes of heat capacity effects in biomolecular interactions. Biophys Chem 2001, 93: 215–230. 10.1016/S0301-4622(01)00222-8PubMedView ArticleGoogle Scholar
- Cooper A: Heat capacity effects in protein folding and ligand binding: a re-evaluation of the role of water in biomolecular thermodynamics. Biophys Chem 2005, 115: 89–97. 10.1016/j.bpc.2004.12.011PubMedView ArticleGoogle Scholar
- Parsegian VA, Rand RP, Rau DC: Macromolecules and water: probing with osmotic stress. Methods Enzymol 1995, 259: 43–94.PubMedView ArticleGoogle Scholar
- Garner MM, Rau DC: Water release associated with specific binding of gal repressor. Embo J 1995, 14: 1257–1263.PubMed CentralPubMedGoogle Scholar
- Sidorova NY, Rau DC: Linkage of EcoRI dissociation from its specific DNA recognition site to water activity, salt concentration, and pH: separating their roles in specific and non-specific binding. J Mol Biol 2001, 310: 801–816. 10.1006/jmbi.2001.4781PubMedView ArticleGoogle Scholar
- Garner MM, Burg MB: Macromolecular crowding and confinement in cells exposed to hypertonicity. Am J Physiol 1994, 266: C877–892.PubMedGoogle Scholar
- Minton AP: Molecular crowding: analysis of effects of high concentrations of inert cosolutes on biochemical equilibria and rates in terms of volume exclusion. Methods Enzymol 1998, 295: 127–149.PubMedView ArticleGoogle Scholar
- Mozzarelli A, Rossi GL: Protein function in the crystal. Annu Rev Biophys Biomol Struct 1996, 25: 343–365. 10.1146/annurev.bb.25.060196.002015PubMedView ArticleGoogle Scholar
- Mozzarelli A, Bettati S: Functional properties of immobilized proteins. In Advanced Functional Molecules and Polymers Volume 4. Volume 4. Edited by: Nalwa HS. Tokio, Gordon and Breach Science Publishers; 2001:55–97.Google Scholar
- Bohm HJ: The development of a simple empirical scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure. J Comput Aided Mol Des 1994, 8: 243–256. 10.1007/BF00126743PubMedView ArticleGoogle Scholar
- Aqvist J: Calculation of absolute binding free energies for charged ligands and effects of long-range electrostatic interactions. J Comput Chem 1996, 17: 1587–1597. Publisher Full Text 10.1002/(SICI)1096-987X(19961115)17:14<1587::AID-JCC1>3.0.CO;2-HView ArticleGoogle Scholar
- Eldridge MD, Murray CW, Auton TR, Paolini GV, Mee RP: Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes. J Comput Aided Mol Des 1997, 11: 425–445. 10.1023/A:1007996124545PubMedView ArticleGoogle Scholar
- Cozzini P, Fornabaio M, Marabotti A, Abraham DJ, Kellogg GE, Mozzarelli A: Simple, intuitive calculations of free energy of binding for protein-ligand complexes. 1. Models without explicit constrained water. J Med Chem 2002, 45: 2469–2483. 10.1021/jm0200299PubMedView ArticleGoogle Scholar
- Lesser DR, Kurpiewski MR, Jen-Jacobson L: The energetic basis of specificity in the Eco RI endonuclease--DNA interaction. Science 1990, 250: 776–786. 10.1126/science.2237428PubMedView ArticleGoogle Scholar
- Benos PV, Lapedes AS, Stormo GD: Probabilistic code for DNA recognition by proteins of the EGR family. J Mol Biol 2002, 323: 701–727. 10.1016/S0022-2836(02)00917-8PubMedView ArticleGoogle Scholar
- Luscombe NM, Thornton JM: Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity. J Mol Biol 2002, 320: 991–1009. 10.1016/S0022-2836(02)00571-5PubMedView ArticleGoogle Scholar
- Jones S, van Heyningen P, Berman HM, Thornton JM: Protein-DNA interactions: A structural analysis. J Mol Biol 1999, 287: 877–896. 10.1006/jmbi.1999.2659PubMedView ArticleGoogle Scholar
- Choo Y, Klug A: Physical basis of a protein-DNA recognition code. Curr Opin Struct Biol 1997, 7: 117–125. 10.1016/S0959-440X(97)80015-2PubMedView ArticleGoogle Scholar
- Gorfe AA, Jelesarov I: Energetics of sequence-specific protein-DNA association: computational analysis of integrase Tn916 binding to its target DNA. Biochemistry 2003, 42: 11568–11576. 10.1021/bi026937pPubMedView ArticleGoogle Scholar
- Jayaram B, McConnell KJ, Dixit SB, Beveridge DL: Free Energy Analysis of Protein-DNA Binding: The EcoRI Endonuclease-DNA Complex. J Comput Phys 1999, 151: 333–357. 10.1006/jcph.1998.6173View ArticleGoogle Scholar
- Jayaram B, McConnell K, Dixit SB, Das A, Beveridge DL: Free-energy component analysis of 40 protein-DNA complexes: a consensus view on the thermodynamics of binding at the molecular level. J Comput Chem 2002, 23: 1–14. 10.1002/jcc.10009PubMedView ArticleGoogle Scholar
- Anderson WF, Ohlendorf DH, Takeda Y, Matthews BW: Structure of the cro repressor from bacteriophage lambda and its interaction with DNA. Nature 1981, 290: 754–758. 10.1038/290754a0PubMedView ArticleGoogle Scholar
- Kellogg GE, Abraham DJ: Hydrophobicity: is LogP(o/w) more than the sum of its parts? Eur J Med Chem 2000, 35: 651–661. 10.1016/S0223-5234(00)00167-7View ArticleGoogle Scholar
- Hansch C, Leo AJ: Substituent constants for correlation analysis in chemistry and biology. New York, John Wiley and Sons; 1979.Google Scholar
- Dill KA: Additivity principles in biochemistry. J Biol Chem 1997, 272: 701–704.PubMedView ArticleGoogle Scholar
- Fornabaio M, Cozzini P, Mozzarelli A, Abraham DJ, Kellogg GE: Simple, intuitive calculations of free energy of binding for protein-ligand complexes. 2. Computational titration and pH effects in molecular models of neuraminidase-inhibitor complexes. J Med Chem 2003, 46: 4487–4500. 10.1021/jm0302593PubMedView ArticleGoogle Scholar
- Fornabaio M, Spyrakis F, Mozzarelli A, Cozzini P, Abraham DJ, Kellogg GE: Simple, intuitive calculations of free energy of binding for protein-ligand complexes. 3. The free energy contribution of structural water molecules in HIV-1 protease complexes. J Med Chem 2004, 47: 4507–4516. 10.1021/jm030596bPubMedView ArticleGoogle Scholar
- Amadasi A, Spyrakis F, Cozzini P, Abraham DJ, Kellogg GE, Mozzarelli A: Mapping the energetics of water-protein and water-ligand interactions with the "natural" HINT forcefield: predictive tools for characterizing the roles of water in biomolecules. J Mol Biol 2006, 358: 289–309. 10.1016/j.jmb.2006.01.053PubMedView ArticleGoogle Scholar
- Burnett JC, Kellogg GE, Abraham DJ: Computational methodology for estimating changes in free energies of biomolecular association upon mutation. The importance of bound water in dimer-tetramer assembly for beta 37 mutant hemoglobins. Biochemistry 2000, 39: 1622–1633. 10.1021/bi991724uPubMedView ArticleGoogle Scholar
- Burnett JC, Botti P, Abraham DJ, Kellogg GE: Computationally accessible method for estimating free energy changes resulting from site-specific mutations of biomolecules: systematic model building and structural/hydropathic analysis of deoxy and oxy hemoglobins. Proteins 2001, 42: 355–377. 10.1002/1097-0134(20010215)42:3<355::AID-PROT60>3.0.CO;2-FPubMedView ArticleGoogle Scholar
- Kellogg GE, Scarsdale JN, Fornari FA Jr.: Identification and hydropathic characterization of structural features affecting sequence specificity for doxorubicin intercalation into DNA double-stranded polynucleotides. Nucleic Acids Res 1998, 26: 4721–4732. 10.1093/nar/26.20.4721PubMed CentralPubMedView ArticleGoogle Scholar
- Cashman DJ, Scarsdale JN, Kellogg GE: Hydropathic analysis of the free energy differences in anthracycline antibiotic binding to DNA. Nucleic Acids Res 2003, 31: 4410–4416. 10.1093/nar/gkg645PubMed CentralPubMedView ArticleGoogle Scholar
- Cashman DJ, Kellogg GE: A computational model for anthracycline binding to DNA: tuning groove-binding intercalators for specific sequences. J Med Chem 2004, 47: 1360–1374. 10.1021/jm030529hPubMedView ArticleGoogle Scholar
- Janin J: Wet and dry interfaces: the role of solvent in protein-protein and protein-DNA recognition. Structure 1999, 7: R277–279. 10.1016/S0969-2126(00)88333-1PubMedView ArticleGoogle Scholar
- Reddy CK, Das A, Jayaram B: Do water molecules mediate protein-DNA recognition? J Mol Biol 2001, 314: 619–632. 10.1006/jmbi.2001.5154PubMedView ArticleGoogle Scholar
- Papoian GA, Ulander J, Wolynes PG: Role of water mediated interactions in protein-protein recognition landscapes. J Am Chem Soc 2003, 125: 9170–9178. 10.1021/ja034729uPubMedView ArticleGoogle Scholar
- Monecke P, Borosch T, Brickmann J, Kast SM: Determination of the interfacial water content in protein-protein complexes from free energy simulations. Biophys J 2006, 90: 841–850. 10.1529/biophysj.105.065524PubMed CentralPubMedView ArticleGoogle Scholar
- Cozzini P, Fornabaio M, Marabotti A, Abraham DJ, Kellogg GE, Mozzarelli A: Free energy of ligand binding to protein: evaluation of the contribution of water molecules by computational methods. Curr Med Chem 2004, 11: 3093–3118.PubMedView ArticleGoogle Scholar
- Goodford PJ: A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. J Med Chem 1985, 28: 849–857. 10.1021/jm00145a002PubMedView ArticleGoogle Scholar
- Cooper A: Heat capacity of hydrogen-bonded networks: an alternative view of protein folding thermodynamics. Biophys Chem 2000, 85: 25–39. 10.1016/S0301-4622(00)00136-8PubMedView ArticleGoogle Scholar
- Schneider B, Patel K, Berman HM: Hydration of the phosphate group in double-helical DNA. Biophys J 1998, 75: 2422–2434.PubMed CentralPubMedView ArticleGoogle Scholar
- The Protein Data Bank [http://www.rcsb.org]
- The Nucleic Acid Database, [http://ndbserver.rutgers.edu]
- Chen DL, Kellogg GE: A computational tool to optimize ligand selectivity between two similar biomacromolecular targets. J Comput Aided Mol Des 2005, 19: 69–82. 10.1007/s10822-005-1485-7PubMedView ArticleGoogle Scholar
- Luscombe NM, Laskowski RA, Thornton JM: Amino acid-base interactions: a three-dimensional analysis of protein-DNA interactions at an atomic level. Nucleic Acids Res 2001, 29: 2860–2874. 10.1093/nar/29.13.2860PubMed CentralPubMedView ArticleGoogle Scholar
- Halford SE, Marko JF: How do site-specific DNA-binding proteins find their targets? Nucleic Acids Res 2004, 32: 3040–3052. 10.1093/nar/gkh624PubMed CentralPubMedView ArticleGoogle Scholar
- Kalodimos CG, Biris N, Bonvin AM, Levandoski MM, Guennuegues M, Boelens R, Kaptein R: Structure and flexibility adaptation in nonspecific and specific protein-DNA complexes. Science 2004, 305: 386–389. 10.1126/science.1097064PubMedView ArticleGoogle Scholar
- Lejeune D, Delsaux N, Charloteaux B, Thomas A, Brasseur R: Protein-nucleic acid recognition: statistical analysis of atomic interactions and influence of DNA structure. Proteins 2005, 61: 258–271. 10.1002/prot.20607PubMedView ArticleGoogle Scholar
- Jen-Jacobson L: Protein-DNA recognition complexes: conservation of structure and binding energy in the transition state. Biopolymers 1997, 44: 153–180. 10.1002/(SICI)1097-0282(1997)44:2<153::AID-BIP4>3.0.CO;2-UPubMedView ArticleGoogle Scholar
- Ha JH, Spolar RS, Record MT Jr.: Role of the hydrophobic effect in stability of site-specific protein-DNA complexes. J Mol Biol 1989, 209: 801–816. 10.1016/0022-2836(89)90608-6PubMedView ArticleGoogle Scholar
- Lundback T, Hansson H, Knapp S, Ladenstein R, Hard T: Thermodynamic characterization of non-sequence-specific DNA-binding by the Sso7d protein from Sulfolobus solfataricus. J Mol Biol 1998, 276: 775–786. 10.1006/jmbi.1997.1558PubMedView ArticleGoogle Scholar
- Milev S, Gorfe AA, Karshikoff A, Clubb RT, Bosshard HR, Jelesarov I: Energetics of sequence-specific protein-DNA association: binding of integrase Tn916 to its target DNA. Biochemistry 2003, 42: 3481–3491. 10.1021/bi0269355PubMedView ArticleGoogle Scholar
- Spolar RS, Record MT Jr.: Coupling of local folding to site-specific binding of proteins to DNA. Science 1994, 263: 777–784. 10.1126/science.8303294PubMedView ArticleGoogle Scholar
- Künne AGE, Sieber M, Meierhans D, Allemann RK: Thermodynamics of the DNA binding reaction of transcription factor MASH-1. Biochemistry 1998, 37: 4217–4223. 10.1021/bi9725374View ArticleGoogle Scholar
- Kozlov AG, Lohman TM: Adenine base unstacking dominates the observed enthalpy and heat capacity changes for the Escherichia coli SSB tetramer binding to single-stranded oligoadenylates. Biochemistry 1999, 38: 7388–7397. 10.1021/bi990309zPubMedView ArticleGoogle Scholar
- Kozlov AG, Lohman TM: Effects of monovalent anions on a temperature-dependent heat capacity change for Escherichia coli SSB tetramer binding to single-stranded DNA. Biochemistry 2006, 45: 5190–5205. 10.1021/bi052543xPubMed CentralPubMedView ArticleGoogle Scholar
- Fukada H, Takahashi K: Enthalpy and heat capacity changes for the proton dissociation of various buffer components in 0.1 M potassium chloride. Proteins 1998, 33: 159–166. 10.1002/(SICI)1097-0134(19981101)33:2<159::AID-PROT2>3.0.CO;2-EPubMedView ArticleGoogle Scholar
- Oda M, Furukawa K, Ogata K, Sarai A, Nakamura H: Thermodynamics of specific and non-specific DNA binding by the c-Myb DNA-binding domain. J Mol Biol 1998, 276: 571–590. 10.1006/jmbi.1997.1564PubMedView ArticleGoogle Scholar
- Tripos Inc., [http://www.tripos.com]
- eduSoft, LC, [http://www.edusoft-lc.com]
- Kellogg GE, Joshi GS, Abraham DJ: New tools for modeling and understanding hydrophobicity and hydrophobic interactions. Med Chem Res 1992, 1: 444–453.Google Scholar
- Porotto M, Fornabaio M, Greengard O, Murrell MT, Kellogg GE, Moscona A: Paramyxovirus receptor-binding molecules: engagement of one site on the hemagglutinin-neuraminidase protein modulates activity at the second site. J Virol 2006, 80: 1204–1213. 10.1128/JVI.80.3.1204-1213.2006PubMed CentralPubMedView ArticleGoogle Scholar
- Cramer P, Larson CJ, Verdine GL, Muller CW: Structure of the human NF-kappaB p52 homodimer-DNA complex at 2.1 A resolution. Embo J 1997, 16: 7078–7090. 10.1093/emboj/16.23.7078PubMed CentralPubMedView ArticleGoogle Scholar
- Elrod-Erickson M, Rould MA, Nekludova L, Pabo CO: Zif268 protein-DNA complex refined at 1.6 A: a model system for understanding zinc finger-DNA interactions. Structure 1996, 4: 1171–1180. 10.1016/S0969-2126(96)00125-6PubMedView ArticleGoogle Scholar
- Robinson H, Gao YG, McCrary BS, Edmondson SP, Shriver JW, Wang AH: The hyperthermophile chromosomal protein Sac7d sharply kinks DNA. Nature 1998, 392: 202–205. 10.1038/32455PubMedView ArticleGoogle Scholar
- Rhee S, Martin RG, Rosner JL, Davies DR: A novel DNA-binding motif in MarA: the first structure for an AraC family transcriptional activator. Proc Natl Acad Sci U S A 1998, 95: 10413–10418. 10.1073/pnas.95.18.10413PubMed CentralPubMedView ArticleGoogle Scholar
- Zhao Q, Chasse SA, Devarakonda S, Sierk ML, Ahvazi B, Rastinejad F: Structural basis of RXR-DNA interactions. J Mol Biol 2000, 296: 509–520. 10.1006/jmbi.1999.3457PubMedView ArticleGoogle Scholar
- Tsai FT, Sigler PB: Structural basis of preinitiation complex assembly on human pol II promoters. Embo J 2000, 19: 25–36. 10.1093/emboj/19.1.25PubMed CentralPubMedView ArticleGoogle Scholar
- Cheetham GM, Jeruzalmi D, Steitz TA: Structural basis for initiation of transcription from an RNA polymerase-promoter complex. Nature 1999, 399: 80–83. 10.1038/19999PubMedView ArticleGoogle Scholar
- Meinke G, Sigler PB: DNA-binding mechanism of the monomeric orphan nuclear receptor NGFI-B. Nat Struct Biol 1999, 6: 471–477. 10.1038/8276PubMedView ArticleGoogle Scholar
- Grant RA, Rould MA, Klemm JD, Pabo CO: Exploring the role of glutamine 50 in the homeodomain-DNA interface: crystal structure of engrailed (Gln50 --> ala) complex at 2.0 A. Biochemistry 2000, 39: 8187–8192. 10.1021/bi000071aPubMedView ArticleGoogle Scholar
- Mo Y, Vaessen B, Johnston K, Marmorstein R: Structure of the elk-1-DNA complex reveals how DNA-distal residues affect ETS domain recognition of DNA. Nat Struct Biol 2000, 7: 292–297. 10.1038/74055PubMedView ArticleGoogle Scholar
- Wilce JA, Vivian JP, Hastings AF, Otting G, Folmer RH, Duggin IG, Wake RG, Wilce MC: Structure of the RTP-DNA complex and the mechanism of polar replication fork arrest. Nat Struct Biol 2001, 8: 206–210. 10.1038/84934PubMedView ArticleGoogle Scholar
- Chevalier BS, Monnat RJ Jr., Stoddard BL: The homing endonuclease I-CreI uses three metals, one of which is shared between the two active sites. Nat Struct Biol 2001, 8: 312–316. 10.1038/86181PubMedView ArticleGoogle Scholar
- Tahirov TH, Sato K, Ichikawa-Iwata E, Sasaki M, Inoue-Bungo T, Shiina M, Kimura K, Takata S, Fujikawa A, Morii H, Kumasaka T, Yamamoto M, Ishii S, Ogata K: Mechanism of c-Myb-C/EBP beta cooperation from separated sites on a promoter. Cell 2002, 108: 57–70. 10.1016/S0092-8674(01)00636-5PubMedView ArticleGoogle Scholar
- Feng JA, Johnson RC, Dickerson RE: Hin recombinase bound to DNA: the origin of specificity in major and minor groove interactions. Science 1994, 263: 348–355. 10.1126/science.8278807PubMedView ArticleGoogle Scholar
- Van Roey P, Waddling CA, Fox KM, Belfort M, Derbyshire V: Intertwined structure of the DNA-binding domain of intron endonuclease I-TevI with its substrate. Embo J 2001, 20: 3631–3637. 10.1093/emboj/20.14.3631PubMed CentralPubMedView ArticleGoogle Scholar
- Hovde S, Abate-Shen C, Geiger JH: Crystal structure of the Msx-1 homeodomain/DNA complex. Biochemistry 2001, 40: 12013–12021. 10.1021/bi0108148PubMedView ArticleGoogle Scholar
- Miller JC, Pabo CO: Rearrangement of side-chains in a Zif268 mutant highlights the complexities of zinc finger-DNA recognition. J Mol Biol 2001, 313: 309–315. 10.1006/jmbi.2001.4975PubMedView ArticleGoogle Scholar
- Chiu TK, Sohn C, Dickerson RE, Johnson RC: Testing water-mediated DNA recognition by the Hin recombinase. Embo J 2002, 21: 801–814. 10.1093/emboj/21.4.801PubMed CentralPubMedView ArticleGoogle Scholar
- Ke A, Mathias JR, Vershon AK, Wolberger C: Structural and thermodynamic characterization of the DNA binding properties of a triple alanine mutant of MATalpha2. Structure 2002, 10: 961–971. 10.1016/S0969-2126(02)00790-6PubMedView ArticleGoogle Scholar
- Beamer LJ, Pabo CO: Refined 1.8 A crystal structure of the lambda repressor-operator complex. J Mol Biol 1992, 227: 177–196. 10.1016/0022-2836(92)90690-LPubMedView ArticleGoogle Scholar
- Shi Y, Wang YF, Jayaraman L, Yang H, Massague J, Pavletich NP: Crystal structure of a Smad MH1 domain bound to DNA: insights on DNA binding in TGF-beta signaling. Cell 1998, 94: 585–594. 10.1016/S0092-8674(00)81600-1PubMedView ArticleGoogle Scholar
- Schumacher MA, Choi KY, Zalkin H, Brennan RG: Crystal structure of LacI member, PurR, bound to DNA: minor groove binding by alpha helices. Science 1994, 266: 763–770. 10.1126/science.7973627PubMedView ArticleGoogle Scholar
- Kodandapani R, Pio F, Ni CZ, Piccialli G, Klemsz M, McKercher S, Maki RA, Ely KR: A new pattern for helix-turn-helix recognition revealed by the PU.1 ETS-domain-DNA complex. Nature 1996, 380: 456–460. 10.1038/380456a0PubMedView ArticleGoogle Scholar
- Glasfeld A, Koehler AN, Schumacher MA, Brennan RG: The role of lysine 55 in determining the specificity of the purine repressor for its operators through minor groove interactions. J Mol Biol 1999, 291: 347–361. 10.1006/jmbi.1999.2946PubMedView ArticleGoogle Scholar
- Rupert PB, Daughdrill GW, Bowerman B, Matthews BW: A new DNA-binding motif in the Skn-1 binding domain-DNA complex. Nat Struct Biol 1998, 5: 484–491. 10.1038/nsb0698-484PubMedView ArticleGoogle Scholar
- Edgell DR, Derbyshire V, Van Roey P, LaBonne S, Stanger MJ, Li Z, Boyd TM, Shub DA, Belfort M: Intron-encoded homing endonuclease I-TevI also functions as a transcriptional autorepressor. Nat Struct Mol Biol 2004, 11: 936–944. 10.1038/nsmb823PubMedView ArticleGoogle Scholar
- Chevalier B, Sussman D, Otis C, Noel AJ, Turmel M, Lemieux C, Stephens K, Monnat RJ Jr., Stoddard BL: Metal-dependent DNA cleavage mechanism of the I-CreI LAGLIDADG homing endonuclease. Biochemistry 2004, 43: 14015–14026. 10.1021/bi048970cPubMedView ArticleGoogle Scholar
- van Pouderoyen G, Ketting RF, Perrakis A, Plasterk RH, Sixma TK: Crystal structure of the specific DNA-binding domain of Tc3 transposase of C.elegans in complex with transposon DNA. Embo J 1997, 16: 6044–6054. 10.1093/emboj/16.19.6044PubMed CentralPubMedView ArticleGoogle Scholar
- Otwinowski Z, Schevitz RW, Zhang RG, Lawson CL, Joachimiak A, Marmorstein RQ, Luisi BF, Sigler PB: Crystal structure of trp repressor/operator complex at atomic resolution. Nature 1988, 335: 321–329. 10.1038/335321a0PubMedView ArticleGoogle Scholar
- Sussman D, Chadsey M, Fauce S, Engel A, Bruett A, Monnat R Jr., Stoddard BL, Seligman LM: Isolation and characterization of new homing endonuclease specificities at individual target site positions. J Mol Biol 2004, 342: 31–41. 10.1016/j.jmb.2004.07.031PubMedView ArticleGoogle Scholar
- Li T, Stark MR, Johnson AD, Wolberger C: Crystal structure of the MATa1/MAT alpha 2 homeodomain heterodimer bound to DNA. Science 1995, 270: 262–269. 10.1126/science.270.5234.262PubMedView ArticleGoogle Scholar
- Kim Y, Geiger JH, Hahn S, Sigler PB: Crystal structure of a yeast TBP/TATA-box complex. Nature 1993, 365: 512–520. 10.1038/365512a0PubMedView ArticleGoogle Scholar
- Hegde RS, Grossman SR, Laimins LA, Sigler PB: Crystal structure at 1.7 A of the bovine papillomavirus-1 E2 DNA-binding domain bound to its DNA target. Nature 1992, 359: 505–512. 10.1038/359505a0PubMedView ArticleGoogle Scholar
- Tucker-Kellogg L, Rould MA, Chambers KA, Ades SE, Sauer RT, Pabo CO: Engrailed (Gln50-->Lys) homeodomain-DNA complex at 1.9 A resolution: structural basis for enhanced affinity and altered specificity. Structure 1997, 5: 1047–1054. 10.1016/S0969-2126(97)00256-6PubMedView ArticleGoogle Scholar
- Fraenkel E, Pabo CO: Comparison of X-ray and NMR structures for the Antennapedia homeodomain-DNA complex. Nat Struct Biol 1998, 5: 692–697.PubMedView ArticleGoogle Scholar
- Swirnoff AH, Milbrandt J: DNA-binding specificity of NGFI-A and related zinc finger transcription factors. Mol Cell Biol 1995, 15: 2275–2287.PubMed CentralPubMedView ArticleGoogle Scholar
- McAfee JG, Edmondson SP, Zegar I, Shriver JW: Equilibrium DNA binding of Sac7d protein from the hyperthermophile Sulfolobus acidocaldarius: fluorescence and circular dichroism studies. Biochemistry 1996, 35: 4034–4045. 10.1021/bi952555qPubMedView ArticleGoogle Scholar
- Martin RG, Jair KW, Wolf RE Jr., Rosner JL: Autoactivation of the marRAB multiple antibiotic resistance operon by the MarA transcriptional activator in Escherichia coli. J Bacteriol 1996, 178: 2216–2223.PubMed CentralPubMedGoogle Scholar
- Lagrange T, Kapanidis AN, Tang H, Reinberg D, Ebright RH: New core promoter element in RNA polymerase II-dependent transcription: sequence-specific DNA binding by transcription factor IIB. Genes Dev 1998, 12: 34–44.PubMed CentralPubMedView ArticleGoogle Scholar
- Bandwar RP, Jia Y, Stano NM, Patel SS: Kinetic and thermodynamic basis of promoter strength: multiple steps of transcription initiation by T7 RNA polymerase are modulated by the promoter sequence. Biochemistry 2002, 41: 3586–3595. 10.1021/bi0158472PubMedView ArticleGoogle Scholar
- Wilson TE, Paulsen RE, Padgett KA, Milbrandt J: Participation of non-zinc finger residues in DNA binding by two nuclear orphan receptors. Science 1992, 256: 107–110. 10.1126/science.1314418PubMedView ArticleGoogle Scholar
- Ades SE, Sauer RT: Differential DNA-binding specificity of the engrailed homeodomain: the role of residue 50. Biochemistry 1994, 33: 9187–9194. 10.1021/bi00197a022PubMedView ArticleGoogle Scholar
- Shore P, Bisset L, Lakey J, Waltho JP, Virden R, Sharrocks AD: Characterization of the Elk-1 ETS DNA-binding domain. J Biol Chem 1995, 270: 5805–5811. 10.1074/jbc.270.11.5805PubMedView ArticleGoogle Scholar
- Heath PJ, Stephens KM, Monnat RJ Jr., Stoddard BL: The structure of I-Crel, a group I intron-encoded homing endonuclease. Nat Struct Biol 1997, 4: 468–476. 10.1038/nsb0697-468PubMedView ArticleGoogle Scholar
- Derbyshire V, Kowalski JC, Dansereau JT, Hauer CR, Belfort M: Two-domain structure of the td intron-encoded endonuclease I-TevI correlates with the two-domain configuration of the homing site. J Mol Biol 1997, 265: 494–506. 10.1006/jmbi.1996.0754PubMedView ArticleGoogle Scholar
- Catron KM, Iler N, Abate C: Nucleotides flanking a conserved TAAT core dictate the DNA binding specificity of three murine homeodomain proteins. Mol Cell Biol 1993, 13: 2354–2365.PubMed CentralPubMedView ArticleGoogle Scholar
- Rolfes RJ, Zalkin H: Purification of the Escherichia coli purine regulon repressor and identification of corepressors. J Bacteriol 1990, 172: 5637–5642.PubMed CentralPubMedGoogle Scholar
- Pio F, Assa-Munt N, Yguerabide J, Maki RA: Mutants of ETS domain PU.1 and GGAA/T recognition: free energies and kinetics. Protein Sci 1999, 8: 2098–2109.PubMed CentralPubMedView ArticleGoogle Scholar
- Daniel DC, Thompson M, Woodbury NW: DNA-binding interactions and conformational fluctuations of Tc3 transposase DNA binding domain examined with single molecule fluorescence spectroscopy. Biophys J 2002, 82: 1654–1666.PubMed CentralPubMedView ArticleGoogle Scholar
- Carey J: Gel retardation at low pH resolves trp repressor-DNA complexes for quantitative study. Proc Natl Acad Sci U S A 1988, 85: 975–979. 10.1073/pnas.85.4.975PubMed CentralPubMedView ArticleGoogle Scholar
- Phillips CL, Stark MR, Johnson AD, Dahlquist FW: Heterodimerization of the yeast homeodomain transcriptional regulators alpha 2 and a1 induces an interfacial helix in alpha 2. Biochemistry 1994, 33: 9294–9302. 10.1021/bi00197a033PubMedView ArticleGoogle Scholar
- Ornstein RL, Rein R, Breen DL, MacElroy RD: An optimized potential function for the calculation of nucleic acid interaction energies. I. Base stacking. Biopolymers 1978, 17: 2341–2360. 10.1002/bip.1978.360171005PubMedView ArticleGoogle Scholar
- Monini P, Grossman SR, Pepinsky B, Androphy EJ, Laimins LA: Cooperative binding of the E2 protein of bovine papillomavirus to adjacent E2-responsive sequences. J Virol 1991, 65: 2124–2130.PubMed CentralPubMedGoogle Scholar
- Affolter M, Percival-Smith A, Muller M, Leupin W, Gehring WJ: DNA binding properties of the purified Antennapedia homeodomain. Proc Natl Acad Sci U S A 1990, 87: 4093–4097. 10.1073/pnas.87.11.4093PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.