An in silico study of the molecular basis of B-RAF activation and conformational stability

Background B-RAF kinase plays an important role both in tumour induction and maintenance in several cancers and it is an attractive new drug target. However, the structural basis of the B-RAF activation is still not well understood. Results In this study we suggest a novel molecular basis of B-RAF activation based on molecular dynamics (MD) simulations of B-RAFWT and the B-RAFV600E, B-RAFK601E and B-RAFD594V mutants. A strong hydrogen bond network was identified in B-RAFWT in which the interactions between Lys601 and the well known catalytic residues Lys483, Glu501 and Asp594 play an important role. It was found that several mutations, which directly or indirectly destabilized the interactions between these residues within this network, contributed to the changes in B-RAF activity. Conclusion Our results showed that the above mechanisms lead to the disruption of the electrostatic interactions between the A-loop and the αC-helix in the activating mutants, which presumably contribute to the flipping of the activation segment to an active form. Conversely, in the B-RAFD594V mutant that has impaired kinase activity, and in B-RAFWT these interactions were strong and stabilized the kinase inactive form.


Background
Watson and Crick proposed the double-helical structure for DNA in 1953, and almost simultaneously two forms were postulated on the basis of fibre diffraction analysis -the Bform DNA corresponded to the Watson and Crick structure and was found to occur in conditions of high humidity and low salt concentration, while the A-form occurred in conditions of lower humidity and higher salt concentration. Gross structural features for these as well as other polymorphic forms of DNA were refined during the next 25 years, using fibre diffraction data [1,2]. The two forms of DNA were mainly characterised in terms of features such as sugar pucker [3][4][5], glycosyl torsion angle [6], base pair orientation and the groove widths [7], apart from the helical parameters rise and twist. However, it was only in the 1980s that the atomic details of the two forms were characterised. The first crystal structure of a B-form DNA was solved in 1981 [8], and was found to have significant sequence-dependant variability, with an average roll per dinucleotide step of 0.5 ± 5.2°, an average local helical twist of 35.6 ± 4.4° and an average slide of 0.2 ± 0.5 Å. Subsequent analyses of other crystal structures confirmed the sequence dependent effects observed here [9][10][11][12]. A-DNA, which was first crystallised by McCall [13], was found to have an average roll of 6.8 ± 2.6°, an average local helical twist of 30.8 ± 1.2° and an average slide of -1.5 ± 0.4 Å [10,12,14]. However, since the overall features of the two crystal structures were close to the fibre models of B and A forms, it was assumed that the two forms correspond to two stable minima that the DNA could assume and transition from one form to another would involve some energetic costs. As crystallographic methods improved and the number and variety of x-ray crystal structures of DNA increased, this idea began to lose ground. While most oligonucleotide structures solved during 1980-2000 had roll and twist values that ranged from exclusively A-like to exclusively B-like, a few appeared to show features intermediate between A-DNA and B-DNA, to a mixture of both types. [15][16][17][18]. Thus it appeared that A and B-form DNA were not well separated stable minima, and the dinucleotide steps in oligomeric DNA could assume conformations that ranged from B-like to intermediate to A-like [2]. In addition, several other forms of synthetic DNA were also solved, which did not fit the canonical A-like or B-like conformation [2]. Against this wide ranging polymorphism of the double-helical DNA molecule, particularly at the dinucleotide step level, the RNA duplex crystal structures, that were solved around the same time [19][20][21][22][23][24], stood out for their rigidity, and their conformational proximity to the A-RNA fibre model, independent of the sequence. In this study, we have analysed a large dataset of free RNA oligomers to verify the conformational rigidity of Watson-Crick basepaired RNA duplexes and then used it as a template against which to measure the A-like characteristics of each dinucleotide step as well as overall structure of both free and protein-bound DNA.
Several studies in the late 1990s also suggested that not only the classical B-form of DNA but also the A-form had biological relevance. Based on a comparison of free B-DNA oligomers and protein-bound DNA, it was suggested [25,26] that protein binding causes DNA to assume A-like or an A-B intermediate conformation in terms of roll and twist. Subsequently it was shown that a new parameter, Z p , could be used to discriminate between A-like or B-like dinucleotide steps more reliably than roll or twist, and that entire structures could be classified as A-like or B-like in terms of their Z p values, irrespective of the local variations in their roll and twist values [14,27]. Lu et al [14] highlighted the fact that in DNA structures bound to a few prominent protein families, the protein-bound region was induced to take up an A-like conformation as defined by Z p . However, the above mentioned studies [25,26], that compared free and bound DNA, considered the overall B-form of the free oligomers as a reference, and not the inherent 'A-philicity' [28][29][30][31][32] of dinucleotide steps in the bound region. Given that atleast in a few cases, the putative binding region is known to assume an A-like conformation in its free form [33,34], inclusion of A-DNA oligomer structures also in the analysis might provide better insights into the intrinsic preferences of a DNA sequence and help distinguish these from protein induced structural effects. Only one study compares the free and protein-bound forms of DNA, taking into consideration the A-form of DNA [35]. Several other studies have implicated the variations in roll, especially at pyrimidinepurine steps, to be responsible for DNA bending and curvature [25,[36][37][38][39][40][41], in ways critical for the binding of the protein.
While the DNA dinucleotide steps were under scrutiny for their role in specifically binding to a protein, the DNA backbone was also shown to be involved in more than 50% of all the contacts between amino acids and the DNA in regulatory protein-DNA complexes [42]. Hence several studies have also focused on how the variations in the DNA backbone might act as an indirect readout signal for protein recognition and binding [43][44][45][46][47][48][49]. In DNA oligomers, the sugar phosphate backbone was believed to be rigid, compared to the variation in local step geometry, defined by two neighbouring basepairs. The sugar ring assumed C 3 '-endo conformation in A-DNA [3] and C 2 'endo in B-DNA [9]. The related backbone torsion angle δ was found to assume values of about 84° for A-DNA [3] and about 128° for B-DNA [4,5]. The torsion angles ε and ζ were observed to assume two conformations-BI and BII in B-DNA [9] but only the BI conformation in A-DNA [9]. α and γ were found to show anticorrelated variation in Aform duplexes [3,24,50,51], but were generally found to take up the g -, g + conformation in B-DNA [9]. However, recent studies have shown that unlike oligomers, the backbone in a significant proportion of nucleotides in bound DNA assumes non-classical conformations [52]. There have also been attempts to analyse the backbone torsion angles, taking into account the correlation between more than two torsion angles and group them into seven distinct states [53,54]. In this study, we have adapted this methodology [53] and analysed the variation in backbone parameters with respect to variation in dinucleotide step parameters across different datasets.
A crucial question of biological relevance is how the variations in DNA structure at the basepair, base-step and backbone level contribute to the overall structure of the molecule, and its implications for protein binding. A related question is how changes caused by protein binding at the local structural level affect the overall DNA structure. There have been efforts to go beyond dinucleotide steps and analyse the properties of all possible tetranucleotide, hexanucleotide and octanucleotide fragments using molecular dynamics simulation studies [53,55,56]. However, most of the high resolution DNA double helical crystal structures, especially those of free DNA, are too short in length, to allow a meaningful statistical analysis of all possible trinucleotide or higher order steps. The other approach is to try and quantify the overall DNA structure, in terms of parameters such as DNA curvature, bendability or stability. The importance of DNA curvature was first realized when it was observed that even unbound genomic DNA could have a well-defined, inherent curvature [57,58]. Since most of the curved DNA observed in the early days were observed to have stretches of adenines, the initial models of DNA curvature, such as the 'wedge model' [59] and the 'junction model' [60], traced the origin of curvature to the presence of A-tracts, in phase with the DNA helical repeat. However, these models had to be abandoned when it was shown that sequences lacking in AA dinucleotides also adopted a curved structure [61]. Thus new models which took into account variation in the geometries of all ten dinucleotide steps were proposed [62][63][64]. However, owing to difficulties in tracing a uniform path for the DNA axis in three dimensions, there is no standard methodology for calculating DNA curvature, despite its obvious importance in biological functions. Various measures of quantifying DNA curvature such as the radius of a circle fitted to the basepair centres projected onto a plane [63,65], the ratio of the end-to-end distance of the DNA molecule to the actual path traced by the DNA axis [63,[65][66][67][68][69], ratio of the moments of inertia of an ellipsoid fitted to the molecule [63,[65][66][67][68][69][70][71] as well as the angles between two local helix axes vectors corresponding to two successive dinucleotide steps have been proposed and implemented [64,69,72]. However, each of these methods has its advantages and limitations, and no single method can unambiguously quantify all possible curved conformations adopted by DNA molecules. Hence a combination of all or several of these methods along with a close inspection of the local level distortions is required to fully understand the curvature of any given structure.
In this study, we have analysed an exhaustive dataset of protein-DNA complexes, and compared it with a complete, high resolution dataset of free DNA oligomers, without pre-classifying them as A-DNA or B-DNA. We have also separately analysed a dataset of DNA bound by proteins via a Helix-Turn-Helix (HTH) motif. The HTH motif is not only the most well-characterised, but also the most commonly occurring DNA-binding motif, and is present in a wide range of transcription factors. The HTH motif consists of two alpha helices linked by a turn region that protrudes out of the surface of the protein [73][74][75]. The second helix, usually referred to as the 'recognition helix', fits into the major groove of the DNA, and is involved in direct or indirect interactions with the DNA [74][75][76]. While the HTH motif has been studied extensively, the structural features of the DNA to which it binds have not been analysed in detail. The present analysis provides some interesting insights into the conformational flexibility of the DNA molecule, and reveals that many of the conformations observed in bound DNA, both at the local dinucleotide step level, and the gross structural level, are also accessible to unbound DNA, while a few conformations are solely induced by protein binding.

Results
The structural parameters of three datasets of DNA -free oligomers, protein-bound DNA (excluding HTH motifbound DNA) and HTH motif-bound DNA, and one dataset of RNA oligomers were analysed in order to gain a complete perspective of the features of DNA both within each set and also across the sets. As RNA is known to assume only A-like conformation, the RNA dataset was used as a reference point for A-like conformation and also to characterise the basepair effects from those due to the ribose sugar ring in RNA. The RNA dataset was observed to be rich in steps containing the G:C basepair, and had remarkably low percentage of steps containing only the A:U basepair (table 1). The free DNA dataset consists of a large proportion of the steps GG (23.9%) and CG (18.0%) (table 1). A significant number of these steps were found to occur in structures which were classified by the Nucleic Acid Database [77] as "A-DNA". A large number of these steps are indeed found to have high Z p values in the present analysis, matching our criteria for an A-like dinucleotide step, as defined in the next section. The free dataset also contains 5 structures with the Drew-Dickerson sequence d(CGCGAATTCGCG). These and other A-tract containing sequences primarily contribute to the high occurrence of AA steps (11.1%) in the free dataset. The HTH dataset consists of DNA bound by a wide variety of proteins ranging across 22 SCOP [78] classes, and includes 3 ternary TATA Binding Protein-Transcription Factor IIB-TATA-box (hereafter referred to as TBP-TFIIB-TA-DNA) complexes and 6 Catabolite Activator Protein-DNA (hereafter referred to as CAP-DNA) complexes (additional file 1). In the TBP-TFIIB-TA-box DNA ternary complexes, the HTH motif is present in the transcription factor TFIIB, which binds to the DNA immediately upstream of the TATA-box region. The complex dataset contains 8 TATA Binding Protein-DNA (hereafter referred to as TBP-DNA) complexes, which lack TFIIB, and hence have been excluded from the HTH dataset (additional file 1). Interestingly, the protein-bound datasets also have a significant proportion of CA/TG steps, which have been implicated in the kinks observed in several structures [79].

Variations of the dinucleotide step parameters
Among the six dinucleotide step parameters that measure the relative rotational and translational motions between adjacent basepairs about the x, y and z-axis (see 'Methods'), tilt, shift and rise were observed to have very little variation within and across the three DNA datasets, and so have not been reported here. On the other hand, in conformity with earlier studies [25][26][27], the parameters roll, twist and slide, as well as the parameter Z p (described in the 'Methods' section), were found to be excellent indicators for analysing the sequence dependent conformational flexibility of a DNA molecule. To highlight the characteristic features of each dinucleotide step in free as well as bound DNA, the dinucleotide step parameters Z p and slide are listed in tables 2 and 3, while figure 1 shows the variation of Z p versus slide. The corresponding values for roll and twist are listed in tables 4 and 5, while the variation of Z p versus roll is shown in figure 2.
The RNA oligomer dataset assumes mean values of high positive Zp (2.2 ± 0.3 Å), negative slide (-1.5 ± 0.4 Å), high roll (9.0 ± 4.0°) and low twist (31.1 ± 3.4°), all close to the values observed for the fibre models of A-form DNA helices [80]. The low values of the standard deviations for all four parameters for individual steps, as well as for the entire dataset, confirms the conformational rigidity of the RNA structures. The sugar-phosphate backbone torsion angles χ and δ and the phase angle P were also observed to assume A-DNA fibre model-like conformation, for the entire dataset. Even steps that have previously been reported to be A-phobic in DNA are observed to be entirely A-like in RNA. This confirms the observation that the presence of even a single ribose sugar causes the entire structure to assume A-like conformation [24], while the presence of uracil in place of thymine also facilitates the A-form structure, particularly for AA/UU, GA/UC and AG/ CU type of steps. Thus, the RNA dataset, with its welldefined and rigid boundaries stands in sharp contrast to the free and protein-bound DNA datasets and its parameters can be used as a criteria to define A-like conformation in DNA.
For the free DNA dataset, as seen in figure 1, a distinct bimodal distribution is observed for Z p and slide. The two distinct clusters for this dataset arise primarily because Z p assumes two distinctly different values with a clear separation between them. Using the RNA dataset as a template, we assigned as A-like, those DNA steps that lie within three standard deviations of the mean Z p value for the RNA dataset viz. Z p > 1.3 Å. The boundary for B-like con-Z p versus Slide for the RNA, free DNA oligomers, protein-bound DNA (not containing the HTH motif) and HTH-bound DNA datasets   [14]. For all the four datasets, slide was observed to correlate well with Z p for the overall data (figure 1), as well as for individual dinucleotide sequences. In contrast, roll does not show a significant correlation with Z p (figure 2), nor do the roll and twist parameters (additional file 2, figure 1) show any bimodal character.

A-philicity of dinucleotides in DNA structures
Efforts have been made several decades back to characterise individual dinucleotide steps as being A-phobic (or Bphilic) (AA, CA and GA) or A-philic (GG, AG and AC) [28][29][30][31] on the basis of their ability to induce a B to A transition in solution. A more recent study on a larger dataset [32] reclassified the GA step as neutral and the AG step as  Thus it appears that in free DNA oligomers, the overall structure assumes A-like or B-like conformation depending on its sequence, particularly the proportion of AA/TT and GG/CC steps. Only AA, and to a lesser extent GA steps show strong preference for B-form, while GG is truly Aphilic. All the other dinucleotide steps do not appear to have a strong intrinsic preference for A-like or B-like conformation, but assume a particular conformation depending on the conformation of neighbouring steps, as suggested by recent solution studies [32].
In the protein-bound DNA datasets, most of the structures were found to exclusively have B-like values for Z p , if the above-mentioned criteria for A-like and B-like DNA is used. Unlike the free dataset, no structure from the complex or HTH dataset was observed to have entirely A-like conformation. Even for an A-philic step such as GG, for which 92.8% of the steps in the free dataset take up an A-like geometry, about 90.0% of the datapoints in the complex dataset and about 71.6% of the datapoints in the HTH Fibre-Model 2.5 -0.6 †TEH -TBP, Endonuclease and Hyperthermophile chromosomal protein SAC7D containing structures (excluded) ‡TC -TBP and CAP containing structures (excluded) The other specifications are as detailed in the caption to table 1.
dataset were observed to have B-like values of Z p , with only 5.0% and 12.7% of datapoints respectively, showing an Alike value for Z p . Only a few steps in the DNA-binding region of some structures were observed to have A-like or near A-like characteristics. These complexes belong to a few specific families, such as the polymerases, endonucleases and transposases, and the structural features of these duplexes have been described in the 'Discussion' section.

Roll and twist are not good discriminators of A-form versus B-form
Roll and twist span a very wide range of values for the three DNA datasets, as evident from their values listed in tables 4 and 5. Unlike Z p and slide, there is no clear bimodal distribution for roll and twist for the free dataset, with the values varying in a continuous negatively correlated fashion, from high negative roll and very large twist to positive roll and low twist (additional file 2, figure 1). In the free DNA dataset, steps which have been classified as A-like or B-like based on their Z p values, have been listed separately in tables 4 and 5.
As mentioned above, CA steps show three types of conformations-one in which Z p , slide, roll and twist have typical A-like values and two different conformations, wherein Z p is B-like. CA steps with B-like Z p values are observed to assume either normal slide and twist with positive roll or high positive slide, large twist and negative roll. This bimodal distribution of the B-like CA steps has been observed in several previous studies [10,65,81,82]. These steps also show a correlated variation in the backbone torsion angles, ε and ζ in both strands, with the low twist and positive roll steps having ε and ζ in the t, g -(or BI) conformation, while the large twist and negative roll steps have ε and ζ in the g -, t (or BII) conformation [9,83]. When a CA step in the BII conformation occurs adjacent to an AG step such that it forms a CAG triplet, the AG step is often observed to have a high roll and a very low twist. This feature is observed in several DNA structures irrespective of whether the steps have bound ions [84,85] or are present free [16]. These CA steps and the adjacent AG steps do not show any correlated variation in Z p and slide, which have

RNA
Free Fibre-Model -1.5 0.6 †TEH -TBP, Endonuclease and Hyperthermophile chromosomal protein SAC7D containing structures (excluded) ‡TC -TBP and CAP containing structures (excluded) The other specifications are as detailed in the caption to table 1.
B-like values, with these CA steps being characterised by large positive slide values. The AG steps occurring adjacent to other steps do not assume this conformation. The high roll and low twist values of these AG steps, which are B-like in terms of Z p , skew the averages for roll (3.5 ± 4.7°) and twist (32.5 ± 7.0°) to A-like values.
In addition to the CA step, 8 of the 13 'B-like' GC steps are also observed to assume the BII conformation for one or both of the guanine backbone torsion angles, and have a corresponding negative value of roll and a large value of twist. As a result, 'B-like' GC steps have a negative average value for roll (-3.7 ± 5.7°) and a large average value for twist (38.2 ± 2.8°). AA steps, which are exclusively B-like in terms of Z p , have mean roll and twist values of 0.2 ± 4.0 and 36.1 ± 3.9 respectively, indicating that these steps are B-like in terms of roll and twist also. Among the other dinucleotide steps, the GG and CG steps have A-like mean values for both roll and twist, irrespective of whether their Z p value is A-like or B-like. For the remaining steps, mean values for roll and twist follow the trend set by Z p . However, the large values of standard deviations for all the steps, including B-philic steps such as AA and GA, and an A-philic step such as GG, indicate that a significant number of steps have intermediate conformation in terms of roll and twist. This is also illustrated by the Z p versus roll plot in figure 2, which does not show any clear demarcation between the A and B like steps.
The large, continuous variation in roll and twist has been observed earlier [35] and is also evident in the twist versus roll plot for the bound-DNA datasets (additional file 2, figure 1), where a large number of the mean roll and twist values are intermediate between those assumed by the A and B-DNA fibre models (tables 4, 5). The higher standard deviations for all the parameters in most of the steps in the bound datasets, when compared to the free DNA dataset, prompted us to individually examine the structures that are responsible for the high standard deviations. For the complex dataset, nearly all the datapoints with more than 3σ deviation from the mean roll or twist values of the free B-like DNA oligomer dataset were found to occur in structures belonging to three families-the TBPbound DNA, the endonuclease-bound DNA and the hyperthermophile SAC7D protein-bound DNA. DNA bound to the integration host factor also undergoes significant distortions in roll and twist. For the HTH dataset, nearly all the datapoints with more than 3σ deviation from the mean roll or twist values of the free B-like DNA oligomer dataset are contributed by the TATA-box-TFIIB and CAP-bound DNA structures. On excluding these structures, the mean values are much closer to B-DNA fibre model values, with low standard deviations, and comparable to those obtained for B-like steps in the free dataset. Significantly, the exclusion of the above mentioned structural families made no significant difference in the mean values of Z p and slide for any of the steps (tables 2 and 3), indicating that the B-like DNA structure can accommodate large variations in roll and twist parameters, with no corresponding change in Z p and slide. This is further corroborated by the low correlation between either roll or twist with Z p or slide, for all the steps across the three DNA datasets. The low correlation between Z p and roll is clearly evident in figure 2.
Interestingly, several CAG triplets in the nucleosome structures [86] show the same unusual combination of parameters observed for the CAG triplets in some oligomers, with the CA step in BII conformation while the AG step has high roll and low twist values so that the overall roll and twist values for the two steps are similar to that in canonical B-DNA.

Free DNA oligomers can be classified as A-DNA or B-DNA in terms of Z p
At the overall structural level, most of the DNA duplexes in the free dataset can be entirely classified as A-like or Blike in terms of Z p , with the exception of 5 structures, 196D, 1P4Z, 1ZFA, 399D and 441D, wherein one or two of the steps show Z p values which differ significantly from that seen for the overall structure. Even the crystal structure of the G:C rich sequence d(CATGGGCCCATG) (1DC0), reported as an A ↔ B intermediate [87], assumes A-like Z p and slide values for all the steps, and hence can be described as A-like, though the roll and twist values show considerable variation.
The global x-displacement, helical rise, inclination and helical twist as well as the major and minor groove widths, described in the 'Methods' section, are also considered to be indicators of the overall A-like or B-like nature of a DNA structure, and we compared the average values of these parameters across the datasets (additional file 2, table 1). Since entire structures in the free dataset could be assigned as A-DNA or B-DNA on the basis of Z p , the averages of the global x-displacement, helical rise, inclination and helical twist for all the non-terminal basepairs within all the A-DNA structures were classified as 'A-DNA' values for the respective parameters. Similar procedure was adopted for the basepair orientation parameters within all the B-DNA structures to obtain 'B-DNA' values. As expected, the RNA dataset assumes A-like values for all the parameters, while the values for the A-like and B-like free DNA datasets being very close to their corresponding fibre model values reaffirms that the overall free DNA oligomer structures can be classified as A-like or B-like.
For both the bound DNA datasets, while the global helical rise is observed to be strongly B-DNA like, with very little variation, the global x-displacement, inclination and helical twist take up values between those for the 'A-DNA' and 'B-DNA' datasets, but closer to B-DNA. The groove width values for the bound DNA datasets for both the major and minor grooves are 'B-DNA' like. The rather large values of standard deviation for inclination and helical twist in case of the free 'B-DNA' dataset implies that B-DNA, in its free form, might be able to access the conformations observed in bound DNA.

Variations of the DNA backbone
The backbone torsion angle δ, defined by C 5 '-C 4 '-C 3 '-O 3 ', the pseudorotation phase angle P [1], which characterises the sugar ring pucker, and the glycosidic torsion angle χ, defined by O 4 '-C 1 '-N 1 -C 2 in pyrimidines and O 4 '-C 1 '-N 9 -C 4 in purines, have the most characteristically distinct values in A and B-DNA [4,5]. Figure 3 shows the variation of Z p with respect to the angle P. The two torsion angles χ (additional file 2, figure 2) and δ (additional file 2, figure   3) show similar behaviour. Note that each dinucleotide step described by a single Z p value encompasses four values of sugar pucker and glycosidic torsions, corresponding to the 4 bases constituting a dinucleotide step. As expected, the entire RNA dataset shows A-like conformation. Free DNA shows two clusters that correspond to Alike and B-like regions described by previous studies [4,5].
An inspection of the four values of χ, δ and P that constitute each step in the free DNA dataset reveals that for a step with A-like Z p , all four values for all three angles were A-like and for a B-like step, all four values were B-like. A few exceptions were also observed in a few structures, where in a single step with A-like Z p value, one of the four P angles was observed to be B-like (1ZEY, 1ZF6, 1ZF87, 1ZFA) and vice versa (1EHV, 1DUO, 1ENN, 1IKK, 1SK5, 1ZFA, 307D, 423D, 463D, 477D, 7BNA) (see additional file 1 for detailed references corresponding to all the PDB id's). The B-like nature of backbone parameters also holds true with respect to B-like steps in the bound datasets, where all four values for all the three angles are usually Blike. Exceptions occur in the structures that displayed unusual behaviour in the local step parameters, and these have been described in detail in the relevant section.
We have also analysed the conformationally flexible torsion angles α, γ, ε-ζ, using a modified version of the algorithm of Dixit et al [53] such that it applied to torsion angles across a step. Table 6 and figure 4 show the distribution of the seven states described by this algorithm, across all dinucleotide steps. The RNA dataset displays classical behaviour, with an overwhelming majority of the steps assuming canonical values for α, γ, ε-ζ, viz. g -, g + , BI (state 1). For the three DNA datasets, there is much greater conformational flexibility, with α, γ, ε-ζ = g -, g + , BII (state 7), being the predominant non-canonical conformation. However, there is a significantly lower occurrence of the state 7 conformation in the bound datasets. Protein binding seems to induce a few B-like steps to assume the α, γ = t, t (state 3 or state 5) conformation, that is not preferred by free B-form DNA.
The most noteworthy difference between the free and the bound datasets was observed in the case of state 6, where the allowed ranges for α and γ occur between 0-220° and 240-270° respectively [53]. However, all the datapoints belonging to state 6, in our four datasets, occupy a much narrower range, close to the α, γ = g + , gconformation, that has been reported to be energetically unfavourable [52]. While its presence is negligible in the free dataset, a substantial proportion of the steps in the HTH dataset assume this conformation (table 6). A significant proportion (22.4%) of the steps in the HTH dataset that assumed this conformation were observed to be AT, with one or both of the thymine bases in these steps taking up the unusual α and γ values. Although no restriction was placed on the εζ value for state 6, almost all the datapoints for this state were observed to have the BI conformation. The steps with state 6 conformation occurred with equal frequency in the bound as well as the unbound regions of the DNA, and were not observed to assume unusual values for any other structural parameter.

Variations at the trinucleotide level
The absence of large protein-induced DNA distortion is also apparent when one examines the successive bending angles ( figure 5). The successive bending angle is directly proportional to the difference in successive roll values, and can be considered to be a measure of the local bending at the trinucleotide level. The RNA dataset generally shows small successive bending angle values (with 96.9% of the values < 20°), as would be expected from a dataset with nearly uniform roll values. Of the protein-bound datasets, the HTH dataset shows surprising results. 55.7% of the triplets in the HTH dataset have bending angles between 0-10° when compared to 48.5% in the free dataset, indicating that a majority of the HTH-bound triplets tend to be less distorted than even the free triplets. 46.9% of the triplets in the complex dataset occur in this range. The trend is reversed for the range between 20-30°, which could be considered to indicate moderately 'distorted' triplets, with 14.8% of the free triplets occurring in this range when compared to only 6.8% of the HTH triplets and 10.6% of the triplets in the complex dataset. However, as noted before, binding by proteins belonging to a few specific families appears to cause large distortions in roll and twist values of a few dinucleotide steps in both Z p versus sugar pseudorotation phase angle P for the four datasets the bound datasets. For example, bending angles for the steps that are distorted by TBP and CAP, in both the protein-bound datasets, range from 50° to 80°. An inspection of the stretches of DNA in the regions with high bending angles in free DNA oligomers revealed that dinucleotides with very high magnitude of roll and very large or very small twist are almost completely absent, yet a series of successive near normal roll and twist values frequently give rise to reasonably high bending angles at the triplet level. Protein binding, and especially HTH binding, does not appear to distort the DNA anymore than when it is in the free state, except in the case of a few special families.
The correlation between the backbone torsion angles γ vs α for all four datasets is shown in different colours, indicating the seven states as defined in [53] Figure 4 The correlation between the backbone torsion angles γ vs α for all four datasets is shown in different colours, indicating the seven states as defined in [53].

Protein-induced distortions in DNA structure
Dinucleotide step level Most of the significant distortions in the protein-bound DNA datasets were observed in terms of unusual roll and twist values, which occur in DNA bound to a small group of protein families. Most of these protein-DNA complexes have been exhaustively studied because of their biological relevance, leading to the perception that protein-bound DNA structure very often differs significantly from free DNA structure. The protein-bound DNA structures that are observed to be distorted can be classified into three classes: the first where the DNA structure is distorted but the distortions do not lead to strand break or strand separation, the second where the distortion leads to a nick in the DNA backbone, and third where the distortion leads to strand separation. The first class consists of DNA bound by proteins belonging to the hyperthermophile and integration host factor families in the complex dataset, and to the CAP and lac repressor families in the HTH dataset. The second class consists of DNA bound to proteins belonging to the endonuclease family in the complex dataset, and to the transposase and recombinase families in the HTH dataset. The third class consists of DNA bound to polymerases and TBPs, and occur in both the bound DNA datasets. In the following two paragraphs, the structural features of these distorted DNA are described briefly.
We have classified a step as distorted if its roll or twist value deviates by more than 3σ from the mean roll and twist values of the free B-like DNA oligomer dataset. Additional file 3 gives the base-step parameters and Z p values for the distorted steps in the DNA structures bound to different protein families. It is clear that there are a wide variety of distortions in DNA structure, depending on specific family, to which the bound protein belongs. Most of the kinks lead to significant bending of the overall structure.
In the CAP-DNA complexes (additional file 4, figure 1a) and the integration host factor-DNA complex (additional file 4, figure 1b), there is a nick in the DNA backbone that seems essential for the crystallization to succeed. It is quite likely that the presence of the nick facilitates the curvature of the DNA duplex, especially since other complexes (an ARAC family transcriptional activator-DNA complex (1BL0) and a CENP-B protein-DNA complex (1HLV)) with similar modes of protein-binding as that of CAPbound DNA, display lesser degree of bending, as estimated by end-to-end bending angle as well as d/l local values (table 7). It might also be speculated that the presence of the nick allows the protein to distort the DNA at the local level to a greater extent, thus causing a few steps to assume unusually distorted roll and twist values, though, there is no direct evidence that this occurs.

Gross structural level
Since the free DNA molecules are relatively short in length, it is difficult to ascertain whether the distortions observed at the local level add up to give a smooth global curvature. However both the protein-bound datasets contain several structures of greater length, hence we analysed the overall curvature of DNA structures from the complex and HTH datasets that consist of atleast 20 contiguous basepairs. We used the measures d/l local [63,[65][66][67][68], the RMSD from circle fit and the ratio of the RMSD from circle fit to the RMSD from line fit, to characterise DNA curva-  (1CGP, 1J59, 1RUN), an ARAC family transcriptional activator-DNA complex (1BL0) and a CENP-B protein-DNA complex (1HLV), all consist of a dimeric protein that binds to two successive major grooves of the DNA, approximately one helix turn apart and the DNA is essentially curved due to two major in-phase kinks (additional file 4, figure 1a). All these duplexes have a negative out-of-plane bending angle.
Among the 21 DNA duplexes that have been classified as unassigned, more complex types of protein-binding is observed, indicating that there are several different modes of curvature for a DNA bound to the HTH motif. In some of these structures (CRE recombinase protein (4CRX), γδ resolvase-DNA complex (1GDT)), bending appears to be a result of large kinks at one or two steps in the duplex, as  The calculation of successive bending angles, end-to-end bending angle, d/l local , Radius of Curvature (ROC), RMSD for circle fit (Cfit) and line fit (Lfit) and torsion angle for out-of-plane component of bending have been described in the 'Methods' section. 'MAX.' denotes the position and value of the maximum successive bending angle within the particular structure. The criteria used to assign DNA molecule geometry as curved (C), linear (L) or unassigned (U) have also been described in the 'Methods' section. The radius of curvature (ROC) and out-of-plane component of bending are reported only when a DNA molecule's geometry is assigned as curved.   figure 6, while the cartoon diagrams of few DNA-protein crystal structure complexes are shown in additional file 4, figure 1. They clearly illustrate the different extent of curvature (or lack of it), adopted by proteinbound DNA molecules.
No correlation was observed between the curvature and occurrence of the various backbone geometries (even the energetically unfavourable state 6 conformation) in these structures.

Discussion
From our analysis, it is clear that for the individual dinucleotide steps in the free oligomer dataset, the dinucleotide step parameters Z p and to a lesser extent, slide, as well as the pseudorotation phase angle P for the sugar ring, the backbone torsion angle χ and the glycosidic torsion angle δ for the individual bases in a step are better indicators of A-like or B-like conformation than the traditionally used parameters of roll and twist, confirming the findings by earlier studies [14,27]. A few of the dinucleotide steps seem to have a distinct preference for a particular conformation-AA and GA steps are strongly B-philic, while only the GG step is strongly A-philic.
In an earlier study to characterize how the DNA sequence defines conformation, Hays et al [16] have reported crystal structures of all the permutations of the inverted repeat d(CCnnnN 6 N 78 GG) under well-defined crystallographic conditions, which take up A-form, B-form and Holliday junction structures. Several of the structures reported in that work fit our selection criteria and are also part of the free dataset in the present study. The authors observed that the set of A-DNA crystal structures reported in their study are conformationally more uniform than the B-DNA structures. This also seems to be the case for the larger and sequentially more heterogeneous dataset analysed in this study vis-à-vis Z p , slide, roll and twist, not only for the entire free dataset, but also for the individual basepair dinucleotide steps. With the exception of the slide parameter for AT step, and the roll parameter for GG step, the standard deviation is always higher for the B-like free DNA steps as against the A-like steps, for all the four parameters. However, for a step like CA, one must consider the fact that it is in fact trimodal, with the B-like steps further subdivided into BI and BII conformations, making it difficult to compare the variation between A-like and Blike conformations. It must also be noted that with the exception of slide in case of A-like steps, the standard deviations obtained in our study are lower compared to those observed by Hays et al [16] for the slide, roll and twist parameters examined in both the studies. Hays et al [16] observed that the trinucleotide motifs GGN, NGG and CC(C/G) favour a transition to A-DNA conformation. Our analysis supports this conclusion, with most of the GC rich structures taking up an A-DNA conformation. The only exception to this rule was observed to be 1ZFB, for the sequence d(CCGCCGGCGG). However, an earlier structure (382D [88]) of the same sequence (not included in this study owing to resolution cutoff criteria), was observed to have an A-like conformation. Hence A-DNA definitely seems to be favoured by GC rich DNA, especially those with oligo-G tracts. Since the GG step is observed to be the most A-philic, this is to be expected. However a GC rich sequence which does not have an oligo-G tract does not necessarily favour an A-DNA conformation, since GC and CG steps seem to assume A-like and B-like conformation with nearly equal ease. In addition, with the exception of AC, which seems to equally favour the A and B-form, and CA, which is trimodal, all other steps where one or both basepairs are A:T seem Bphilic in the free dataset.
For B-DNA structures, Gorin et al [11] have correlated the extent of B-DNA twisting with the basepair morphology and clash between the exo-cyclic groups in the four bases.
The average values for slide, roll and twist, obtained by Gorin et al [11] for the dinucleotide steps in their dataset, comprising of B-DNA structures with a resolution cutoff of 3.0Å, are quite similar to the 'B-like' average values for different dinucleotide steps in the present study. (tables 3, 4, 5). The overall average values in the two studies are also observed to be similar. However, interestingly the low twist CA (BI) and high twist TA steps in Gorin et al dataset converge to nearly similar values in our high resolution dataset (36.5° and 34.7° respectively). This positions the CA step in a favoured conformation with minimal clash as predicted by the clash strength function designed by these authors, while the TA step is positioned in a less favourable conformation.

How different is bound DNA from free B-DNA?
In case of bound DNA, the DNA duplexes are almost entirely B-like in conformation in terms of Z p and slide, while roll and twist predominantly show variation that is similar to that of free DNA. The average values for slide, roll and twist, obtained in an earlier study by Olson et al [25] for different dinucleotide steps in a dataset of pro-tein-DNA crystal structure complexes, are comparable to those obtained in this study for the complex dataset (excl. TEH, tables 3, 4, 5) and the HTH dataset (excl. TC, tables 3, 4, 5). This is expected, since these authors also considered only the step parameter values within 3σ deviation of the mean, for their 'B-like' protein-DNA dataset, essentially excluding the distorted steps. The overall average values for slide, roll and twist reported for the 'B-like' protein-DNA dataset are also observed to be similar to those obtained for the complex (excl. TEH) dataset and the HTH (excl. TC) dataset in this study.
Only a few DNA structures, bound to proteins belonging to a small group of families have highly unusual structural parameters, principally roll and twist. Apart from those structures, other structures in both the protein-bound datasets principally take up free B-DNA like values for all the dinucleotide step parameters. The nucleosome structure [86], considered a classic case of a highly curved structure, does not have highly unusual Z p , slide, roll or twist values, with only 5 out of 146 roll values lying just outside the 3σ deviation range of the B-DNA like oligomer dataset. The lack of sharp kinks gives the nucleosome structure The 3-dimensional path traced by basepair centers of the DNA helix, in some protein-DNA crystal structure complexes Figure 6 The 3-dimensional path traced by basepair centers of the DNA helix, in some protein-DNA crystal structure complexes. The basepair centres of the DNA molecules are indicated by hollow circles in case of 'curved' geometry, by hollow squares in the case of 'linear' geometry and by stars in case of 'unassigned' geometry. The criteria for assigning geometry has been described in the 'Methods' section. The PDB id's correspond to the following biological molecules: 1KX5 -Nucleosome core particle, a smooth curvature, with very small out-of-plane component for ~30 basepair fragments. For any randomly selected 30 basepair fragment of this structure, the RMSD from a plane fit was always < 0.15 Å, as against a value of 0.65 Å for the highly distorted CAP-bound DNA structure 1J59. Even for a randomly selected fragment of 76 basepairs (which nearly completes a full circle), the RMSD from a plane fit was observed to be only 0.26 Å. Similarly, the out-of-plane torsion angle values, for several random fragments of ~30 basepairs, were always observed to be < 10°, another indicator of the smooth curvature and gentle, regular pitch of the superhelix. It is also interesting to note that the ROC calculated for the 76 basepair fragments in 1KX3 and 1KX5 are 39.8 Å and 39.4 Å respectively, while that for a 2.8 Å resolution structure (1AOI [89]) is calculated as 41.5 Å, indicating that the DNA in different nucleosome structures has small variations in curvature.

A-like steps are limited to DNA bound to proteins from a few specific families
Protein-bound DNA structures, apart from being perceived as distorted, have also been characterised as being predominantly A-like [90]. Our analysis clearly refutes this characterisation. The only protein-bound DNA structures having few steps with A-like values of Z p are those bound to some of the endonucleases, DNA polymerases, transposases and the homeodomains. Of the endonuclease bound-DNA structures, the IPpo I endonucleasebound DNA structures 1A73 and 1CYQ have two separate A-like half turns [14], leading to a non-linear geometry, as explained above. The PvuII endonuclease-bound DNA structure 3PVI has an entire A-like stretch with only few bases at one end having a B-like geometry in terms of Z p . Transposase-bound DNA (1TC3, 1U78) have a 4-7 basepair long G:C rich region at one terminal of the duplex that assumes A-DNA like conformation in terms of all the parameters. The other end of the DNA duplex is A:T rich, with a narrow minor groove [91] and curvature characteristic of DNA containing oligo A-tracts [58,59]. The transposase protein binds to these two regions-the G:C rich Alike region and the A:T rich region, via two HTH motifs that are connected by a long linker [91]. However, the fea-tures observed for the G:C rich region and the oligo A-tract in the transposase-bound DNA are similar to those observed for free G:C rich oligomers and free oligo Atracts, hence these features can be said to be intrinsic to the DNA sequence.
There are 18 homeodomain-bound DNA structures in the protein-bound datasets. Of these, only five steps (3 GG and 2 CG steps), occurring in four different structures, are observed to take up A-like values of Z p .
Another class of DNA, often cited as an example of protein-induced B ↔ A transition, are the Zn-finger-bound DNA structures. Nekludova et al [92] have shown that for a variety of protein-bound DNA molecules, including Znfinger-bound DNA, a distinctive conformation with an enlarged major groove when compared to B-DNA, was observed. In our study, all the Zn-finger-bound DNA structures (1A1G, 1A1H It is also interesting that while the 7 nucleotide long runtdomain binding site DNA sequences in the free form are reported to assume A-DNA like conformation (1XJX, 1XJY) [34] as well as near B-DNA like conformation [34,93], depending on the flanking bases, the same sequences bound to the runt-domain protein are found to assume B-like values of Z p , slide, the backbone torsion angles and the groove widths. X-displacement, inclination and helical twist take up intermediate, but closer to B-DNA values -a behaviour similar to that observed for the bound DNA datasets in this study. Thus, in this case, while the free DNA sequences assume both the A and B-forms, the protein-bound DNA take up the B-form.
It has been suggested that the TBP-bound DNA conformation is closer to an A-DNA and the inherent A-philicity of the TATA sequence might facilitate the transition to the near A-like bound-conformation [39]. Our analysis indi-cates that these assertions are not always valid. For example, the oligomer structure 1VJ4 [50] for the sequence d(GGTATACC), takes up an A-DNA conformation, but the free DNA structures 1D56 and 1D57 [81], for the decamer d(CGATATATCG), both take up a B-DNA conformation, despite encompassing the TATA stretch. The TATA stretch in the TBP-bound DNA structures also take up entirely B-like Z p values. Though some of the other parameters such as roll, twist and rise do not have classical B-DNA values, this is more indicative of a distortion from the B-form, but not necessarily to an A-like conformation. The B-like nature of the TBP-bound DNA in terms of Z p and slide is also observed for a couple of hexamer sequences, which occur in both the free DNA dataset and some of the TBP-bound DNA structures. While the sequence TTTAAA takes up B-like Z p and slide values in the free (1IKK [94], 1SK5) as well as the TBP-bound DNA (1D3U [95], 1QNA [96]), the hexamer stretch GGCGCC takes up an A-DNA conformation in the free DNA structure 414D [97] as expected, but is observed to take up Blike Z p and slide values in the TBP-TFIIB-DNA complex 1C9B [98]. It is also noteworthy that unlike the TBPbound DNA from the complex dataset, the backbone parameters P and δ as well as χ take up entirely B-like values for the TBP-TFIIB-bound DNA from the HTH dataset, consistent with our observation throughout this study that HTH-bound DNA tends to be more B-like than other protein-bound DNA molecules.
There have been studies of protein-DNA complexes, using backbone conformational parameters such as sugar pucker [99] or the χ and δ torsion angles [90] [99] find only 12% of the protein-interacting nucleotides with a C 3 '-endo sugar pucker conformation. On the other hand, Lejeune et al [90] conclude that "A-DNA is more frequently implicated in protein-DNA interactions than the classical B-DNA conformation". We do not find this claim to be valid, using any of the backbone parameters for 'A versus B' discrimination.

HTH-bound DNA, while remaining B-like, occassionally takes up an unfavourable backbone conformation
The only effect that can be unambiguously ascribed to protein binding in the predominantly B-DNA like protein-bound duplexes occurs in the DNA backbone. The DNA backbone in the free dataset is quite uniform, with the angles α and γ almost completely in the canonical g -, g + conformation in B-DNA, and (α, γ, ε-ζ) ranging from (g -, g + , BI) to (t, t, BI) conformation in A-DNA. On the other hand, backbone torsion angles in protein-bound DNA are observed to be considerably distorted. Steps that are B-like in terms of Z p and slide are observed to assume a wide variety of backbone conformations that are highly unusual, and in some cases, energetically unfavourable. In particular, HTH binding causes α and γ angles in DNA to assume the energetically unfavourable g + , gconformation in much higher proportion (11.7%) than in unbound DNA. As described in the 'Results' section, the steps taking up this energetically unfavourable conformation occur with equal frequency in the bound as well as the unbound regions of the DNA, and are not observed to assume unusual values for any other structural parameter. Overall, 57 out of the 97 HTH-motif bound DNA structures are observed to adopt this unfavourable backbone conformation for some of the steps. Of these, 24 structures have 5 or more occurrences of the unfavourable backbone conformation. Thus it is seen that there are a large number of structures with atleast a few steps in this conformation. These structures have been solved in a variety of space groups. The proteins binding to the 24 HTH-bound DNA structures with 5 or more occurrences of state 6 get classified into 15 different SCOP classes. Thus it appears that binding by the HTH motif allows the DNA backbone to assume this energetically unfavourable conformation, even when there is no direct contact between them. At the tri-nucleotide level, bound DNA, and especially HTH-bound DNA appears to have less distortion than free DNA. At the gross structural level, nearly half of the DNA structures of length ≥ 20 basepairs and bound to the HTH motif were observed to have moderate curvature. It was observed that in several of these cases, the DNA was bound by a dimer of 2 HTH motifs, with the two monomers binding to DNA at regions one helix turn apart and bending it in the same direction so that there was a net overall curvature. However there are other modes by which the DNA bound to the HTH motif was observed to be curved, such as in the case of the MAT alpha2-MCM1-DNA ternary complex 1MNM and the CRE-recombinase-DNA complex (4CRX). Yet other modes of curvature of protein-bound DNA are revealed in the complex dataset. Thus it is not possible to determine a uniform mode and mechanism for the DNA curvature observed in the bound datasets. With the exception of a few structures where it was difficult to determine three uniformly undistorted regions separated by a large kink, all the curved DNA structures have a negative out-of-plane component. With no long free DNA oligomers in the dataset, it is difficult to conclude whether free DNA by itself can attain such conformations and the protein merely 'locks' it in that conformation or the protein actually bends it to that state. Most of these curved structures, however, do not have highly unusual step parameter values and hence it is possible that longer free DNA oligomers with similar sequences might be able to achieve such curved conformations without the aid of proteins. Even a few steps with unusual parameters might occur in long free DNA oligomers, as indicated by the spontaneous development of one or two sharp kinks in the molecular dynamics simulations of 94 basepair free DNA minicircles [100]. This has interesting implications especially for the HTH-binding DNA, since a majority of the proteins in this dataset are transcription activators or repressors, whose function on binding to the DNA is to cause structural changes in the DNA that allow or prevent other proteins of the transcription machinery to bind to the DNA and carry out transcription. It is tempting to speculate that these proteins merely increase the 'lifetime' of those conformations, as against inducing unfavourable conformations, which involves a much higher energetic cost. However, this needs to be verified using experimental and theoretical methods that trace the dynamic evolution of DNA structures under different conditions.

Conclusion
The free DNA oligomers, even in the crystalline state, sample a large conformational space, but each molecule is found to be entirely in the A or B form, depending primarily on its sequence. In case of protein-bound DNA, the claim that protein-binding generally favours the A-form of DNA [90], as well as the perception that it induces an energetically unfavourable conformation, are invalid. We find that the role of A-form is limited to the DNA structures bound to a few specific protein families such as transposases and DNA polymerases. Protein-induced distortion in DNA can occur via one of several different modes, such as a few steps taking up high positive roll and a smaller twist, a BII like transition of the backbone, leading to a negative roll and large twist, or in some cases, the two strands in the helix being pulled apart. However, these large, induced deviations from the free B-form are observed only in the DNA structures bound to the proteins such as CAP, TBP, integration host factor and Cre recombinase. It is to be noted that, even in these structures, the distortions are limited to a few steps and the remainder of the duplex shows B-DNA like features. In a large number of cases of the HTH motif-bound DNA, protein-binding does not induce any distortion in the dinucleotide step geometry, but the duplex takes up an energetically unfavourable backbone conformation, even when there are no contacts between the protein and the DNA backbone. Barring these exceptions, the average parameters at the level of dinucleotide step, trinucleotide and the backbone of protein-bound DNA structures, across a large and diverse set of protein families, are quite close to the free B-DNA oligomer values. Interestingly, this is observed even though very few hexamer or longer sequence motifs are common to the free and bound data-sets, and the free DNA dataset is significantly smaller than the bound DNA datasets in terms of size. It is also striking to note that even a duplex structure as far away from a 'straight' DNA as seen in the 147 basepair long nucleosome, has very few (≤ 5) steps with highly distorted local parameters, indicating that 'normal' B-like parameters at the local level can cumulatively give rise to double helical structures with a wide range of geometries. These observations highlight the amazing adaptability of this structural form, and may explain why it has evolved to be biologically the most relevant design for double-helical DNA.

Crystallographic dataset generation
The four X-ray crystallographic datasets used in the analysis are (i) RNA oligomers dataset (hereafter referred to as the RNA dataset), (ii) DNA oligomer dataset (hereafter referred to as the free dataset), (iii) DNA-protein complexes dataset (excluding DNA bound by the HTH protein) (hereafter referred to as the complex dataset), and (iii) DNA-HTH protein complexes dataset (hereafter referred to as the HTH dataset). The RNA, free and complex datasets were extracted from the Protein Data Bank (PDB) [101]. All three datasets contain structures with a resolution of 2.0 Å or better. Structures in the PDB that have the DNA-binding HTH motif were identified using the tool PredictDNAHTH, developed by McLaughlin et al [102]. Since only 33 DNA-HTH protein complexes with a resolution of 2.0 Å or better were identified, the resolution cut-off for the HTH dataset was increased to 3.0 Å. There was no significant difference between the results obtained for the dataset with a cutoff of 2.0 Å and the dataset with a cutoff of 3.0 Å. Therefore the larger dataset, with a cutoff of 3.0 Å was used. In the three DNA datasets, only fragments of the DNA consisting of atleast 8 contiguous Watson-Crick basepairs were considered. The RNA dataset had much shorter structures, hence the length cutoff was reduced to five contiguous basepairs. Also steps with non-Watson-Crick basepairs, present in significant numbers in the RNA dataset, were not included in this analysis. In the free dataset, structures with any ligands other than ions or water were excluded. Identical basepairs from structures with a two-fold symmetry were considered only once. The RNA dataset consists of 52 structures (additional file 1) (75 individual duplexes which contain 276 dinucleotide steps comprising of Watson-Crick base pairs). The free dataset consists of 76 structures (additional file 1) (77 individual duplexes which contain 406 basepaired dinucleotide steps comprising of Watson-Crick basepairs).

Evaluation of dinucleotide step parameters and global helical parameters
The structural parameters of the duplexes i.e. the basepair parameters propeller twist, buckle, opening angle, shear, stretch and stagger as well as the dinucleotide step parameters tilt, roll, twist, shift, slide, and rise were determined by the NUPARM program [103][104][105], for all the four datasets. The parameter Z p [14], defined as the mean z-coordinate of the backbone phosphate atoms of the basepair with respect to the basepair dimer reference frame, was also calculated using the revised NUPARM program [105].
The dinucleotide step parameters tilt, roll and twist measure the relative rotational motion between adjacent basepairs about the x, y and z-axis respectively of a local basepair doublet coordinate system, whereas the dinucleotide step parameters shift, slide and rise measure relative translational motion between adjacent basepairs along the local doublet x, y and z-directions respectively.
The global helical parameters viz. the rotational parameters inclination, tip and the helical twist and the translational parameters x-displacement, y-displacement and zdisplacement were also calculated using the NUPARM program. Inclination denotes the rotation of the basepair about the x-axis, tip denotes rotation about the y-axis and helical twist denotes rotation about the helical axis. Similarly the translational parameters denote displacement along the three axes. The mean of the global x-displacement, helical rise, inclination and helical twist for all the non-terminal basepairs within all the structures in a dataset were classified as the average values for the respective dataset. The protein-bound DNA sructures in which the roll or twist value for atleast one step deviated by more than 3σ from the mean roll and twist values of the free Blike DNA oligomer dataset, and also those structures which were curved or whose geometry of curvature could not be assigned (as given in tables 7, 8), were excluded from the calculation of mean values of global helical parameters, since fitting a single linear helical axis would be untenable in these cases. Overall, 49 structures from the complex dataset and 62 structures from the HTH dataset were included for these calculations.

Evaluation of groove widths
The minor groove width and the major groove width were calculated as the smallest interstrand phosphate separations along the two grooves, using the NUPARM program. Please note that the groove widths as defined here also include the phosphate diameter value.

Calculation of bending/curvature
The calculation of the radius of curvature using a least square circle fit method and the ratio of end-to-end distance to the contour length (d/l local or d/l max ) were done as described previously in [69]. The measure d/l max is reasonably independent of the length of the DNA sequence (data not shown), except for highly curved long DNA molecules, as in nucleosomal DNA, but does not distinguish between different types of bending for sequences with fewer than 30 basepairs (data not shown). The radius of curvature (ROC) is calculated by fitting a circle to the basepair centres of the DNA molecules. Smaller the radius of this circle, the more curved the DNA is. However, the quality of the fit to a circle is affected to a large extent by distortions at the local level in the duplex i. e. the successive bending angles. Thus the presence of several triplets that are distorted, even to a small degree, would lead to a poor circle fit and consequently an inaccurate value of radius of curvature (ROC). Thus ROC is only reported when the RMSD for a circle fit is ≤ 1.0 Å, and the ratio of RMSD for a circle fit to that for a line fit is ≤ 0.6.
When the d/l local value is ≤ 0.98, the RMSD for a circle fit is ≤ 1.0 Å, and the ratio of RMSD for a circle fit to that for a line fit is ≤ 0.6, we have assigned the DNA molecule geometry to be curved. When the d/l local value is > 0.98, the RMSD for a line fit is ≤ 1.0 Å, and the ratio of RMSD for a circle fit to that for a line fit is > 1.6, we have assigned the DNA molecule geometry to be linear. When neither the 'curved' nor 'linear' criteria are satisfied, the geometry of the DNA duplex is considered as 'unassigned'. For the DNA duplex that is curved, the out-of-plane component of DNA curvature was calculated as the torsion angle between the global helix axes vectors fitted to three relatively straight sections of the DNA molecule, separated by large kinks.
A local helix axis vector corresponding to each dinucleotide step is defined as the vector pointing in the direction of the cross-product of the differences of the x and y-vectors of the constituent basepair planes. The angle between two local helix axes vectors corresponding to overlapping dinucleotide steps, described as the successive bending angle, as well as the angle between the vectors corresponding to the dinucleotide steps at the two ends of the molecule, and described as the end-to-end bending angle, were also calculated using NUPARM and used as measures of curvature.
The entire analysis of the dinucleotide step parameters, backbone torsion angle parameters, the successive bending angles, the radius of curvature, d/l local and out-of-plane components of DNA curvature has been carried out excluding the terminal basepairs to eliminate end effects. The end-to-end bending angle has also been measured as the angle between the local helix axes vectors corresponding to the penultimate dinucleotide steps.
All the plots were generated using the MATLAB-7.4 package.
The values of the basepair parameters, base-step parameters as well as the backbone torsion angles obtained using the NUPARM package were compared to those obtained by the X3DNA package [27]. The general trend of the parameters was observed to be similar. The parameters calculated by the two programs were different for the distorted regions of a few protein-bound DNA structures.