- Research article
- Open Access
An ensemble of B-DNA dinucleotide geometries lead to characteristic nucleosomal DNA structure and provide plasticity required for gene expression
BMC Structural Biology volume 11, Article number: 1 (2011)
A nucleosome is the fundamental repeating unit of the eukaryotic chromosome. It has been shown that the positioning of a majority of nucleosomes is primarily controlled by factors other than the intrinsic preference of the DNA sequence. One of the key questions in this context is the role, if any, that can be played by the variability of nucleosomal DNA structure.
In this study, we have addressed this question by analysing the variability at the dinucleotide and trinucleotide as well as longer length scales in a dataset of nucleosome X-ray crystal structures. We observe that the nucleosome structure displays remarkable local level structural versatility within the B-DNA family. The nucleosomal DNA also incorporates a large number of kinks.
Based on our results, we propose that the local and global level versatility of B-DNA structure may be a significant factor modulating the formation of nucleosomes in the vicinity of high-plasticity genes, and in varying the probability of binding by regulatory proteins. Hence, these factors should be incorporated in the prediction algorithms and there may not be a unique 'template' for predicting putative nucleosome sequences. In addition, the multimodal distribution of dinucleotide parameters for some steps and the presence of a large number of kinks in the nucleosomal DNA structure indicate that the linear elastic model, used by several algorithms to predict the energetic cost of nucleosome formation, may lead to incorrect results.
The nucleosome is the fundamental repeating unit of the eukaryotic chromosome [1, 2]. Nucleosome positioning is controlled by several factors such as intrinsic preference of the DNA sequence to assume a nucleosome-like structure, higher order chromatin organisation, DNA methylation and presence of DNA binding proteins such as transcription factors . A large number of studies have tried to characterise the intrinsic preferences of the DNA sequence and derive the complete sequence pattern of nucleosomal DNA by analysing the in-phase and out-of-phase occurrences of various dinucleotides such as AA and TT, GG and CC, AT, TA, and CA and TG [4–11]. However, formation of a large number of nucleosomes is primarily controlled by factors other than the nucleosome sequence [12–14]. In this context, one needs to ask what role, if any, is played by the structural variability of nucleosomal DNA in the formation of these nucleosomes.
The high resolution crystal structures of the nucleosome core particle  comprise of a left-handed superhelix covering ~1.7 turns, with a pseudo two-fold axis of symmetry passing through one of the central basepairs, where the major groove faces the histone octamer. Since the interactions of the histone core with DNA are primarily non-specific, the ability of the DNA sequence to assume this structure determines the stability of the nucleosome core. The nucleosome structure at the local level is described in terms of dinucleotide step parameters tilt, roll, twist, shift, slide and rise that quantify the motion between adjacent basepair planes . Studies of the best resolved nucleosome structure (PDB id: 1KX5, ) have shown that the parameter roll is primarily responsible for curvature of the nucleosome , while twist and slide contribute predominantly to the pitch of the nucleosome [17–19]. Analysis of data from nucleosome crystal structures along with that from molecular dynamics simulations of the nucleosome structure have indicated that the distributions of roll, twist and slide are conserved, while tilt, shift and rise are relatively free .
In the present study, we have analysed the variability at the dinucleotide as well as longer length scales in a dataset of nucleosome X-ray crystal structures. Our analysis shows that even for identical sequences, there is significant local level structural variation and some amount of variation at longer length scales, indicating that there is a thermodynamic ensemble of local level structures that can give rise to the core nucleosome structure. These results also raise questions about the rationale for using only the best resolved X-ray crystal structure (PDB id: 1KX5, ) of nucleosome as the prototype on which different potential nucleosome sequences are threaded, for calculating the energetic cost of nucleosome formation [18, 20–22]. Further, they indicate that use of the simple harmonic approximation to calculate the energetic cost of nucleosome formation [18, 20–24] may lead to incorrect results, owing to the multimodal distribution of dinucleotide parameters for some steps [25, 26] and the presence of a large number of kinks in the nucleosome structure.
The structural parameters of the nucleosome dataset were analysed to gain a perspective on the intrinsic variability of the nucleosomal DNA structure. The dataset comprised of twenty-nine structures corresponding to only six unique sequences (see 'Methods' section for details). However, even for structures with identical sequences, the local structural parameters were observed to vary considerably. Hence all the structures have been retained for the analysis.
Table 1 shows the distribution of the ten unique dinucleotide steps for all the six sequences. The last column in Table 1 gives the total number and proportion of each dinucleotide step in the entire dataset of twenty-nine crystal structures. It is seen from Table 1 that all the sequences have large proportions of AA/TT and CA/TG (> 15% for five sequences) dinucleotide steps. There are also significant proportions of AG/CT, GA/TC and AT/AT steps (> 10% for sequences 2-5). Compared to other sequences, sequence 1 has a larger proportion of AA/TT and GA/TC steps, and smaller proportions of CA/TG and AT/AT steps. The proportion of CG/CG and TA/TA steps is very small for all the sequences.
A detailed analysis of all the protein-DNA contacts for all the structures was carried out. Most of the specific protein contacts with DNA base atoms as well as non-specific contacts with the DNA backbone were conserved in terms of distance from SHL 0. Exceptions were observed in a few cases which were the result of mutations in the histones. However, the pattern of variation in structural paramaters observed in this study seems to be independent of protein contacts, and more a result of position dependence and the intrinsic properties of the dinucleotide steps. Hence interactions of DNA with the protein have not been focussed upon in this study.
An analysis of all the inter-particle contacts in the crystal lattice was also carried out for all the structures. The only noteworthy observation is the loose twisting around SHL ±2 and tighter twisting around ±5, of the structure 2NZD of the human NCP, as compared to the NCP structures of Xenopus laevis, and also those of other organisms. This can be attributed to the DNA-DNA contacts in the human NCP, as commented by Tsunaka et al. . Apart from this variation, none of the observed differences between the nucleosomal DNA structures can be related to differences in the inter-particle contacts in the crystal lattice. Hence the crystal packing effects will not be discussed further.
Position-specific variation in the structure of the backbone, and of the dinucleotide steps
The backbone torsion angles α, β, γ, δ and χ do not show any significant variation from the standard B-DNA values. The major backbone flexibility is observed in the occurrence of BI and BII conformations.
Table 2 shows the overall proportions of the backbone torsion angle states as defined in the 'Methods' section. 68.3% of the nucleotides are found to take up state 1, with canonical values for α, γ and ϵ-ζ , while 20.4% of the nucleotides are observed to assume state 7 with canonical values for α and γ but with the BII conformation for ϵ-ζ . The remaining ~10% of the nucleotides are divided between the rest of the states, with state 6 occurring in ~5% of them.
A comparison with the distribution of backbone torsion angles for the free and protein-bound DNA reported by Marathe et al.  shows that the occurrence of the classical state 1 is lower than that observed in free oligomers or either of the protein-bound datasets. On the other hand, the proportion of state 7 is larger for the nucleosome dataset as compared to any of the datasets reported by Marathe et al., and is closest to the proportion for the free B-DNA dataset. It must be noted here that the 'Complex' dataset analysed by Marathe et al. also included two representative nucleosome structures 1KX5 and 1KX3. When these two structures are excluded from the 'Complex' dataset, the proportion of state 1 increases from 78.8% to 80.5% and the proportion of state 7 decreases from 10.0% to 7.6%. Thus DNA in the nucleosome structure has lower preference for the canonical backbone conformation as compared to protein-free and other protein-bound DNA, and the proportion of state 7 among the non-canonical conformations is larger than the corresponding proportions in protein-free and other protein-bound DNA structures.
In case of dinucleotide step parameters, an important question is whether the nucleosomal DNA is primarily A-like or B-like at the local structural level. It has been shown earlier [25, 28] that the parameter Z p best indicates whether a given dinucleotide step has an A-like or B-like structure. A value of Z p > 1.3 Å is considered an A-like conformation while Z p ≤ 0.8 Å is considered as B-like conformation, with values in between the two cutoffs signifying an intermediate conformation . A survey of the Z p values for all the dinucleotide steps in this study indicates that only 10 out of 4146 steps have a Z p value > 0.8 Å, indicating that almost the entire nucleosome dataset is B-DNA like. Of these, only one step in the structure 1KX4 has an A-like Z p value of 1.3 Å, while the other nine steps have Z p values signifying an intermediate conformation. The plots of Z p versus slide and Z p versus roll for the nucleosomal DNA dataset, shown in Figure 1 and 2 respectively, indicate that slide is correlated with Z p , while roll shows no correlation with Z p . Thus the observation in case of high-resolution free and protein-bound DNA crystal structures that slide, in addition to Z p , can discriminate between A and B-forms, while roll does not discriminate between the two forms , is also valid in case of nucleosomal DNA.
The rotational and translational motion between Watson-Crick basepairs constituting a dinucleotide step is observed to vary depending on their orientation with respect to the histone octamer and on the backbone conformation. The most significant variations are observed in case of the rotational parameters roll and twist, and the translational parameter slide, and the mean and standard deviation values for these parameters are shown in Tables 3, 4 and 5 respectively, for the ten Watson-Crick dinucleotide steps. The corrresponding values for the rotational parameter tilt, and the translational parameters shift and rise, are reported in tables S1, S2 and S3 of additional file 1.
Henceforth, we will refer to the regions where the minor groove of the DNA faces the histone octamer as region I, the regions where the DNA backbone faces the histone octamer as region II, and the regions where the DNA major groove faces the histone octamer as region III.
Owing to the strong bias in the dataset in favour of purine-purine steps, they also dominate the population of each subset, namely, regions I, II and III. However, their proportion in region I is significantly larger (63.8%) as compared to their proportion in the entire dataset (53.0%), while their corresponding proportions are smaller in regions II (42.0%) and III (48.1%). The proportion of purine-pyrimidine steps is significantly smaller in region I (16.3%) as compared to their proportion in the entire dataset (23.2%), while their corresponding proportions are larger in regions II (28.4%) and III (27.2%). In case of pyrimidine-purine steps, their proportion in region I (19.9%) is smaller as compared to their proportion in the entire dataset (23.8%), while their proportion in region II is significantly larger (30.0%) and proportion in region III (24.8%) is similar.
In all three regions, BI/BI conformation is present in the largest proportion, and hence most of the steps take up this conformation most often. However, in region I, GA/TC, GG/CC, GC/GC and CG/CG take up the BI/BII conformation more often than the BI/BI conformation. The CA/TG step is the most uniformly distributed with a significant presence in all three subgroups of all three regions. It takes up the largest share of BII/BII conformation in all three regions.
Roll, twist and slide show correlated variation for the three subgroups in all three regions. Steps with the BII/BII state assume more negative roll, larger twist and large positive slide as compared to steps with BI/BI or BI/BII conformation. Remarkably, the GG/CC step takes up an extremely large and negative mean value of roll for the BII/BII conformation in region I, and correspondingly, a large positive mean slide and high mean rise indicative of stretching, unlike its structure in free DNA as well as other protein-bound DNA.
The nucleosome is expected to assume negative roll values corresponding to a narrow and deep minor groove in region I, and positive roll values corresponding to a wide and shallow minor groove in region III. While this trend holds true overall, the presence of the remarkably flexible CA/TG step in all regions, and the occurrence of BI/BI as well as BII/BII conformation in both regions I and III, leads to some individual steps bucking the trend. Thus the CA/TG step with BI/BI conformation in region I takes up large positive mean roll of 6.6°, while the one with BII/BII conformation in region III takes up large negative mean roll of -6.1°, both values being beyond 1σ of the mean value for free B-DNA.
The tilt values also show significant variation, though no obvious trend is seen. The variation in shift is very small, while that for rise is negligible. The exception is the GG/CC step, which in several cases, shows significant variation in shift and rise as well as tilt.
Kinks in nucleosomal DNA
An important question of biological relevance is the contribution of the intrinsic, sequence-dependent flexibility of the DNA structure in nucleosome formation, vis-a-vis the role of intrinsic or protein-induced kinks. By definition, a kink should be beyond the elastic limit of dinucleotide step flexibility. The deviation of the various dinucleotide steps from their structure in protein-free B-DNA should provide clues about the number of kinked steps, as compared to the number of steps with deformations within the elastic limit. We observe that the dinucleotide steps in nucleosomal DNA undergo large deviations from their corresponding values in free B-DNA primarily in terms of tilt and roll, which also happen to be the parameters responsible for imparting curvature to a DNA fragment. We have assumed that the structure of a dinucleotide step with both tilt and roll deviating by less than 3σ from their mean values in free B-DNA, is within the elastic limits. Hence the structure of a dinucleotide step with tilt or roll deviating beyond 3σ of their mean values in free B-DNA qualifies as a kink, formed with or without the assistance of proteins.
Table 6 shows the number of datapoints for each dinucleotide step that deviate by more than 3σ from the mean values for either tilt or roll in the free B-DNA dataset . The population has been divided on the basis of orientation with respect to the histone octamer, and further subdivided on the basis of backbone conformation. The total number of kinked steps in the dataset is 421. This amounts to an average of 421/29 ~ 14.5 kinks per structure, or a kink every 144/14.5 ~ 10 steps of the nucleosomal DNA. The largest number (26) of kinks are observed in the structure 1U35, while the smallest number (6) of kinks are observed in the structure 1KX5. Sixteen out of the twenty-nine structures in the dataset have > 14 kinks.
Of the 421 steps assuming a kinked structure, 235 have only the roll value deviating by 3σ from its free B-DNA mean value, while 153 steps have only the tilt value deviating by 3σ from its free B-DNA mean value. Nearly 40% of the steps in region I with backbone conformation BII/BII, and 25.0% of the steps in region II with backbone conformation BII/BII, are kinked, with CA/TG contributing a large share in both cases. In region III, significant proportions of CA/TG and GG/CC steps with BI/BI backbone conformation, and TA/TA and GG/CC steps with BI/BII backbone conformation, are kinked.
In region I, 150 of the 172 kinked steps take up negative roll value as expected. Of the 22 steps with positive roll, 3 AG/CT steps and 1 CA/TG step take up positive roll beyond 3σ of its free B-DNA mean value, accompanied by a BI/BI backbone conformation. Of these, the CA/TG kink in the vicinity of SHL -5.5 in the structure 1M1A, and the AG/CT kink in the vicinity of SHL -1.5 in the structure 1U35, are compensated by larger and opposite kinks at the neighbouring steps. However, the AG/CT kink in the vicinity of SHL 6.5 in the structure 1M1A, and the AG/CT kink in the vicinity of SHL 4.5 in the structure 1S32, are not compensated for.
In region III, 162 of the 182 kinked steps take up positive roll. Of the remaining 20 steps, 3 CA/TG steps take up negative roll beyond 3σ of its free B-DNA mean value, accompanied by a BII/BII backbone conformation. All the 3 steps occur at the edge of the major groove region centered at SHL -6, in the structures 1P3F, 1P3O and 2F8N. These kinks are compensated by large positive roll angle values at the neighbouring step, as well as at the step adjacent to it.
In region II, 41 of the 67 kinked steps have only the tilt value deviating by 3σ from its free B-DNA mean value, while 22 kinked steps have only the roll value deviating by 3σ from its free B-DNA mean value. Thus tilt contributes nearly twice the share of kinks in this region as compared to roll.
The intra-basepair parameters propeller twist, buckle and open angle of the basepairs constituting the dinucleotide steps with kinks are far more likely to take up values more than 1σ away from the mean nucleosome dataset values for these parameters, as compared to the proportion over the entire nucleosome dataset. Over the entire dataset, only 25-30% of basepairs take up propeller twist, buckle or open angle values beyond 1σ of the mean value for these parameters, but within the kinked steps, 60-70% of steps have one or both the constituent basepairs assuming values beyond 1σ of the mean value. In case of other basepair parameters such as the two glycosidic angles, and the C1'-C1' and C8-C6 separations, the proportions are similar to those for propeller twist, buckle and open angle. The basepair parameters seldom assume values beyond the 2σ limit.
Variation within and across structures
At dinucleotide level
It must be emphasized that while there are similarities within and across structures in the distribution of parameters at the dinucleotide step level, there are also substantial differences. The trajectories for the dinucleotide step parameters, for the six structures, namely, 1KX4, 2NZD, 2F8N, 1KX3, 1P3I and 1KX5, corresponding to sequences 1, 2, 3, 4, 5 and 6 respectively, are shown in figures S1-S6 of additional file 1. We have represented only six structures for clarity, however, the observations remain valid for the entire dataset of twenty-nine structures. The differences within the same structure become evident when one looks at the autocorrelation values for different parameters in a structure. Figure 3 shows the autocorrelation values for the parameters roll, twist and slide, for the six representative structures. The autocorrelation plots for tilt, shift and rise are shown in figure S7 of additional file 1. It is immediately apparent that roll and slide display strong periodicity throughout each structure with a period just greater than 10. However, the peaks for strong correlation or anticorrelation are not always at the same position with respect to SHL 0, for different structures, and in many cases, are distributed over two or three positions. Twist shows a weak periodicity, and the other three parameters display no periodicity. This validates the theoretical prediction by Bishop  that tilt, shift and rise are allowed relatively greater freedom in the nucleosome structure.
The differences across structures become evident when one looks at the crosscorrelation plots for different dinucleotide step parameters. Figure 4 shows the crosscorrelation values for the parameters roll, twist and slide, for the five structures 1KX4, 2NZD, 2F8N, 1KX3 and 1P3I, corresponding to sequences 1, 2, 3, 4 and 5, with respect to the parameters for the best resolved crystal structrure 1KX5, corresponding to sequence 6. The corresponding plots for tilt, shift and rise are shown in figure S8 of additional file 1. The crosscorrelation peaks for each structure seem to be displaced by few positions as compared to the peaks for any other structure. In case of both autocorrelation and crosscorrelation plots, the distribution for slide seems more periodic than that for roll.
At trinucleotide level
Successive bending angles are proportional to the difference of roll angles of consecutive, overlapping dinucleotide steps (data not shown) and as such, are indicators of trinucleotide level structural variation. Figure S9 in additional file 1 shows the trajectories for the successive bending angles for the six representative structures, while Figure 5 (upper panel) shows the crosscorrelation values for the successive bending angles, for the five structures 1KX4, 2NZD, 2F8N, 1KX3 and 1P3I, with respect to the corresponding parameters for the structrure 1KX5. It is clearly seen that even at zero lag, the plots show a very small peak, and there is no correlation at non-zero lag. This clearly indicates that the pattern of local bending is very distinct across different structures, despite very similar sequences for all the structures, except 1KX4. The plots for autocorrelation values of successive bending angles in the six structures mentioned above (upper panel of figure S11, additional file 1) also display an irregular pattern, implying that even different regions within the same structure show different bending profiles.
The minor groove width calculation spans three basepairs and in that sense, is a trinucleotide parameter. Figure S10 in additional file 1 shows the minor groove width trajectories for the six representative structures, while Figure 5 (lower panel) shows the crosscorrelation plots for minor groove width values. In contrast to successive bending angles, the variation in minor groove width is seen to be uniform, strongly periodic, and shows very similar pattern across all structures.
At octanucleotide and decanucleotide levels
Figure 6 (upper panel) shows the crosscorrelation plots for the angles between the global helix axes fitted to successive, non-overlapping tetranucleotide fragments. It is clearly seen that at the octanucleotide level, the bending between successive, non-overlapping tetranucleotide fragments shows a periodic variation across all structures, independent of differences in sequence. This uniformity of variation of the bending angle across structures with different sequences becomes even more pronounced when one considers crosscorrelation values for the angles between global helix axes fitted to successive, non-overlapping pentanucleotides (as seen by the sharper peaks of larger magnitude in the lower panel of Figure 6), since this region corresponds to approximately one turn of the helix. This observation validates the theoretical prediction by Bishop  regarding this length scale. Thus within one turn, the longer length scales from octanucleotide to decanucleotide display the same trend of bending across all structures. It is possible that this trend carries over to even longer length scales.
The trajectories for the angles between tetranucleotide fragments, and the angles between pentanucleotide fragments, for the six representative structures, as well as the autocorrelation plots for the angles are shown in figures S12, S13 and S14 respectively, in additional file 1. The periodicity for these two parameters within each structure is clearly evident.
Curvature at a length scale of half a superhelical circle
DNA curvature at longer length scales gives an idea as to whether the similarity of bending at a length scale of one turn (~ten nucleotides) adds up at longer length scales. Table 7 shows the mean and standard deviation values of d/l local , I max /I min , (I max +I mid )/I min and radius of curvature for successive, overlapping 36-mer fragments of each structure. A 36-mer fragment is just short of half the superhelical circle. It is clearly seen that the mean values for d/l local , I max /I min , (I max +Imid)/I min and radius of curvature for all structures are within 1σ deviation of each other indicating that most of the fragments have similar curvature. However, the standard deviation values being ~4-5% of the mean ROC indicates that the variation in curvature of individual fragments of the same structure or between different structures is not negligible.
Nucleosomal DNA is B-like at the dinucleotide step level
An analysis of the twenty-nine nucleosome X-ray crystal structures of better than 3 Å resolution reveals significant dinucleotide level structural variability in nucleosomal DNA, despite limited variation in sequence, with only six unique sequences, of which only one differs significantly from the other five. A survey of the dinucleotide step parameters indicates that all the dinucleotide steps in the dataset assume values characteristic of B-form DNA. Specifically, the parameter Z p , which has been shown to be the most reliable indicator of A-versus-B discrimination at the dinucleotide step level [25, 28], indicates that with the exception of ten steps, the entire nucleosomal dataset is B-like. This observation is in contrast to the study of all nucleosomal DNA structures reported by Xu and Olson . Xu and Olson have classified nucleosomal dinucleotide steps with large positive roll (> 7°) and negative slide (< -1 Å) (as calculated by the 3DNA program ) as exhibiting an "A-type kink-and-slide geometry", and observe 15% kink-and-slide steps assuming this geometry. However, it has been shown that roll is not a reliable indicator of an A-like geometry . Further, of the 421 nucleosomal DNA steps in our dataset with at least one of tilt or roll assuming values beyond 3σ of the free B-DNA mean values, only 16 steps (proportion 3.8%) assume roll value > 7° and slide value < -1 Å.
The presence of the non-canonical BII conformation in nucleosomal DNA in proportions larger than those observed in other protein-bound DNA highlights its role in modulating the nucleosomal DNA structure by facilitating a bend into the minor groove. In the free B-DNA dataset, only the CA/TG step has significant proportion of datapoints with one or both strands in BII conformation, which might explain why the CA/TG step is often strategically placed at positions where the DNA bends into the minor groove [18, 20]. This also explains why the CA/TG step constitutes the largest share of steps with at least one strand in BII conformation, in all regions of the nucleosome.
A comparison of the mean values for the three parameters, namely, roll, twist and slide, which contribute predominantly to the DNA curvature and superhelical pitch, for the nucleosome dataset with those for the free B-DNA dataset  sheds some light on the role played by the histone proteins. The mean values for BI/BI conformation are within 1σ of the mean values for free B-DNA, while most of the mean values for BII/BII and few of the mean values for BI/BII conformation are beyond 1σ of the corresponding mean values for free B-DNA. As a result, the overall mean values for roll, twist and slide for the BII/BII conformation in all three regions are beyond 1σ of the free B-DNA mean values, and reinforce the importance of the BII/BII conformation in modulating the nucleosome structure.
The kinks into the minor groove at GG/CC steps may be influenced by proteins and environmental factors
A minor groove kink is generally characterised by large negative roll, large twist and large positive slide, along with a state 7 backbone conformation i.e. a BII/BII conformation for ϵ-ζ and canonical B-DNA values for the other backbone torsion angles. The negative mean roll angle value with extremely large magnitude for the GG/CC steps occurring in region I and taking up a BII/BII backbone conformation, and the significant proportion of kinks among these steps prompted us to individually examine all the steps. We observe at least one kinked GG/CC step in all structures with the exception of 1KX5, 1M18, 1M19, 1M1A, 2CV5 and 2PYO. However, it must be pointed out that not all these extreme values belong to a step with state 7 conformation, in many cases, the step assumes one of the other six conformations in one or both of the strands. In most of the cases, these GG/CC steps occur at SHL -1.5 and/or +1.5 with respect to the pseudo two-fold axis of the superhelix. While many of these kinks into the minor groove are also accompanied by a stretching of the DNA as described earlier , in some cases the kinked conformation is observed in the absence of any stretching. Several GG/CC steps with a sharp kink also have an accompanying large positive slide. Thus at SHL -1.5 in the structure 1F66, the roll value is -68.1° with corresponding tilt, twist, shift, slide and rise values of -4.3°, 43.0°, 0.3 Å, 1.2 Å and 4.7 Å respectively, while the GG/CC steps in 1KX4 at SHL ±1.5 provide examples of distortion without stretching, but with positive slide values.
In about half the reported structures, a GC/GC step adjacent to the GG/CC step is also kinked. For example, while the GG/CC step at location -1.5 in the structure 1KX3 assumes tilt, roll, twist, shift, slide and rise values of 7.5°, -26.1°, 43.3°, -0.4 Å, 0.4 Å and 4.0 Å respectively, the neighbouring GC/GC step assumes values of -15.5°, -17.0°, 37.3°, 0.6 Å, 0.5 Å and 4.6 Å respectively. In the other structures, the neighbouring GC/GC step tends to compensate for the unusual conformation of the GG/CC step. On the other hand, the GG/CC step at SHL -1.5 has no significant distortion in some structures, while the neighbouring GC/GC step is extremely distorted. This is further proof of the variability observed even across structures with identical sequences. It must be noted that the nucleotides in these GG/CC steps with extreme values of roll, twist and slide were not hydrogen bonded to any amino acids.
Similarly, several of the AA/TT steps also assume large negative roll values at locations where the minor groove faces the histone octamer. This is observed in eleven of the structures at one or two locations, the most prominent being SHL ±4.5, and in few cases at SHL ±3.5 and ±1.5. Many of these AA/TT steps are observed to be part of A-tracts. Some of these are again accompanied by large twist and large postive slide values. For example, in the structure 2F8N, the roll, twist and slide values of the AA/TT step at SHL -3.5 are -14.2°, 44.6° and 1.7 Å respectively, while those for the step at SHL 1.5 are -19.5°, 44.0° and 1.8 Å respectively. While these steps also have an abnormal tilt, most of them do not display a large rise characteristic of stretching.
In addition to the above steps, few of the AG/CT steps are also observed to assume large negative roll, large twist and large positive slide at SHL 2.5 and in some cases, at SHL -1.5. For example, in the structures 1M1A and 2CV5, the tilt, roll, twist, shift, slide and rise values are 7.4°, -16.6°, 45.0°, 0.6 Å, 1.8 Å and 3.6 Å, and 4.7°, -18.5°, 40.6°, 1.1 Å, 1.6 Å and 3.5 Å respectively at SHL -1.5.
A comparison of the minor groove kinks in the nucleosome dataset with those observed in the DNA bound to other proteins shows that similar kinks are also associated with CA/TG steps in the Cre recombinase-bound DNA and the I-Cre I homing endonuclease-bound DNA structures. A dinucleotide step with large positive slide in the non-nucleosomal protein-bound DNA is most likely to be CA/TG, as noted by Tolstorukov et al. . The only non-CA/TG dinucleotide steps with a slide value > 2.0 Å are an AA/TT step in the hyper-thermophile SAC 7D-DNA complex structure with PDB id 1WTQ  and a CG/CG step in the catabolite activator protein-bound DNA structure with PDB id 1O3R [33, 34]. While both these steps assume large positive slide, they do not assume large negative roll and hence are not equivalent to a kink into the minor groove. The absence of a minor groove kink at the TA/TA steps in the crystal structure datasets of either the nucleosomal DNA or the non-nucleosomal protein-bound DNA is surprising, since it has been suggested earlier  that a minor groove kink at the TA/TA step will be energetically less costly as compared to an equivalent kink at the CA/TG step.
However, while the contribution by the CA/TG step to curvature and superhelical pitch remains largest, it is not exclusively confined to it. Given that the GG/CC step favours positive roll, small twist and negative slide  in free DNA, the frequency with which it is observed kinking into the minor groove in the nucleosome dataset is intriguing. The observation that most of the distorted GG/CC steps are observed at SHL ±1.5 indicates that this position might be of special relevance  and any dinucleotide step around this region might be vulnerable to stretching and distortion. However, exactly which step gets kinked and/or stretched might depend on a combination of factors such as the position, the dinucleotide sequence and the differing context of the nucleosome within chromatin. The kinks into the minor groove at GG/CC steps might also point to a more general tendency to have a mixture of favourable and unfavourable sequences, which results in only marginally stable nucleosomes , so that the nucleosome can be disrupted during transcription and replication while simultaneously preventing inappropriate access.
Extreme kinks into the major groove are less likely as compared to extreme kinks into the minor groove
The kinks into the major groove do not have tilt or roll values deviating too far from 3σ of the free B-DNA mean values. Only 4 out of 182 kinks in region III take up positive roll with a value deviating by > 5σ from the mean free B-DNA value, while the corresponding number for kinks in region I is 27 out of 172. The four steps are: an AT/AT step at SHL -1 in the structure 1EQZ (similar to the GG/CC kink into the minor groove, this is an unlikely conformation, since the AT/AT step favours nearly zero roll and slide in free B-DNA and other protein-bound DNA ), a GG/CC step at SHL -6 in the structure 1P3B, a CA/TG step at SHL -2 in the structure 2F8N and a CA/TG step at SHL -1 in the structure 2NZD.
A comparison of the major groove kinks in the nucleosome dataset with those observed in the DNA bound to other proteins shows that similar kinks are also observed at a variety of steps in different structures such as hyperthermophile SAC 7D-bound DNA (CG/CG, TA/TA, AA/TT), LAC-repressor-bound DNA (CG/CG), catabolite activator protein-bound DNA (CA/TG, CG/CG, GA/TC), integration host factor-bound DNA (AA/TT), Eco RV endonuclease-bound DNA (TA/TA), γδ resolvase-bound DNA (TA/TA) and TATA binding protein-bound DNA (TA/TA, AA/TT and AG/CT). Unlike most of the kinks in the nucleosome structures, these kinks are extremely sharp (roll ~50-60°) and often accompanied by a large rise of > 4.0 Å and sometimes by a small twist. However, similar to the conformation observed in nucleosomes, most of these steps are also B-DNA like in terms of Z p , slide and the backbone conformational parameters.
The linear elastic model may not be applicable to nucleosomal DNA
A number of algorithms, based on the simple harmonic approximation, have been developed to predict the energetic cost of nucleosome formation [18, 20–24]. These algorithms typically assign the parameter mean values in the X-ray crystal structure dataset of protein-DNA complexes (or values obtained by minimising the conformational energy ) as 'zero energy' values, and use a quadratic term as energy penalty for deviations from these mean values. However, analysis of high-resolution free and protein-bound DNA crystal structures  indicates that even within the B-DNA family, steps such as CA/TG assume a trimodal distribution in case of several dinucleotide parameters. Multimodal distribution of twist and slide values for several steps has been observed in molecular dynamics simulations carried out by the Ascona B-DNA consortium . In case of such steps, use of distribution mean values is invalid. In this context, it must also be pointed out that our definition of a kink in terms of mean and standard deviation values for tilt and roll over the entire free B-DNA dataset is meaningful only because the values for both parameters assume a single Gaussian distribution over the entire dataset.
The presence of a kink, on average, over every turn of the nucleosomal DNA helix poses an additional problem for algorithms based on the linear elastic model. Molecular dynamics studies  have shown that kinks similar to the ones observed in the nucleosome are stiff. Hence the simple harmonic approximation may lead to an incorrect value for energy of formation of a kink, and consequently, for energy of nucleosome formation. This is in agreement with the observation by Sereda and Bishop  that "removal of the largest amplitude deformations in the nucleosome had a significant [positive] effect on all elastic rod models" and therefore "a simple linear approximation does not properly capture the material properties of DNA".
Distribution of trinucleotide parameters indicates a more uniform variation in slide as compared to that in roll
The correlation values for successive bending angles within and across structures indicate that the bending profile fluctuates significantly not only for structures with different sequences but also for those with identical sequences, as well as for different regions within a structure. This variation in successive bending angle values can be most clearly explained in terms of the pattern of roll angle values. In all the structures, we observe blocks of two or three steps with negative roll values in the regions around SHL ±i.5 (i = 1, 2, 3, 4, 5, 6), followed by a junction step with nearly zero roll and a block of two or three steps with positive roll in the regions around SHL ±i (i = 1, 2, 3, 4, 5, 6, 7). This has been discussed by Richmond and Davey  for the structure 1KX5. However, within the block of negative roll, any of the two or three steps can have the highest magnitude of roll, and this step is observed to be different even for different structures corresponding to the same sequence. The observation also holds true for blocks of positive roll. This explains the difference in pattern of successive bending angles across structures corresponding to different as well as identical sequences, and highlights the structural versatility of B-form DNA.
It was shown by Richmond and Davey  in the structure 1KX5 that the regions between SHL -3 and SHL +3 display smooth bending when the minor groove faces the histone octamer while the minor groove blocks facing the histone octamer and farther away from the dyad are kinked with large negative roll angle values. However, this observation does not hold true in general, as seen by a survey of the successive bending angle values in all structures. While the structures in the 1M1 series display smooth bending in the region between SHL -3 and +3 and kinks outside those blocks, most of the other structures have sharp kinks throughout in blocks of both positive and negative roll. Even in 1M1A, there are sharp kinks at SHL -1.5 and -2. Several structures have a kink at -1.5 or +1.5 (1EQZ) as described earlier while some of them have kinks at -2 or +2. It must be noted that the structure 1KX5 also has a sharp kink at +2. As a result of variation in kinks versus smooth bending, the variation in the shift parameter is also not uniformly different for the regions binding H3 and H4 as against the regions binding H2A and H2B, as noted for 1KX5 .
A survey of the successive bending angle values for all structures also indicates that for almost all of the 146-basepair structures, the kinks in the shorter half (please refer 'Methods' section for definitions of the shorter and longer halves) of the structures are larger in magnitude than the corresponding kinks in the longer half. This trend is seen most consistently in structures of the 1P3 series but it is also observed in all other 146-basepair structures. In addition, most of the large kinks into the minor and major grooves, commented upon in the previous section, occur in the shorter half of the DNA structure. These two observations together imply that stretching in one half of the structure to cover the same distance with one basepair less as compared to the longer half leads to sharper kinks throughout the shorter half. Of the three 145-basepair structures, only 2F8N is observed to assume sharper kinks in the first half compared to the second half of the structure. The 147-basepair structures do not display this behaviour.
In contrast to successive bending angles, the variation in minor groove width is seen to be far more consistent across all structures. Minor groove width has been shown to be proportional to the mean of the slide of the two dinucleotide steps constituting the trinucleotide . The consistency of the minor groove width variation across different nucleosome structures implies that unlike roll, the slide parameter adds up in a regular fashion at the trinucleotide level, independent of sequence. This observation supports the earlier suggestion that slide is the most important parameter in determining DNA superhelical structure .
Interplay of slide, roll and twist causes variation of gross structural parameters at different length scales
We observe that at a length scale of thirty-six basepairs, curvature values have large fluctuations, as indicated by the standard deviation values being ~4-5% of the mean ROC. This is the cumulative effect of large variation in roll angles at the same position with respect to SHL 0. While roll is primarily responsible for curvature, twist and slide are responsible for superhelical rise. However, within and across structures, slide displays more regular variation as compared to roll and twist. Thus the pattern of interplay between these parameters is different at varying length scales, for long fragments within the same structure and also across structures, leading to variations in the core nucleosome structure. This observation is in agreement with results from Bishop's analysis of crystal structures and molecular dynamics simulation data of the nucleosomal DNA .
Structural versatility of B-form nucleosomal DNA may contribute to the plasticity of gene expression
Nucleosome positioning is known to be controlled by various factors such as preference of the DNA sequence to assume a nucleosome like structure, DNA methylation, higher order chromatin structure and presence of DNA binding proteins such as transcription factors . Of these factors, the intrinsic preferences of the DNA sequence have been shown to play a key role in determining the organisation of nucleosomes in vivo[5–7]. There have been a host of studies which have attempted to derive the complete sequence pattern characteristic of nucleosomal DNA by analysing the in-phase and out-of-phase occurrences of various dinucleotides such as AA and TT, GG and CC, AT, TA, and CA and TG [4, 6, 8–11]. In this study, we have not looked at sequence variation in the nucleosome crystal structure dataset, as it has only two widely differing sequences. However, it must be noted that the statistical enrichment of preferred dinucleotide and longer motifs is observed to occur only modestly above a random distribution and is limited to nucleosomes immediately upstream and down-stream of a transcription start site (TSS) [12–14]. In other words, formation of a majority of in vivo nucleosomes is largely controlled by factors other than the DNA sequence. This is especially true for nucleosomes in the vicinity of genes, which display a higher plasticity in terms of variation in their expression, with such nucleosomes displaying a more homogeneous and dynamic occupancy across promoters, and a particularly high occupancy close to the TSS . We would like to propose that the large range of permissible variation in structure of B-form DNA  acts as an important factor in the formation of nucleosomes in such regions. There has not been any focus on this factor, because we do not have the structure of nucleosomal DNA in in vivo conditions, and it is difficult to comment on its variability. However, our analysis of all the available nucleosome crystal structures shows that even within this limited dataset, and even for the same sequence, there is an ensemble of dinucleotide and trinucleotide level B-form structures, that can lead to similar core nucleosome structure. We also hypothesise that the structural versatility of nucleosomal DNA might act as an important facilitator of expression plasticity by changing the volume of periodically exposed grooves and thereby, varying the probability of recognition by regulatory proteins that bind to these grooves [14, 40, 41].
The best resolved nucleosome crystal structure may not be the 'ideal' template
Several studies have focussed on developing algorithms to predict the energetic cost of nucleosome formation [18, 20–22] by using as a template, the best resolved X-ray crystal structure of nucleosome with PDB id 1KX5 . We see major drawbacks in this approach, since our analysis clearly points at significant variation in local nucleosomal structure, and hence it seems unlikely that the single static structure represented in 1KX5 is the structure for nucleosomal DNA. This is in agreement with the observation by Xu and Olson  that "Nucleosomal DNA can also take slightly different conformational routes in the course of its packaging" and "the different nucleosomal pathways accommodate the deformations of a common sequence ... in different ways". Given that DNA curvature is essentially statistical [42–44], and considering that statistical and static averages are often different [45, 46], the structure 1KX5 is unlikely to represent the statistical mean of such an ensemble. Hence calculation of the energetic cost for a given genomic sequence to take up a nucleosome structure, assuming the structure of 1KX5 as the template, may not lead to biologically meaningful results.
The nucleosomal DNA structure, despite very limited sequence variation in available experimental data, displays significant variation at the dinucleotide step level for parameters such as roll, twist and slide, but remains within the B-DNA family. We also observe a large number of kinks in the nucleosomal DNA structure. Extreme kinking into the minor groove is more frequent than extreme kinking into the major groove. Particularly at SHL ±1.5, the GG/CC step, which favours a conformation with positive roll, small twist and negative slide in free B-DNA , is frequently observed to assume large negative roll, leading to a kink into the minor groove. This indicates a possible indirect role for proteins and the environment. The multimodal distribution of dinucleotide parameters for some steps [25, 26] and the presence of a large number of kinks in the nucleosomal DNA structure may lead to an incorrect value of nucleosome formation energy, as predicted by algorithms which use the linear elastic model [18, 20–24] for this calculation.
At the trinucleotide level, while roll does not add up to give a consistent pattern for local bending angle, slide adds up consistently in a position specific fashion even across widely differing sequences to give similar variation in minor groove width. This highlights the role of slide as being the principal dinucleotide parameter in characterising nucleosomal DNA structure. At longer length scales of octanucleotide and decanucleotide, the pattern of bending is consistent across all sequences. At a length scale of thirty-six basepairs, the curvature is generally uniform with small variation, within and across structures.
Overall, our results indicate that there is an ensemble of dinucleotide and trinucleotide level parameters that can give rise to similar core nucleosome structure. This structural versatility may act as a significant factor influencing the formation of nucleosomes in the vicinity of high-plasticity genes, and in varying the probability of binding by regulatory proteins. In view of this structural versatility, use of the best resolved X-ray crystal structure of the nucleosome  as the template for predicting the energetic cost of nucleosome formation by potential nucleosome sequences [18, 20–22], appears questionable.
Note: 2.5 Å resolution X-ray crystal structures of two distinct 145-basepair sequences, belonging to the so-called '601' fragment, were reported by Vasudevan et al  while this manuscript was under review. Hence we now have crystal structures of the nucleosome corresponding to four widely differing sequences. A quick survey of the parameters for the two new structures further highlights the large variation at dinucleotide and trinucleotide levels.
Crystal structure dataset selection
All X-ray crystal structures of the nucleosome core particle with a resolution better than 3.0 Å, released on or before 1 st November, 2009, were downloaded from the Protein Data Bank . The dataset comprises of twenty-nine structures, which correspond to only six unique sequences and have been labelled with numbers 1 to 6 in our analysis (Table 1).
Sequences 1 and 2 had only one corresponding structure, namely 1KX4  and 2NZD respectively, sequences 3 and 6 had two corresponding structures, namely 1U35  and 2F8N , and 1KX5  and 2PYO  respectively, sequence 4 had three corresponding structures, namely 1AOI , 1KX3  and 2CV5 , while the remaining twenty structures (1EQZ , 1F66 , 1M18, 1M19, 1M1A , 1P34, 1P3A, 1P3B, 1P3F, 1P3G, 1P3I, 1P3K, 1P3L, 1P3M, 1P3O, 1P3P , 1S32 , 1ZLA , 2NQB , 3C1B ) correspond to sequence 5.
Figure S15 in additional file 1 shows superhelical location (abbreviated as SHL) 0, which denotes the position of the pseudo two-fold axis of symmetry for the superhelix. At SHL 0, the major groove of the DNA faces the histone octamer. Before we proceed further, a few words about nomenclature are in order. As shown in figure S15, additional file 1 we will denote positions on the nucleosomal DNA sequence in terms of their displacement from SHL 0. The basepairs separated by factors of 10 from SHL 0 would be denoted by SHL ±i, where i is the relevant multiple of 10. The negative sign indicates a basepair on the 5'-side of SHL 0 along strand 1, while the positive sign indicates a basepair on the 3'-side of SHL 0 along strand 1. The basepairs between the integral super helical locations would be denoted by appropriate fractions. Thus, -4.4 would indicate a position four basepairs away from SHL 4, towards the 5'-end along strand 1, while 4.4 would indicate a position four basepairs away from SHL 4, towards the 3'-end along strand 1.
Sequence 4 is the 146-basepair long α satellite DNA. The 146-basepair long sequence 5 differs from it in two locations, with G:C and C:G basepairs replacing the T:A and A:T basepairs at positions -0.3 and 0.4 respectively. In sequence 6, the T:A and A:T basepairs at positions -5.2 and 5.3 in sequence 4 (the 146-basepair long α satellite DNA) are replaced by A:T and T:A basepairs respectively. In addition, a G:C basepair is present between positions -0.1 and -0.2 of sequence 4 to make it a 147-basepair long DNA. Finally, the T:A basepair at position 0.2 of sequence 4 is replaced by a C:G basepair to maintain basepair complementarity. Sequences 2 and 3 are 145-basepair long, with the T:A basepair at position 0.2 of sequence 4 missing. Further, the T:A and A:T basepairs at positions -5.2 and 5.3 in sequence 4 are replaced by A:T and T:A basepairs in case of sequence 2, and by G:C and C:G basepairs in case of sequence 3. 146-basepair long sequence 1 is very distinct from all the others, with differences at thirty-six positions in comparison to sequence 4.
In case of the structures corresponding to the 145-basepair long sequences 2 and 3, as well as the 147-basepair long sequence 6, SHL 0 passes through the central basepair, effectively dividing the structure into two equal halves. In case of the 146-basepair long sequences corresponding to sequences 1, 4 and 5, SHL 0 passes through the 73 rd basepair, counting from the 5'-end of strand 1. Thus SHL 0 divides the structure into two unequal halves, a shorter half of 72-basepairs on the 5'-side along strand 1, and a longer half of 73-basepairs on the 3'-side along strand 1.
All the structures corresponding to sequences 1 (PDB id 1KX4), 2 (PDB id 2NZD), 4 (PDB id's 1AOI, 1KX3 and 2CV5) and 6 (PDB id's 1KX5 and 2PYO), as well as one structure corresponding to sequence 3 (PDB id 2F8N) and two structures corresponding to sequence 5 (PDB id's 1EQZ and 2NQB) comprise of wild-type histones, though the histone sequence might vary at a few amino acid positions depending on the organism from which it was derived. None of these structures have any additional ligands attached to the DNA or the protein. The second structure corresponding to sequence 3 (PDB id 1U35) contains a histone variant macroH2A. The remaining eighteen structures corresponding to sequence 5 display a variety of histone mutations and binding by different ligands: mutations in histones H3 and H4 (PDB id's 1P34, 1P3A, 1P3B, 1P3F, 1P3G, 1P3I, 1P3K, 1P3L, 1P3M, 1P3O, 1P3P, 3C1B), mutation in H2A to give H2A.Z (PDB id 1F66), presence of a linker joining the two DNA disks (PDB id 1S32), presence of groove binding ligands (PDB id's 1M18, 1M19, 1M1A) and presence of an antigen bound to DNA (PDB id 1ZLA).
There are two structures of the nucleosome core particle (NCP) from Drosophila melanogaster (PDB id's 2NQB and 2PYO), and one structure each of the NCP from chicken (PDB id 1EQZ), mouse (PDB id 1U35) and human (PDB id 2NZD). All the other structures are of the NCP from Xenopus laevis.
Classification of dinucleotide steps depending on their orientation with respect to the histone octamer
For each integer i, (-7 ≤ i ≤ 7), SHL i denotes the centre of the region where the DNA major groove faces the histone octamer. On the other hand, each half-integer value i.5 denotes the centre of the region where the DNA minor groove faces the histone octamer. Hence, for each SHL i, the four dinucleotide steps incorporating the five basepairs at positions (i - 1) + 0.8, (i - 1) + 0.9, i, i + 0.1 and i + 0.2 have been classified as belonging to the region where the major groove faces the histone octamer. Similarly the four dinucleotide steps incorporating the five basepairs at positions i + 0.3, i + 0.4, i + 0.5, i + 0.6 and i + 0.7 have been classified as belonging to the region where the minor groove faces the histone octamer. The steps intermediate between these two regions, i. e. incorporating basepairs i + 0.2 and i + 0.3, and i + 0.7 and i + 0.8, have been classified as belonging to the region where the DNA backbone faces the histone octamer.
Calculation and classification of backbone torsion angles
Backbone torsion angles α, β, γ, δ, ϵ, ζ ; the glycosidic torsion angle χ and the pseudo rotation angle P  were calculated using the NUPARM program [60–62]. Backbone torsion angles for a basepaired dinucleotide step, i.e., across the phosphodiester bond, were clustered and analysed. ϵ: C4' n -C3' n -O3' n -Pn, ζ: C3' n -O3' n -Pn+1-O5'n+1, α: O3' n -Pn+1-O5'n+1-C5'n+1and γ: O5'n+1-C5'n+1-C4'n+1-C3'n+1are classified into seven states as per the algorithm proposed by Dixit et al. . Since ϵ and ζ assume two related conformations, ϵ, ζ = t, g− being the canonical conformation, known as BI, and ϵ, ζ = g−, t being the non-canonical conformation, known as BII, a value of ϵ - ζ ≤ 0 has been classified as the BI conformation, and a value of ϵ - ζ > 0 has been classified as the BII conformation.
Evaluation of basepair and dinucleotide step parameters
The structural parameters of the duplexes i.e. the basepair parameters buckle, propeller twist and open angle, the dinucleotide step parameters tilt, roll, twist, shift, slide, rise and Z p , and the bending angles between all dinucleotide steps were calculated using the NUPARM program [60–62].
The basepair parameters buckle, propeller twist and opening angle describe relative rotation between paired bases about their common x, y and z-axes respectively. All these parameters essentially quantify the intrinsic or induced basepair non-planarity.
The dinucleotide step parameters tilt, roll and twist measure the relative rotational motion between adjacent basepairs, while shift, slide and rise measure relative translational motion between adjacent basepairs along the local doublet x, y and z-directions respectively. The parameter Z p  is defined as the mean z-coordinate of the backbone phosphate atoms of the basepair with respect to the basepair dimer reference frame.
A local helix axis corresponding to each dinucleotide step is defined as the vector pointing in the direction of the cross-product of the differences of the x and y-vectors of the constituent basepair planes. The angle between two local helix axes vectors corresponding to overlapping dinucleotide steps is classified as the successive bending angle between those two steps.
Calculation of minor groove width
For the fibre model of a N-basepair DNA, if the phosphates in strand 1 are numbered from 2 to N in the 5'-3' direction, and the phosphates in strand 2 are numbered from 1' to (N-1)' in the 3'-5' direction, the minimum interstrand phosphate-phosphate (P..P) separation on the minor groove side is between Pi+4 and Pi', and spans 2 dinucleotide steps or 3 basepairs, and is in that sense, a trinucleotide parameter. We have classified the minimum P..P separation on the minor groove side as the minor groove width.
Calculation of angle between successive, non-overlapping tetranucleotide and pentanucleotide fragments
A global helix axis for a given fragment of DNA has been defined as the best-fit line fitted to the backbone C1' atoms of all the nucleotides constituting that fragment. The angles between global helix axes corresponding to successive, non-overlapping tetranucleotide and pentanucleotide fragments were calculated. The angles were assigned the sign of their dot product with the average of vectors in the x-directions of the two central basepairs.
Calculation of parameters quantifying curvature
The calculation of the radius of curvature using a least square circle fit method, the ratio of end-to-end distance to the contour length (d/l local or d/I max ) and the three moments of inertia I max , I mid and I min were done as described previously in . The radius of curvature (ROC) is calculated by fitting a circle to the basepair centres of the DNA molecules. Smaller the radius of this circle, the more curved is the DNA. However, the quality of the fit to a circle is affected by distortions at the local level in the duplex i.e. the successive bending angles. Thus the presence of several trinucleotides that are distorted, even to a small degree, would lead to a poor circle fit and consequently an inaccurate value of ROC. After a careful consideration of the RMSD values for a circle fit and a line fit for all the nucleosome fragments under consideration, only those fragments for which the RMSD for a circle fit is ≤ 1.5 Å and the RMSD for a line fit is ≥ 10.8 Å have been included, for the calculation of mean and standard deviation values for all parameters quantifying curvature. Most of the fragments for all the structures in our dataset were observed to satisfy these criteria, with the minimum number of selected fragments being 99 out of 109 for the structure with PDB id 1F66.
Calculation of autocorrelation and crosscorrelation values
The calculation of autocorrelation values for parameters of the same structure, and cross-correlation values for parameters of different structures were carried out using MATLAB. For calculation of crosscorrelation values, the parameter sets for 146 and 147-basepair sequences were truncated at the ends, so that the parameter sets for 145, 146 and 147-basepair sequences were of equal size. Care was taken to ensure that at zero lag, the parameter sets were aligned with respect to values corresponding to SHL 0 position.
All the analysis has been carried out excluding terminal basepairs to eliminate end effects. All the plots were generated using MATLAB.
Kornberg RD: Chromatin structure: a repeating unit of histones and DNA. Science 1974, 184: 868–871. 10.1126/science.184.4139.868
Kornberg RD, Lorch Y: Twenty-five years of the nucleosome, the fundamental particle of the eukaryote chromosome. Cell 1999, 98: 285–294. 10.1016/S0092-8674(00)81958-3
Segal E, Widom J: What controls nucleosome positions? Trends Genet 2009, 25: 335–343. 10.1016/j.tig.2009.06.002
Thastrom A, Lowary PT, Widlund HR, Cao H, Kubista M, Widom J: Sequence motifs and free energies of selected natural and non-natural nucleosome positioning DNA sequences. J Mol Biol 1999, 288: 213–229. 10.1006/jmbi.1999.2686
Ioshikhes IP, Albert I, Zanton SJ, Pugh BF: Nucleosome positions predicted through comparative genomics. Nat Genet 2006, 38: 1210–1215. 10.1038/ng1878
Segal E, Fondufe-Mittendorf Y, Chen L, Thastrom A, Field Y, Moore IK, Wang JP, Widom J: A genomic code for nucleosome positioning. Nature 2006, 442: 772–778. 10.1038/nature04979
Kaplan N, Moore IK, Fondufe-Mittendorf Y, Gossett AJ, Tillo D, Field Y, LeProust EM, Hughes TR, Lieb JD, Widom J, Segal E: The DNA-encoded nucleosome organization of a eukaryotic genome. Nature 2009, 458: 362–366. 10.1038/nature07667
Cohanim AB, Kashi Y, Trifonov EN: Three sequence rules for chromatin. J Biomol Struct Dyn 2006, 23: 559–566.
Kogan SB, Kato M, Kiyama R, Trifonov EN: Sequence structure of human nucleosome DNA. J Biomol Struct Dyn 2006, 24: 43–48.
Salih F, Salih B, Trifonov EN: Sequence structure of hidden 10.4-base repeat in the nucleosomes of C. elegans. J Biomol Struct Dyn 2008, 26: 273–282.
Gabdank I, Barash D, Trifonov EN: Nucleosome DNA bendability matrix (C. elegans). J Biomol Struct Dyn 2009, 26: 403–411.
Valouev A, Ichikawa J, Tonthat T, Stuart J, Ranade S, Peckham H, Zeng K, Malek JA, Costa G, McKernan K, Sidow A, Fire A, Johnson SM: A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning. Genome Res 2008, 18: 1051–1063. 10.1101/gr.076463.108
Mavrich TN, Ioshikhes IP, Venters BJ, Jiang C, Tomsho LP, Qi J, Schuster SC, Albert I, Pugh BF: A barrier nucleosome model for statistical positioning of nucleosomes throughout the yeast genome. Genome Res 2008, 18: 1073–1083. 10.1101/gr.078261.108
Jiang C, Pugh BF: Nucleosome positioning and gene regulation: advances through genomics. Nat Rev Genet 2009, 10: 161–172. 10.1038/nrg2522
Davey CA, Sargent DF, Luger K, Maeder AW, Richmond TJ: Solvent mediated interactions in the structure of the nucleosome core particle at 1.9 Å resolution. J Mol Biol 2002, 319: 1097–1113. 10.1016/S0022-2836(02)00386-8
Dickerson RE, Bansal M, Calladine CR, Diekmann S, Hunter WN, K O, et al.: Definitions and nomenclature of nucleic acid srtucture parameters. EMBO J 1989, 8: 1–4.
Richmond TJ, Davey CA: The structure of DNA in the nucleosome core. Nature 2003, 423: 145–150. 10.1038/nature01595
Tolstorukov MY, Colasanti AV, McCandish MM, Olson WK, Zhurkin VB: A novel roll-and-slide mechanism of DNA folding in chromatin: Implications for nucleosome positioning. J Mol Biol 2007, 371: 725–738. 10.1016/j.jmb.2007.05.048
Bishop TC: Geometry of the nucleosomal DNA superhelix. Biophys J 2008, 95: 1007–1017. 10.1529/biophysj.107.122853
Balasubramanian S, Xu F, Olson W: DNA sequence-directed organization of chromatin: structure-based computational analysis of nucleosome-binding sequences. Biophys J 2009, 96: 2245–2260. 10.1016/j.bpj.2008.11.040
Chung HR, Vingron M: Sequence-dependent nucleosome positioning. J Mol Biol 2009, 386: 1411–1422. 10.1016/j.jmb.2008.11.049
Rohs R, West SM, Sosinsky A, Liu P, Mann RS, Honig B: The role of DNA shape in protein-DNA recognition. Nature 2009, 461: 1248–1253. 10.1038/nature08473
Morozov AV, Fortney K, Gaykalova DA, Studitsky VM, Widom J, Siggia ED: Using DNA mechanics to predict in vitro nucleosome positions and formation energies. Nucleic Acids Res 2009, 37: 4707–4722. 10.1093/nar/gkp475
DeSantis P, Morosetti S, Scipioni A: Prediction of nucleosome positioning in genomes: limits and perspectives of physical and bioinformatic approaches. J Biomol Struct Dyn 2010, 27: 747–764.
Marathe A, Karandur D, Bansal M: Small local variations in B-form DNA lead to a large variety of global geometries which can accommodate most DNA-binding protein motifs. BMC Struct Biol 2009, 9: 24. 10.1186/1472-6807-9-24
Lavery R, Zakrzewska K, Beveridge D, Bishop TC, Case DA, Cheatham TE III, Dixit S, Jayaram B, Lankas F, Laughton C, Maddocks JH, Michon A, Osman R, Orozco M, Perez A, Singh T, Spackova N, Sponer J: A systematic molecular dynamics study of nearest-neighbor effects on base pair and base pair step conformations and fluctuations in B-DNA. Nucleic Acids Res 2009, 38: 299–313. 10.1093/nar/gkp834
Tsunaka Y, Kajimura N, Tate S, Morikawa K: Alteration of the nucleosomal DNA path in the crystal structure of a human nucleosome core particle. Nucleic Acids Res 2005, 33: 3424–3434. 10.1093/nar/gki663
Lu X, Shakked Z, Olson WK: A-form conformational motifs in ligand-bound DNA structures. J Mol Biol 2000, 300: 819–840. 10.1006/jmbi.2000.3690
Xu F, Olson WK: DNA architecture, deformability, and nucleosome positioning. J Biomol Struct Dyn 2010, 27: 725–739.
Lu X, Olson WK: 3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures. Nucleic Acids Res 2003, 31: 5108–5121. 10.1093/nar/gkg680
Ong MS, Richmond TJ, Davey CA: DNA stretching and extreme kinking in the nucleosome core. J Mol Biol 2007, 368: 1067–1074. 10.1016/j.jmb.2007.02.062
Chen CY, Ko TP, Lin TW, Chou CC, Chen CJ, Wang AH: Probing the DNA kink structure induced by the hyperthermophilic chromosomal protein Sac7d. Nucleic Acids Res 2005, 33: 430–438. 10.1093/nar/gki191
Chen S, Vojtechovsky J, Parkinson GN, Ebright RH, Berman HM: Indirect readout of DNA sequence at the primary-kink site in the CAP-DNA complex: DNA binding specificity based on energetics of DNA kinking. J Mol Biol 2001, 314: 63–74. 10.1006/jmbi.2001.5089
Chen S, Gunasekera A, Zhang X, Kunkel TA, Ebright RH, Berman HM: Indirect read-out of DNA sequence at the primary-kink site in the CAP-DNA complex: alteration of DNA binding specificity through alteration of DNA kinking. J Mol Biol 2001, 314: 75–82. 10.1006/jmbi.2001.5090
Wang D, Ulyanov NB, Zhurkin VB: Sequence-dependent Kink-and-Slide deformations of nucleosomal DNA facilitated by histone arginines bound in the minor groove. J Biomol Struct Dyn 2010, 27: 843–859.
Lankas F, Lavery R, Maddocks JH: Kinking occurs during molecular dynamics simulations of small DNA minicircles. Structure 2006, 14: 1527–1534. 10.1016/j.str.2006.08.004
Sereda YV, Bishop TC: Evaluation of elastic rod models with long range interactions for predicting nucleosome stability. J Biomol Struct Dyn 2010, 27: 867–887.
Bhattacharyya D, Bansal M: Groove width and depth of B-DNA structures depend on local variation in slide. J Biomol Struct Dyn 1992, 10: 213–226.
Tirosh I, Barkal N: Two strategies for gene regulation by promoter nucleosomes. Genome Res 2008, 18: 1084–1091. 10.1101/gr.076059.108
Polach KJ, Widom J: Mechanism of protein access to specific DNA sequences in chromatin: A dynamic equilibrium model for gene regulation. J Mol Biol 1995, 254: 130–149. 10.1006/jmbi.1995.0606
Polach KJ, Widom J: A model for the cooperative binding of eukaryotic regulatory proteins to nucleosomal target sites. J Mol Biol 1996, 258: 800–812. 10.1006/jmbi.1996.0288
Ulyanov NB, Zhurkin VB: Sequence-dependent anisotropic flexibility of B-DNA. A conformational study. J Biomol Struct Dyn 1984, 2: 361–385.
Zhurkin VB, Ulyanov NB, Ivanov VI: Mechanisms of DNA bending in the free state and in the nucleosome. In DNA Bending and Curvature, Adenine Press, New York 1988, 169–190.
Hagerman PJ: Sequence-directed curvature of DNA. Annu Rev Biochem 1990, 59: 755–781. 10.1146/annurev.bi.59.070190.003543
Zhurkin VB, Ulyanov NB, Gorin AA, Jernigan RL: Static and statistical bending of DNA evaluated by Monte Carlo simulations. Proc Natl Acad Sci 1991, 88: 7046–7050. 10.1073/pnas.88.16.7046
Olson WK, Marky NL, Jernigan RL, Zhurkin VB: Influence of fluctuations on DNA curvature. A comparison of flexible and static wedge models of intrinsically bent DNA. J Mol Biol 1993, 232: 530–554. 10.1006/jmbi.1993.1409
Vasudevan D, Chua EY, Davey CA: Crystal structures of nucleosome core particles containing the '601' strong positioning sequence. J Mol Biol 2010, 403: 1–10. 10.1016/j.jmb.2010.08.039
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28: 235–242. 10.1093/nar/28.1.235
Chakravarthy S, Gundimella SK, Caron C, Perche PY, Pehrson JR, Khochbin S, Luger K: Structural characterization of the histone variant macroH2A. Mol Cell Biol 2005, 25: 7616–7624. 10.1128/MCB.25.17.7616-7624.2005
Clapier CR, Chakravarthy S, Petosa C, Fernández-Tornero C, Luger K, Müller CW: Structure of the Drosophila nucleosome core particle highlights evolutionary constraints on the H2A-H2B histone dimer. Proteins 2008, 71: 1–7. 10.1002/prot.21720
Luger K, Mader AW, Richmond RK, Sargent DF, Richmond TJ: Crystal structure of the nucleosome core particle at 2.8 Å resolution. Nature 1997, 389: 251–260. 10.1038/38444
Harp JM, Hanson BL, Timm DE, Bunick GJ: Asymmetries in the nucleosome core particle at 2.5 Å resolution. Acta Cryst 2000, D56: 1513–1534.
Suto RK, Clarkson MJ, Tremethick DJ, Luger K: Crystal structure of a nucleosome core particle containing the variant histone H2A.Z. Nat Struct Biol 2000, 7: 1121–1124. 10.1038/81971
Suto RK, Edayathumangalam RS, White CL, Melander C, Gottesfeld JM, Dervan PB, Luger K: Crystal structures of nucleosome core particles in complex with minor groove DNA-binding ligands. J Mol Biol 2003, 326: 371–380. 10.1016/S0022-2836(02)01407-9
Muthurajan UM, Bao Y, Forsberg LJ, Edayathumangalam RS, Dyer PN, White CL, Luger K: Crystal structures of histone Sin mutant nucleosomes reveal altered protein-DNA interactions. EMBO J 2004, 23: 260–271. 10.1038/sj.emboj.7600046
Edayathumangalam RS, Weyermann P, Gottesfeld JM, Dervan PB, Luger K: Molecular recognition of the nucleosomal "supergroove". Proc Natl Acad Sci 2004, 101: 6864–6869. 10.1073/pnas.0401743101
Barbera AJ, Chodaparambil JV, Kelley-Clarke B, Joukov V, Walter JC, Luger K, Kaye KM: The nucleosomal surface as a docking station for Kaposi's sarcoma herpesvirus LANA. Science 2006, 311: 856–861. 10.1126/science.1120541
Lu X, Simon MD, Chodaparambil JV, Hansen JC, Shokat KM, Luger K: The effect of H3K79 dimethylation and H4K20 trimethylation on nucleosome and chromatin structure. Nat Struct Mol Biol 2008, 15: 1122–1124. 10.1038/nsmb.1489
Saenger W: Principles of nucleic acid structure. Springer; 1984.
Bhattacharyya D, Bansal M: A self-consistent formulation for the analysis and generation of non-uniform DNA structures. J Biomol Struct Dyn 1989, 6: 635–653.
Bansal M, Bhattacharyya D, Ravi B: NUPARM and NUCGEN: software for analysis and generation of sequence dependent nucleic acid structures. Comput Appl Biosci 1995, 11: 281–287.
Mukherjee S, Bansal M, Bhattacharyya D: Conformational specificity of non-canonical base pairs and higher order structures in nucleic acids: Crystal structure database analysis. J Comput Aided Mol Des 2006, 20: 629–645. 10.1007/s10822-006-9083-x
Dixit SB, Beveridge DL, Case DA, Cheatham TE III, Giudice E, Lankas F, Lavery R, Maddocks JH, Osman R, Sklenar H, Thayer KM, Varnai P: Molecular dynamics simulations of the 136 unique tetranucleotide sequences of DNA oligonucleotides. II: sequence context effects on the dynamical structures of the 10 unique dinucleotide steps. Biophys J 2005, 89: 3721–3740. 10.1529/biophysj.105.067397
Kanhere A, Bansal M: An assessment of three dinucleotide parameters to predict DNA curvature by quantitative comparison with experimental data. Nucleic Acids Res 2003, 31: 2647–2658. 10.1093/nar/gkg362
We are grateful to the anonymous reviewer 1 for making very pertinent suggestions. AM would like to thank Dr. Venu Govindu for valuable comments, and the Department of Biotechnology, Government of India, for financial support.
AM and MB designed the project. AM carried out the analysis and drafted the manuscript. The project was supervised by MB. Both the authors read and approved the manuscript.
Electronic supplementary material
Additional file 1: Structural variation at different length scales in nucleosomal DNA. This file contains mean and standard deviation values for three of the dinucleotide parameters, the trajectories for dinucleotide, trinucleotide, octanucleotide and decanucleotide parameters for six representative structures, and the autocorrelation and cross-correlation values between these parameters for the same six structures. (PDF 9 MB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.