Nucleosomal DNA is B-like at the dinucleotide step level
An analysis of the twenty-nine nucleosome X-ray crystal structures of better than 3 Å resolution reveals significant dinucleotide level structural variability in nucleosomal DNA, despite limited variation in sequence, with only six unique sequences, of which only one differs significantly from the other five. A survey of the dinucleotide step parameters indicates that all the dinucleotide steps in the dataset assume values characteristic of B-form DNA. Specifically, the parameter Z
, which has been shown to be the most reliable indicator of A-versus-B discrimination at the dinucleotide step level [25, 28], indicates that with the exception of ten steps, the entire nucleosomal dataset is B-like. This observation is in contrast to the study of all nucleosomal DNA structures reported by Xu and Olson . Xu and Olson have classified nucleosomal dinucleotide steps with large positive roll (> 7°) and negative slide (< -1 Å) (as calculated by the 3DNA program ) as exhibiting an "A-type kink-and-slide geometry", and observe 15% kink-and-slide steps assuming this geometry. However, it has been shown that roll is not a reliable indicator of an A-like geometry . Further, of the 421 nucleosomal DNA steps in our dataset with at least one of tilt or roll assuming values beyond 3σ of the free B-DNA mean values, only 16 steps (proportion 3.8%) assume roll value > 7° and slide value < -1 Å.
The presence of the non-canonical BII conformation in nucleosomal DNA in proportions larger than those observed in other protein-bound DNA highlights its role in modulating the nucleosomal DNA structure by facilitating a bend into the minor groove. In the free B-DNA dataset, only the CA/TG step has significant proportion of datapoints with one or both strands in BII conformation, which might explain why the CA/TG step is often strategically placed at positions where the DNA bends into the minor groove [18, 20]. This also explains why the CA/TG step constitutes the largest share of steps with at least one strand in BII conformation, in all regions of the nucleosome.
A comparison of the mean values for the three parameters, namely, roll, twist and slide, which contribute predominantly to the DNA curvature and superhelical pitch, for the nucleosome dataset with those for the free B-DNA dataset  sheds some light on the role played by the histone proteins. The mean values for BI/BI conformation are within 1σ of the mean values for free B-DNA, while most of the mean values for BII/BII and few of the mean values for BI/BII conformation are beyond 1σ of the corresponding mean values for free B-DNA. As a result, the overall mean values for roll, twist and slide for the BII/BII conformation in all three regions are beyond 1σ of the free B-DNA mean values, and reinforce the importance of the BII/BII conformation in modulating the nucleosome structure.
The kinks into the minor groove at GG/CC steps may be influenced by proteins and environmental factors
A minor groove kink is generally characterised by large negative roll, large twist and large positive slide, along with a state 7 backbone conformation i.e. a BII/BII conformation for ϵ-ζ and canonical B-DNA values for the other backbone torsion angles. The negative mean roll angle value with extremely large magnitude for the GG/CC steps occurring in region I and taking up a BII/BII backbone conformation, and the significant proportion of kinks among these steps prompted us to individually examine all the steps. We observe at least one kinked GG/CC step in all structures with the exception of 1KX5, 1M18, 1M19, 1M1A, 2CV5 and 2PYO. However, it must be pointed out that not all these extreme values belong to a step with state 7 conformation, in many cases, the step assumes one of the other six conformations in one or both of the strands. In most of the cases, these GG/CC steps occur at SHL -1.5 and/or +1.5 with respect to the pseudo two-fold axis of the superhelix. While many of these kinks into the minor groove are also accompanied by a stretching of the DNA as described earlier , in some cases the kinked conformation is observed in the absence of any stretching. Several GG/CC steps with a sharp kink also have an accompanying large positive slide. Thus at SHL -1.5 in the structure 1F66, the roll value is -68.1° with corresponding tilt, twist, shift, slide and rise values of -4.3°, 43.0°, 0.3 Å, 1.2 Å and 4.7 Å respectively, while the GG/CC steps in 1KX4 at SHL ±1.5 provide examples of distortion without stretching, but with positive slide values.
In about half the reported structures, a GC/GC step adjacent to the GG/CC step is also kinked. For example, while the GG/CC step at location -1.5 in the structure 1KX3 assumes tilt, roll, twist, shift, slide and rise values of 7.5°, -26.1°, 43.3°, -0.4 Å, 0.4 Å and 4.0 Å respectively, the neighbouring GC/GC step assumes values of -15.5°, -17.0°, 37.3°, 0.6 Å, 0.5 Å and 4.6 Å respectively. In the other structures, the neighbouring GC/GC step tends to compensate for the unusual conformation of the GG/CC step. On the other hand, the GG/CC step at SHL -1.5 has no significant distortion in some structures, while the neighbouring GC/GC step is extremely distorted. This is further proof of the variability observed even across structures with identical sequences. It must be noted that the nucleotides in these GG/CC steps with extreme values of roll, twist and slide were not hydrogen bonded to any amino acids.
Similarly, several of the AA/TT steps also assume large negative roll values at locations where the minor groove faces the histone octamer. This is observed in eleven of the structures at one or two locations, the most prominent being SHL ±4.5, and in few cases at SHL ±3.5 and ±1.5. Many of these AA/TT steps are observed to be part of A-tracts. Some of these are again accompanied by large twist and large postive slide values. For example, in the structure 2F8N, the roll, twist and slide values of the AA/TT step at SHL -3.5 are -14.2°, 44.6° and 1.7 Å respectively, while those for the step at SHL 1.5 are -19.5°, 44.0° and 1.8 Å respectively. While these steps also have an abnormal tilt, most of them do not display a large rise characteristic of stretching.
In addition to the above steps, few of the AG/CT steps are also observed to assume large negative roll, large twist and large positive slide at SHL 2.5 and in some cases, at SHL -1.5. For example, in the structures 1M1A and 2CV5, the tilt, roll, twist, shift, slide and rise values are 7.4°, -16.6°, 45.0°, 0.6 Å, 1.8 Å and 3.6 Å, and 4.7°, -18.5°, 40.6°, 1.1 Å, 1.6 Å and 3.5 Å respectively at SHL -1.5.
A comparison of the minor groove kinks in the nucleosome dataset with those observed in the DNA bound to other proteins shows that similar kinks are also associated with CA/TG steps in the Cre recombinase-bound DNA and the I-Cre I homing endonuclease-bound DNA structures. A dinucleotide step with large positive slide in the non-nucleosomal protein-bound DNA is most likely to be CA/TG, as noted by Tolstorukov et al. . The only non-CA/TG dinucleotide steps with a slide value > 2.0 Å are an AA/TT step in the hyper-thermophile SAC 7D-DNA complex structure with PDB id 1WTQ  and a CG/CG step in the catabolite activator protein-bound DNA structure with PDB id 1O3R [33, 34]. While both these steps assume large positive slide, they do not assume large negative roll and hence are not equivalent to a kink into the minor groove. The absence of a minor groove kink at the TA/TA steps in the crystal structure datasets of either the nucleosomal DNA or the non-nucleosomal protein-bound DNA is surprising, since it has been suggested earlier  that a minor groove kink at the TA/TA step will be energetically less costly as compared to an equivalent kink at the CA/TG step.
However, while the contribution by the CA/TG step to curvature and superhelical pitch remains largest, it is not exclusively confined to it. Given that the GG/CC step favours positive roll, small twist and negative slide  in free DNA, the frequency with which it is observed kinking into the minor groove in the nucleosome dataset is intriguing. The observation that most of the distorted GG/CC steps are observed at SHL ±1.5 indicates that this position might be of special relevance  and any dinucleotide step around this region might be vulnerable to stretching and distortion. However, exactly which step gets kinked and/or stretched might depend on a combination of factors such as the position, the dinucleotide sequence and the differing context of the nucleosome within chromatin. The kinks into the minor groove at GG/CC steps might also point to a more general tendency to have a mixture of favourable and unfavourable sequences, which results in only marginally stable nucleosomes , so that the nucleosome can be disrupted during transcription and replication while simultaneously preventing inappropriate access.
Extreme kinks into the major groove are less likely as compared to extreme kinks into the minor groove
The kinks into the major groove do not have tilt or roll values deviating too far from 3σ of the free B-DNA mean values. Only 4 out of 182 kinks in region III take up positive roll with a value deviating by > 5σ from the mean free B-DNA value, while the corresponding number for kinks in region I is 27 out of 172. The four steps are: an AT/AT step at SHL -1 in the structure 1EQZ (similar to the GG/CC kink into the minor groove, this is an unlikely conformation, since the AT/AT step favours nearly zero roll and slide in free B-DNA and other protein-bound DNA ), a GG/CC step at SHL -6 in the structure 1P3B, a CA/TG step at SHL -2 in the structure 2F8N and a CA/TG step at SHL -1 in the structure 2NZD.
A comparison of the major groove kinks in the nucleosome dataset with those observed in the DNA bound to other proteins shows that similar kinks are also observed at a variety of steps in different structures such as hyperthermophile SAC 7D-bound DNA (CG/CG, TA/TA, AA/TT), LAC-repressor-bound DNA (CG/CG), catabolite activator protein-bound DNA (CA/TG, CG/CG, GA/TC), integration host factor-bound DNA (AA/TT), Eco RV endonuclease-bound DNA (TA/TA), γδ resolvase-bound DNA (TA/TA) and TATA binding protein-bound DNA (TA/TA, AA/TT and AG/CT). Unlike most of the kinks in the nucleosome structures, these kinks are extremely sharp (roll ~50-60°) and often accompanied by a large rise of > 4.0 Å and sometimes by a small twist. However, similar to the conformation observed in nucleosomes, most of these steps are also B-DNA like in terms of Z
, slide and the backbone conformational parameters.
The linear elastic model may not be applicable to nucleosomal DNA
A number of algorithms, based on the simple harmonic approximation, have been developed to predict the energetic cost of nucleosome formation [18, 20–24]. These algorithms typically assign the parameter mean values in the X-ray crystal structure dataset of protein-DNA complexes (or values obtained by minimising the conformational energy ) as 'zero energy' values, and use a quadratic term as energy penalty for deviations from these mean values. However, analysis of high-resolution free and protein-bound DNA crystal structures  indicates that even within the B-DNA family, steps such as CA/TG assume a trimodal distribution in case of several dinucleotide parameters. Multimodal distribution of twist and slide values for several steps has been observed in molecular dynamics simulations carried out by the Ascona B-DNA consortium . In case of such steps, use of distribution mean values is invalid. In this context, it must also be pointed out that our definition of a kink in terms of mean and standard deviation values for tilt and roll over the entire free B-DNA dataset is meaningful only because the values for both parameters assume a single Gaussian distribution over the entire dataset.
The presence of a kink, on average, over every turn of the nucleosomal DNA helix poses an additional problem for algorithms based on the linear elastic model. Molecular dynamics studies  have shown that kinks similar to the ones observed in the nucleosome are stiff. Hence the simple harmonic approximation may lead to an incorrect value for energy of formation of a kink, and consequently, for energy of nucleosome formation. This is in agreement with the observation by Sereda and Bishop  that "removal of the largest amplitude deformations in the nucleosome had a significant [positive] effect on all elastic rod models" and therefore "a simple linear approximation does not properly capture the material properties of DNA".
Distribution of trinucleotide parameters indicates a more uniform variation in slide as compared to that in roll
The correlation values for successive bending angles within and across structures indicate that the bending profile fluctuates significantly not only for structures with different sequences but also for those with identical sequences, as well as for different regions within a structure. This variation in successive bending angle values can be most clearly explained in terms of the pattern of roll angle values. In all the structures, we observe blocks of two or three steps with negative roll values in the regions around SHL ±i.5 (i = 1, 2, 3, 4, 5, 6), followed by a junction step with nearly zero roll and a block of two or three steps with positive roll in the regions around SHL ±i (i = 1, 2, 3, 4, 5, 6, 7). This has been discussed by Richmond and Davey  for the structure 1KX5. However, within the block of negative roll, any of the two or three steps can have the highest magnitude of roll, and this step is observed to be different even for different structures corresponding to the same sequence. The observation also holds true for blocks of positive roll. This explains the difference in pattern of successive bending angles across structures corresponding to different as well as identical sequences, and highlights the structural versatility of B-form DNA.
It was shown by Richmond and Davey  in the structure 1KX5 that the regions between SHL -3 and SHL +3 display smooth bending when the minor groove faces the histone octamer while the minor groove blocks facing the histone octamer and farther away from the dyad are kinked with large negative roll angle values. However, this observation does not hold true in general, as seen by a survey of the successive bending angle values in all structures. While the structures in the 1M1 series display smooth bending in the region between SHL -3 and +3 and kinks outside those blocks, most of the other structures have sharp kinks throughout in blocks of both positive and negative roll. Even in 1M1A, there are sharp kinks at SHL -1.5 and -2. Several structures have a kink at -1.5 or +1.5 (1EQZ) as described earlier while some of them have kinks at -2 or +2. It must be noted that the structure 1KX5 also has a sharp kink at +2. As a result of variation in kinks versus smooth bending, the variation in the shift parameter is also not uniformly different for the regions binding H3 and H4 as against the regions binding H2A and H2B, as noted for 1KX5 .
A survey of the successive bending angle values for all structures also indicates that for almost all of the 146-basepair structures, the kinks in the shorter half (please refer 'Methods' section for definitions of the shorter and longer halves) of the structures are larger in magnitude than the corresponding kinks in the longer half. This trend is seen most consistently in structures of the 1P3 series but it is also observed in all other 146-basepair structures. In addition, most of the large kinks into the minor and major grooves, commented upon in the previous section, occur in the shorter half of the DNA structure. These two observations together imply that stretching in one half of the structure to cover the same distance with one basepair less as compared to the longer half leads to sharper kinks throughout the shorter half. Of the three 145-basepair structures, only 2F8N is observed to assume sharper kinks in the first half compared to the second half of the structure. The 147-basepair structures do not display this behaviour.
In contrast to successive bending angles, the variation in minor groove width is seen to be far more consistent across all structures. Minor groove width has been shown to be proportional to the mean of the slide of the two dinucleotide steps constituting the trinucleotide . The consistency of the minor groove width variation across different nucleosome structures implies that unlike roll, the slide parameter adds up in a regular fashion at the trinucleotide level, independent of sequence. This observation supports the earlier suggestion that slide is the most important parameter in determining DNA superhelical structure .
Interplay of slide, roll and twist causes variation of gross structural parameters at different length scales
We observe that at a length scale of thirty-six basepairs, curvature values have large fluctuations, as indicated by the standard deviation values being ~4-5% of the mean ROC. This is the cumulative effect of large variation in roll angles at the same position with respect to SHL 0. While roll is primarily responsible for curvature, twist and slide are responsible for superhelical rise. However, within and across structures, slide displays more regular variation as compared to roll and twist. Thus the pattern of interplay between these parameters is different at varying length scales, for long fragments within the same structure and also across structures, leading to variations in the core nucleosome structure. This observation is in agreement with results from Bishop's analysis of crystal structures and molecular dynamics simulation data of the nucleosomal DNA .
Structural versatility of B-form nucleosomal DNA may contribute to the plasticity of gene expression
Nucleosome positioning is known to be controlled by various factors such as preference of the DNA sequence to assume a nucleosome like structure, DNA methylation, higher order chromatin structure and presence of DNA binding proteins such as transcription factors . Of these factors, the intrinsic preferences of the DNA sequence have been shown to play a key role in determining the organisation of nucleosomes in vivo[5–7]. There have been a host of studies which have attempted to derive the complete sequence pattern characteristic of nucleosomal DNA by analysing the in-phase and out-of-phase occurrences of various dinucleotides such as AA and TT, GG and CC, AT, TA, and CA and TG [4, 6, 8–11]. In this study, we have not looked at sequence variation in the nucleosome crystal structure dataset, as it has only two widely differing sequences. However, it must be noted that the statistical enrichment of preferred dinucleotide and longer motifs is observed to occur only modestly above a random distribution and is limited to nucleosomes immediately upstream and down-stream of a transcription start site (TSS) [12–14]. In other words, formation of a majority of in vivo nucleosomes is largely controlled by factors other than the DNA sequence. This is especially true for nucleosomes in the vicinity of genes, which display a higher plasticity in terms of variation in their expression, with such nucleosomes displaying a more homogeneous and dynamic occupancy across promoters, and a particularly high occupancy close to the TSS . We would like to propose that the large range of permissible variation in structure of B-form DNA  acts as an important factor in the formation of nucleosomes in such regions. There has not been any focus on this factor, because we do not have the structure of nucleosomal DNA in in vivo conditions, and it is difficult to comment on its variability. However, our analysis of all the available nucleosome crystal structures shows that even within this limited dataset, and even for the same sequence, there is an ensemble of dinucleotide and trinucleotide level B-form structures, that can lead to similar core nucleosome structure. We also hypothesise that the structural versatility of nucleosomal DNA might act as an important facilitator of expression plasticity by changing the volume of periodically exposed grooves and thereby, varying the probability of recognition by regulatory proteins that bind to these grooves [14, 40, 41].
The best resolved nucleosome crystal structure may not be the 'ideal' template
Several studies have focussed on developing algorithms to predict the energetic cost of nucleosome formation [18, 20–22] by using as a template, the best resolved X-ray crystal structure of nucleosome with PDB id 1KX5 . We see major drawbacks in this approach, since our analysis clearly points at significant variation in local nucleosomal structure, and hence it seems unlikely that the single static structure represented in 1KX5 is the structure for nucleosomal DNA. This is in agreement with the observation by Xu and Olson  that "Nucleosomal DNA can also take slightly different conformational routes in the course of its packaging" and "the different nucleosomal pathways accommodate the deformations of a common sequence ... in different ways". Given that DNA curvature is essentially statistical [42–44], and considering that statistical and static averages are often different [45, 46], the structure 1KX5 is unlikely to represent the statistical mean of such an ensemble. Hence calculation of the energetic cost for a given genomic sequence to take up a nucleosome structure, assuming the structure of 1KX5 as the template, may not lead to biologically meaningful results.