Search for allosteric disulfide bonds in NMR structures

Background Allosteric disulfide bonds regulate protein function when they break and/or form. They typically have a -RHStaple configuration, which is defined by the sign of the five chi angles that make up the disulfide bond. Results All disulfides in NMR and X-ray protein structures as well as in refined structure datasets were compared and contrasted for configuration and strain energy. Conclusion The mean dihedral strain energy of 55,005 NMR structure disulfides was twice that of 42,690 X-ray structure disulfides. Moreover, the energies of all twenty types of disulfide bond was higher in NMR structures than X-ray structures, where there was an exponential decrease in the mean strain energy as the incidence of the disulfide type increased. Evaluation of protein structures for which there are X-ray and NMR models shows that the same disulfide bond can exist in different configurations in different models. A disulfide bond configuration that is rare in X-ray structures is the -LHStaple. In NMR structures, this disulfide is characterised by a particularly high potential energy and very short α-carbon distance. The HIV envelope glycoprotein gp120, for example, is regulated by thiol/disulfide exchange and contains allosteric -RHStaple bonds that can exist in the -LHStaple configuration. It is an open question which form of the disulfide is the functional configuration.


Background
It appears that introduction of disulfide bonds into proteins is an important mechanism by which they have evolved and are evolving [1][2][3]. A recent analysis of the trend in amino gain and loss in protein evolution showed that Cys have accrued in all 15 taxa studied [3]. In fact, Cys was the most frequently acquired amino acid in 8 of the 15 taxa. Considering that disulfide bonds will only form between optimally placed Cys in the tertiary structure, it follows that these bonds are a relatively recent addition to proteins.
Most protein disulfide bonds are motifs that stabilise the tertiary and quaternary protein structure. These bonds are also thought to assist protein folding by decreasing the entropy of the unfolded form [4]. A minor population of disulfide bonds serve a functional role. There are two types of functional disulfides; the catalytic and allosteric bonds.
The catalytic bonds are typically at the active sites of enzymes that mediate thiol/disulfide exchange in other proteins. These enzymes are the oxidoreductases [5,6]. The allosteric bonds, in contrast, control the function of the protein in which they reside by mediating a change when they break and/or form [7,8]. The type of change depends on the protein. It may be conformational as described for the HIV receptor, CD4 [9,10], or the result-ing unpaired thiols of the cleaved allosteric bond may act as sites of alkylation by thiol modifiers as described for the blood clotting initiator, tissue factor [11,12]. The actions of the two functional disulfides are linked in that the redox state of the known allosteric disulfides are controlled by catalytic disulfides [9,12,13]. In an attempt to identify a common structural motif for allosteric disulfides the geometry and strain of 6,874 unique disulfide bonds in X-ray structures was recently examined [8].
A disulfide bond is made up of six atoms linking the two α-carbon atoms of the cysteine residues; C α -C β -S γ -S γ '-C β '-C α '. These six atoms define five chi angles, which are the rotation about the bonds linking the atoms. Each chi angle can be either positive or negative, which equates to 20 possible disulfide bond configurations. The three basic types of disulfide are the spirals, hooks or staples and depending on the sign of the χ 3 angle they are either rightor left-handed [14]. We expanded these standard definitions to reflect the sign of the χ 1 and χ 1 ' torsional angles [8]. For instance, a disulfide is a minus right handed spiral (-RHSpiral) if the χ 1 χ 2 χ 3 χ 2 ' and χ 1 ' angles are -, +, +, + and -, respectively. The disulfides are treated as symmetrical. For example, a disulfide is a +/-RHSpiral if the χ 1 , χ 2 , χ 3 , χ 2 ', χ 1 ' angles are +, +, +, +, -or -, +, +, +, +.
The spirals are the main structural disulfides. With one or two exceptions all the catalytic disulfides are +/-RHHooks, while the known allosteric disulfides are -RHStaples [8]. The allosteric bonds are also defined by closely-spaced αcarbon atoms of the two cysteine residues. The -RHStaple bonds have a mean α-carbon atom distance of 4.3 Å, compared to a mean of 5.6 Å for all disulfides [8]. This is because of their position in protein structures. These bonds often link adjacent strands in the same β-sheet secondary structure [7,15]. The strands are usually so close in the β-sheet that they need to pucker to accommodate the disulfide bond [15].
While most protein structures have been solved by X-ray crystallography, a growing number of NMR structures are becoming available. There are also some proteins whose structure has been determined by both methods. A recent analysis of 78 protein structures determined by both X-ray and NMR methods showed that 18 of the 78 structures are significantly different, while the other 60 structures are very similar [16]. The large scale differences likely reflect crystal versus solution structures.
The primary limitation in determining protein structure by NMR is the size of the protein. The size limitation for complete atomic-resolution structure determination by NMR is currently ~30 kDa, though backbone assignments and general folds have been described for proteins up to 100 kDa. X-ray crystallography does not suffer from the size restrictions of NMR, with protein size having no direct bearing on the solvability of the protein or protein complex. This is at least partly why most protein structures have been determined by X-ray rather than NMR. The limitation of X-ray crystallography is its static nature. This means that only a single structure can be determined and any protein movement during data collection results in decreased resolution. Indeed, in many structures there are segments of the protein that are so disordered they are not contained in the structure. With the advent of timeresolved crystallography some dynamic data can be obtained. However, each individual snapshot is still limited by the requirement of an unmoving structure.
In this study, we compare and contrast the disulfide configurations and energies of all NMR and X-ray protein structures. Analysis of the points of contrast between the datasets have led to the identification of a new potential allosteric disulfide defined by the -LHStaple configuration.

Results and discussion
As of June 20, 2006, there were 37,141 structure files available in the protein databank. Of these, 31,611 were determined by X-ray crystallography, 5,476 were determined by NMR and 54 were determined by cryo-electron microscopy or powder diffraction. There was a mean of 15 structural models in each NMR file deposited, resulting in 84,584 total NMR structural models. There were 97,741 disulfides in all files, as determined by the presence of an SSBOND line in the PDB file. Of these disulfides, 42,690 were found in X-ray structures, 55,005 in the separate NMR structures, and 46 were from structures determined by the other methods.
There is a mean of 1.4 disulfide bonds listed per X-ray structure file in the PDB. This is higher than the mean of 0.6 disulfide bonds per NMR structure and 0.9 disulfide bonds per structure determined by other methods. The prototypical structural disulfide configuration, the -LHSpiral [8], accounts for nearly 30% of all disulfides in X-ray structures ( Table 1) and 20% of the disulfides in NMR structures ( Table 2).
The five chi angles of the disulfide bond was used to estimate the potential energy of each bond, or dihedral strain energy [8,17,18]. This energy measurement is approximate but has been shown to be a useful measure of disulfide strain [19][20][21][22]. A striking feature is the disparity in dihedral strain energy between NMR and X-ray disulfides. The mean dihedral strain energy of all NMR disulfides (26.5 kJ.mol -1 , Table 2) is twice that of X-ray disulfides (13.1 kJ.mol -1 , Table 1). The ordering of the mean strain energies between the different dihedral con- The disulfide bonds were separated into twenty configurations based on the sign of χ 1, χ 2 , χ 3 , χ 2 ' and χ 1 ' angles [8]. The dihedral strain energy (DSE) and distance between the two α carbon atoms were calculated for each disulfide and the mean and 95% confidence intervals is shown for each group. The disulfide bonds were separated into twenty configurations based on the sign of χ 1, χ 2 , χ 3 , χ 2 ' and χ 1 ' angles [8]. The dihedral strain energy (DSE) and distance between the two α carbon atoms were calculated for each disulfide and the mean and 95% confidence intervals is shown for each group.
figurations, though, is nearly the same between NMR and X-ray structures. This supports the validity of the analysis and highlights the difference in tolerance for highly strained disulfides in NMR versus X-ray structures. This is demonstrated graphically in Fig. 1A, where the dihedral strain energies of disulfides in NMR structures have a much broader distribution across the energy range. In NMR structures there is only a modest linear decrease in the mean strain energy as a function of the incidence of each disulfide configuration. In X-ray structures, however, there is an exponential decrease in the mean strain energy as the incidence of the configuration increases (Fig. 1B).
The overall spread of values is similar, however, with the strain energies ranging from 2.1 to 79.1 kJ.mol -1 in NMR structures and from 2.1 to 75.6 kJ.mol -1 in X-ray structures.
There are several possible explanations for the higher average strain energy of disulfide bonds in NMR-determined structures. One possibility is a higher degree of error in defining disulfide bond structures in NMR compared to Xray structures. To test this notion, the disulfide bonds in a dataset of uniformly refined NMR structures [23,24] was analysed.
The lower tolerance for disulfide strain energy in X-ray structures is also apparent when comparing the data for all X-ray structures in Table 1 with the data we reported earlier for a set of unique X-ray disulfides [8] and the disulfides of a culled set of X-ray structures described by Distribution of disulfide strain energies in NMR and X-ray structures Figure 1 Distribution of disulfide strain energies in NMR and X-ray structures. A. Number of disulfide bonds for each dihedral strain energy (in 2.5 kJ.mol -1 increments) for structures determined by NMR (total of 55,005 disulfides, Table  2) and X-ray (total of 42,690 disulfides, Table 1). B. Plot of the mean strain energy and 95% confidence intervals of each disulfide configuration versus the incidence of that configuration. The dotted lines are the linear least-squares fit to the NMR data (top line; Table 2) or single exponential least squares fit to the X-ray data (bottom line; Table 1). C. Plot of the mean strain energy and 95% confidence intervals of each disulfide configuration versus the incidence of that configuration for all X-ray disulfides (42,690 disulfides; see part B), a unique set of 6,874 X-ray disulfides described by Schmidt et al. [8] (data set 1) and the 16,225 disulfides of a culled set of X-ray structures described by Guoli Wang and Roland Dunbrack, Jr. [25] (data set 2).  (Table 3, Fig.  1C). The Wang and Dunbrack structures represent nonredundant sequences across all PDB files and were selected based on the highest resolution structure available and then the best R-values. The overall trend in relative strain energies of the different configurations and their incidence is the same for the non-culled and culled datasets. This finding indicates that the analysis of the nonculled dataset has not been unduly biased by those proteins for which there are numerous X-ray structures, such as serine proteinases like trypsin.
Direct comparison of disulfide bond characteristics in NMR and X-ray structures can be made for proteins whose structures have been determined by both methods. The disulfide bond configurations in 10 proteins that have very similar X-ray and NMR structures (MaxSub ≥ 0.77) has been determined ( Table 4). The differences in the Xray versus NMR models of the proteins is comparable to the differences between various X-ray or various NMR structures of a given protein [16]. It is apparent that a given disulfide can exist in different configurations in NMR models. Most often, the configuration found in the X-ray structure is also found in one or more of the NMR models. For example, the Cys26-Cys84 disulfide in ribonuclease A is a -LHSpiral in the X-ray structure and in 16 of the 32 NMR models. In the other 16 models it is a -RHHook (13) or -RHSpiral (3). There are some notable exceptions however. The Cys11-Cys27 disulfide in tendamistat is a -/+RHHook in the X-ray structure and a +/-LHStaple in all 9 NMR models. Also, the Cys25-Cys80 disulfide in β 2 -microglobulin is a -LHStaple in the X-ray structure but a -LHSpiral (10), -RHSpiral (7) or -RHHook (3) in the 20 NMR models. These findings indicate that structures of some disulfides are particularly malleable.
There are 10 disulfides in this dataset of comparable structures where the X-ray configuration is also the predominant NMR configuration. Notably, nine of the ten dihedral strain energies for the matching disulfide configurations are significantly higher in NMR structures (Table  4). This finding supports the notion that the propensity for a protein to crystallize relates, at least in part, to the amount of strain in its disulfide bonds.
The mean distance between the α-carbon atoms of the disulfide bond is the same in NMR and X-ray structures, at 5.6 Å (Tables 1 and 2). The -RHStaple configuration is the standout for α-carbon distance, with mean distances of 4.5 Å and 4.2 Å in NMR and X-ray structures, respectively (Fig. 2). As discussed previously [8,15], this is because -RHStaples are often found linking adjacent strands in the same antiparallel β-sheet. The -RHStaple configuration is favoured by allosteric disulfides [8]. The finding that -RHStaples have the same features in NMR and X-ray structures further supports this motif as a hallmark of allosteric The disulfide bonds were separated into twenty configurations based on the sign of χ 1, χ 2 , χ 3 , χ 2 ' and χ 1 ' angles [8]. The dihedral strain energy (DSE) and distance between the two α carbon atoms were calculated for each disulfide and the mean and 95% confidence intervals is shown for each group.  (3) 1 Numbers in brackets are the number of disulfides with that configuration. 2 The mean dihedral strain energy (DSE) and 95% confidence intervals. 3 The root-mean square deviations (RMSD) value was calculated between all C α atoms of the X-ray structure and the first NMR model [16]. 4 MaxSub is a measure of structural similarity of the X-ray and NMR structures [16]. A score of 1.0 means that all C α atoms are matched between the X-ray and NMR structures, while a score towards zero indicates very different structures. All the structures listed in the table have only small-scale differences (MaxSub values from 0.77 to 0.93).

Table 4: Comparison of the disulfide bond configurations in proteins that have very similar X-ray and NMR structures. (Continued)
bonds. The catalytic disulfides in X-ray structures are nearly always +/-RHHooks [8]. They are also predominantly +/-RHHooks in NMR structures of oxidoreductases (data not shown), but can exist in subsets of the RHHook configuration. The catalytic disulfide in one NMR structure of thioredoxin (PDB ID 1xoa), for example, is a -RHHook in 15 of the 20 models and a +/-RHHook in the other 5 (Table 4).
Due to the high strain energies of these short -LHStaples, it is understandable that they would be rare in X-ray structures due to the generally low tolerance for high energy bonds. In NMR and X-ray structures that contain -RHStaple disulfides, it is apparent that these bonds can often exist in the -LHStaple configuration and vice versa. Moreover, the disulfides that can exist in both -RHStaple and -LHStaple configurations almost invariably have high strain energy and a short α-carbon separation in both the right-handed and left-handed configurations (data not shown). These findings suggest that the -LHStaple should be considered a potential allosteric bond. Indeed, it remains in question if it is the -RHStaple or the strained -LHStaple that is the functional form of allosteric disulfide bonds. Two proteins in which this switching occurs, fibronectin and HIV gp120, will be discussed in more detail.
Fibronectin is a major component of extracellular matrices where it influences a variety of cellular functions by binding to surface integrin receptors [26]. Following secretion from cells it assembles into a fibrillar network that once formed is resistant to all denaturants except reducing agents [27]. The mechanism of fibril formation is not well understood but it may involve domain swapping [28,29]. The five N-terminal type 1 repeats of fibronectin are essential for fibril formation [26]. Type 1 domains are ~40 residues in length and contain two Mean distance between the α-carbons of each of the 20 disulfide configurations in NMR and X-ray structures Figure 2 Mean distance between the α-carbons of each of the 20 disulfide configurations in NMR and X-ray structures. The mean distance between the α carbons of all disulfides is 5.6 Å for both NMR (part A) and X-ray (part B) structures. The outliers with a short α carbon distance are the allosteric -RHStaple bonds in both NMR and X-ray structures and the -LHStaple bonds in NMR structures. The dotted lines are the linear least-squares fit to the data.  (Table 5). Given the apparent necessity for a -RHStaple or -LHStaple in the 2-4 disulfides, we suggest that these are allosteric disulfides that might regulate fibril formation. The fact that the -LHStaple configuration of these bonds uniformly have a higher DSE and shorter α-carbon separation than the -RHStaple configuration can be interpreted to suggest either that there is some uniform defect in the modelling of this configuration or that the -LHStaple is the functional configuration.
The HIV envelope glycoprotein consists of the surface glycoprotein gp120 bound non-covalently to transmembrane gp41 that is anchored in the viral membrane [30]. The two proteins dissociate when gp120 binds to CD4 and a chemokine receptor. This allows the gp41 fusion peptide to be inserted into the target membrane, which drives the membrane merger [31]. Cleavage of two of the nine disulfide bonds in gp120 appears to be important in this process [32,33]. It has been proposed that cleavage of the gp120 bonds facilitate unmasking of the gp41 fusion peptide and its insertion into the target cell membrane [32,33]. Seven of the nine disulfide bonds are present in the eight core structures of gp120 in the protein databank, and five of these bonds can exist in either -RHStaple or -LHStaple configurations in the different structures ( Table  6). Considering that the V3 domain binds chemokine receptor and that cleavage of gp120 disulfides ablates this interaction [32], the Cys296-Cys331 bond that tethers the ends of V3 is most likely one of the two disulfides cleaved in gp120. There is currently no experimental data to suggest what other disulfide is cleaved. Our analysis leads us to propose that the Cys385-Cys418 disulfide is the other bond cleaved.
The Cys126-Cys196 bond is found in the -RHStaple configuration in seven of the eight structures and has strain energies ranging from 20 to 40 kJ.mol -1 (Table 5). However, the distance between α-carbons for this bond is longer than for the other -RHStaples in this protein. The Cys218-Cys247 is also found in the -RHStaple configuration in the solved structures and the α-carbon separation is less than 4 Å. The strain energies for this bond are modest, though, ranging from 12 to 20 kJ.mol -1 . By comparison, the Cys385-Cys418 bond is found as a -RHStaple in two of the reported structures and as a -LHStaple in one structure. In the remaining structures, it is found as a -LHHook. The strain energies are around 30 kJ.mol -1 , however, with the -LHStaple configuration having a strain of 43 kJ.mol -1 . Additionally, the α-carbon separation is short, ranging from 3.7 to 3.9 Å in all of the structures. While the predominant configuration of this bond, -Distribution of strain energies and α-carbon distances for the -LHStaple disulfides in NMR and X-ray structures Figure 3 Distribution of strain energies and α-carbon distances for the -LHStaple disulfides in NMR and X-ray structures. A major fraction of the 1,805 -LHStaple bonds in NMR structures (part A) have a high strain energy (~50 kJ.mol -1 ) and short α-carbon distance (~4 Å). The majority of the 599 -LHStaple bonds in X-ray structures (part B) have a low strain energy (~10 kJ.mol -1 ) and long α-carbon distance (~6.5 Å). Example of a short, high energy -LHStaple (the Cys45-Cys56 bond in fibronectin, PDB ID 1o9a) and a long, low energy -LHStaple (the Cys133-Cys193 bond in urokinase plasminogen activator, PDB ID 2fd6) is shown in part C. The fibronectin disulfide is a NMR structure (Table 4), while the urokinase plasminogen activator disulfide is a X-ray structure with a resolution of 1.9 Å, a DSE of 2.9 kJ.mol -1 and an α-carbon distance of 6.5 Å. The structures look at the side of the S-S bond, which is shown in the horizontal position. They were generated using PyMol [35]. LHHook, has not been associated with allosteric disulfides, the high strain of this bond disposes it to cleavage. Although, given the preference for lower energy bond configurations in X-ray structures, it is possible that the predominance of the -LHHook configuration in this structure is a biproduct of crystal packing. We suggest that it is the -LHStaple configuration of this bond that is most susceptible to cleavage and is the second disulfide cleaved during viral entry. The Cys385-Cys418 bond is in the same β-barrel as the Cys296-Cys331 disulfide. It is plausible that accessibility of one of these bonds to the reduct-ant leads to the accessibility of the other bond as well. The cleavage of two strained, cross-strand disulfides in one structural motif should allow for a large conformational change in the domain.

Conclusion
Comparison of the same disulfide bonds in very similar Xray and NMR structures indicates that the bonds often exist in different configurations in different NMR models and usually with a higher potential energy than found in X-ray structures. One bond configuration that is scarce in  X-ray structures is the -LHStaple. In NMR structures, this disulfide is characterised by a particularly high potential energy and very short α-carbon distance. Moreover, allosteric -RHStaple disulfides often exist in the -LHStaple configuration in different NMR models. The rarity of -LHStaple disulfides in X-ray structures is consistent with the finding that disulfides in crystallized proteins generally have lower strain energy than those found in solution structures. We suggest that the -LHStaple is an allosteric configuration.

Methods
All structures released in the protein databank [34] as of June 20, 2006 were analyzed. Disulfide bonds in structures were determined by the presence of an SSBOND line in the PDB file. NMR structures were analyzed once, using the first model listed as the representative structure. The files were then separated into each individual model and analyzed.