 Methodology article
 Open Access
 Published:
A general method for the unbiased improvement of solution NMR structures by the use of related XRay data, the AUREMOLISIC algorithm
BMC Structural Biology volume 6, Article number: 14 (2006)
Abstract
Background
Rapid and accurate threedimensional structure determination of biological macromolecules is mandatory to keep up with the vast progress made in the identification of primary sequence information. During the last few years the amount of data deposited in the protein data bank has substantially increased providing additional information for novel structure determination projects. The key question is how to combine the available database information with the experimental data of the current project ensuring that only relevant information is used and a correct structural bias is produced. For this purpose a novel fully automated algorithm based on Bayesian reasoning has been developed. It allows the combination of structural information from different sources in a consistent way to obtain high quality structures with a limited set of experimental data. The new ISIC (I ntelligent S tructural I nformation C ombination) algorithm is part of the larger AUREMOL software package.
Results
Our new approach was successfully tested on the improvement of the solution NMR structures of the Rasbinding domain of Byr2 from Schizosaccharomyces pombe, the Rasbinding domain of RalGDS from human calculated from a limited set of NMR data, and the immunoglobulin binding domain from protein G from Streptococcus by their corresponding Xray structures. In all test cases clearly improved structures were obtained. The largest danger in using data from other sources is a possible bias towards the added structure. In the worst case instead of a refined target structure the structure from the additional source is essentially reproduced. We could clearly show that the ISIC algorithm treats these difficulties properly.
Conclusion
In summary, we present a novel fully automated method to combine strongly coupled knowledge from different sources. The combination with validation tools such as the calculation of NMR Rfactors strengthens the impact of the method considerably since the improvement of the structures can be assessed quantitatively. The ISIC method can be applied to a large number of similar problems where the quality of the obtained threedimensional structures is limited by the available experimental data like the improvement of large NMR structures calculated from sparse experimental data or the refinement of low resolution Xray structures. Also structures may be refined using other available structural information such as homology models.
Background
In any structure determination process of a biological macromolecule the general goal is to obtain from the available data a structure as accurate as possible. For all high throughput procedures as used in structural genomics projects the structure determination process has to be as fast as possible, demanding that only a minimal set of experimental data is recorded. One way to speed up the NMR structure determination process is to reduce the required number of experimental restraints and/or to use only restraints that are relatively easy to obtain e.g. backbone dihedral angles, chemical shifts, residual dipolar couplings, hydrogen bonds, or H^{N}H^{N} NOEs. When the amount of available experimental data is limited, the use of additional information such as structural data from homologous proteins is advisable. Most fast methods previously described in the literature are mainly aimed at determining the global fold of a protein [1–9]. Another set of methods directly uses information from different sources, namely NMR and Xray, for joint structure refinement to obtain refined structures. It is common to these approaches that discrepancies between NMR and Xray data are manually corrected, for example by removing violated NOEs, reassigning NOEs or hydrogenbonds, and taking spindiffusion effects on NMR restraints into account [10–15].
From the conceptual point of view in any structural prediction or calculation from a set of mixed data one has to decide beforehand what kind of structure is the target of the procedure since there is nothing like "the structure". This question is inherently answered in purely experimental structure determination since solution NMR spectroscopy determines the structure in solution and a crystal structure in the crystal. More importantly, the selected experimental conditions such as the buffer and the absence or presence of ligands select the target structural set.
Here, we present a novel general and fully automated approach called ISIC (I ntelligent S tructural I nformation C ombination) for the combination of structural information from different sources. It allows the predefinition and selection of the target structural set and properly treats discrepancies inherent in the input structural data, thereby ensuring that the additional input data are properly biased toward the target structural set. Using the combined information, high resolution structures are calculated and results are automatically verified on experimental data. One possible application of the ISIC algorithm for rapid structure determination would include the use experimental solution NMR data that is relatively easy to obtain, such as backbone dihedral angles, chemical shifts, residual dipolar couplings, hydrogen bonds, or H^{N}H^{N} NOEs that alone allow the calculation of a low to medium resolution NMR structure, supplemented with for example data from homology modeling or from a homologous Xray structure.
In this paper, ISIC was tested for three applications that may occur in "real life". Firstly, the refinement of a solution structure of a protein with an Xray structure of the same protein determined under slightly different conditions (proper choice), secondly the refinement of a structure calculated from a limited set of NMR data with an Xray structure of the same protein also determined under slightly different conditions and last, the refinement of a known NMR structure with a known Xray structure of the same protein that is largely different (wrong choice). For the first case we selected the Rasbinding domain of Byr2 (Byr2RBD) from Schizosaccharomyces pombe (residues 71–165 referred here as residues 1–95) for which both a solution structure of the free protein [16] and a crystal structure of Byr2RBD in complex with Ras [17] are available. Both structures are of medium quality of about 3 Å resolution (Xray) or equivalent resolution (NMR) making it an ideal target for structure refinement. In addition, it is expected that the two structures are not identical since complex formation with Ras leads to small but significant conformational changes in the structure of Byr2. The aim of the second test was to refine a structure that was obtained using only readily available NMR data. For this case the Rasbinding domain of RalGDS (RalGDSRBD) from human was used. The solution structure (residues 1–97, corresponding to residues 788–884 of the full length protein, Swiss prot accession code: Q12967) has been published previously [18]. For the current tests the low resolution structure of a shorter construct (amino acid 11 to 97) was obtained by using only relatively easily available NMR data such as hbonds, dihedral angles, and backbone NOEs. In addition a medium quality (3.4 Å resolution) Xray structure of RalGDS in complex with Ras is available [19]. Similar to the first test case small but significant conformational changes between RalGDS in its free solution form and its crystal form in complex with Ras are expected. As a third example we used the NMR [20] [PDBID:1Q10] and the crystal structure [21] [PDB ID: 1PGX] of the immunoglobulin binding domain of protein G from Streptococcus, species Lancefield group G. In this case large global structural differences were observed since in solution dimerization introduced by core mutations induces a domain swapping of a βpleated sheet.
Results
Theoretical considerations
General considerations
In the improvement of structures by including information from other sources two main cases have to be distinguished: In the first case the additional information is describing the same set of structures (e. g. a solution structure of a protein at given pH, temperature and sample composition). Here the proper weighting of the additional information is the main point when the "true" structure should be optimally approximated. In the second case the additional information is taken from structures that are supposed to be similar but are different nevertheless (e. g. a solution structure and a crystal structure of a different complex). Here an additional difficulty arises since one has to estimate how well the additional structure will apply to the structure in question since otherwise not a properly biased solution will be obtained. The problem can be formulated as the aim to obtain the most probable structure or the most probable set of structures S_{0} with a conditional probability P(S_{0}A, I_{i}, i = 1, N) higher than a threshold value P_{t}. The combination of information from N different sources I_{i} is a problem often encountered in structural biology. When S_{0} is a set of purely NMR derived protein structures, A would be the general knowledge about the system that is the physical model including the covalent structure and the interaction potentials as they enter a typical molecular dynamics calculation. The NMR derived information I_{1} is usually expressed as a set of experimental restraints R_{1} = {R_{1}^{1},...., R_{1}^{M}} containing M restraints that essentially reduce the accessible conformational space of the probable solutions. The experimental restraints are rather inhomogeneous since they include information such as distance restraints from NOESY spectra, dihedral angle information from Jcouplings or chemical shifts, as well as intra molecular orientational restraints from residual dipolar couplings.
An elegant semi quantitative way to find the most probable structures S_{i} is the simulated annealing protocol [22], where the information A is an intrinsic part of the molecular dynamics routines used.
In case two the situation becomes much more complex since structural information that corresponds not exactly to the conditions used in the actual experiment is added from other sources. When this information is expressed again in the form of sets of restraints R_{i}, structures S_{0}^{p} (p = 1,...,L_{0}, with L_{0} being the total number of structures in set S_{0}) have to be found with high probabilities P(S_{0}^{p}  A, R_{i.} i = 1,...,N). When a restrained simulated annealing approach is used, the physical model is again an implicit feature, that is P(S_{0}^{p}  A, R_{i.} i = 1,...,N) can be replaced by P(S_{0}^{p} R_{i.} i = 1,...,N). With the exception of the restraint set R_{1} corresponding to the leading set of structures S_{1}, the primary restraints R_{i}* (i = 2,...,N) that are derived from the other sources in general do not directly apply to the conditions of the leading set of structures. This can for example occur due to different experimental conditions. As a consequence, new restraints R_{i} have to be calculated, which directly apply to the true set of structures S_{0}. This means that for R_{1} one can define R_{1} = R_{1}*, but for the other restraint sets R_{i}* we have to determine to which amount their individual restraints apply to the true structures S_{0}, as explained below.
P(S_{0}R_{i.} i = 1,...,N) = P(S_{i}R_{1}* = R_{1}, R_{i}*, i = 2,...,N) (1)
In general, the complete description of the sets of restraints R_{i} has to be given as a multidimensional probability distribution p(R_{i}, i = 1,...,N). The different sets of restraints and the restraints themselves are coupled since they are derived from related structures and coupled by the physical model. The probability P and thus the probability distribution p of a set of restraints R_{i} in the leading structures can be calculated from the known R_{i}* by
P(R_{i}) = P(R_{i}R_{i}*, i = 1,...,N)P(R_{i}*, i = 1,...,N) (2)
Equation 2 shows that R_{i} depends again on a multidimensional probability distribution and a simplification of the problem is mandatory.
In the standard simulated annealing approach the individual restraints R_{i}^{k} are assumed primarily as independent, their coupling is performed indirectly by the algorithm itself, which selects consistent solutions. As long as the same restraints R_{i}^{k} are considered (and the restraints in a given structure can be considered to be uncoupled) one can calculate the probability that a newly created restraint R_{0}^{k} that corresponds to the "true" solution structures S_{0} has a given value in the set S_{0}. The restraints R_{0}^{k} are used later on for calculating the set of true solution structures S_{0}.
P(R_{0}^{k}) = P(R_{0}^{k}R_{i}^{k*}, i = 1,...,N)P(R_{i}^{k*}, i = 1,...,N) (3)
The indices i and k specify the data set used and the specific restraint, respectively. Here, it is assumed that in first order the individual restraints R_{0}^{k} and R_{0}^{l} are independent for k≠l. For the calculation of P(R_{0}^{k}) it would be useful to have information about the same restraints in the structures derived from the different data sets. Below it will be shown how a reasonable estimate can be obtained by using a MDsampling procedure.
Equation 3 can be used in two different ways: When a good estimate of the conditional probability is known it can be directly applied. If this is not the case, one can test the hypothesis that P(R_{0}^{k}R_{i}^{k}*) is close to 1 for a data set i. Since we assume that the experimental data 1 represents the "true" ensemble, one can test if a restraint R_{i}^{k} is part of the same ensemble as R_{1}^{k} and simply discard all restraints R_{i}^{k} in the calculation that do not fulfill the condition. P(R_{i}^{k*}, i = 1,...,N) in eq. 3 describes the probability that a substitute restraint R_{i}^{k*} has a given value in the set of structures S_{i} and clearly this probability depends on factors such as the corresponding second moments σ of the restraints in the set of structures S_{i}.
Main features of the algorithm
The general features of the ISIC algorithm based on the above considerations are described in Figure 1 for the important application that a NMR solution structure is improved by an Xraystructure. In ISIC the structural information from a set of different sources i consisting of members S_{i} (with i = 1,...,N and the number of used sources N ≥ 2) is used to improve the structures of the set S_{1}. For instance, NMR structures in S_{1} are refined by an appropriate Xray structure S_{2}. In this approach the different structural sources S_{i} are usually not identical, as is evident in the case of solution and crystal structures, but they may differ also in other aspects such as in amino acid sequence or absence or presence of interacting molecules.
One important concept is that the available structural information from different sources is first converted into a dense network of derived substitute restraints R_{i}^{k*} that can directly be compared (eq. 3). They are calculated from a structural bundle and are coded as main chain and side chain dihedral angle restraints, as well as distance restraints between selected sets of atoms. The expectation values and standard deviations s of the sample are directly calculated from the given structural bundle by the PERMOLalgorithm [23, 24]. In case the leading structural set S_{1} consists of a set of NMR structures, such a bundle is already available. When no structural bundle is available, it first has to be created in a welldefined manner (see below). The restraints R_{1}^{k}* = R_{1}^{k} (k = 1,..., M) are then combined with the sets of restraints R_{i}^{k}* (i = 2,...,N; k = 1,...,M_{i}, M_{i} ≤ M) to obtain a final set of restraints R_{0}^{k} (k = 1,..., M) and a new bundle of structures S_{0} is calculated. The quality of the new structural bundle can be validated against the original experimental data, a step which increases the confidence in the result and can be used to assess the improvement of the structures but is not required by the algorithm.
Structure improvement of the Rasbinding domain of Byr2
As a first example, the AUREMOLISIC algorithm was tested on the structure improvement of the Rasbinding domain of Byr2 for which both a set of 10 solution NMR structures [16] and a single Xray structure of Byr2 in complex with Ras [17] are available. The Xray structure was used as source structure to improve the NMR structure S_{1}.
As described above and using the parameters given in Table 1, distance and dihedral angle restraints were created that represent the Xray data. In total 5248 distance restraints and 321 dihedral angle restraints were obtained, defining the restraint set R_{2}^{x}*. Please note that for residues 57 – 69 no restraints were obtained since these residues were invisible in the original Xray structure. Employing these restraints and DYANA v.1.5 1000 structures were calculated. The 10 best in terms of DYANA target function were selected to define the set of structures S_{2}^{x} that represents the Xray data. For this purpose a standard DYANA simulated annealing protocol was used, which includes 4000 TAD (torsion angle dynamics) steps. One fifth of these are performed at an initial high temperature, followed by slow cooling during the rest of the schedule. Figures 2B and 2C show a comparison between the original Xray structure and the corresponding set of structures S_{2}^{x}, respectively. As described above from the set S_{2}^{x} the set of restraints R_{2}* was generated. It consisted of 5600 distance restraints, 396 dihedral angle restraints and 53 hydrogen bond restraints. The corresponding parameters used for restraint generation are given in Table 2. The set of 10 submitted solution NMR structures defines the set of structures S_{1} (Fig. 2A), from which 6642 distance restraints, 453 dihedral angle restraints, and 106 hydrogen bond restraints were generated that define the leading restraint set R_{1} = R_{1}*. Please note that 106 is the sum of all hydrogen bond restraints identified in the individual structures of the selected bundle. The corresponding parameters are given in Table 2. No separate structures were calculated using the restraint set R_{1} alone. In the next step the restraints from sets R_{1}* and R_{2}* were combined as described in the Materials and Methods section using the parameters given in Table 3. In the case of mismatching restraints only the restraint corresponding to the NMR structure was further used. After the restraint combination 6642 distance restraints, 338 dihedral angle restraints and 26 hydrogen bond restraints were obtained, defining the restraint set R_{0}. Using the set R_{0} 1000 structures were calculated with DYANA and the ten best in terms of the DYANA target function were selected for further analysis, defining the set S_{0} (Fig. 2D). The structures were refined in explicit solvent (water) [25, 26]. As result a set (S_{0_WR}) of 10 structures of Byr2RBD (Fig. 2E) was obtained.
All secondary structure elements are well defined in these structures. Especially the Cterminal αhelix that was poorly characterized in the original NMR structures is now very well defined. In addition, the quality of the resulting structures was compared to the original NMR and Xray structures (Table 4) employing rmsd calculations, Ramachandran plots, and NMR Rfactor calculations. The results clearly show that the refined structures show improved values for all categories. The rmsd values of the newly calculated structures are drastically reduced compared to the original NMR structures, with values of 0.033 nm and 0.144 nm for the backbone N atoms, respectively. The percentage of residues in the most favored and allowed regions of the Ramachandran plot increased for the refined structures compared to both sets of input structures (S_{1} and S_{2}). Since the goal was to obtain refined solution structures, the resulting structures have been analyzed, whether they really explain the experimental data better than the original structures. A suitable check for this purpose is the calculation of NMR Rfactors [27] that directly compare an experimental NMR NOESY spectrum with the corresponding spectrum backcalculated from a single or a set of test structures. For the calculations shown in Table 4 we used the structurally most discriminating Rfactor R_{5} as described by us previously [27]. The Rfactors show also a significant improvement for the refined structures indicating that we were really able to obtain refined solution structures by the use of external data.
Structure improvement of the Rasbinding domain of RalGDSRBD
As a second test case the Rasbinding domain of RalGDS was chosen using a set of low resolution solution NMR structures as input together with a single Xray structure of RalGDS in complex with Ras [19]. As in the first test case the Xray structure was used to improve the NMR structure.
Low resolution NMR structures for RalGDSRBD (residues 11–97) were newly calculated using easily available NMR data such as 25 hbonds, 102 Φ and Ψ dihedral angles, and 232 backbone NOEs involving H_{N} and H_{α} atoms. Employing these restraints and DYANA v.1.5 300 structures were calculated as described above of which the 10 best in terms of DYANA target function were selected to define the set of NMR input structures S_{1} (Fig. 3A). As described above and using the parameters given in Table 5, distance and dihedral angle restraints were created that represent the Xray data. In total 2001 distance restraints and 263 dihedral angle restraints were obtained, defining the restraint set R_{2}^{x}*. Please note that for residues 1, 50 – 55, 78 – 89, and 97 no restraints were obtained since these residues were invisible in the original Xray structure. Employing these restraints and DYANA 1.5, 1000 structures were calculated, of which the 10 best in terms of DYANA target function were selected to define the set of structures S_{2}^{x} that represents the Xray data. The original input Xray structure of RalGDS obtained in complex with Ras is shown in Figure 3B. As described above from the set S_{2}^{x} the set of restraints R_{2}* was generated consisting of 1784 distance restraints, 326 dihedral angle restraints and 13 hydrogen bond restraints. The corresponding parameters used for restraint generation are given in Table 6. The set of 10 low resolution NMR structures defines the set of structures S_{1} (Fig. 3A), from which 2344 distance restraints, 417 dihedral angle restraints, and 70 hydrogenbond restraints were generated that define the leading restraint set R_{1} = R_{1}*. The corresponding parameters are given in Table 6. In the next step the restraints from sets R_{1}* and R_{2}* were combined as described in the Materials and Methods section using the parameters given in Table 7. In the case of mismatching restraints only the restraint corresponding to the NMR structure was further used. After restraint combination we obtained 2344 distance restraints, 285 dihedral angle restraints and 27 hydrogen bond restraints, defining the restraint set R_{0}. Using the set R_{0} 300 structures were calculated with DYANA and the ten best in terms of the DYANA target function were selected for further analysis, defining the set S_{0} (Fig. 3C). All secondary structure elements are well defined in these structures. Especially the locations of the two αhelices that were poorly defined in the input NMR structures are now substantially better defined. In addition, the quality of the resulting structures was compared to the original NMR structure (Fig 3D and Table 8) employing rmsd calculations, Ramachandran plots, and NMR Rfactor calculations. The rmsd values of the newly calculated structures are drastically reduced compared to the input NMR structures with values of 0.07 nm and 0.21 nm for the rmsd values to the mean structure of the backbone N atoms, respectively. The corresponding average pair wise rmsd values for the backbone atoms show a similar trend with values of 0.11 nm and 0.33 nm, respectively (Table 8). This clearly shows the influence of the increased number of well defined restraints on the refined structures. The average pair wise rmsd difference between the low resolution NMR input structures and the refined structures amounts to 0.32 nm indicating on the one hand the influence of the second source (Xray data) on the refinement and on the other hand that the refined structures are within the conformational space occupied by the low resolution NMR input structures. The percentage of residues in the most favored regions of the Ramachandran plot did not change for the refined structures compared to the low resolution input NMR structures (S_{1}). The calculation of NMR Rfactors was performed as described for Byr2RBD. The Rfactors show also a significant improvement for the refined structures indicating that we were able to obtain refined solution structures by the use of external data.
Structure improvement of the B2 ImmunoglobulinBinding Domain of Streptococcal protein G
The highest risk in using data from other sources to improve a target structure is a possible bias towards the added structure. In the worst case instead of a refined target structure the structure from the additional source is essentially reproduced. To investigate a possible bias introduced by an additional source on the ISIC algorithm two structures were selected, which clearly show different structural details. The solution structure of the B2 ImmunoglobulinBinding Domain of Streptococcal protein G [20] differs clearly from the Xray structure [21]. The NMR structure was obtained from a dimeric form of the protein, where 4 core mutations lead to dimerization of the protein and a domain swapping of a βpleated sheet. Figure 4A shows one half of the dimeric NMR structure compared the monomeric Xray structure of the B2 domain (Fig. 4B). As it can clearly be seen the orientation of the last two βstrands is considerably different between the 2 structures. A simple averaging process between these two sets of structures leads to substantially incorrect structures and not to any improvements (data not shown). However, applying the ISIC algorithm however takes these structural differences automatically into account. We used the ISIC algorithm as described above by using the same parameters as described for Byr2RBD and details of the calculations are given in the caption of figure 4. In the first step a bundle of structures representing the Xray information (Fig. 4C) was generated. From this set and the NMR structures restraints were generated and combined with ISIC and new improved structures were calculated (Fig 4D). As can be seen from Figure 4D the resulting structures look very similar to the original NMR structure but the rmsdvalues and the Ramachandran quality have slightly improved (Fig 4). Note that the original NMR structures were in this example already very well defined. We did also the inverse experiment, using the NMRstructure to improve the Xray structure and obtained again an unbiased structure with all characteristics of the original structure (data not shown).
Discussion and conclusion
Any determination of solution structures from experimental data is not (as sometimes automatically assumed) the direct calculation of the only existing solution but the search for a set of structures consistent with the experimental data and additional knowledge of the system (in this regard see also the paper by Rieping et al. [28]). The use of substitute restraints as introduced here with a simulated annealing protocol for restrained molecular dynamics is an efficient method to combine strongly coupled knowledge from different sources. A proper bias toward the selected target set of structures can be achieved by Bayesian reasoning, thus using the additional information only to increase the probability to find the "true" ground state set of structures corresponding to the experimental conditions selected. The combination with validation tools such as the calculation of NMR Rfactors strengthens the impact of the method considerably since the improvement of the structures can be assessed quantitatively. This is clearly visible for the example of Byr2RBD where our improved structures also better explain the experimental data. Even the choice of largely inappropriate additional knowledge does not lead to distortion of the original structure as shown for the immunoglobulin binding domain.
In the present paper the automated ISIC algorithm was used to improve a solution structure by related Xray data. The qualities of both the originally submitted Byr2 NMR structures as well as the corresponding Xray structure were both limited; therefore, giving an excellent example for testing the ISIC algorithm. The same is true for the RalGDSRBD test case where both the set of low resolution NMR structures of RalGDS that were calculated only from easily available experimental data and the corresponding Xray data are of medium quality. Especially this last test case is a good example how the inclusion of additional data can speed up the NMR structure determination process for example in structural genomics efforts. However, ISIC can also be used for other applications such as the improvement of a NMR structure of a given protein by NMR structures of homologues proteins or pure homology models. The same would be true for the improvement of Xray structures by NMRdata when some parts of the electron density map are illdefined.
Here, the Xray Rfactor would provide the validation tool. A similar application that one may encounter more often in the future is the calculation of NMRstructures of very large proteins using only a limited set of experimental data. One can think about other scenarios for the application of ISIC. When no Xray structure of the protein is available homology models from related proteins may be used.
Methods
Details of the algorithm
Calculation of the network of substitute restraints
The calculation of a dense network of dihedral angle and distance restraints with the PERMOLalgorithm from bundles of structures has been described earlier [23, 24]. and is implemented in AUREMOL [29]. Here, the expectation values and standard deviations are calculated. Error ranges are approximated from the standard deviations on the basis of the ttest. In case the original set contains only one structure the corresponding structural bundle has to be calculated first. In this regard we will discuss in the following only the most important case of crystal structures that are usually represented as distinct single structures S_{i}^{p} (p = 1). But the principle can be applied to other data.
Depending on the unit cell and the refinement method used sometimes more than one structure is deposited in the data base (p > 1). However, even then the statistical ensemble is too small. The solution to this problem is that in analogy to the calculation of NMRstructures the inherent coordinate uncertainties can be used to calculate structural bundles and from those a set of substitute restraints R_{i}* is obtained. Therefore, we first determine a set of restraints R_{i}^{x}* that represent the original Xray structure(s) from interatomic distances and dihedral angles in the crystal structure(s) together with the corresponding coordinate uncertainties. Using these restraints a set of structures S_{i}^{x} is created, from which the set of substitute restraints R_{i}* is created using PERMOL. For generating the set R_{i}^{x}* two factors that are usually published together with the structure that can be used for a conservative estimate of the structural variations. In a first approximation the expected average error in atomic positions σ(r_{0}) is about 1/3 of the resolution R [30]. In a more involved analysis σ(r_{m}) of the atoms m possessing low Bfactors is often estimated from Luzzati plots. Second the local Bfactors can be used to introduce additional errors for specific atoms possessing significant Bvalues. Static and thermal disorder can effectively spread out the electron density of a given atom mand this increases its Bfactor. The Bfactor is related to the rms error in the position of an atom by the equation:
$\sigma ({r}_{m})=\sqrt{\frac{{B}_{m}}{8\cdot {\pi}^{2}}}\left(4\right)$
B_{m} denotes the Bfactor of a given atom m and σ(r_{m}) is the corresponding average error in atom positions.
Since for the calculations a conservative estimate of distances ranges is most useful, the square of the standard deviation σ^{2}(d_{m,n}) of the distance d_{m,n}between two atoms m and n (m  n) is approximated by
σ^{2}(d_{m,n}) = σ(r_{ m })^{2} + σ(r_{ n })^{2} + 2σ(r_{0})^{2} (5)
For a more detailed description on the precision of protein structures see the article by Cruickshank [31]. When more than one structure of the same crystal is contained in the data base they can be considered as separate structural sets S_{i} and handled in an analogous way. As mentioned above, using this preliminary set of restraints R_{i}^{x}* a bundle of structures S_{i}^{x} is calculated by employing programs such as DYANA [32], XPLORNIH [33] or CNS [34]. From this bundle a set of restraints R_{i}* is calculated in the same way as it has been done for the restraint set R_{1} of the leading structure S_{1}.
Restraint combination
As derived above (eq. 2 and eq. 3), from the sets of restraints R_{1} (R_{1} = R_{1}*) and R_{i}* (i = 2,...,N) a new set R_{0} has to be calculated, which then enters then the final structure calculation. Although the algorithm produces restraint sets R_{i}* that are matched to the leading set R_{1} for all data sets, in some cases no restraint R_{i}^{k*} matching a restraint R_{1}^{k} can be created for data set i. Such a case can occur when an atom or an amino acid of set R_{1} does not exist in the data used to generate set R_{i}*. In this case R_{0}^{k} is set to R_{1}^{k}. In all other cases the final restraint R_{0}^{k} has to be calculated according to eq. 3. Since P(R_{0}^{k}R_{i}^{k}*, i > 1) is difficult to determine for distances and angles, we apply a pair wise zero hypothesis test P(R_{1}^{k}R_{i}^{k}*, i > 1), that the corresponding two restraints of the two data sets describe the same ensemble. If yes, a new probability distribution for the restraint is calculated, if no, the restraint R_{i}^{k}* is discarded and only R_{1}^{k} is used. For the case that also errors in the leading restraint set R_{1} are expected it is possible to also discard the restraint R_{1}^{k}. However, this special option was not used in the current tests. When large structural bundles are created (as one of the possible options), the probability distributions can directly be obtained from the bundle. Since we have no a priori knowledge about the distribution type of the individual restraints, we can apply known statistical tests like the rank dispersion test according to Siegel and Tukey [35] or the comparison of two independent samples according to Kolmogoroff and Smirnoff [35]. In case that the investigated restraints possess the same or nearly the same type of distribution, the so called U test according to Wilcoxon, Mann and Whitney [35] can be applied. It is the distribution free counterpart to the parametrical Student ttest that strictly can only be applied for normally distributed data.
On a variety of data sets we tested according to Kolmogoroff and Smirnoff, whether our data can be assumed to follow a normal distribution. As a result it was found that for all our test cases the data are normally distributed within a small degree of error. Therefore, for practical reasons it is sufficient to assume that the distribution can be approximated sufficiently well by a Gaussian distribution.
As a consequence we are allowed to check for the null hypothesis by enforcing a pairwise twosided ttest that compares the individual distance and angle restraints of all restraint sets R_{i}* (i > 1) with the corresponding restraints of set R_{1}*. The average distances <${d}_{i}^{k*}$> and dihedral angles <${a}_{i}^{k*}$> together with the corresponding standard deviations s(d_{i}^{k*}) and s(a_{i}^{k*}) have been calculated from the structural bundles and the tvalues t_{1}^{k} (i > 1) are now calculated for the distances and angles by:
${t}_{1}^{k}=\frac{\left<{R}_{1}^{k}><{R}_{i}^{k*}>\right}{\sqrt{\frac{{s}^{2}({R}_{1}^{k})}{{L}_{1}}+\frac{{s}^{2}({R}_{i}^{k*})}{{L}_{i}}}}\left(6\right)$
After that the individual tvalues ${t}_{1}^{k}$ are compared to the critical tvalue t_{c}. The critical tvalue at a given significance level and known degrees of freedom f (with f = L_{1}  L_{ i } 1) can be calculated or looked up in the tvalue table.
In case the calculated tvalue t_{1}^{k} is greater than the critical tvalue t_{c}, the null hypothesis has to be rejected and the restraint R_{i}^{k*} is not used. Restraints with t_{1}^{k} ≤ t_{c} are retained and the weighted average value <R_{0}^{k}> of the restraint R_{0}^{k} is calculated together with the corresponding weighted total standard deviation σ(R_{0}^{k}).
Hydrogen bond restraints
In addition to combined dihedral angle and distance restraints the ISIC algorithm also uses backbone hydrogen bond restraints R_{i}^{k}. For the sake of clarity they will in the following be denoted as H_{i}^{k}. In principle hydrogen bonds could be handled in a similar way as described above for distance restraints by using the distributions of hydrogen bonding energies as parameters, where the hydrogen bond energies are calculated according to Freund [36]. Since rapid calculations are required within ISIC a somewhat faster method is actually used for hydrogen bond definition accepting a maximum NHO distance of 0.24 nm and a hydrogen bond angle a_{NHO} of 180° ± 35°. In ISIC the frequencies X_{i}^{k*} of the hydrogen bonds in the different structural bundles S_{i} are determined and used as hydrogen bond probabilities P(H_{i}^{k*}). From that the conditional probabilities P(H_{0}^{k}H_{1}^{k}, H_{i}^{k*}, i = 2,...N) that a hydrogen bond exists in the solution structure are obtained.
$P({H}_{0}^{k}{H}_{1}^{k},{H}_{i}^{k}*,i=2,\dots ,N)=\frac{P(H)(P({H}_{1}^{k},{H}_{i}^{k}*,i=1,\dots ,N)}{P(H)(P({H}_{1}^{k},{H}_{i}^{k}*,i=2,\dots ,N)+(1P(H)(1P({H}_{1}^{k},{H}_{i}^{k},i=2,\dots ,N))}\left(7\right)$
Assuming that the restraints from different structural sets can be considered statistically independent and that with eq. 2 the probability P(H_{i}^{k}) that a hydrogen bond exists also under the conditions of true solution structures can be written as
P(H_{i}^{k}) = P(H_{i}^{k}H_{i}^{k}*, i = 1,...,N)P(H_{i}^{k}*, i = 1,...,N) (8)
one obtains from eq. 7 and eq. 8
$\begin{array}{l}P({H}_{0}^{k}{H}_{1}^{k},{H}_{i}^{k}*,i=2,\dots ,N)=\hfill \\ \frac{P(H)(P({H}_{1}^{k}\cdot {\displaystyle \prod _{i=2}^{N}P({H}_{i}^{k}{H}_{i}^{k}*)P({H}_{i}^{k}*)}}{P(H)(P({H}_{1}^{k}\cdot {\displaystyle \prod _{i=2}^{N}P({H}_{i}^{k}{H}_{i}^{k}*)P({H}_{i}^{k}*))+(1P(H)(1P({H}_{0}^{k})(P({H}_{1}^{k}\cdot {\displaystyle \prod _{i=2}^{N}P({H}_{i}^{k}{H}_{i}^{k}*)P({H}_{i}^{k}*))}}}\hfill \end{array}\left(9\right)$
For the conditional probability that a hydrogen bond P(H_{o}^{k}H_{i}^{k}*) also exists in solution when it exists in the crystal structure, a plausible value of 0.9 has been assumed in this paper. More accurate values for P(H_{o}^{k}H_{i}^{k}*) could be obtained by a statistical analysis of the existing structural data base. The a priori probability P(H) that a hydrogen bond between a given pair of atoms exists is rather small, a plausible value would be 1/Q with Q the number of residues of the protein under consideration.
In case that P(${H}_{0}^{k}{H}_{1}^{k},{H}_{i}^{k}*$, i = 2,..., N) exceeds a given userdefined threshold, for example 0.75, the corresponding hydrogen bond restraint is accepted and transformed in appropriate distance restraints as usually done in MDcalculations.
Filtering of angle restraints
When dihedral angles are combined and averaged it is possible that the calculated average values are located in disallowed regions of the Ramachandran plot. A filter is implemented that allows the user to disregard backbone and side chain dihedral angles as a function of their presence in unfavorable regions of the Ramachandran plot.
NMR spectroscopy and structures
The sequential assignments of the NMR signals of Byr2 and the experimental parameters have been described in [37]. A 2D ^{1}H NOESY spectrum obtained with a mixing time of 100 ms was used for structure validation. As input data the NMR structure of the free Rasbinding domain of Byr2 (Byr2RBD) from Schizosaccharomyces pombe (residues 71–165 here referred to as residues 1–95) [16] [PDB ID: 1I35], the crystal structure of Byr2RBD in complex with Ras [17] [PDB ID: 1K8R], the NMR structure [20] [PDB ID: 1Q10] and the crystal structure [21] [PDB ID: 1PGX] of the immunoglobulin binding domain of protein G from Streptococcus, species Lancefield group G were selected.
Programs and structure validation
NMR data evaluation was performed with the program AUREMOL (V 2.2.1). Expectation values and standard deviations of cyclic quantities were calculated according to Döker et al., [38]. Sequence alignment was performed with a module for pairwise sequence alignment based on the NeedlemanWünsch algorithm and the BLOSUM62 matrix that we recently included in the AUREMOL module PERMOL [23, 24]. The resulting refined solution structures were validated on the experimental NMR data by the calculation of NMR Rfactors [27]. For investigating the stereochemical quality PROCHECKNMR was employed [39] and rmsd values were calculated using MOLMOL [40].
Molecular dynamics calculations
Structure calculations were performed using the torsion angle molecular dynamics program DYANA v1.5 [32]. Details of the used standard simulated annealing protocol are given in the corresponding publication. From the resulting structures the best in terms of DYANA target function were selected for refinement in explicit solvent [25, 26].
Implementation
ISIC is written in ANSIC and is fully incorporated in the software package AUREMOL http://www.auremol.de.
Abbreviations
 NMR:

nuclear macgnetic resonance
 rmsd:

root mean square deviation
 RBD:

Ras binding domain.
References
 1.
Annila A, Aito H, Thulin E, Drakenberg T: Recognition of protein folds via dipolar couplings. J Biomol NMR 1999, 14: 223–230. 10.1023/A:1008330519680
 2.
Bowers PM, Strauss CEM, Baker D: De novo protein structure determination using sparse NMR data. J Biomol NMR 2000, 18: 311–318. 10.1023/A:1026744431105
 3.
Simons KT, Kooperberg C, Huang E, Baker D: Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functions. J Mol Biol 1997, 268: 209–225. 10.1006/jmbi.1997.0959
 4.
Simons KT, Ruczinski I, Kooperberg C, Fox BA, Bystroff C, Baker D: Improved Recognition of NativeLike protein Structures Using a Combination of SequenceDependent and SequenceIndependent Features of Proteins. Proteins 1999, 34: 82–95. 10.1002/(SICI)10970134(19990101)34:1<82::AIDPROT7>3.0.CO;2A
 5.
Delagio F, Kontaxis G, Bax A: Protein Structure Determination Using Molecular Fragment Replacement and NMR Dipolar Couplings. J Am Chem Soc 2000, 122: 2142–2143. 10.1021/ja993603n
 6.
Andrec M, Harano Y, Jacobson MP, Friesner RA, Levy RM: Complete protein structure determination using backbone residual dipolar couplings and sidechain rotamer prediction. J Struct Funct Genomics 2002, 2: 103–111. 10.1023/A:1020435630054
 7.
Haliloglu T, Kolinski A, Skolnick J: Use of Residual Dipolar Couplings as Restraints in Ab Initio Protein Structure Prediction. Biopolymers 2003, 70: 548–562. 10.1002/bip.10511
 8.
Albrecht M, Hanisch D, Zimmer R, Lengauer T: Improving fold recognition of protein threading by experimental distance constraints. In Silico Biology 2002, 2: 1–12.
 9.
Li W, Zhang Y, Kihara D, Huang YJ, Zheng D, Montelione G, Kolinski A, Skolnick J: TOUCHSTONEX: Protein Structure Prediction With Sparse NMR Data. Proteins 2003, 53: 290–306. 10.1002/prot.10499
 10.
Shaanan B, Gronenborn AM, Cohen GH, Gilliland GL, Veerapandian B, Davies DR, Clore GM: Combining Experimental Information from Crystal and Solution Studies: Joint Xray and NMR refinement. Science 1992, 257: 961–964.
 11.
Schiffer CA, Huber R, Wüthrich K, Gunsteren WF: Simultaneous Refinement of the Structure of BPTI Against NMR Data Measured in Solution and Xray Diffraction Data Measured in Single Crystals. J Mol Biol 1994, 241: 588–599. 10.1006/jmbi.1994.1533
 12.
Hoffman DW, Cameron CS, Davies C, White SW, Ramakrishnan V: Ribosomal Protein L9: A Structure Determination by the Combined Use of Xray Crystallography and NMR Spectroscopy. J Mol Biol 1996, 264: 1058–1071. 10.1006/jmbi.1996.0696
 13.
Miller M, Lubkowski J, Rao KKM, Danishefsky AT, Omichinski JG, Sakaguchi K, Sakamoto H, Apella E, Gronenborn AM, Clore GM: The Oligomerization Domain of p53: Crystal Structure of the Trigonal Form. FEBS Lett 1996, 399: 166–170. 10.1016/S00145793(96)012318
 14.
Raves ML, Doreleijers JF, Vis H, Vorgias CE, Wilson KS, Kaptein R: Joint refinement as a tool for thorough comparison between NMR and Xray data and structures of HU protein. J Biomol NMR 2001, 21: 235–248. 10.1023/A:1012927325963
 15.
Chao J, Williamson JR: Joint XRay and NMR Refinement of the Yeast L30emRNA Complex. Structure 2004, 12: 1165–1176. 10.1016/j.str.2004.04.023
 16.
Gronwald W, Huber F, Grünewald P, Spörner M, Wohlgemuth S, Herrmann C, Kalbitzer HR: Solution Structure of the Ras binding Domain of the Protein Kinase Byr2 from Schizosaccharomyces pombe . Structure 2001, 9: 1029–1041. 10.1016/S09692126(01)006712
 17.
Scheffzek K, Grünewald P, Wohlgemuth S, Kabsch W, Tu H, Wigler M, Wittinghofer A, Herrmann C: The RasByr2RBD Complex: Structural Basis for Ras Effector Recognition in Yeast. Structure 2001, 9: 1043–1050. 10.1016/S09692126(01)006748
 18.
Geyer M, Herrmann C, Wohlgemuth S, Wittinghofer A, Kalbitzer HR: Structure of the Rasbinding domain of RalGEF and implications for Ras binding and signalling. Nat Struc Biol 1997, 4: 694–699. 10.1038/nsb0997694
 19.
Vetter IR, Linnemann T, Wohlgemuth S, Geyer M, Kalbitzer HR, Herrmann C, Wittinghofer A: Structural and Biochemical Analysis of RasEffector signaling via RalGDS. FEBS Lett 1999, 451: 175–180. 10.1016/S00145793(99)005554
 20.
Byeon IL, Louis JM, Gronenborn AM: A protein Contortionist: Core mutations of GB1 that Induce Dimerization and Domain Swapping. J Mol Biol 2003, 333: 141–152. 10.1016/S00222836(03)009288
 21.
Achari A, Hale SP, Howard AJ, Clore GM, Gronenborn AM, Hardman KD, Whitlow M: 1.67Å Xray Structure of the B2 ImmunoglobulinBinding Domain of Strptococcal Protein G and Comparison to the NMR Structure of the B1 Domain. Biochemistry 1992, 31: 10449–10457. 10.1021/bi00158a006
 22.
Kirkpatrick S, Gelatt CD, Vecchi MP: Optimization by Simulated Annealing. Science 1983, 220: 671–680.
 23.
Möglich A, Weinfurtner D, Maurer T, Gronwald W, Kalbitzer HR: A Restraint Molecular Dynamics and Simulated Annealing Approach for Protein Homology Modeling Utilizig Mean angles. BMCBioinformatics 2005, 6: 91. 10.1186/14712105691
 24.
Möglich A, Weinfurtner D, Gronwald W, Maurer T, Kalbitzer HR: PERMOL: RestraintBased Protein Homology Modeling Using DYANA or CNS. Bioinformatics 2005, 21: 2110–2111. 10.1093/bioinformatics/bti276
 25.
Nabuurs SB, Nederveen AJ, Vranken W, Doreleijers JF, Bonvin AMJJ, Vuister GW, Vriend G, Spronk CAEM: DRESS: a Database of REfined Solution NMR Structures. Proteins 2004, 55: 483–486. 10.1002/prot.20118
 26.
Linge JP, Williams MA, Spronk CAEM, Bonvin AMJJ, Nilges M: Refinement of protein structures in explicit solvent. Proteins 2003, 50: 496–506. 10.1002/prot.10299
 27.
Gronwald W, Kirchhofer R, Gorler A, Kremer W, Ganslmeier B, Neidig KP, Kalbitzer HR: RFAC, a program for automated NMR Rfactor estimation. J Biomol NMR 2000, 17: 137–151. 10.1023/A:1008360715569
 28.
Rieping W, Habeck M, Nilges M: Inferential Structure Determination. Science 2005, 309: 303–306. 10.1126/science.1110428
 29.
Gronwald W, Kalbitzer HR: Automated structure determination of proteins by NMR spectroscopy. Prog NMR Spectrosc 2004, 44: 33–96. 10.1016/j.pnmrs.2003.12.002
 30.
Holton J, Alber T: Automated Protein Crystal Structure Determination using ELVES. Proc Natl Acad Sci USA 2004, 101: 1537–1542. 10.1073/pnas.0306241101
 31.
Cruickshank DWJ: Remarks About Protein Structure Precision. Acta Cryst D 1999, 55: 583–601. 10.1107/S0907444998012645
 32.
Güntert P, Mumenthaler C, Wüthrich K: Torsion Angle Dynamics for NMR Structure Calculation with the New Program DYANA. J Mol Biol 1997, 273: 283–298. 10.1006/jmbi.1997.1284
 33.
Schwieters CD, Kuszewski J, Tjandra NL, Clore GM: The XplorNIH NMR molecular structure determination package. J Magn Reson 2003, 160: 65–73. 10.1016/S10907807(02)000149
 34.
Brünger AT, Adams PD, Clore GM, DeLano WL, Gros P, Grossekunstleve RW, Jiang JS, Kuszewski J, Nilges M, Pannu NS, Read RJ, Rice LM, Simonson T, Warren GL: Crystallography & NMR System: A New Software Suite for Macromolecular Structure Determination. Acta Cryst 1998, D54: 905–921.
 35.
Sachs L: Angewandte Statistik. Berlin: Springer Verlag; 1997.
 36.
Freund J University of Heidelberg; 1994.
 37.
Huber F, Gronwald W, Wohlgemuth S, Herrmann C, Geyer M, Wittinghofer A, Kalbitzer HR: Letter to the Editor: Sequential NMR Assignment of the RasBinding Domain of Byr2. J Biomol NMR 2000, 16: 355–356. 10.1023/A:1008335420475
 38.
Döker R, Maurer T, Kremer W, Neidig KP, Kalbitzer HR: Determination of Mean and Standard Deviation of Dihedral Angles. BBRC 1999, 257: 348–350.
 39.
Laskowski RA, Rullmann JAC, MacArthur MW, Kaptein R, Thornton JM: AQUA and PROCHECKNMR Programs for checking the quality of protein structures solved by NMR. J Biomol NMR 1996, 8: 477–486. 10.1007/BF00228148
 40.
Koradi R, Billeter M, Wüthrich K: MOLMOL: a program for display and analysis of macromolecular structures. J Mol Graphics 1996, 14: 51–55. 10.1016/02637855(96)000094
Acknowledgements
Financial support by the European Commission (SPINE), the Fonds der Chemischen Industrie and the Deutsche Forschungsgemeinschaft is gratefully acknowledged
Author information
Additional information
Authors' contributions
HRK, WG and KB conceived the project. KB and to a smaller part JMT wrote the ISIC software. KB, JMT and KPN implemented ISIC within the larger AUREMOL software package. KB calculated the improved structures and drafted the manuscript. WG and HRK coordinated the study and wrote the manuscript together with KB. All authors read and approved the final manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
About this article
Cite this article
Brunner, K., Gronwald, W., Trenner, J.M. et al. A general method for the unbiased improvement of solution NMR structures by the use of related XRay data, the AUREMOLISIC algorithm. BMC Struct Biol 6, 14 (2006). https://doi.org/10.1186/14726807614
Received:
Accepted:
Published:
Keywords
 Structural Bundle
 Residual Dipolar Coupling
 Distance Restraint
 Simulated Annealing Protocol
 Dihedral Angle Restraint