A general method for the unbiased improvement of solution NMR structures by the use of related XRay data, the AUREMOLISIC algorithm
 Konrad Brunner^{1},
 Wolfram Gronwald^{1},
 Jochen M Trenner^{1},
 KlausPeter Neidig^{2} and
 Hans Robert Kalbitzer^{1}Email author
DOI: 10.1186/14726807614
© Brunner et al; licensee BioMed Central Ltd. 2006
Received: 11 March 2006
Accepted: 26 June 2006
Published: 26 June 2006
Abstract
Background
Rapid and accurate threedimensional structure determination of biological macromolecules is mandatory to keep up with the vast progress made in the identification of primary sequence information. During the last few years the amount of data deposited in the protein data bank has substantially increased providing additional information for novel structure determination projects. The key question is how to combine the available database information with the experimental data of the current project ensuring that only relevant information is used and a correct structural bias is produced. For this purpose a novel fully automated algorithm based on Bayesian reasoning has been developed. It allows the combination of structural information from different sources in a consistent way to obtain high quality structures with a limited set of experimental data. The new ISIC (I ntelligent S tructural I nformation C ombination) algorithm is part of the larger AUREMOL software package.
Results
Our new approach was successfully tested on the improvement of the solution NMR structures of the Rasbinding domain of Byr2 from Schizosaccharomyces pombe, the Rasbinding domain of RalGDS from human calculated from a limited set of NMR data, and the immunoglobulin binding domain from protein G from Streptococcus by their corresponding Xray structures. In all test cases clearly improved structures were obtained. The largest danger in using data from other sources is a possible bias towards the added structure. In the worst case instead of a refined target structure the structure from the additional source is essentially reproduced. We could clearly show that the ISIC algorithm treats these difficulties properly.
Conclusion
In summary, we present a novel fully automated method to combine strongly coupled knowledge from different sources. The combination with validation tools such as the calculation of NMR Rfactors strengthens the impact of the method considerably since the improvement of the structures can be assessed quantitatively. The ISIC method can be applied to a large number of similar problems where the quality of the obtained threedimensional structures is limited by the available experimental data like the improvement of large NMR structures calculated from sparse experimental data or the refinement of low resolution Xray structures. Also structures may be refined using other available structural information such as homology models.
Background
In any structure determination process of a biological macromolecule the general goal is to obtain from the available data a structure as accurate as possible. For all high throughput procedures as used in structural genomics projects the structure determination process has to be as fast as possible, demanding that only a minimal set of experimental data is recorded. One way to speed up the NMR structure determination process is to reduce the required number of experimental restraints and/or to use only restraints that are relatively easy to obtain e.g. backbone dihedral angles, chemical shifts, residual dipolar couplings, hydrogen bonds, or H^{N}H^{N} NOEs. When the amount of available experimental data is limited, the use of additional information such as structural data from homologous proteins is advisable. Most fast methods previously described in the literature are mainly aimed at determining the global fold of a protein [1–9]. Another set of methods directly uses information from different sources, namely NMR and Xray, for joint structure refinement to obtain refined structures. It is common to these approaches that discrepancies between NMR and Xray data are manually corrected, for example by removing violated NOEs, reassigning NOEs or hydrogenbonds, and taking spindiffusion effects on NMR restraints into account [10–15].
From the conceptual point of view in any structural prediction or calculation from a set of mixed data one has to decide beforehand what kind of structure is the target of the procedure since there is nothing like "the structure". This question is inherently answered in purely experimental structure determination since solution NMR spectroscopy determines the structure in solution and a crystal structure in the crystal. More importantly, the selected experimental conditions such as the buffer and the absence or presence of ligands select the target structural set.
Here, we present a novel general and fully automated approach called ISIC (I ntelligent S tructural I nformation C ombination) for the combination of structural information from different sources. It allows the predefinition and selection of the target structural set and properly treats discrepancies inherent in the input structural data, thereby ensuring that the additional input data are properly biased toward the target structural set. Using the combined information, high resolution structures are calculated and results are automatically verified on experimental data. One possible application of the ISIC algorithm for rapid structure determination would include the use experimental solution NMR data that is relatively easy to obtain, such as backbone dihedral angles, chemical shifts, residual dipolar couplings, hydrogen bonds, or H^{N}H^{N} NOEs that alone allow the calculation of a low to medium resolution NMR structure, supplemented with for example data from homology modeling or from a homologous Xray structure.
In this paper, ISIC was tested for three applications that may occur in "real life". Firstly, the refinement of a solution structure of a protein with an Xray structure of the same protein determined under slightly different conditions (proper choice), secondly the refinement of a structure calculated from a limited set of NMR data with an Xray structure of the same protein also determined under slightly different conditions and last, the refinement of a known NMR structure with a known Xray structure of the same protein that is largely different (wrong choice). For the first case we selected the Rasbinding domain of Byr2 (Byr2RBD) from Schizosaccharomyces pombe (residues 71–165 referred here as residues 1–95) for which both a solution structure of the free protein [16] and a crystal structure of Byr2RBD in complex with Ras [17] are available. Both structures are of medium quality of about 3 Å resolution (Xray) or equivalent resolution (NMR) making it an ideal target for structure refinement. In addition, it is expected that the two structures are not identical since complex formation with Ras leads to small but significant conformational changes in the structure of Byr2. The aim of the second test was to refine a structure that was obtained using only readily available NMR data. For this case the Rasbinding domain of RalGDS (RalGDSRBD) from human was used. The solution structure (residues 1–97, corresponding to residues 788–884 of the full length protein, Swiss prot accession code: Q12967) has been published previously [18]. For the current tests the low resolution structure of a shorter construct (amino acid 11 to 97) was obtained by using only relatively easily available NMR data such as hbonds, dihedral angles, and backbone NOEs. In addition a medium quality (3.4 Å resolution) Xray structure of RalGDS in complex with Ras is available [19]. Similar to the first test case small but significant conformational changes between RalGDS in its free solution form and its crystal form in complex with Ras are expected. As a third example we used the NMR [20] [PDBID:1Q10] and the crystal structure [21] [PDB ID: 1PGX] of the immunoglobulin binding domain of protein G from Streptococcus, species Lancefield group G. In this case large global structural differences were observed since in solution dimerization introduced by core mutations induces a domain swapping of a βpleated sheet.
Results
Theoretical considerations
General considerations
In the improvement of structures by including information from other sources two main cases have to be distinguished: In the first case the additional information is describing the same set of structures (e. g. a solution structure of a protein at given pH, temperature and sample composition). Here the proper weighting of the additional information is the main point when the "true" structure should be optimally approximated. In the second case the additional information is taken from structures that are supposed to be similar but are different nevertheless (e. g. a solution structure and a crystal structure of a different complex). Here an additional difficulty arises since one has to estimate how well the additional structure will apply to the structure in question since otherwise not a properly biased solution will be obtained. The problem can be formulated as the aim to obtain the most probable structure or the most probable set of structures S_{0} with a conditional probability P(S_{0}A, I_{i}, i = 1, N) higher than a threshold value P_{t}. The combination of information from N different sources I_{i} is a problem often encountered in structural biology. When S_{0} is a set of purely NMR derived protein structures, A would be the general knowledge about the system that is the physical model including the covalent structure and the interaction potentials as they enter a typical molecular dynamics calculation. The NMR derived information I_{1} is usually expressed as a set of experimental restraints R_{1} = {R_{1}^{1},...., R_{1}^{M}} containing M restraints that essentially reduce the accessible conformational space of the probable solutions. The experimental restraints are rather inhomogeneous since they include information such as distance restraints from NOESY spectra, dihedral angle information from Jcouplings or chemical shifts, as well as intra molecular orientational restraints from residual dipolar couplings.
An elegant semi quantitative way to find the most probable structures S_{i} is the simulated annealing protocol [22], where the information A is an intrinsic part of the molecular dynamics routines used.
In case two the situation becomes much more complex since structural information that corresponds not exactly to the conditions used in the actual experiment is added from other sources. When this information is expressed again in the form of sets of restraints R_{i}, structures S_{0}^{p} (p = 1,...,L_{0}, with L_{0} being the total number of structures in set S_{0}) have to be found with high probabilities P(S_{0}^{p}  A, R_{i.} i = 1,...,N). When a restrained simulated annealing approach is used, the physical model is again an implicit feature, that is P(S_{0}^{p}  A, R_{i.} i = 1,...,N) can be replaced by P(S_{0}^{p} R_{i.} i = 1,...,N). With the exception of the restraint set R_{1} corresponding to the leading set of structures S_{1}, the primary restraints R_{i}* (i = 2,...,N) that are derived from the other sources in general do not directly apply to the conditions of the leading set of structures. This can for example occur due to different experimental conditions. As a consequence, new restraints R_{i} have to be calculated, which directly apply to the true set of structures S_{0}. This means that for R_{1} one can define R_{1} = R_{1}*, but for the other restraint sets R_{i}* we have to determine to which amount their individual restraints apply to the true structures S_{0}, as explained below.
P(S_{0}R_{i.} i = 1,...,N) = P(S_{i}R_{1}* = R_{1}, R_{i}*, i = 2,...,N) (1)
In general, the complete description of the sets of restraints R_{i} has to be given as a multidimensional probability distribution p(R_{i}, i = 1,...,N). The different sets of restraints and the restraints themselves are coupled since they are derived from related structures and coupled by the physical model. The probability P and thus the probability distribution p of a set of restraints R_{i} in the leading structures can be calculated from the known R_{i}* by
P(R_{i}) = P(R_{i}R_{i}*, i = 1,...,N)P(R_{i}*, i = 1,...,N) (2)
Equation 2 shows that R_{i} depends again on a multidimensional probability distribution and a simplification of the problem is mandatory.
In the standard simulated annealing approach the individual restraints R_{i}^{k} are assumed primarily as independent, their coupling is performed indirectly by the algorithm itself, which selects consistent solutions. As long as the same restraints R_{i}^{k} are considered (and the restraints in a given structure can be considered to be uncoupled) one can calculate the probability that a newly created restraint R_{0}^{k} that corresponds to the "true" solution structures S_{0} has a given value in the set S_{0}. The restraints R_{0}^{k} are used later on for calculating the set of true solution structures S_{0}.
P(R_{0}^{k}) = P(R_{0}^{k}R_{i}^{k*}, i = 1,...,N)P(R_{i}^{k*}, i = 1,...,N) (3)
The indices i and k specify the data set used and the specific restraint, respectively. Here, it is assumed that in first order the individual restraints R_{0}^{k} and R_{0}^{l} are independent for k≠l. For the calculation of P(R_{0}^{k}) it would be useful to have information about the same restraints in the structures derived from the different data sets. Below it will be shown how a reasonable estimate can be obtained by using a MDsampling procedure.
Equation 3 can be used in two different ways: When a good estimate of the conditional probability is known it can be directly applied. If this is not the case, one can test the hypothesis that P(R_{0}^{k}R_{i}^{k}*) is close to 1 for a data set i. Since we assume that the experimental data 1 represents the "true" ensemble, one can test if a restraint R_{i}^{k} is part of the same ensemble as R_{1}^{k} and simply discard all restraints R_{i}^{k} in the calculation that do not fulfill the condition. P(R_{i}^{k*}, i = 1,...,N) in eq. 3 describes the probability that a substitute restraint R_{i}^{k*} has a given value in the set of structures S_{i} and clearly this probability depends on factors such as the corresponding second moments σ of the restraints in the set of structures S_{i}.
Main features of the algorithm
One important concept is that the available structural information from different sources is first converted into a dense network of derived substitute restraints R_{i}^{k*} that can directly be compared (eq. 3). They are calculated from a structural bundle and are coded as main chain and side chain dihedral angle restraints, as well as distance restraints between selected sets of atoms. The expectation values and standard deviations s of the sample are directly calculated from the given structural bundle by the PERMOLalgorithm [23, 24]. In case the leading structural set S_{1} consists of a set of NMR structures, such a bundle is already available. When no structural bundle is available, it first has to be created in a welldefined manner (see below). The restraints R_{1}^{k}* = R_{1}^{k} (k = 1,..., M) are then combined with the sets of restraints R_{i}^{k}* (i = 2,...,N; k = 1,...,M_{i}, M_{i} ≤ M) to obtain a final set of restraints R_{0}^{k} (k = 1,..., M) and a new bundle of structures S_{0} is calculated. The quality of the new structural bundle can be validated against the original experimental data, a step which increases the confidence in the result and can be used to assess the improvement of the structures but is not required by the algorithm.
Structure improvement of the Rasbinding domain of Byr2
As a first example, the AUREMOLISIC algorithm was tested on the structure improvement of the Rasbinding domain of Byr2 for which both a set of 10 solution NMR structures [16] and a single Xray structure of Byr2 in complex with Ras [17] are available. The Xray structure was used as source structure to improve the NMR structure S_{1}.
Permol parameter used for the generation of distance and angle restraints from out the Xray structure (S_{2}) which then are used in the MD calculation in order to create the Xray bundle (S_{2}^{x}). Distances were calculated between every used atoms.
Restraint generation parameter from the Xray structure (S_{2})  

Confidence level  99.00% 
Distances  
Distance range  0.18 nm – 1.00 nm 
Used atoms  N, C, C_{α}, C_{β}, C_{γ}, C_{δ}, C_{ε}, C_{ζ}, O 
Number  5248 
Angles  
Selected angles  ψ, φ, χ_{1}, χ_{2}, χ_{21}, χ_{22}, χ_{3}, χ_{31}, χ_{32}, χ_{4}, χ_{5}, χ_{6} 
Number  321 
Permol parameters used for the generation of distance, angle and hydrogen bond restraints from the NMR Bundle (S_{1}) and Xray bundle (S_{2}^{x}) which then are used for combination.
Restraint generation parameter from the NMR Bundle (S_{1}) and the Xray Bundle (S_{2}*) (R_{2})  

Confidence level  99.90% 
Selected residues NMR  1–95 
Selected residues Xray  1–56, 70–95 
Distance range bb  0.18 nm – 1.00 nm 
Used Atoms bb  N, C 
Distance range sc  0.18 nm – 0.60 nm 
Used Atoms sc  H_{N}, H_{α}, H_{α2}, H_{α3}, H_{β}, H_{β1}, H_{β2}, H_{β3}, H_{γ}, H_{γ2}, H_{γ3}, H_{γ1}, H_{δ}, H_{δ1}, H_{δ2}, H_{δ3}, H_{ε}, H_{ε2}, H_{ε3}, H_{ε1} 
Number NMR  6642 
Number Xray  5600 
Angles  
Selected angles  ψ, φ, χ_{1}, χ_{2}, χ_{21}, χ_{22}, χ_{3}, χ_{31}, χ_{32}, χ_{4}, χ_{5}, χ_{6} 
Number NMR  453 
Number Xray  396 
Hydrogen bonds  
Donators  H_{N}, H_{γ}, H_{η11}, H_{η12}, H_{η22}, H_{ζ1}, H_{ζ2}, H_{ζ3}, H_{γ1} 
Acceptors  O, O_{δ1}, O_{δ2}, O_{ε2}, N, N_{η1}, N_{η2}, N_{δ2} 
Number NMR  106 
Number Xray  53 
Restraint combination parameters and obtained numbers of restraints.
Combination parameters  

Angle filter  Favored regions, GLY, PRO, CHI1CHI2: < level 2 
Hbond threshold  0.75% 
Hbond exchange  0.90% 
Significance level  0.2% 
Number of obtained restraints  
Distance  6642 
Angles  338 
Hbonds  26 
Quality values from AUREMOL and Procheck.
S_{1}(NMR)  S_{2}(Xray)  S_{0}  S_{0_WR}  

AUREMOL Rfac (whole)  0.534    0.455  0.451 
RMSD MolMol N [nm] to mean  0.144  0.067  0.026  0.033 
Ramachandran m.f. + a. [%]  87.3  88.5  94.3  90.8 
Most favored [%]  67.8  70.1  71.3  78.2 
Additional allowed [%]  19.5  18.4  23.0  12.6 
Generously allowed [%]  11.5  8.0  4.6  8.0 
Disallowed [%]  1.1  3.4  1.1  1.1 
Structure improvement of the Rasbinding domain of RalGDSRBD
As a second test case the Rasbinding domain of RalGDS was chosen using a set of low resolution solution NMR structures as input together with a single Xray structure of RalGDS in complex with Ras [19]. As in the first test case the Xray structure was used to improve the NMR structure.
Permol parameter used for the generation of distance and angle restraints from out the Xray structure (S_{2}) which then are used in the MD calculation in order to create the Xray bundle (S_{2}^{x}). Distances were calculated between every used atoms.
Restraint generation parameter from the Xray structure (S_{2}) equal to TABLE 1  

Confidence level  99.90% 
Distances  
Distance range bb  0.18 nm – 1.00 nm 
Used atoms  N, C, O 
Distance range sc  0.18 nm – 1.00 nm 
Used atoms  C_{β}, C_{γ}, C_{δ}, C_{ε}, C_{ζ} 
Number  2001 
Angles  
Selected angles  ψ, φ, χ_{1}, χ_{2}, χ_{21}, χ_{22}, χ_{3}, χ_{31}, χ_{32}, χ_{4}, χ_{5}, χ_{6} 
Number  263 
Permol parameters used for the generation of distance, angle and hydrogen bond restraints from the NMR Bundle (S_{1}) and Xray bundle (S_{2}^{x}) which then are used for combination.
Restraint generation parameter from the NMR Bundle (S_{1}) and the Xray Bundle (S_{2}*) (R_{2})  

Confidence level  99.90% 
Selected residues NMR  11–97 
Selected residues Xray  12–49, 56–77, 90–96 
Distance range bb  0.18 nm – 1.00 nm 
Used Atoms bb  N 
Distance range sc  0.5 nm – 1.5 nm 
Used Atoms sc  H_{δ2}, H_{δ21}, H_{δ22}, H_{δ3}, H_{ε}, H_{ε2}, H_{ε3}, H_{ε1} 
Number NMR  2344 
Number Xray  1784 
Angles  
Selected angles  ψ, φ, χ_{1}, χ_{2}, χ_{21}, χ_{22}, χ_{3}, χ_{31}, χ_{32}, χ_{4}, χ_{5}, χ_{6} 
Number NMR  417 
Number Xray  326 
Hydrogen bonds  
Donators  H_{N}, H_{γ}, H_{η11}, H_{η12}, H_{η22}, H_{ζ1}, H_{ζ2}, H_{ζ3}, H_{γ1} 
Acceptors  O, O_{δ1}, O_{δ2}, O_{ε2}, N, N_{η1}, N_{η2}, N_{δ2} 
Number NMR  70 
Number Xray  13 
Restraint combination parameters and obtained numbers of restraints.
Combination parameters  

Angle filter  Favored regions, GLY, PRO, CHI1CHI2: < level 2 
Hbond threshold  0.75% 
Hbond exchange  0.90% 
Significance level  0.2% 
Number of obtained restraints  
Distance  2344 
Angles  285 
Hbonds  27 
Quality values from AUREMOL and Procheck.
S_{1}(NMR)  S_{2}(Xray)  S_{0}  

AUREMOL Rfac (whole)  0.383    0.353 
RMSD MolMol N [nm] to mean  0.21  0.13  0.07 
RMSD MolMol bb [nm] pairwise  0.33  0.19  0.11 
Ramachandran m.f. + a. [%]  91.3  74.4  88.8 
Most favored [%]  72.8  36.7  72.8 
Additional allowed [%]  18.5  38.0  16.0 
Generously allowed [%]  6.2  16.5  7.4 
Disallowed [%]  2.5  8.9  3.7 
Structure improvement of the B2 ImmunoglobulinBinding Domain of Streptococcal protein G
Discussion and conclusion
Any determination of solution structures from experimental data is not (as sometimes automatically assumed) the direct calculation of the only existing solution but the search for a set of structures consistent with the experimental data and additional knowledge of the system (in this regard see also the paper by Rieping et al. [28]). The use of substitute restraints as introduced here with a simulated annealing protocol for restrained molecular dynamics is an efficient method to combine strongly coupled knowledge from different sources. A proper bias toward the selected target set of structures can be achieved by Bayesian reasoning, thus using the additional information only to increase the probability to find the "true" ground state set of structures corresponding to the experimental conditions selected. The combination with validation tools such as the calculation of NMR Rfactors strengthens the impact of the method considerably since the improvement of the structures can be assessed quantitatively. This is clearly visible for the example of Byr2RBD where our improved structures also better explain the experimental data. Even the choice of largely inappropriate additional knowledge does not lead to distortion of the original structure as shown for the immunoglobulin binding domain.
In the present paper the automated ISIC algorithm was used to improve a solution structure by related Xray data. The qualities of both the originally submitted Byr2 NMR structures as well as the corresponding Xray structure were both limited; therefore, giving an excellent example for testing the ISIC algorithm. The same is true for the RalGDSRBD test case where both the set of low resolution NMR structures of RalGDS that were calculated only from easily available experimental data and the corresponding Xray data are of medium quality. Especially this last test case is a good example how the inclusion of additional data can speed up the NMR structure determination process for example in structural genomics efforts. However, ISIC can also be used for other applications such as the improvement of a NMR structure of a given protein by NMR structures of homologues proteins or pure homology models. The same would be true for the improvement of Xray structures by NMRdata when some parts of the electron density map are illdefined.
Here, the Xray Rfactor would provide the validation tool. A similar application that one may encounter more often in the future is the calculation of NMRstructures of very large proteins using only a limited set of experimental data. One can think about other scenarios for the application of ISIC. When no Xray structure of the protein is available homology models from related proteins may be used.
Methods
Details of the algorithm
Calculation of the network of substitute restraints
The calculation of a dense network of dihedral angle and distance restraints with the PERMOLalgorithm from bundles of structures has been described earlier [23, 24]. and is implemented in AUREMOL [29]. Here, the expectation values and standard deviations are calculated. Error ranges are approximated from the standard deviations on the basis of the ttest. In case the original set contains only one structure the corresponding structural bundle has to be calculated first. In this regard we will discuss in the following only the most important case of crystal structures that are usually represented as distinct single structures S_{i}^{p} (p = 1). But the principle can be applied to other data.
Depending on the unit cell and the refinement method used sometimes more than one structure is deposited in the data base (p > 1). However, even then the statistical ensemble is too small. The solution to this problem is that in analogy to the calculation of NMRstructures the inherent coordinate uncertainties can be used to calculate structural bundles and from those a set of substitute restraints R_{i}* is obtained. Therefore, we first determine a set of restraints R_{i}^{x}* that represent the original Xray structure(s) from interatomic distances and dihedral angles in the crystal structure(s) together with the corresponding coordinate uncertainties. Using these restraints a set of structures S_{i}^{x} is created, from which the set of substitute restraints R_{i}* is created using PERMOL. For generating the set R_{i}^{x}* two factors that are usually published together with the structure that can be used for a conservative estimate of the structural variations. In a first approximation the expected average error in atomic positions σ(r_{0}) is about 1/3 of the resolution R [30]. In a more involved analysis σ(r_{m}) of the atoms m possessing low Bfactors is often estimated from Luzzati plots. Second the local Bfactors can be used to introduce additional errors for specific atoms possessing significant Bvalues. Static and thermal disorder can effectively spread out the electron density of a given atom mand this increases its Bfactor. The Bfactor is related to the rms error in the position of an atom by the equation:
$\sigma ({r}_{m})=\sqrt{\frac{{B}_{m}}{8\cdot {\pi}^{2}}}\left(4\right)$
B_{m} denotes the Bfactor of a given atom m and σ(r_{m}) is the corresponding average error in atom positions.
Since for the calculations a conservative estimate of distances ranges is most useful, the square of the standard deviation σ^{2}(d_{m,n}) of the distance d_{m,n}between two atoms m and n (m  n) is approximated by
σ^{2}(d_{m,n}) = σ(r_{ m })^{2} + σ(r_{ n })^{2} + 2σ(r_{0})^{2} (5)
For a more detailed description on the precision of protein structures see the article by Cruickshank [31]. When more than one structure of the same crystal is contained in the data base they can be considered as separate structural sets S_{i} and handled in an analogous way. As mentioned above, using this preliminary set of restraints R_{i}^{x}* a bundle of structures S_{i}^{x} is calculated by employing programs such as DYANA [32], XPLORNIH [33] or CNS [34]. From this bundle a set of restraints R_{i}* is calculated in the same way as it has been done for the restraint set R_{1} of the leading structure S_{1}.
Restraint combination
As derived above (eq. 2 and eq. 3), from the sets of restraints R_{1} (R_{1} = R_{1}*) and R_{i}* (i = 2,...,N) a new set R_{0} has to be calculated, which then enters then the final structure calculation. Although the algorithm produces restraint sets R_{i}* that are matched to the leading set R_{1} for all data sets, in some cases no restraint R_{i}^{k*} matching a restraint R_{1}^{k} can be created for data set i. Such a case can occur when an atom or an amino acid of set R_{1} does not exist in the data used to generate set R_{i}*. In this case R_{0}^{k} is set to R_{1}^{k}. In all other cases the final restraint R_{0}^{k} has to be calculated according to eq. 3. Since P(R_{0}^{k}R_{i}^{k}*, i > 1) is difficult to determine for distances and angles, we apply a pair wise zero hypothesis test P(R_{1}^{k}R_{i}^{k}*, i > 1), that the corresponding two restraints of the two data sets describe the same ensemble. If yes, a new probability distribution for the restraint is calculated, if no, the restraint R_{i}^{k}* is discarded and only R_{1}^{k} is used. For the case that also errors in the leading restraint set R_{1} are expected it is possible to also discard the restraint R_{1}^{k}. However, this special option was not used in the current tests. When large structural bundles are created (as one of the possible options), the probability distributions can directly be obtained from the bundle. Since we have no a priori knowledge about the distribution type of the individual restraints, we can apply known statistical tests like the rank dispersion test according to Siegel and Tukey [35] or the comparison of two independent samples according to Kolmogoroff and Smirnoff [35]. In case that the investigated restraints possess the same or nearly the same type of distribution, the so called U test according to Wilcoxon, Mann and Whitney [35] can be applied. It is the distribution free counterpart to the parametrical Student ttest that strictly can only be applied for normally distributed data.
On a variety of data sets we tested according to Kolmogoroff and Smirnoff, whether our data can be assumed to follow a normal distribution. As a result it was found that for all our test cases the data are normally distributed within a small degree of error. Therefore, for practical reasons it is sufficient to assume that the distribution can be approximated sufficiently well by a Gaussian distribution.
As a consequence we are allowed to check for the null hypothesis by enforcing a pairwise twosided ttest that compares the individual distance and angle restraints of all restraint sets R_{i}* (i > 1) with the corresponding restraints of set R_{1}*. The average distances <${d}_{i}^{k*}$> and dihedral angles <${a}_{i}^{k*}$> together with the corresponding standard deviations s(d_{i}^{k*}) and s(a_{i}^{k*}) have been calculated from the structural bundles and the tvalues t_{1}^{k} (i > 1) are now calculated for the distances and angles by:
${t}_{1}^{k}=\frac{\left<{R}_{1}^{k}><{R}_{i}^{k*}>\right}{\sqrt{\frac{{s}^{2}({R}_{1}^{k})}{{L}_{1}}+\frac{{s}^{2}({R}_{i}^{k*})}{{L}_{i}}}}\left(6\right)$
After that the individual tvalues ${t}_{1}^{k}$ are compared to the critical tvalue t_{c}. The critical tvalue at a given significance level and known degrees of freedom f (with f = L_{1}  L_{ i } 1) can be calculated or looked up in the tvalue table.
In case the calculated tvalue t_{1}^{k} is greater than the critical tvalue t_{c}, the null hypothesis has to be rejected and the restraint R_{i}^{k*} is not used. Restraints with t_{1}^{k} ≤ t_{c} are retained and the weighted average value <R_{0}^{k}> of the restraint R_{0}^{k} is calculated together with the corresponding weighted total standard deviation σ(R_{0}^{k}).
Hydrogen bond restraints
In addition to combined dihedral angle and distance restraints the ISIC algorithm also uses backbone hydrogen bond restraints R_{i}^{k}. For the sake of clarity they will in the following be denoted as H_{i}^{k}. In principle hydrogen bonds could be handled in a similar way as described above for distance restraints by using the distributions of hydrogen bonding energies as parameters, where the hydrogen bond energies are calculated according to Freund [36]. Since rapid calculations are required within ISIC a somewhat faster method is actually used for hydrogen bond definition accepting a maximum NHO distance of 0.24 nm and a hydrogen bond angle a_{NHO} of 180° ± 35°. In ISIC the frequencies X_{i}^{k*} of the hydrogen bonds in the different structural bundles S_{i} are determined and used as hydrogen bond probabilities P(H_{i}^{k*}). From that the conditional probabilities P(H_{0}^{k}H_{1}^{k}, H_{i}^{k*}, i = 2,...N) that a hydrogen bond exists in the solution structure are obtained.
$P({H}_{0}^{k}{H}_{1}^{k},{H}_{i}^{k}*,i=2,\dots ,N)=\frac{P(H)(P({H}_{1}^{k},{H}_{i}^{k}*,i=1,\dots ,N)}{P(H)(P({H}_{1}^{k},{H}_{i}^{k}*,i=2,\dots ,N)+(1P(H)(1P({H}_{1}^{k},{H}_{i}^{k},i=2,\dots ,N))}\left(7\right)$
Assuming that the restraints from different structural sets can be considered statistically independent and that with eq. 2 the probability P(H_{i}^{k}) that a hydrogen bond exists also under the conditions of true solution structures can be written as
P(H_{i}^{k}) = P(H_{i}^{k}H_{i}^{k}*, i = 1,...,N)P(H_{i}^{k}*, i = 1,...,N) (8)
one obtains from eq. 7 and eq. 8
$\begin{array}{l}P({H}_{0}^{k}{H}_{1}^{k},{H}_{i}^{k}*,i=2,\dots ,N)=\hfill \\ \frac{P(H)(P({H}_{1}^{k}\cdot {\displaystyle \prod _{i=2}^{N}P({H}_{i}^{k}{H}_{i}^{k}*)P({H}_{i}^{k}*)}}{P(H)(P({H}_{1}^{k}\cdot {\displaystyle \prod _{i=2}^{N}P({H}_{i}^{k}{H}_{i}^{k}*)P({H}_{i}^{k}*))+(1P(H)(1P({H}_{0}^{k})(P({H}_{1}^{k}\cdot {\displaystyle \prod _{i=2}^{N}P({H}_{i}^{k}{H}_{i}^{k}*)P({H}_{i}^{k}*))}}}\hfill \end{array}\left(9\right)$
For the conditional probability that a hydrogen bond P(H_{o}^{k}H_{i}^{k}*) also exists in solution when it exists in the crystal structure, a plausible value of 0.9 has been assumed in this paper. More accurate values for P(H_{o}^{k}H_{i}^{k}*) could be obtained by a statistical analysis of the existing structural data base. The a priori probability P(H) that a hydrogen bond between a given pair of atoms exists is rather small, a plausible value would be 1/Q with Q the number of residues of the protein under consideration.
In case that P(${H}_{0}^{k}{H}_{1}^{k},{H}_{i}^{k}*$, i = 2,..., N) exceeds a given userdefined threshold, for example 0.75, the corresponding hydrogen bond restraint is accepted and transformed in appropriate distance restraints as usually done in MDcalculations.
Filtering of angle restraints
When dihedral angles are combined and averaged it is possible that the calculated average values are located in disallowed regions of the Ramachandran plot. A filter is implemented that allows the user to disregard backbone and side chain dihedral angles as a function of their presence in unfavorable regions of the Ramachandran plot.
NMR spectroscopy and structures
The sequential assignments of the NMR signals of Byr2 and the experimental parameters have been described in [37]. A 2D ^{1}H NOESY spectrum obtained with a mixing time of 100 ms was used for structure validation. As input data the NMR structure of the free Rasbinding domain of Byr2 (Byr2RBD) from Schizosaccharomyces pombe (residues 71–165 here referred to as residues 1–95) [16] [PDB ID: 1I35], the crystal structure of Byr2RBD in complex with Ras [17] [PDB ID: 1K8R], the NMR structure [20] [PDB ID: 1Q10] and the crystal structure [21] [PDB ID: 1PGX] of the immunoglobulin binding domain of protein G from Streptococcus, species Lancefield group G were selected.
Programs and structure validation
NMR data evaluation was performed with the program AUREMOL (V 2.2.1). Expectation values and standard deviations of cyclic quantities were calculated according to Döker et al., [38]. Sequence alignment was performed with a module for pairwise sequence alignment based on the NeedlemanWünsch algorithm and the BLOSUM62 matrix that we recently included in the AUREMOL module PERMOL [23, 24]. The resulting refined solution structures were validated on the experimental NMR data by the calculation of NMR Rfactors [27]. For investigating the stereochemical quality PROCHECKNMR was employed [39] and rmsd values were calculated using MOLMOL [40].
Molecular dynamics calculations
Structure calculations were performed using the torsion angle molecular dynamics program DYANA v1.5 [32]. Details of the used standard simulated annealing protocol are given in the corresponding publication. From the resulting structures the best in terms of DYANA target function were selected for refinement in explicit solvent [25, 26].
Implementation
ISIC is written in ANSIC and is fully incorporated in the software package AUREMOL http://www.auremol.de.
Abbreviations
 NMR:

nuclear macgnetic resonance
 rmsd:

root mean square deviation
 RBD:

Ras binding domain.
Declarations
Acknowledgements
Financial support by the European Commission (SPINE), the Fonds der Chemischen Industrie and the Deutsche Forschungsgemeinschaft is gratefully acknowledged
Authors’ Affiliations
References
 Annila A, Aito H, Thulin E, Drakenberg T: Recognition of protein folds via dipolar couplings. J Biomol NMR 1999, 14: 223–230. 10.1023/A:1008330519680View ArticleGoogle Scholar
 Bowers PM, Strauss CEM, Baker D: De novo protein structure determination using sparse NMR data. J Biomol NMR 2000, 18: 311–318. 10.1023/A:1026744431105View ArticlePubMedGoogle Scholar
 Simons KT, Kooperberg C, Huang E, Baker D: Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functions. J Mol Biol 1997, 268: 209–225. 10.1006/jmbi.1997.0959View ArticlePubMedGoogle Scholar
 Simons KT, Ruczinski I, Kooperberg C, Fox BA, Bystroff C, Baker D: Improved Recognition of NativeLike protein Structures Using a Combination of SequenceDependent and SequenceIndependent Features of Proteins. Proteins 1999, 34: 82–95. 10.1002/(SICI)10970134(19990101)34:1<82::AIDPROT7>3.0.CO;2AView ArticlePubMedGoogle Scholar
 Delagio F, Kontaxis G, Bax A: Protein Structure Determination Using Molecular Fragment Replacement and NMR Dipolar Couplings. J Am Chem Soc 2000, 122: 2142–2143. 10.1021/ja993603nView ArticleGoogle Scholar
 Andrec M, Harano Y, Jacobson MP, Friesner RA, Levy RM: Complete protein structure determination using backbone residual dipolar couplings and sidechain rotamer prediction. J Struct Funct Genomics 2002, 2: 103–111. 10.1023/A:1020435630054View ArticlePubMedGoogle Scholar
 Haliloglu T, Kolinski A, Skolnick J: Use of Residual Dipolar Couplings as Restraints in Ab Initio Protein Structure Prediction. Biopolymers 2003, 70: 548–562. 10.1002/bip.10511View ArticlePubMedGoogle Scholar
 Albrecht M, Hanisch D, Zimmer R, Lengauer T: Improving fold recognition of protein threading by experimental distance constraints. In Silico Biology 2002, 2: 1–12.Google Scholar
 Li W, Zhang Y, Kihara D, Huang YJ, Zheng D, Montelione G, Kolinski A, Skolnick J: TOUCHSTONEX: Protein Structure Prediction With Sparse NMR Data. Proteins 2003, 53: 290–306. 10.1002/prot.10499View ArticlePubMedGoogle Scholar
 Shaanan B, Gronenborn AM, Cohen GH, Gilliland GL, Veerapandian B, Davies DR, Clore GM: Combining Experimental Information from Crystal and Solution Studies: Joint Xray and NMR refinement. Science 1992, 257: 961–964.View ArticlePubMedGoogle Scholar
 Schiffer CA, Huber R, Wüthrich K, Gunsteren WF: Simultaneous Refinement of the Structure of BPTI Against NMR Data Measured in Solution and Xray Diffraction Data Measured in Single Crystals. J Mol Biol 1994, 241: 588–599. 10.1006/jmbi.1994.1533View ArticlePubMedGoogle Scholar
 Hoffman DW, Cameron CS, Davies C, White SW, Ramakrishnan V: Ribosomal Protein L9: A Structure Determination by the Combined Use of Xray Crystallography and NMR Spectroscopy. J Mol Biol 1996, 264: 1058–1071. 10.1006/jmbi.1996.0696View ArticlePubMedGoogle Scholar
 Miller M, Lubkowski J, Rao KKM, Danishefsky AT, Omichinski JG, Sakaguchi K, Sakamoto H, Apella E, Gronenborn AM, Clore GM: The Oligomerization Domain of p53: Crystal Structure of the Trigonal Form. FEBS Lett 1996, 399: 166–170. 10.1016/S00145793(96)012318View ArticlePubMedGoogle Scholar
 Raves ML, Doreleijers JF, Vis H, Vorgias CE, Wilson KS, Kaptein R: Joint refinement as a tool for thorough comparison between NMR and Xray data and structures of HU protein. J Biomol NMR 2001, 21: 235–248. 10.1023/A:1012927325963View ArticlePubMedGoogle Scholar
 Chao J, Williamson JR: Joint XRay and NMR Refinement of the Yeast L30emRNA Complex. Structure 2004, 12: 1165–1176. 10.1016/j.str.2004.04.023View ArticlePubMedGoogle Scholar
 Gronwald W, Huber F, Grünewald P, Spörner M, Wohlgemuth S, Herrmann C, Kalbitzer HR: Solution Structure of the Ras binding Domain of the Protein Kinase Byr2 from Schizosaccharomyces pombe . Structure 2001, 9: 1029–1041. 10.1016/S09692126(01)006712View ArticlePubMedGoogle Scholar
 Scheffzek K, Grünewald P, Wohlgemuth S, Kabsch W, Tu H, Wigler M, Wittinghofer A, Herrmann C: The RasByr2RBD Complex: Structural Basis for Ras Effector Recognition in Yeast. Structure 2001, 9: 1043–1050. 10.1016/S09692126(01)006748View ArticlePubMedGoogle Scholar
 Geyer M, Herrmann C, Wohlgemuth S, Wittinghofer A, Kalbitzer HR: Structure of the Rasbinding domain of RalGEF and implications for Ras binding and signalling. Nat Struc Biol 1997, 4: 694–699. 10.1038/nsb0997694View ArticleGoogle Scholar
 Vetter IR, Linnemann T, Wohlgemuth S, Geyer M, Kalbitzer HR, Herrmann C, Wittinghofer A: Structural and Biochemical Analysis of RasEffector signaling via RalGDS. FEBS Lett 1999, 451: 175–180. 10.1016/S00145793(99)005554View ArticlePubMedGoogle Scholar
 Byeon IL, Louis JM, Gronenborn AM: A protein Contortionist: Core mutations of GB1 that Induce Dimerization and Domain Swapping. J Mol Biol 2003, 333: 141–152. 10.1016/S00222836(03)009288View ArticlePubMedGoogle Scholar
 Achari A, Hale SP, Howard AJ, Clore GM, Gronenborn AM, Hardman KD, Whitlow M: 1.67Å Xray Structure of the B2 ImmunoglobulinBinding Domain of Strptococcal Protein G and Comparison to the NMR Structure of the B1 Domain. Biochemistry 1992, 31: 10449–10457. 10.1021/bi00158a006View ArticlePubMedGoogle Scholar
 Kirkpatrick S, Gelatt CD, Vecchi MP: Optimization by Simulated Annealing. Science 1983, 220: 671–680.View ArticlePubMedGoogle Scholar
 Möglich A, Weinfurtner D, Maurer T, Gronwald W, Kalbitzer HR: A Restraint Molecular Dynamics and Simulated Annealing Approach for Protein Homology Modeling Utilizig Mean angles. BMCBioinformatics 2005, 6: 91. 10.1186/14712105691PubMed CentralView ArticlePubMedGoogle Scholar
 Möglich A, Weinfurtner D, Gronwald W, Maurer T, Kalbitzer HR: PERMOL: RestraintBased Protein Homology Modeling Using DYANA or CNS. Bioinformatics 2005, 21: 2110–2111. 10.1093/bioinformatics/bti276View ArticlePubMedGoogle Scholar
 Nabuurs SB, Nederveen AJ, Vranken W, Doreleijers JF, Bonvin AMJJ, Vuister GW, Vriend G, Spronk CAEM: DRESS: a Database of REfined Solution NMR Structures. Proteins 2004, 55: 483–486. 10.1002/prot.20118View ArticlePubMedGoogle Scholar
 Linge JP, Williams MA, Spronk CAEM, Bonvin AMJJ, Nilges M: Refinement of protein structures in explicit solvent. Proteins 2003, 50: 496–506. 10.1002/prot.10299View ArticlePubMedGoogle Scholar
 Gronwald W, Kirchhofer R, Gorler A, Kremer W, Ganslmeier B, Neidig KP, Kalbitzer HR: RFAC, a program for automated NMR Rfactor estimation. J Biomol NMR 2000, 17: 137–151. 10.1023/A:1008360715569View ArticlePubMedGoogle Scholar
 Rieping W, Habeck M, Nilges M: Inferential Structure Determination. Science 2005, 309: 303–306. 10.1126/science.1110428View ArticlePubMedGoogle Scholar
 Gronwald W, Kalbitzer HR: Automated structure determination of proteins by NMR spectroscopy. Prog NMR Spectrosc 2004, 44: 33–96. 10.1016/j.pnmrs.2003.12.002View ArticleGoogle Scholar
 Holton J, Alber T: Automated Protein Crystal Structure Determination using ELVES. Proc Natl Acad Sci USA 2004, 101: 1537–1542. 10.1073/pnas.0306241101PubMed CentralView ArticlePubMedGoogle Scholar
 Cruickshank DWJ: Remarks About Protein Structure Precision. Acta Cryst D 1999, 55: 583–601. 10.1107/S0907444998012645View ArticleGoogle Scholar
 Güntert P, Mumenthaler C, Wüthrich K: Torsion Angle Dynamics for NMR Structure Calculation with the New Program DYANA. J Mol Biol 1997, 273: 283–298. 10.1006/jmbi.1997.1284View ArticlePubMedGoogle Scholar
 Schwieters CD, Kuszewski J, Tjandra NL, Clore GM: The XplorNIH NMR molecular structure determination package. J Magn Reson 2003, 160: 65–73. 10.1016/S10907807(02)000149View ArticlePubMedGoogle Scholar
 Brünger AT, Adams PD, Clore GM, DeLano WL, Gros P, Grossekunstleve RW, Jiang JS, Kuszewski J, Nilges M, Pannu NS, Read RJ, Rice LM, Simonson T, Warren GL: Crystallography & NMR System: A New Software Suite for Macromolecular Structure Determination. Acta Cryst 1998, D54: 905–921.Google Scholar
 Sachs L: Angewandte Statistik. Berlin: Springer Verlag; 1997.View ArticleGoogle Scholar
 Freund J University of Heidelberg; 1994.
 Huber F, Gronwald W, Wohlgemuth S, Herrmann C, Geyer M, Wittinghofer A, Kalbitzer HR: Letter to the Editor: Sequential NMR Assignment of the RasBinding Domain of Byr2. J Biomol NMR 2000, 16: 355–356. 10.1023/A:1008335420475View ArticlePubMedGoogle Scholar
 Döker R, Maurer T, Kremer W, Neidig KP, Kalbitzer HR: Determination of Mean and Standard Deviation of Dihedral Angles. BBRC 1999, 257: 348–350.PubMedGoogle Scholar
 Laskowski RA, Rullmann JAC, MacArthur MW, Kaptein R, Thornton JM: AQUA and PROCHECKNMR Programs for checking the quality of protein structures solved by NMR. J Biomol NMR 1996, 8: 477–486. 10.1007/BF00228148View ArticlePubMedGoogle Scholar
 Koradi R, Billeter M, Wüthrich K: MOLMOL: a program for display and analysis of macromolecular structures. J Mol Graphics 1996, 14: 51–55. 10.1016/02637855(96)000094View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.