Skip to main content
  • Research article
  • Open access
  • Published:

Comparative modelling by restraint-based conformational sampling

Abstract

Background

Although comparative modelling is routinely used to produce three-dimensional models of proteins, very few automated approaches are formulated in a way that allows inclusion of restraints derived from experimental data as well as those from the structures of homologues. Furthermore, proteins are usually described as a single conformer, rather than an ensemble that represents the heterogeneity and inaccuracy of experimentally determined protein structures. Here we address these issues by exploring the application of the restraint-based conformational space search engine, RAPPER, which has previously been developed for rebuilding experimentally defined protein structures and for fitting models to electron density derived from X-ray diffraction analyses.

Results

A new application of RAPPER for comparative modelling uses positional restraints and knowledge-based sampling to generate models with accuracies comparable to other leading modelling tools. Knowledge-based predictions are based on geometrical features of the homologous templates and rules concerning main-chain and side-chain conformations. By directly changing the restraints derived from available templates we estimate the accuracy limits of the method in comparative modelling.

Conclusion

The application of RAPPER to comparative modelling provides an effective means of exploring the conformational space available to a target sequence. Enhanced methods for generating positional restraints can greatly improve structure prediction. Generation of an ensemble of solutions that are consistent with both target sequence and knowledge derived from the template structures provides a more appropriate representation of a structural prediction than a single model. By formulating homologous structural information as sets of restraints we can begin to consider how comparative models might be used to inform conformer generation from sparse experimental data.

Background

The three-dimensional (3D) structures of proteins provide valuable insights into their biochemical activities and biological functions. The most widely used experimental methods for determining 3D structures, X-ray crystallography and nuclear magnetic resonance (NMR), have limitations in both time and tractability. For X-ray crystallography sufficient quantities of purified proteins may be difficult to produce and to crystallize when obtained [1]. For NMR, proteins are often too large or insufficiently soluble to be tractable [2]. Nevertheless, genome sequencing projects create a continuing need to translate sequence information into structure [3].

Where experimental methods are problematic, theoretical models can often provide valuable information about the structure of interest. Methods that use physical and chemical properties of amino acids together with information about small fragments of already solved structures have had success with smaller proteins but are still limited in accuracy and reliability [4]. However, knowledge-based methods, such as comparative modelling, which exploit information about amino acid substitutions that accumulate during divergent evolution and are compatible with preserving folded state and function [5], have been the most successful in producing good quality models. Comparative modelling approaches, which can be broadly classified as fragment-based, for example COMPOSER [6], 3D-JIGSAW [7] and SWISS-MODEL [8], or restraint-based, for example MODELLER [9] continue to improve [10]. The latest approaches, for example TASSER, use a combination of threading and restraint optimisation by sampling conformational restraints using Monte Carlo methods [11]. Other protocols use Monte Carlo searches in a reduced space determined by restraints from multiple templates and fragments generated from a consensus of results from a number of modelling programs [12]. Nevertheless, recent CASP exercises [13, 14] have demonstrated little significant improvement and have identified empirical limits for knowledge-based protein structure prediction, even when the problem of incorrect alignment has been eliminated [15].

We have previously applied the restraint based conformational search engine RAPPER [16] to a number of protein modelling problems where partial structural information was available, including ab initio loop modelling [17, 18], Cα tracing [19] and modelling into electron density from X-ray crystallographic experiments [20, 21]. Here we develop the approach for comparative modelling, focusing on sampling ϕ/ψ torsion angles under spatial restraints derived from knowledge of homologous structures. We assess the limitations of the method by comparing the use of spatial restraints derived from the homologous template structures with that using restraints derived from experimentally defined structures of targets. We show that significant improvements in model accuracy can be achieved by incorporating additional restraints from main chain curvature and torsion and as side chain χ angle conservation derived from the structures of homologues. By generating an ensemble of solutions consistent with both the target sequence and template structures we provide a more appropriate representation of the structure. By formulating homologous structural information as sets of restraints we can begin to consider how comparative models might be used to inform conformer generation from sparse experimental data [22].

Results and Discussion

We explored a number of different modes of modelling using RAPPER. The principal differences between these modes lie in the information used from the templates to derive the restraints. In order to minimise problems arising from inaccuracy of sequence alignment, we used structure-based alignments from the HOMSTRAD database [23]. For each of 10 targets, models were generated for one member of the family using four homologues, constructing fifteen models using all possible combinations of four, three, two and one homologue(s) as templates. For comparison, models were also built using the standard modelling mode in MODELLER. This combinatorial approach allowed RAPPER to be parameterised and the performance assessed against a variety of templates of varying sequence identity. In order to assess the usefulness of different restraints described below, models were generated for a greater number of targets using a more limited subset of templates based on percentage sequence identity of target and template. Again, for comparison purposes, models were also generated using MODELLER (See Tables 1, 2, 3, 4, 5).

Table 1 Templates for Each of the Targets Modelled.
Table 2 Fifteen Different Combinations of Templates Used in Exploring the Effect of PID.
Table 3 All-Atom RMSD for RAPPER Models for 15 Combinations of Template to Target.
Table 4 All-Atom RMSD for Models Built Using Different RAPPER Restraint Derivations and Templates and for Models Built by Modeller.
Table 5 Statistical Analysis of Modelling Methods.

Deriving theoretically optimal restraints

As a control, we also modelled the target structure based on the experimental Cα positions using RAPPER (as previously described [19]); this provides an upper bound on the quality of the models obtainable using Cα coordinates alone from the homologous templates. The root-mean-square deviations (RMSDs) of these models from the corresponding experimentally determined structures are similar in magnitude to the experimental variation in solution structures determined by NMR. For example, the solution structure of α-parvalbumin has an all-atom RMSD of 1.02 ± 0.08Å (excluding the five first and last residues) [24]. Models built using the Cα trace mode of RAPPER – guided by Cα atom coordinates derived from experimental structures – have loop regions with up to 1Å RMSD and organised secondary structural elements with up to 0.5–0.6Å RMSD from the parent structure. A significant proportion of the difference may result from different crystal packing in the target structure and that of the homologues used in the modelling. [25]. The remainder probably represents errors introduced by (imperfect) restraints from homologous structures.

We calculated the RMSD at each residue in order to identify large local errors which can have an undue influence on the overall RMSD [26]. We calculated other measures such as TM [27], GDT [28] and MaxSub [29] scores as well as the overall RMSD but all failed to identify local regions of inaccuracy in the model. This is illustrated by models for the glycosyl hydrolase family 22 (Ghf22) protein family; Figure 1 shows that the last three residues contribute most to the overall RMSD and this is due to a hook-like conformation of the three C-terminal residues in all available templates, which is not present in the experimental target structure, perhaps due to crystal packing.

Figure 1
figure 1

Contribution to Overall RMSD by Individual Residue. The per-residue all-atom RMSD for models generated by RAPPER (solid red) and MODELLER (dotted green) for a target of the Ghf22 family. The greatest contribution to the overall RMSD can be seen to be from the C-terminal residues. If these are excluded from the overall RMSD the recalculated RMSD is the same for both modelling procedures.

Next, we generated the best possible comparative models but now using optimal spatial restraints for each residue. These target structures were superimposed on those of the templates in order to ascertain, for each residue, which template Cα atom is closest to the target. The coordinates of these atoms were then used as the centres for the restraint spheres in an analogous way to the Cα trace mode of RAPPER previously developed for X-ray refinement [19].

Models based on restraints derived from the closest available template (defined by percentage sequence identity) are often close in accuracy to those defined by the Cα-trace model based on the actual structure (all-atom RMSD values shown in Table 4). In many cases, the models show a lower all-atom RMSD than the equivalent models produced by MODELLER, but there are the notable exceptions of Phospholipase A2 (Phos) and the Response Regulator Receiver domain (Resp). In the case of Phospholipase A2 this is due to an insertion of a small section of alpha helical secondary structure flanked by two short loop regions not present in any of the templates. With the lack of any other restraint RAPPER tends to generate expanded loops, while MODELLER's molecular dynamics energy function tends to generate a more compact loop. In the Response Regulator Receiver domain, a section of alpha-helical secondary structure has an incorrect orientation due to a slight extension in one of the flanking loop regions. As with Phospholipase A2 RAPPER minimises the contacts in the flanking loop region which pushes the extension out, resulting in an incorrect orientation of the secondary structural element. For regions which have few short range contacts, RAPPER is provided with few restraints and builds poor models. This might be improved by using secondary structure predictions as restraints to provide a more directed search of the available conformational space.

Improving on the naïve use of restraints

We explored whether model accuracy can be improved by using multiple templates [30]. We did this by deriving the restraints in three different ways. First, templates were weighted according to their percentage amino acid sequence identity. The size of the restraint sphere derived from each template was varied in size in order to influence the frequency of sampling. This provided a significant improvement in the accuracy (see Tables 4, 5 and 6).

Table 6 All-Atom RMSD's for RAPPER Using Two Different Restraint Derivations Compared to Those for MODELLER.

Secondly, we incorporated information from two newly developed prediction programs as restraints. The first program, CHORAL [31], calculates the curvature and torsion of the main chain residues for each template. Sequences of residues with similar patterns of curvature and torsion are clustered together and scored against the target sequence using environmentally-constrained substitution tables. CHORAL constructs a set of non-overlapping, structurally conserved clusters, which best represent the main chain of the target. Weighting sections of templates by the CHORAL prediction in this way reduces the influence from inappropriate templates on main chain restraints. The second prediction program, ANDANTE [32], predicts the side chain χ angles from likely conservation of those in structures of homologues. These predictions can be used to limit the rotamer search space by RAPPER. The predictions from CHORAL and ANDANTE are presented to RAPPER as possibilities for each template residue defined by ellipsoidal restraints for Cα and side chain centroids. If no prediction is made, then all of the templates are used to generate the restraint ellipsoids. RAPPER models generated using CHORAL/ANDANTE predictions showed significant improvements in the modelling by RAPPER (see Tables 4 and 5).

Third, we defined restraints from homologues of known structure as 3-D probability density functions, using a local percentage sequence identity calculated over a window of 20 residues. While testing this approach it quickly became obvious that using the standard deviation of the PDF to define the radius of the ellipsoid for the side chain centroid was too restrictive as it prevented effective exploration of a range of rotamer states. Thus the side chain restraint sphere size was set as a default value. A significant improvement in modelling was seen by using restraints generated with a PDF, with P (P = 0.000061 and greater than 0.01) values using a paired means t-test. No overall significant improvement was made compared to MODELLER (P = 0.24 and greater than 0.01). There were a few cases where the PDF-derived restraints led to inaccuracies. For example, when building targets in the flavodoxins (Flav) family, a significant increase in all-atom RMSD comes from the PDF being overly influenced by templates with similar local PID's but significantly different structures (see Figure 2). A similar problem is also observed for the globins (Glob). We had already chosen templates in the relevant functional state, so it was not due to an injudicious choice of templates. For both the flavodoxins and globins arises from differences in conformations, particularly of loops, due to different environments in the crystals.

Figure 2
figure 2

Problems in Deriving PDF's for Flav Family. A. The superimposed templates in gray with the derived centres of the PDF's shown as yellow spheres. Note the divergent loop on the left. The target structure is shown in green. B. The resulting models from different modes of building in RAPPER: RAPPER-PDF in gray, RAPPER-CHORAL in blue, RAPPER-Standard in yellow and MODELLER in pink. The target structure is also shown in green.

Comparing NMR and comparative ensembles

Although NMR methods have led to the generation of ensembles, X-ray and comparative models have usually been presented as single conformers, though often multiple models are generated during the experimental or modelling process. An ensemble of multiple conformers captures more information, as it allows regions to be identified that are structurally variable, representing the intrinsic dynamics of the target structure or uncertainties in the modelling process. In order to examine this, we compared the ensemble generated by RAPPER for 1pvaa as target, using other structures from the α-parvalbumin family (Parv), to the experimentally determined NMR ensemble of the same protein.

The RAPPER and NMR ensembles, superimposed on the crystallographic model, are shown in Figure 3A. It can be seen that the two ensembles have similar features with respect to compactness and diversity in different regions of the polypeptide chain, with the comparative modelling ensemble closer to the crystal model than the NMR ensemble. In order to gain more insight into this observation, the mode of the distribution of RMSD for the two ensembles was calculated for each residue. The two curves (Figure 3B) are very similar as shown by a correlation coefficient of 0.66 when comparing the first derivative for each curve (Figure 3C). The fact that the RAPPER ensemble is more similar to the crystallographic model than the NMR ensemble can be seen when the all-atom RMSD is calculated for each of the models in the ensemble (Figure 3D). If the all-atom RMSDs are calculated for the two ensemble representative models, the RAPPER representative model is closer to the crystal structure than the equivalent representative model from the NMR ensemble. The representative model is, in the case of RAPPER, the geometric average of the ensemble, while in the case of NMR it is that chosen by the NMR spectroscopist on deposition to the PDB. Furthermore the RAPPER representative model is always much closer to the crystal structure than any of the individual models that make up the ensemble. The wider variability seen in the NMR ensemble may be due to compaction by crystal packing. Also the crystallographic model is a single time and space averaged representation of the protein in question. This representation may be inadequate in fully explaining the experimental data, especially at medium and lower resolutions [33].

Figure 3
figure 3

Comparison of RAPPER and NMR Ensembles to the Crystallographic Model. Comparison of a RAPPER ensemble of comparative models for the target 1PVA chain A from the Parvalbulmin family with an NMR ensemble, the crystal structure and the deposited representative NMR structure. A: The backbone trace of 9 models from the RAPPER ensemble (cyan) generated by comparative modelling on all targets and the equivalent models generated by NMR (blue). Also shown are the deposited crystal structure (red) and the representative NMR single model (orange). All models are superimposed with reference to the crystal structure. B: The plot of ensemble mean and mode for each residue in the RAPPER ensemble. C: The 1st derivative of the per residue ensemble mean for RAPPER (red) and the NMR ensemble (green). D: The all atom per residue RMSD for the RAPPER representative single model (red) compared to the equivalent single NMR representative model (green).

Conclusion

The differences between the comparative modelling protocol of RAPPER, the Cα-trace models, and most importantly modes that use an optimal restraint network based on knowledge of the target structure demonstrate that there is a limit to which we could hope to build a reliable model based solely on homologous templates using RAPPER. Nevertheless, the restraint networks based on differential geometry, pattern recognition and χ angle conservation described here are all shown to be useful approaches to introducing further structural information.

The application of RAPPER to comparative modelling provides an effective means of exploring the conformational space available to a target sequence. The use of different methods for defining restraints from homologous templates shows that better methods for generating positional restraints can greatly improve structure prediction. Generation of an ensemble of solutions that are consistent with both target sequence and knowledge derived from the template structures provides a more appropriate representation of a structural prediction than a single model.

As we have already demonstrated in generating conformers using low resolution X-ray data[21], RAPPER allows the testing of weak hypotheses and speculations about structures where the ratio of observations to parameters is low. For comparative modelling, where restraints derived from distant homologues or regions of divergent structure are often inaccurate, we have now shown that RAPPER can explore conformational space defined by restraints from varying combinations of templates or secondary structure predictions. This suggests that there might be advantage in integrating restraints derived from knowledge of homologous structures with restraints provided by sparse or low resolution experimental data. Thus information from structures of homologues could be of particular use in generating conformers consistent with low resolution X-ray electron density and electron microscopy density, NMR where there are insufficient observations and small angle X-ray scattering (SAXS). We are now investigating such applications, not only with RAPPER but also with RAPPER-TK [34], which can be used to model not only proteins but also other macromolecules and their complexes.

Methods

Modelling data set

In order to develop and test the approach twenty four families were chosen from the HOMSTRAD database [23], representing each of the four main SCOP classes (all α, all β, α + β and α/β). For each family five members were chosen based on maximizing the range of the relative percentage identity (PID) based on sequence (calculated by Malform [35]) and ensuring all the solved structures were of relatively high resolution (greater than 2Å). One member was designated as the target, with the rest acting as the templates. This allowed fifteen combinations of the templates exploiting one to four homologues as targets, so reflecting information from homologues across the range of PID. The data set was sub-divided into three. The first consisted of four families that were used to define the default parameters. The restraint defaults for main chain and side chain restraint sphere size were chosen by iteratively reducing the radii in a combinatorial manner until RAPPER was unable to generate a model. The second set comprising a further six families was used to generate all 15 combinations of template to target. The third set comprises all of the chosen families and were used to test alternative approaches to defining restraints. Table 1 shows the families and their constituent members. The possible combinations of target to templates are given in Table 2. Each of the combinations, including the target, were structurally aligned using COMPARER [36] and annotated by JOY [37]. The resulting alignments were manually corrected, resulting in the best possible alignment and thus minimising any error from an incorrect alignment.

Modelling procedure for RAPPER

The application of the conformational search engine RAPPER to comparative modelling by satisfaction of spatial restraints was achieved by extending the restraint engine as described for solving the Cα trace problem [19]. From the given alignment a structural superimposition of equivalent residues is made and optimised. A common core was defined from the set of aligned protein structures as the subset of equivalent residue atoms with relatively little structural variation as defined by the Altman-Gerstein algorithm [38] and implemented in RAPPER. Based on this superimposition and alignment, spatial restraints can then be described for each residue of the target sequence. There are four types of spatial restraint:

  1. 1

    – As RAPPER builds from the N to C termini a bootstrap restraint is required to allow modelling to commence. The bootstrap is defined as the mean position of the Cβ coordinates from the templates, which is made the centre of a restraint sphere, the size of which is user-defined. In building the first two residues a position of the first residue Cβ is taken at a random offset from the mean Cβ coordinate position of the equivalent Cβ of the templates. From this the remaining backbone atom positions can be calculated from the ideal Engh and Huber [39] bond angles and lengths implicit in the RAPPER protein model. A ψ angle is then randomly picked from high-grained residue specific ϕ/ψ propensity tables as well as a random angle for the vector between the first and the second Cβ position. Thus the first peptide bond is generated.

  2. 2

    – A set of spatial restraints is defined for the backbone (main chain) atoms, principally the Cα atoms. Each is defined as an ellipsoid generated from the union of the set of restraint spheres centred on the equivalent atom position from each of the templates, as defined in equation 1. The size of these spheres is user defined.

    ‖ p → − O → ‖ ≤ r MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaauqaaeaacuWGWbaCgaWcaaGaayzcSdGaeyOeI0YaauGaaeaacuWGpbWtgaWcaaGaayPcSdGaeyizImQaemOCaihaaa@3614@
    (1)

where p → MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGafmiCaaNbaSaaaaa@2D4E@ is the position of the Cα atom, O → MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGafm4ta8KbaSaaaaa@2D0C@ is the centre of the restraint sphere with radius r.

  1. 3

    – A similar set of spherical restraints can be defined for the side chain atoms, except that, rather than taking each atom separately, a virtual centroid (as defined in equation 2) of the side chain is calculated and this position is used to centre the restraint sphere. In fact two virtual centroid positions are calculated: a short virtual centroid position which essentially takes into account the atoms up to and including the Cγ position and a long virtual centroid position which accounts for the rest of the side chain.

    ‖ ( ∑ i N s c p → i ) / N s c − O → ‖ ≤ r MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaauqaaeaadaqbcaqaamaalyaabaWaaeWaaeaadaaeWbqaaiqbdchaWzaalaWaaSbaaSqaaiabdMgaPbqabaaabaGaemyAaKgabaGaemOta40aaSbaaWqaaiabdohaZjabdogaJbqabaaaniabggHiLdaakiaawIcacaGLPaaaaeaacqWGobGtdaWgaaWcbaGaem4CamNaem4yamgabeaaaaGccqGHsislcuWGpbWtgaWcaaGaayPcSdaacaGLjWoacqGHKjYOcqWGYbGCaaa@44FF@
    (2)

where N sc is the number of side chain atoms

  1. 4

    – A set of spatial restraints is derived for secondary structure elements. Residues are defined to be in elements of secondary structure from consideration of the consensus across the template structures or from secondary structure prediction. The restraints are a combination of restricted ϕ/ψ sampling of the residue specific ϕ/ψ propensity tables to the alpha helical or beta sheet regions of ϕ/ψ space and short range hydrogen bonding distance restraints. Only short range hydrogen bonding is enforced and this primarily in alpha helical regions, although we have now developed algorithms for including more long range restraints (A Karmali and N Furnham, unpublished data).

As well as the specific restraints from homologues, a number of other restraints are also enforced including clash restraints against the framework structure as it is built and distance restraints from ideal bond angles, bond lengths and omega torsion angles. All of the restraints can be propagated along the chain for a user defined distance.

The standard building process in RAPPER as described previously is used [18, 19]. Briefly, the algorithm employs a branch and bound protocol to extend iteratively the polypeptide chain in the N to C-terminal direction. A population of 100 fragments that make up the growing polypeptide chain is maintained, with a maximum of 100,000 attempts to find the 100 solutions to the restraint network at each residue position. As some residues are in rare ϕ/ψ conformations this may still be insufficient to sample effectively the ϕ/ψ space. Thus, to optimise the time spent searching the target sequence is split into a number of fragments, avoiding regions where there is no template information available, but otherwise randomly. A population of 50 models is produced for each target. The geometric average of the model population is calculated in RAPPER. The resultant single model is then re-geometrised by TINKER [40]. The protocol is summarised in Figure 4.

Figure 4
figure 4

Schematic of RAPPER Conformer Generation Applied to Comparative Modelling.

Models were constructed using this standard comparative modelling mode. In each round of building 2Å spheres where enforced for the bootstrap, Cα main chain and side chain restraints. These values were determined from the subset of four families used to parameterise the modelling procedure. This parameterisation was achieved by iterative rounds of building adjusting each of the parameters in a combinatorial approach, starting from a large value and gradually decreasing in 0.5Å increments till the restraints were too strict for a model to be built. The last round where the model could be successfully generated was taken as the optimal parameters.

RAPPER sampling by PID

The results of modelling using all the templates demonstrate that the approach would benefit from restricting the available search area. This can be simply achieved by weighting towards the restraints derived from the template with the highest PID to the target, which is accomplished by reducing, based on the PID of the template to target, the relative size of the restraint spheres. The range of PID across the available templates is calculated and is divided into four equal sub-ranges. If the PID of the template lies in the top quartile then the user defined restraint sphere radius is enforced. If the PID of the template lies in one of the other three quartiles, then the restraint sphere is reduced by a corresponding factor, with the restraint spheres generated from the template whose PID lies in the lowest quartile being reduced by 60%. In addition the sampling frequency of the restraint sphere generated from the template with the highest PID is enhanced.

RAPPER using probability density function derived restraints

More distantly related homologous structures can be exploited if restraints are formulated as probability density functions (PDF). The position of each atom (or centroid for side chains) can be used to centre a probability function described as a Gaussian distribution, the mean of which is the atom position and the variance is the local PID taken over a window of 20 residues as a

P D F i = A e − ( x − x 1 ) 2 2 σ 1 2 2 π MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemiuaaLaemiraqKaemOray0aaSbaaSqaaiabdMgaPbqabaGccqGH9aqpjuaGdaWcaaqaaiabdgeabjabdwgaLnaaCaaabeqaaiabgkHiTmaalaaabaGaeiikaGIaemiEaGNaeyOeI0IaemiEaG3aaSbaaeaacqaIXaqmaeqaaiabcMcaPmaaCaaabeqaaiabikdaYaaaaeaacqaIYaGmiiGacqWFdpWCdaqhaaqaaGqaaiab+fdaXaqaaiab+jdaYaaaaaaaaaqaamaakaaabaGaeGOmaiJae8hWdahabeaaaaaaaa@4547@
(3)

where i is the position in the template sequence, x1 is Cα position of the template and σ12 is inversely proportional to the PID of the template. The sum of the distributions of each of the homologous atom positions is calculated and normalised to generate a PDF (equation 4).

P ( x ) = ∫ − ∞ ∞ P D F i ( t 1 ) + P D F i ( t 2 ) + P D F i ( t 3 ) + ... MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemiuaaLaeiikaGIaemiEaGNaeiykaKIaeyypa0Zaa8qmaeaacqWGqbaucqWGebarcqWGgbGrdaWgaaWcbaGaemyAaKgabeaakiabcIcaOiabdsha0jabigdaXiabcMcaPiabgUcaRiabdcfaqjabdseaejabdAeagnaaBaaaleaacqWGPbqAaeqaaOGaeiikaGIaemiDaqNaeGOmaiJaeiykaKIaey4kaSIaemiuaaLaemiraqKaemOray0aaSbaaSqaaiabdMgaPbqabaGccqGGOaakcqWG0baDcqaIZaWmcqGGPaqkcqGHRaWkcqGGUaGlcqGGUaGlcqGGUaGlaSqaaiabgkHiTiabg6HiLcqaaiabg6HiLcqdcqGHRiI8aaaa@57A4@
(4)

where x is the coordinate in question and t is the template. This is done for each of x, y and z coordinates. The resulting mean position of the combined PDF is taken as the centre of the restraint sphere, the radius of which can either be user defined or defined by the standard deviation of the new distribution for each coordinate, which can then be used to define an ellipsoid (see Figure 5).

Figure 5
figure 5

Centres of PDF's Compared to the Target. PDF's and target for the Ltn family. The centres of each PDF shown as a space filled sphere with the ribbon trace of the target in red. Note that the size of the sphere does not represent the size of the PDF sphere enforced in RAPPER.

RAPPER using CHORAL/ANDANTE predictions

An alternative approach to defining restraints based upon information from homologous structures can be achieved by taking advantage of the predictions of two programs: CHORAL [31] and ANDANTE [32]. CHORAL, an amalgam of differential geometry and pattern recognition algorithms, identifies the clusters of conformers from homologous templates with conserved curvature and torsion that are most likely to represent the core backbone of the target structure. ANDANTE uses environmental-specific substitution probabilities to predict where χ1, χ1 plus χ2, or χ1 plus χ2 plus χ3 can be directly used from a single template to limit the rotamer search space. Thus, RAPPER uses the equivalent template residue(s) predicted to contribute either to the target's core backbone or side chain conformations to generate the restraint network. For example, if CHORAL predicts that residue i in the target sequence will have similar backbone conformations to the equivalent residues of template 1 and template 2, the Cα atoms of these two templates are used as the centres of the main chain restraint spheres. Similarly, where ANDANTE predicts that the χ1 plus χ2 of template 2 is most likely to be conserved in the target, the short virtual centroid position is used as the centre of the short side chain restraint sphere. RAPPER then builds through this restraint network in the same manner as the standard method for restraint derivation.

For each target the protocol in the standard comparative modelling procedure is used to produce an ensemble of 50 models; the arithmetic mean is taken and the structure re-geometrised using TINKER [40]. The approach of using CHORAL/ANDANTE predictions allowed tighter restraints of 1Ã… radius to be universally enforced for both main chains and side chains. Where CHORAL or ANDANTE did not predict conformations for a residue i.e. a variable loop region or where there was no prediction of side chain rotamer, all of the templates were used to generate the restraint network with the larger 2Ã… radius. The restraint sphere radius in the interface between the conserved core and non-conserved region for the backbone was "funnelled" at the end of the conserved core region (gradually increasing from 1Ã… to 2Ã…) and the beginning of the next conserved core region (gradually decreasing from 2Ã… to 1Ã…). This provided continuity in the main chain restraint network, ensuring no unrealistic distances were required to be satisfied.

Baseline Modelling

In addition to the basic comparative mode of RAPPER, further models were constructed in order to estimate the limitations of the method. For example we used the Cα trace mode of RAPPER [19] to rebuild the target based on experimentally observed co-ordinates. We also exploited restraints from secondary structure information, using the actual atomic positions of the Cα atoms of the experimentally resolved target to define the restraint network. Alternatively the template with the minimum distance from its Cα to that of the target was used while ensuring that this was consistent with the previous restraint sphere centre by approximately a Cα-Cα bonds length to define restraints.

Other modelling programs

The targets were also built using the well established comparative modelling program: MODELLER [41]. Ten models were produced by MODELLER using the standard model-building routine. A single model was automatically selected based on the average between the minimal energy as calculated by MODELLER and minimal steric violations.

References

  1. Blundell T, Johnson LN: Protein Crystallography. Academic Press; 1976.

    Google Scholar 

  2. Wuthrich K: Protein structure determination in solution by NMR spectroscopy. J Biol Chem 1990, 265(36):22059–22062.

    CAS  Google Scholar 

  3. Fiser A: Protein structure modeling in the proteomics era. Expert Rev Proteomics 2004, 1(1):97–110.

    Article  CAS  Google Scholar 

  4. Bonneau R, Baker D: Ab initio protein structure prediction: Progress and prospects. Annu Rev Biophys Biomolec Struct 2001, 30: 173–189.

    Article  CAS  Google Scholar 

  5. Bajaj M, Blundell T: Evolution and the tertiary structure of proteins. Annu Rev Biophys Bioeng 1984, 13: 453–492.

    Article  CAS  Google Scholar 

  6. Topham CM, Thomas P, Overington JP, Johnson MS, Eisenmenger F, Blundell TL: An assessment of COMPOSER: a rule-based approach to modelling protein structure. Biochem Soc Symp 1990, 57: 1–9.

    CAS  Google Scholar 

  7. Bates PA, Kelley LA, MacCallum RM, Sternberg MJ: Enhancement of protein modeling by human intervention in applying the automatic programs 3D-JIGSAW and 3D-PSSM. Proteins 2001, Suppl 5: 39–46.

    Article  CAS  Google Scholar 

  8. Guex N, Peitsch MC: SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis 1997, 18(15):2714–2723.

    Article  CAS  Google Scholar 

  9. Sali A, Blundell TL: Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 1993, 234(3):779–815.

    Article  CAS  Google Scholar 

  10. Ginalski K: Comparative modeling for protein structure prediction. Curr Opin Struct Biol 2006, 16(2):172–177.

    Article  CAS  Google Scholar 

  11. Zhang Y, Arakaki AK, Skolnick J: TASSER: an automated method for the prediction of protein tertiary structures in CASP6. Proteins 2005, 61 Suppl 7: 91–98.

    Article  Google Scholar 

  12. Kolinski A, Bujnicki JM: Generalized protein structure prediction based on combination of fold-recognition with de novo folding and evaluation of models. Proteins-Structure Function and Bioinformatics 2005, 61: 84–90.

    Article  CAS  Google Scholar 

  13. Kryshtafovych A, Venclovas C, Fidelis K, Moult J: Progress over the first decade of CASP experiments. Proteins 2005, 61 Suppl 7: 225–236.

    Article  Google Scholar 

  14. Moult J: A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr Opin Struct Biol 2005, 15(3):285–289.

    Article  CAS  Google Scholar 

  15. Contreras-Moreira B, Ezkurdia I, Tress ML, Valencia A: Empirical limits for template-based protein structure prediction: the CASP5 example. FEBS Lett 2005, 579(5):1203–1207.

    Article  CAS  Google Scholar 

  16. de Bakker PI, Furnham N, Blundell TL, Depristo MA: Conformer generation under restraints. Curr Opin Struct Biol 2006.

    Google Scholar 

  17. de Bakker PI, DePristo MA, Burke DF, Blundell TL: Ab initio construction of polypeptide fragments: Accuracy of loop decoy discrimination by an all-atom statistical potential and the AMBER force field with the Generalized Born solvation model. Proteins 2003, 51(1):21–40.

    Article  CAS  Google Scholar 

  18. DePristo MA, de Bakker PI, Lovell SC, Blundell TL: Ab initio construction of polypeptide fragments: efficient generation of accurate, representative ensembles. Proteins 2003, 51(1):41–55.

    Article  CAS  Google Scholar 

  19. DePristo MA, De Bakker PI, Shetty RP, Blundell TL: Discrete restraint-based protein modeling and the Calpha-trace problem. Protein Sci 2003, 12(9):2032–2046.

    Article  CAS  Google Scholar 

  20. DePristo MA, de Bakker PI, Johnson RJ, Blundell TL: Crystallographic refinement by knowledge-based exploration of complex energy landscapes. Structure 2005, 13(9):1311–1319.

    Article  CAS  Google Scholar 

  21. Furnham N, Dore AS, Chirgadze DY, de Bakker PI, Depristo MA, Blundell TL: Knowledge-based real-space explorations for low-resolution structure determination. Structure 2006, 14(8):1313–1320.

    Article  CAS  Google Scholar 

  22. Sali A, Overington JP, Johnson MS, Blundell TL: From Comparisons of Protein Sequences and Structures to Protein Modeling and Design. Trends in Biochemical Sciences 1990, 15(6):235–240.

    Article  CAS  Google Scholar 

  23. Mizuguchi K, Deane CM, Blundell TL, Overington JP: HOMSTRAD: A database of protein structure alignments for homologous families. Protein Sci 1998, 7(11):2469–2471.

    Article  CAS  Google Scholar 

  24. Baig I, Bertini I, Del Bianco C, Gupta YK, Lee YM, Luchinat C, Quattrone A: Paramagnetism-based refinement strategy for the solution structure of human alpha-parvalbumin. Biochemistry 2004, 43(18):5562–5573.

    Article  CAS  Google Scholar 

  25. Eyal E, Gerzon S, Potapov V, Edelman M, Sobolev V: The limit of accuracy of protein modeling: influence of crystal packing on protein structure. J Mol Biol 2005, 351(2):431–442.

    Article  CAS  Google Scholar 

  26. Rogen P, Fain B: Automatic classification of protein structure by using Gauss integrals. Proc Natl Acad Sci U S A 2003, 100(1):119–124.

    Article  CAS  Google Scholar 

  27. Zhang Y, Skolnick J: Scoring function for automated assessment of protein structure template quality. Proteins 2004, 57(4):702–710.

    Article  CAS  Google Scholar 

  28. Zemla A, Venclovas C, Moult J, Fidelis K: Processing and analysis of CASP3 protein structure predictions. Proteins 1999, Suppl 3: 22–29.

    Article  CAS  Google Scholar 

  29. Siew N, Elofsson A, Rychlewski L, Fischer D: MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics 2000, 16(9):776–785.

    Article  CAS  Google Scholar 

  30. Burke DF, Deane CM, Nagarajaram HA, Campillo N, Martin-Martinez M, Mendes J, Molina F, Perry J, Reddy BV, Soares CM, Steward RE, Williams M, Carrondo MA, Blundell TL, Mizuguchi K: An iterative structure-assisted approach to sequence alignment and comparative modeling. Proteins 1999, Suppl 3: 55–60.

    Article  CAS  Google Scholar 

  31. Montalvao RW, Smith RE, Lovell SC, Blundell TL: CHORAL: a differential geometry approach to the prediction of the cores of protein structures. Bioinformatics 2005.

    Google Scholar 

  32. Smith RE, Lovell SC, Burke DF, Montalvao RW, Blundell TL: Andante: Reducing side-chain rotamer search space during comparative modeling using environment-specific substitution probabilities. Bioinformatics 2007.

    Google Scholar 

  33. Furnham N, Blundell TL, Depristo MA, Terwilliger TC: Is one solution good enough? Nat Struct Mol Biol 2006, 13(3):184–185.

    Article  CAS  Google Scholar 

  34. Gore SP, Karmali AM, Blundell TL: Rappertk: a versatile engine for discrete restraint-based conformational sampling of macromolecules. Bmc Structural Biology 2007., 7:

    Google Scholar 

  35. Clark SP: MALIGNED: a multiple sequence alignment editor. Comput Appl Biosci 1992, 8(6):535–538.

    CAS  Google Scholar 

  36. Zhu ZY, Sali A, Blundell TL: A Variable Gap Penalty-Function and Feature Weights for Protein 3-D Structure Comparisons. Protein Eng 1992, 5(1):43–51.

    Article  CAS  Google Scholar 

  37. Mizuguchi K, Deane CM, Blundell TL, Johnson MS, Overington JP: JOY: protein sequence-structure representation and analysis. Bioinformatics 1998, 14(7):617–623.

    Article  CAS  Google Scholar 

  38. Gerstein M, Altman RB: Average Core Structures and Variability Measures for Protein Families - Application to the Immunoglobulins. J Mol Biol 1995, 251(1):161–175.

    Article  CAS  Google Scholar 

  39. Engh RA, Huber R: Accurate Bond and Angle Parameters for X-Ray Protein-Structure Refinement. Acta Crystallogr Sect A 1991, 47: 392–400.

    Article  Google Scholar 

  40. Ponder JW, Richards FM: An Efficient Newton-Like Method for Molecular Mechanics Energy Minimization of Large Molecules. J Comput Chem 1987, 8(7):1016–1024.

    Article  CAS  Google Scholar 

  41. Fiser AS, Sali A: MODELLER: Generation and refinement of homology-based protein structure models. Methods in Enzymology. Macromolecular Crystallography, Pt D 2003, 374: 461–491.

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Rinaldo Wander Montalvao and Rick Smith for their help with CHORAL and ANDANTE respectively, as well as Mark DePristo for his help and insightful comments during the development of RAPPER. NF was supported by BBSRC studentships. PIWDB was supported by the Cambridge European Trust, the Isaac Newton Trust and the BBSRC. SG was supported by the Cambridge Commonwealth Trust and Universities UK ORS studentship. DFB was supported by a grant from the Wellcome Trust.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicholas Furnham.

Additional information

Authors' contributions

NF conducted the data acquisition processing and developed (with PIWDB) the comparative modelling mode of RAPPER. SG provided the PDF code and programming support. DFB participated in study design and aided in the analysis. TLB conceived of the study, participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Furnham, N., de Bakker, P.I., Gore, S. et al. Comparative modelling by restraint-based conformational sampling. BMC Struct Biol 8, 7 (2008). https://doi.org/10.1186/1472-6807-8-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1472-6807-8-7

Keywords