Comparative modelling by restraint-based conformational sampling

Background Although comparative modelling is routinely used to produce three-dimensional models of proteins, very few automated approaches are formulated in a way that allows inclusion of restraints derived from experimental data as well as those from the structures of homologues. Furthermore, proteins are usually described as a single conformer, rather than an ensemble that represents the heterogeneity and inaccuracy of experimentally determined protein structures. Here we address these issues by exploring the application of the restraint-based conformational space search engine, RAPPER, which has previously been developed for rebuilding experimentally defined protein structures and for fitting models to electron density derived from X-ray diffraction analyses. Results A new application of RAPPER for comparative modelling uses positional restraints and knowledge-based sampling to generate models with accuracies comparable to other leading modelling tools. Knowledge-based predictions are based on geometrical features of the homologous templates and rules concerning main-chain and side-chain conformations. By directly changing the restraints derived from available templates we estimate the accuracy limits of the method in comparative modelling. Conclusion The application of RAPPER to comparative modelling provides an effective means of exploring the conformational space available to a target sequence. Enhanced methods for generating positional restraints can greatly improve structure prediction. Generation of an ensemble of solutions that are consistent with both target sequence and knowledge derived from the template structures provides a more appropriate representation of a structural prediction than a single model. By formulating homologous structural information as sets of restraints we can begin to consider how comparative models might be used to inform conformer generation from sparse experimental data.


Background
The three-dimensional (3D) structures of proteins provide valuable insights into their biochemical activities and biological functions. The most widely used experimental methods for determining 3D structures, X-ray crystallography and nuclear magnetic resonance (NMR), have limitations in both time and tractability. For X-ray crystallography sufficient quantities of purified proteins may be difficult to produce and to crystallize when obtained [1]. For NMR, proteins are often too large or insufficiently soluble to be tractable [2]. Nevertheless, genome sequencing projects create a continuing need to translate sequence information into structure [3].
Where experimental methods are problematic, theoretical models can often provide valuable information about the structure of interest. Methods that use physical and chemical properties of amino acids together with information about small fragments of already solved structures have had success with smaller proteins but are still limited in accuracy and reliability [4]. However, knowledge-based methods, such as comparative modelling, which exploit information about amino acid substitutions that accumulate during divergent evolution and are compatible with preserving folded state and function [5], have been the most successful in producing good quality models. Comparative modelling approaches, which can be broadly classified as fragment-based, for example COMPOSER [6], 3D-JIGSAW [7] and SWISS-MODEL [8], or restraintbased, for example MODELLER [9] continue to improve [10]. The latest approaches, for example TASSER, use a combination of threading and restraint optimisation by sampling conformational restraints using Monte Carlo methods [11]. Other protocols use Monte Carlo searches in a reduced space determined by restraints from multiple templates and fragments generated from a consensus of results from a number of modelling programs [12]. Nevertheless, recent CASP exercises [13,14] have demonstrated little significant improvement and have identified empirical limits for knowledge-based protein structure prediction, even when the problem of incorrect alignment has been eliminated [15].
We have previously applied the restraint based conformational search engine RAPPER [16] to a number of protein modelling problems where partial structural information was available, including ab initio loop modelling [17,18], Cα tracing [19] and modelling into electron density from X-ray crystallographic experiments [20,21]. Here we develop the approach for comparative modelling, focusing on sampling ϕ/ψ torsion angles under spatial restraints derived from knowledge of homologous structures. We assess the limitations of the method by comparing the use of spatial restraints derived from the homologous template structures with that using restraints derived from experimentally defined structures of targets. We show that significant improvements in model accuracy can be achieved by incorporating additional restraints from main chain curvature and torsion and as side chain χ angle conservation derived from the structures of homologues. By generating an ensemble of solutions consistent with both the target sequence and template structures we provide a more appropriate representation of the structure. By formulating homologous structural information as sets of restraints we can begin to consider how comparative models might be used to inform conformer generation from sparse experimental data [22].

Results and Discussion
We explored a number of different modes of modelling using RAPPER. The principal differences between these modes lie in the information used from the templates to derive the restraints. In order to minimise problems arising from inaccuracy of sequence alignment, we used structure-based alignments from the HOMSTRAD database [23]. For each of 10 targets, models were generated for one member of the family using four homologues, constructing fifteen models using all possible combinations of four, three, two and one homologue(s) as templates. For comparison, models were also built using the standard modelling mode in MODELLER. This combinatorial approach allowed RAPPER to be parameterised and the performance assessed against a variety of templates of varying sequence identity. In order to assess the usefulness of different restraints described below, models were generated for a greater number of targets using a more limited subset of templates based on percentage sequence identity of target and template. Again, for comparison purposes, models were also generated using MODELLER (See Tables  1, 2, 3, 4, 5).

Deriving theoretically optimal restraints
As a control, we also modelled the target structure based on the experimental Cα positions using RAPPER (as previously described [19]); this provides an upper bound on the quality of the models obtainable using Cα coordinates alone from the homologous templates. The root-meansquare deviations (RMSDs) of these models from the corresponding experimentally determined structures are similar in magnitude to the experimental variation in solution structures determined by NMR. For example, the solution structure of α-parvalbumin has an all-atom RMSD of 1.02 ± 0.08Å (excluding the five first and last residues) [24]. Models built using the Cα trace mode of RAP-PER -guided by Cα atom coordinates derived from experimental structures -have loop regions with up to 1Å RMSD and organised secondary structural elements with up to 0.5-0.6Å RMSD from the parent structure. A significant proportion of the difference may result from differ-ent crystal packing in the target structure and that of the homologues used in the modelling. [25]. The remainder probably represents errors introduced by (imperfect) restraints from homologous structures.
We calculated the RMSD at each residue in order to identify large local errors which can have an undue influence on the overall RMSD [26]. We calculated other measures such as TM [27], GDT [28] and MaxSub [29] scores as well as the overall RMSD but all failed to identify local regions of inaccuracy in the model. This is illustrated by models for the glycosyl hydrolase family 22 (Ghf22) protein family; Figure 1 shows that the last three residues contribute most to the overall RMSD and this is due to a hook-like conformation of the three C-terminal residues in all available templates, which is not present in the experimental target structure, perhaps due to crystal packing.
Next, we generated the best possible comparative models but now using optimal spatial restraints for each residue. These target structures were superimposed on those of the templates in order to ascertain, for each residue, which template Cα atom is closest to the target. The coordinates of these atoms were then used as the centres for the restraint spheres in an analogous way to the Cα trace mode of RAPPER previously developed for X-ray refinement [19].
Models based on restraints derived from the closest available template (defined by percentage sequence identity) are often close in accuracy to those defined by the Cαtrace model based on the actual structure (all-atom RMSD values shown in Table 4). In many cases, the models show To explore the effect of the number and diversity of templates on RAPPER building using the Standard mode of conformer generation, fifteen combinations as shown in the table where used. Template 1 was closest by PID to the target with Template 4 the most remote.
a lower all-atom RMSD than the equivalent models produced by MODELLER, but there are the notable exceptions of Phospholipase A2 (Phos) and the Response Regulator Receiver domain (Resp). In the case of Phospholipase A2 this is due to an insertion of a small section of alpha helical secondary structure flanked by two short loop regions not present in any of the templates. With the lack of any other restraint RAPPER tends to generate expanded loops, while MODELLER's molecular dynamics energy function tends to generate a more compact loop. In the Response Regulator Receiver domain, a section of alpha-helical secondary structure has an incorrect orienta-tion due to a slight extension in one of the flanking loop regions. As with Phospholipase A2 RAPPER minimises the contacts in the flanking loop region which pushes the extension out, resulting in an incorrect orientation of the secondary structural element. For regions which have few short range contacts, RAPPER is provided with few restraints and builds poor models. This might be improved by using secondary structure predictions as restraints to provide a more directed search of the available conformational space.

Improving on the naïve use of restraints
We explored whether model accuracy can be improved by using multiple templates [30]. We did this by deriving the restraints in three different ways. First, templates were weighted according to their percentage amino acid sequence identity. The size of the restraint sphere derived from each template was varied in size in order to influence the frequency of sampling. This provided a significant improvement in the accuracy (see Tables 4, 5 and 6).
Secondly, we incorporated information from two newly developed prediction programs as restraints. The first program, CHORAL [31], calculates the curvature and torsion of the main chain residues for each template. Sequences of residues with similar patterns of curvature and torsion are clustered together and scored against the target sequence using environmentally-constrained substitution tables. CHORAL constructs a set of non-overlapping, structurally conserved clusters, which best represent the main chain of the target. Weighting sections of templates by the CHO-RAL prediction in this way reduces the influence from inappropriate templates on main chain restraints. The sec- Members of the homologous families (given by their Homstrad abbreviation) used to model the target are shown with the target and its PDB code in the first column. The percentage sequence identity and the PDB code of the four templates used to model on is given in the subsequent columns. The four families used to parameterise are indicated by *, while the ten families that were used to explore the effect of PID are indicated by §.   (100) The all-atom RMSD in Ångstrom's for RAPPER models built in the standard mode for ten families. The combinations of template to target are as indicated in Table 3.2. To provide a fair comparison the percentage of residues used in the RMSD calculation is given in parentheses.
ond prediction program, ANDANTE [32], predicts the side chain χ angles from likely conservation of those in structures of homologues. These predictions can be used to limit the rotamer search space by RAPPER. The predictions from CHORAL and ANDANTE are presented to RAP-PER as possibilities for each template residue defined by ellipsoidal restraints for Cα and side chain centroids. If no prediction is made, then all of the templates are used to generate the restraint ellipsoids. RAPPER models generated using CHORAL/ANDANTE predictions showed significant improvements in the modelling by RAPPER (see Tables 4 and 5).
Third, we defined restraints from homologues of known structure as 3-D probability density functions, using a local percentage sequence identity calculated over a window of 20 residues. While testing this approach it quickly became obvious that using the standard deviation of the PDF to define the radius of the ellipsoid for the side chain centroid was too restrictive as it prevented effective exploration of a range of rotamer states. Thus the side chain restraint sphere size was set as a default value. A significant improvement in modelling was seen by using restraints generated with a PDF, with P (P = 0.000061 and greater than 0.01) values using a paired means t-test. No The all-atom RMSD given in Ångstrom's for RAPPER models built using A. Cα-trace mode; B. Using optimal spatial restraints mode using based on all templates; C. stranded restraint derivation using just the closest template by PID; D. stranded restraint derivation using all templates; E. restraints predicted by CHORAL and ANDANTE using all templates; F. using restraints based on the PID of the templates using all templates; G. using MODELLER based on just the closest templates and H. using MODELLER based on all the templates. overall significant improvement was made compared to MODELLER (P = 0.24 and greater than 0.01). There were a few cases where the PDF-derived restraints led to inaccuracies. For example, when building targets in the flavodoxins (Flav) family, a significant increase in all-atom RMSD comes from the PDF being overly influenced by templates with similar local PID's but significantly different structures (see Figure 2). A similar problem is also observed for the globins (Glob). We had already chosen templates in the relevant functional state, so it was not due to an injudicious choice of templates. For both the flavodoxins and globins arises from differences in conformations, particularly of loops, due to different environments in the crystals.

Comparing NMR and comparative ensembles
Although NMR methods have led to the generation of ensembles, X-ray and comparative models have usually been presented as single conformers, though often multiple models are generated during the experimental or modelling process. An ensemble of multiple conformers captures more information, as it allows regions to be identified that are structurally variable, representing the intrinsic dynamics of the target structure or uncertainties in the modelling process. In order to examine this, we compared the ensemble generated by RAPPER for 1pvaa as target, using other structures from the α-parvalbumin family (Parv), to the experimentally determined NMR ensemble of the same protein.
The RAPPER and NMR ensembles, superimposed on the crystallographic model, are shown in Figure 3A. It can be seen that the two ensembles have similar features with respect to compactness and diversity in different regions of the polypeptide chain, with the comparative modelling ensemble closer to the crystal model than the NMR ensemble. In order to gain more insight into this observation, the mode of the distribution of RMSD for the two ensembles was calculated for each residue. The two curves ( Figure 3B) are very similar as shown by a correlation coefficient of 0.66 when comparing the first derivative for each curve ( Figure 3C). The fact that the RAPPER ensemble is more similar to the crystallographic model than the NMR ensemble can be seen when the all-atom RMSD is calculated for each of the models in the ensemble ( Figure  3D). If the all-atom RMSDs are calculated for the two ensemble representative models, the RAPPER representative model is closer to the crystal structure than the equivalent representative model from the NMR ensemble. The representative model is, in the case of RAPPER, the geometric average of the ensemble, while in the case of NMR it is that chosen by the NMR spectroscopist on deposition to the PDB. Furthermore the RAPPER representative model is always much closer to the crystal structure than any of the individual models that make up the ensemble. The wider variability seen in the NMR ensemble may be due to compaction by crystal packing. Also the crystallographic model is a single time and space averaged representation of the protein in question. This representation The statistical significance of the differences in all-atom RMSD between the modelling methods as reported in Table 3.4. The differences are assessed using a t-test of a paired sample between two means. The truth logic given is for p values of 0.01 and 0.05. may be inadequate in fully explaining the experimental data, especially at medium and lower resolutions [33].

Conclusion
The differences between the comparative modelling protocol of RAPPER, the Cα-trace models, and most importantly modes that use an optimal restraint network based on knowledge of the target structure demonstrate that there is a limit to which we could hope to build a reliable model based solely on homologous templates using RAP-PER. Nevertheless, the restraint networks based on differential geometry, pattern recognition and χ angle conservation described here are all shown to be useful approaches to introducing further structural information.
The application of RAPPER to comparative modelling provides an effective means of exploring the conforma-tional space available to a target sequence. The use of different methods for defining restraints from homologous templates shows that better methods for generating positional restraints can greatly improve structure prediction. Generation of an ensemble of solutions that are consistent with both target sequence and knowledge derived from the template structures provides a more appropriate representation of a structural prediction than a single model.
As we have already demonstrated in generating conformers using low resolution X-ray data [21], RAPPER allows the testing of weak hypotheses and speculations about structures where the ratio of observations to parameters is low. For comparative modelling, where restraints derived from distant homologues or regions of divergent structure are often inaccurate, we have now shown that RAPPER Contribution to Overall RMSD by Individual Residue Figure 1 Contribution to Overall RMSD by Individual Residue. The per-residue all-atom RMSD for models generated by RAP-PER (solid red) and MODELLER (dotted green) for a target of the Ghf22 family. The greatest contribution to the overall RMSD can be seen to be from the C-terminal residues. If these are excluded from the overall RMSD the recalculated RMSD is the same for both modelling procedures.
can explore conformational space defined by restraints from varying combinations of templates or secondary structure predictions. This suggests that there might be advantage in integrating restraints derived from knowledge of homologous structures with restraints provided by sparse or low resolution experimental data. Thus information from structures of homologues could be of particular use in generating conformers consistent with low resolution X-ray electron density and electron microscopy density, NMR where there are insufficient observations and small angle X-ray scattering (SAXS). We are now investigating such applications, not only with RAPPER but also with RAPPER-TK [34], which can be used to model not only proteins but also other macromolecules and their complexes.

Modelling data set
In order to develop and test the approach twenty four families were chosen from the HOMSTRAD database [23], representing each of the four main SCOP classes (all α, all β, α + β and α/β). For each family five members were chosen based on maximizing the range of the relative percentage identity (PID) based on sequence (calculated by Malform [35]) and ensuring all the solved structures were of relatively high resolution (greater than 2Å). One mem-ber was designated as the target, with the rest acting as the templates. This allowed fifteen combinations of the templates exploiting one to four homologues as targets, so reflecting information from homologues across the range of PID. The data set was sub-divided into three. The first consisted of four families that were used to define the default parameters. The restraint defaults for main chain and side chain restraint sphere size were chosen by iteratively reducing the radii in a combinatorial manner until RAPPER was unable to generate a model. The second set comprising a further six families was used to generate all 15 combinations of template to target. The third set comprises all of the chosen families and were used to test alternative approaches to defining restraints. Table 1 shows the families and their constituent members. The possible combinations of target to templates are given in Table 2.
Each of the combinations, including the target, were structurally aligned using COMPARER [36] and annotated by JOY [37]. The resulting alignments were manually corrected, resulting in the best possible alignment and thus minimising any error from an incorrect alignment.

Modelling procedure for RAPPER
The application of the conformational search engine RAP-PER to comparative modelling by satisfaction of spatial restraints was achieved by extending the restraint engine The RMSD for models of each of the family targets generated using RAPPER with restraints derived from PDF's, RAPPER with the standard restraint derivation procedure and MODELLER. The RMSD's for MODLLER and Standard RAPPER differ from the equivalent RMSD's in Tables 3  and 4 as they are calculated over the equivalent number of residues for all models in each family. The two families with difficulties in PDF generation are indicated by ‡.
Problems in Deriving PDF's for Flav Family Figure 2 Problems in Deriving PDF's for Flav Family. A. The superimposed templates in gray with the derived centres of the PDF's shown as yellow spheres. Note the divergent loop on the left. The target structure is shown in green. B. The resulting models from different modes of building in RAPPER: RAPPER-PDF in gray, RAPPER-CHORAL in blue, RAPPER-Standard in yellow and MODELLER in pink. The target structure is also shown in green.
as described for solving the Cα trace problem [19]. From the given alignment a structural superimposition of equivalent residues is made and optimised. A common core was defined from the set of aligned protein structures as the subset of equivalent residue atoms with relatively little structural variation as defined by the Altman-Gerstein algorithm [38] and implemented in RAPPER. Based on this superimposition and alignment, spatial restraints can then be described for each residue of the target sequence.
There are four types of spatial restraint: 1 -As RAPPER builds from the N to C termini a bootstrap restraint is required to allow modelling to commence. The bootstrap is defined as the mean position of the Cβ coordinates from the templates, which is made the centre of a restraint sphere, the size of which is user-defined. In building the first two residues a position of the first residue Cβ is taken at a random offset from the mean Cβ coordinate position of the equivalent Cβ of the templates. From this the remaining backbone atom positions can be calculated from the ideal Engh and Huber [39] bond Figure 3 Comparison of RAPPER and NMR Ensembles to the Crystallographic Model. Comparison of a RAPPER ensemble of comparative models for the target 1PVA chain A from the Parvalbulmin family with an NMR ensemble, the crystal structure and the deposited representative NMR structure. A: The backbone trace of 9 models from the RAPPER ensemble (cyan) generated by comparative modelling on all targets and the equivalent models generated by NMR (blue). Also shown are the deposited crystal structure (red) and the representative NMR single model (orange). All models are superimposed with reference to the crystal structure. B: The plot of ensemble mean and mode for each residue in the RAPPER ensemble. C: The 1st derivative of the per residue ensemble mean for RAPPER (red) and the NMR ensemble (green). D: The all atom per residue RMSD for the RAPPER representative single model (red) compared to the equivalent single NMR representative model (green).

Comparison of RAPPER and NMR Ensembles to the Crystallographic Model
angles and lengths implicit in the RAPPER protein model. A ψ angle is then randomly picked from high-grained residue specific ϕ/ψ propensity tables as well as a random angle for the vector between the first and the second Cβ position. Thus the first peptide bond is generated.
2 -A set of spatial restraints is defined for the backbone (main chain) atoms, principally the Cα atoms. Each is defined as an ellipsoid generated from the union of the set of restraint spheres centred on the equivalent atom position from each of the templates, as defined in equation 1.
The size of these spheres is user defined.
where is the position of the Cα atom, is the centre of the restraint sphere with radius r.
3 -A similar set of spherical restraints can be defined for the side chain atoms, except that, rather than taking each atom separately, a virtual centroid (as defined in equation 2) of the side chain is calculated and this position is used to centre the restraint sphere. In fact two virtual centroid positions are calculated: a short virtual centroid position which essentially takes into account the atoms up to and including the Cγ position and a long virtual centroid position which accounts for the rest of the side chain.
where N sc is the number of side chain atoms 4 -A set of spatial restraints is derived for secondary structure elements. Residues are defined to be in elements of secondary structure from consideration of the consensus across the template structures or from secondary structure prediction. The restraints are a combination of restricted ϕ/ψ sampling of the residue specific ϕ/ψ propensity tables to the alpha helical or beta sheet regions of ϕ/ψ space and short range hydrogen bonding distance restraints. Only short range hydrogen bonding is enforced and this primarily in alpha helical regions, although we have now developed algorithms for including more long range restraints (A Karmali and N Furnham, unpublished data).
As well as the specific restraints from homologues, a number of other restraints are also enforced including clash restraints against the framework structure as it is built and distance restraints from ideal bond angles, bond lengths and omega torsion angles. All of the restraints can be propagated along the chain for a user defined distance.
The standard building process in RAPPER as described previously is used [18,19]. Briefly, the algorithm employs a branch and bound protocol to extend iteratively the polypeptide chain in the N to C-terminal direction. A population of 100 fragments that make up the growing polypeptide chain is maintained, with a maximum of 100,000 attempts to find the 100 solutions to the restraint network at each residue position. As some residues are in rare ϕ/ψ conformations this may still be insufficient to sample effectively the ϕ/ψ space. Thus, to optimise the time spent searching the target sequence is split into a number of fragments, avoiding regions where there is no template information available, but otherwise randomly. A population of 50 models is produced for each target.
The geometric average of the model population is calculated in RAPPER. The resultant single model is then regeometrised by TINKER [40]. The protocol is summarised in Figure 4.
Models were constructed using this standard comparative modelling mode. In each round of building 2Å spheres where enforced for the bootstrap, Cα main chain and side chain restraints. These values were determined from the subset of four families used to parameterise the modelling procedure. This parameterisation was achieved by iterative rounds of building adjusting each of the parameters in a combinatorial approach, starting from a large value and gradually decreasing in 0.5Å increments till the restraints were too strict for a model to be built. The last round where the model could be successfully generated was taken as the optimal parameters.

RAPPER sampling by PID
The results of modelling using all the templates demonstrate that the approach would benefit from restricting the available search area. This can be simply achieved by weighting towards the restraints derived from the template with the highest PID to the target, which is accomplished by reducing, based on the PID of the template to target, the relative size of the restraint spheres. The range of PID across the available templates is calculated and is divided into four equal sub-ranges. If the PID of the template lies in the top quartile then the user defined restraint sphere radius is enforced. If the PID of the template lies in one of the other three quartiles, then the restraint sphere is reduced by a corresponding factor, with the restraint spheres generated from the template whose PID lies in the lowest quartile being reduced by 60%. In addition the sampling frequency of the restraint sphere generated from the template with the highest PID is enhanced.

RAPPER using probability density function derived restraints
More distantly related homologous structures can be exploited if restraints are formulated as probability den- Schematic of RAPPER Conformer Generation Applied to Comparative Modelling Figure 4 Schematic of RAPPER Conformer Generation Applied to Comparative Modelling.
sity functions (PDF). The position of each atom (or centroid for side chains) can be used to centre a probability function described as a Gaussian distribution, the mean of which is the atom position and the variance is the local PID taken over a window of 20 residues as a where i is the position in the template sequence, x 1 is Cα position of the template and σ 1 2 is inversely proportional to the PID of the template. The sum of the distributions of each of the homologous atom positions is calculated and normalised to generate a PDF (equation 4).
where x is the coordinate in question and t is the template. This is done for each of x, y and z coordinates. The resulting mean position of the combined PDF is taken as the centre of the restraint sphere, the radius of which can either be user defined or defined by the standard deviation of the new distribution for each coordinate, which can then be used to define an ellipsoid (see Figure 5).

RAPPER using CHORAL/ANDANTE predictions
An alternative approach to defining restraints based upon information from homologous structures can be achieved by taking advantage of the predictions of two programs: CHORAL [31] and ANDANTE [32]. CHORAL, an amalgam of differential geometry and pattern recognition algorithms, identifies the clusters of conformers from homologous templates with conserved curvature and torsion that are most likely to represent the core backbone of the target structure. ANDANTE uses environmental-specific substitution probabilities to predict where χ1, χ1 plus χ2, or χ1 plus χ2 plus χ3 can be directly used from a single template to limit the rotamer search space. Thus, RAPPER uses the equivalent template residue(s) predicted to contribute either to the target's core backbone or side chain conformations to generate the restraint network. For example, if CHORAL predicts that residue i in the target sequence will have similar backbone conformations to the equivalent residues of template 1 and template 2, the Cα atoms of these two templates are used as the centres of the main chain restraint spheres. Similarly, where ANDANTE predicts that the χ1 plus χ2 of template 2 is most likely to be conserved in the target, the short virtual centroid position is used as the centre of the short side chain restraint sphere. RAPPER then builds through this restraint network in the same manner as the standard method for restraint derivation.
For each target the protocol in the standard comparative modelling procedure is used to produce an ensemble of 50 models; the arithmetic mean is taken and the structure re-geometrised using TINKER [40]. The approach of using CHORAL/ANDANTE predictions allowed tighter restraints of 1Å radius to be universally enforced for both main chains and side chains. Where CHORAL or ANDANTE did not predict conformations for a residue i.e. a variable loop region or where there was no prediction of side chain rotamer, all of the templates were used to generate the restraint network with the larger 2Å radius. The restraint sphere radius in the interface between the conserved core and non-conserved region for the backbone was "funnelled" at the end of the conserved core region (gradually increasing from 1Å to 2Å) and the beginning of the next conserved core region (gradually decreasing from 2Å to 1Å). This provided continuity in the main chain restraint network, ensuring no unrealistic distances were required to be satisfied.

Baseline Modelling
In addition to the basic comparative mode of RAPPER, further models were constructed in order to estimate the limitations of the method. For example we used the Cα trace mode of RAPPER [19] to rebuild the target based on experimentally observed co-ordinates. We also exploited restraints from secondary structure information, using the actual atomic positions of the Cα atoms of the experimentally resolved target to define the restraint network. Alternatively the template with the minimum distance from its Cα to that of the target was used while ensuring that this was consistent with the previous restraint sphere centre by approximately a Cα-Cα bonds length to define restraints.

Other modelling programs
The targets were also built using the well established comparative modelling program: MODELLER [41]. Ten models were produced by MODELLER using the standard model-building routine. A single model was automatically selected based on the average between the minimal energy as calculated by MODELLER and minimal steric violations.

Authors' contributions
NF conducted the data acquisition processing and developed (with PIWDB) the comparative modelling mode of RAPPER. SG provided the PDF code and programming support. DFB participated in study design and aided in the analysis. TLB conceived of the study, participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript.

PDF
Ae Centres of PDF's Compared to the Target Figure 5 Centres of PDF's Compared to the Target. PDF's and target for the Ltn family. The centres of each PDF shown as a space filled sphere with the ribbon trace of the target in red. Note that the size of the sphere does not represent the size of the PDF sphere enforced in RAPPER.