BMC Structural Biology BioMed Central

Background: Automatic protein modelling pipelines are becoming ever more accurate; this has come hand in hand with an increasingly complicated interplay between all components involved. Nevertheless, there are still potential improvements to be made in template selection, refinement and protein model selection.


Repair algorithm
In the repair algorithm, first, backbone-bonds with non-standard-length are identified and fixed. Second, fragments are created according to the secondary structure prediction, using internal coordinates with standard bond and angle values (IUPAC).
These fragments are spliced into the backbone of the incomplete model and subsequently adjusted using an algorithm based on the cyclic coordinate descent (CCD) algorithm [1]. In this algorithm the Ramachandran Plot [2,3] is used for Φ/Ψ sampling, clashes are prevented and the convergence criterion is based upon structural similarity between the initial and repaired fragment stems. The exact procedure is outlined below.

Identification of breaks
Various categories of breaks can appear in protein models. Here we broadly distinguish between breaks, arising from incorrect peptide bond lengths or deletions from the template (geometrical breaks, II) and those caused by missing residues (sequence breaks, I & III) (Figure S1 a). Geometrical breaks were identified measuring the peptide bond length between N and C atoms of adjacent residues.
Missing residues were identified using the residue numbering, and the single sequence alignment [4] between the full query and the model sequence.
The probability of introducing errors to a protein structure increases with the length of missing secondary structure elements to be inserted. Longer insertions often hinder the completion of subsequent breaks. Thus, it is advisable to define an explicit order and maximum length for insertions. This length is set to 15 residues for coil regions and 25 for helical regions. Moreover, we decided to complete the backbone in the order from short to long fragments.

Closing geometrical breaks
If a geometrical break is within 10 residues of the terminal regions, the whole secondary structure element is remodelled; otherwise the CCD algorithm is applied.
In the initial attempt, flanking residues on either side of the break are included to facilitate closure. If this attempt fails, the number of surrounding residues is expanded stepwise. However, this expansion is not allowed to include residues outside the coil element that contains the break. When a break lies within a helical structure, the whole helix is replaced, remodelled with its correct length and fitted into the original model.
In some cases, a geometrical break is proximal to a sequence break. If these are less than five residues apart, they are realigned, thereby reducing the amount of conformational change needed to be dealt with (see Figure S1 b). Although this procedure results in a local realignment, it prevents major rearrangements of the backbone and prevents total failure of the closing algorithm.

Modelling of missing parts
Using PSIPRED [5] secondary structure prediction, broad three-state conformations for the missing secondary structure elements are assigned. Angles, distances and torsion angles comply with the official IUPAC definitions. All residues, within a modelled insertion, have torsion angles distributed within the highly populated areas of the Ramachandran Plot. These elements are modelled separately and spliced into the initial model. After the insertion of the new secondary structure elements, the resulting breaks in the protein backbone are closed using CCD algorithm, sampling highly populated Φ/Ψ angle combinations.
If a sequence break is proximal to either terminus, the CCD algorithm is not required and new residues are simply added to the model. The conformation of the addition is checked for backbone clashes, using the coarse backbone clash score [6]. If backbone clashes occur, the terminus is remodelled.
If a sequence break is not proximal to either terminus, a method to maintain the overall topology is required. The CCD algorithm is applied, and a closed conformation is accepted, once the SC score between the closed and initial structure is above 0.98.

Server
The modelling pipeline is integrated into the new server 3D-JIGSAW v3.0. The user can create models for a chosen protein sequence in an automatic and an interactive mode. Furthermore an upload mode is offered where the user can upload their own models for recombination (see supplementary Figure S4).
For all modes, the five best models are available for download or direct inspection.
The list of used templates, including their function, is returned. The top alignment is given as well as the sequence predictions such as PSIPRED or DISOPRED [7]. A plot  Figure S5).
In the interactive mode an intermediate results page is created, where the indentified templates are visualised, including the function, sequence identity and coverage (see supplementary Figure S6). Each alignment can be inspected and manually adapted.
Single models can be selected for model building without recombination.
Alternatively, several models can be selected for recombination. The user can choose between three different modes: automatic, interactive and upload mode. For each mode either the sequence to be modelled needs to be pasted into the text field, or a sequence file in FASTA format needs to be defined for upload. For the upload mode the user needs to define a file including all the models to be recombined, separated by TER tags.

Figure S5
Final results page.
For the final results sequences predictions, the POPULUS energy profile and up to five different models are given. For each model an energy score, the coverage and a Ramachandran Plot can be displayed.

Figure S6
Interactive mode results page.
In the interactive mode an intermediate results page is created. Here, the different sequence predictions are available. For each model, the template, the template function, the sequence identity and the sequence coverage is given. The alignments can be adapted manually and models can either be selected for single modelling or for recombination.