Rappertk: a versatile engine for discrete restraint-based conformational sampling of macromolecules

Background Macromolecular structures are modeled by conformational optimization within experimental and knowledge-based restraints. Discrete restraint-based sampling generates high-quality structures within these restraints and facilitates further refinement in a continuous all-atom energy landscape. This approach has been used successfully for protein loop modeling, comparative modeling and electron density fitting in X-ray crystallography. Results Here we present a software toolkit (Rappertk) which generalizes discrete restraint-based sampling for use in structural biology. Modular design and multi-layered architecture enables Rappertk to sample conformations of any macromolecule at many levels of detail and within a variety of experimental restraints. Performance against a Cα-tracing benchmark shows that the efficiency has not suffered despite the overhead required by this flexibility. We demonstrate the toolkit's capabilities by building high-quality β-sheets and by introducing restraint-driven sampling. RNA sampling is demonstrated by rebuilding a protein-RNA interface. Ability to construct arbitrary ligands is used in sampling protein-ligand interfaces within electron density. Finally, secondary structure and shape information derived from EM are combined to generate multiple conformations of a protein consistent with the observed density. Conclusion Through its modular design and ease of use, Rappertk enables exploration of a wide variety of interesting avenues in structural biology. This toolkit, with illustrative examples, is freely available to academic users from .

eled with weighted sampling of knowledge-based preferences, excluded volume restraints and ideal stereochemistry, when further optimized with an all-atom forcefield, have accurately predicted the native conformation [8,9]. The combination of knowledge-based local preferences (for fragments smaller than 10 residues) with nonlocal physical energy terms like hydrophobic burial and hydrogen bonding in a simulated annealing protocol has been effective in protein structure prediction [10], homology modeling [11] and structure determination [12]. Interpretations of crystallographic data of both high [13] and low [14] resolutions have been achieved by combining discrete and continuous approaches.
The promise of this hybrid approach has not yet been fully exploited; for instance it has not been used to assess conformational ensembles to enhance structure determination with NMR and EM data, to explore flexibility of ligands including macromolecules such as RNA or to examine diversity at macromolecular interfaces. Our approach, encoded in RAPPER [15] (Fig. 1), has been applied succesfully to a range of protein modeling problems where restraints have been introduced from knowledge of structures or experimental obsvservations. But RAPPER is limited in applicability due to its inflexibility in molecular representation (proteins only), sampling direction (N to C) and search algorithm (Genetic Algorithm with Branch and Bound : GABB). These limitations have to be removed if the idea of discrete restraints-based sampling is to be applied to new problems. We found that this was quite challenging within the RAPPER codebase (> 30,000 lines of C++ code).
In this paper we describe an alternative framework, Rappertk, which (a) programmatically decouples the logically distinct concepts like search algorithms, knowledge-based confomational preferences, sampling and building techniques and (b) provides access to them with a scripting language. The former reduces development time by allowing modules to be treated in isolation -e.g. RNA sampling and building can be implemented independent of GABB. The latter speeds up the process of adapting the software to new scenarios, say by coding high level tasks like parsing and file manipulations in the scripting language. We show that both impact scientific productivity by allowing faster application of discrete restraints-based sampling to new problems. Analogous to MD softwares which provide a platform to run MD/minimization schedules, Rappertk provides a platform for discrete restraints-based sampling and reproduces RAPPER functionality for proteins as a special case. Following sections describe the design, implementation and benchmarking of Rappertk. We demonstrate that Rappertk has a flexible, robust and easy-touse software library which generalizes and builds upon the major concepts from RAPPER methodology in a modular, multi-layered fashion. Fig. 2 shows a typical step in RAPPER-like incremental sampling. This involves three distinct steps : sampling of dihedral angles ϕ, ψ, ω, building coordinates for the next peptide using those of the previous and checking the C αpositional restraint. This suggests the concepts of sampler, builder and restraint. RAPPER maintains a population of conformers and executes these steps repeatedly on them according to GABB. This can be abstracted as search strategy which is responsible for correct ordering and execution of samplers, builders and restraints. In the modular, layered design of Rappertk (Fig. 3), application scripts reside at a level higher than search strategies -they carry out the task of preprocessing, creating necessary builders, samplers and restraints for the problem at hand, and passing them to the appropriate strategy.

Implementation
We have chosen a C++/SWIG/PYTHON style of coding, whereby the interface of C++ code is exposed in PYTHON by generating suitable wrappers automatically with SWIG. Such architecture has become popular among academic softwares (e.g. Xplor-NIH [16]) as it provides robustness without losing the fluidity needed in academic implementations. We now describe the major concepts in more detail.

Coordinates
Different sets of coordinates need to be maintained in order to allow for sets of conformers, either for population-based searches or for using ensemble averaged restraints. Some coordinates are known and fixed, e.g. secondary structure elements in a loop building exercise. Each point has an associated hard-sphere (van der Waals) radius adapted from those used in PROBE [17]. A highlevel application script generates the coordinates. Builders and restraints operate upon specified indices in given coordinates.

Samplers
A sampler chooses a datum from an underlying distribution of conformational preferences by random weighted sampling. Well-known examples are weighted ϕ, ψ sampling for protein backbone [18], RNA backbone [4] and sidechain rotamer sampling [6], all derived from high quality crystallographic structures. New types of sampling can be easily incorporated by writing a new sampler for the corresponding builder, say tri-ϕ, ψ sampler for tripeptide fragments, substructure sampler etc.

Restraints
Values of various geometric entities are useful in constraining the conformational space, e.g. internuclear dis- The central search algorithm in Rapper is a blend of genetic algorithm and branch-and-bound approach (GABB) Figure 1 The central search algorithm in Rapper is a blend of genetic algorithm and branch-and-bound approach (GABB). Red nodes represent restraints-violating conformation extensions and green nodes stand for the restraints-obeying ones. Some conformational extensions may be left unsampled (not shown). Subtrees emanating from green nodes only are explored further. Set of green nodes at each level is kept below a fixed size (population size), and this allows conformational exploration in time proportional to protein length, leading to an ensemble of restraints-satisfying conformations. Basic concepts in Rappertk. A sampler samples discrete conformational preferences (e.g. ϕ, ψ, ω). A builder uses the sample and calculates a set of unknown coordinates from a set of known coordinates (e.g. peptide building). A restraint checks that the calculated coordinates satisfy some geometric criterion (e.g. whether calculated C α coordinate lies within a spherical region). tances derived from NMR NOEs, electron-density from Xray analyses, C α positional information from templates in comparative modelling, and so on. A restraint object holds the information of points on which it is to be tested, and the method of testing. A restraint is generally binary; it is either satisfied or violated. A restraint can be also be optional, i.e. it can be discarded if sampling consistently fails due to that restraint.

Builders and followers
A builder consists of the indices of coordinates it uses and those it calculates, along with the calculation technique.
For instance, the ϕ, ψ-based peptide backbone builder } coordinates are used by a backbone-dependent sidechain builder; and so on. Thus builder is an abstraction of coordinate calculations operating on input and output indices within a coordinate set. A builder may have an empty input set or may have only known coordinates as inputs, in which case it is called a seed builder (e.g. peptide N terminal anchor builder). There is a maximum number of trials a builder can undertake to extend a conformation; this will depend upon the conformational space available to sample. In order to avoid futile sampling, the builder may implement a session in which only unique samples are used, thus improving sampling diversity. Follower is a concept specific to population-based searches. A builder is another's follower if it is advisable to execute it in the same population-search step as the leading builder. This was an improvisation first used during C α -trace scripting to build sidechain immediately after the relevant mainchain.

Sampling Strategy
The sampling strategy orchestrates the builders and restraints systematically to generate conformations. The sampling strategy can be divided into ordering and execution of restraints and builders. Automatic ordering allows the application script to create builders and restraints in any convenient order. Because strategies are coded in PYTHON, it is easy to write a new strategy.

Ordering of builders and restraints
A correct strategy must calculate the order of execution for builders and restraints. There is a partial ordering induced on builders due to their input and output coordinates, i.e. a builder may not be executed unless its input coordinates have been computed, except for seed builders. Thus there is a digraph of builders, with possibly many seed builders Rappertk architecture -modular and layered Figure 3 Rappertk architecture -modular and layered. Modules written in C++ are exposed to PYTHON using automatic wrappers generated with SWIG. Knowledge−based conformational distributions and others depending on one or more builders. Restraints can be checked only after all coordinates to be tested have been computed, hence there is restraint-builder dependence. An efficient strategy must test a restraint as early as possible in order to avoid sampling the disallowed conformational space. Once a builder succeeds or fails in its task, an efficient sampling strategy must use the builder dependence digraph to identify the builder to be attempted next. The strategy currently implemented in Rappertk determines the builder order by topologically sorting the builder digraph, more specifically as follows: • In case of multiple seed (parentless) builders, a dummy builder is assumed to be their parent. A procedure similar to DFS (depth first search) is used to assign unique parents to all nodes, i.e. convert the digraph into a tree. A node appears as child of another node only if the latter is the only unvisited parent of the former.
• The size of subtree rooted at each node is found.
• Using DFS again, an order is established for the nodes. When a node is popped off the DFS stack, its children are pushed onto the stack in the ascending order of subtree sizes.
• The order thus obtained is the final ordering used by the default strategy. If a builder fails, its unique parent builder may be executed, and the results of the parent and all its children discarded. If a builder succeeds, the builder next in order may be called.
• From this builder order, restraints are identified for each builder such that they have all the necessary points computed after the builder. Thus every builder has associated restraints to check after it is executed.
As an illustration, consider conformational sampling of a three residue peptide (see Fig. 4) under the C α spherical positional restraints. Four kinds of builders are employed. NanchorBuilder uses the first two C α restraints to anchor the peptide. Backbone-dependent sidechain builder is used for sampling sidechains. Since this builder requires parts of the backbone from adjacent residues also, two dummy Gly residues are added, one each at the beginning and end of the tripeptide. PeptideBuilder is used to build peptides in forward and backward directions. Nanchor-Builder is the seed builder as it has no input points. Reverse PeptideBuilder, PeptideBuilder-1 and Sidechain-Builder-1 depend on it because their input coordinates are partly or completely contained in its output coordinates. Similarly, SidechainBuilder-3 depends upon Peptide-Builder-1 and PeptideBuilder-2. Restraints CARestraint-1 and 2 depend upon PeptideBuilder-1 and 2. From these dependences, a directed graph can be constructed with builders and restraints as nodes. Topological sort on this graph produces a linear order of the builders, which suggests the builder to be tried after a successful (restraintssatisfying) builder. The backward ordering (or fallback ordering) determines the builder to be called after an unsuccessful (not satisfying restraints) builder.

Execution of builders and restraints
Once the ordering among builders and restraints is established, various search strategies can be used to sample conformational space. The simplest is an exhaustive search, where each restraints-satisfying option available to a builder is explored. RAPPER uses PopulationSearch algorithm (GABB) as mentioned earlier. GABB limits the number of restraints to be checked at every extension step and provides a pool of fit parents to build upon. Each parent is allowed to contribute more than one child and parents compete to put their children in the children pool. In addition to PopulationStrategy, Rappertk provides a minor variation which allows limited backtracking (using fallbacks described earlier). The number and size of backtrack steps can be specified. In cases where the parents are not extensible at a certain step, the population search is restarted some steps earlier, determined by number and size of backtrack step specified. This saves the cost of starting from first step in case of failure at an advanced step.

Spatial grid for checking clashes
Steric clashes are a very important restraint on conformational freedom. Hence the output of every builder is verified with a 3D grid that uses geometric caching to check the clashes efficiently. A GridHelper is provided to the grid to modify clash-checking functionality according to the application requirements. For instance, in atomic models, first and second covalent neighbours of an atom need not be clash-checked, the van der Waals radius of sidechain atoms needs to be reduced due to discrete sidechain rotameric states, etc.

PDB reader, model renderer etc
The i/o functionality is written in PYTHON. PDB reader is largely adapted from a previous work [19]. ModelRenderer is currently a PDB writer, but can be extended to write models in other formats too. ModelRenderer is invoked by the strategy when it succeeds in sampling a conformation within given restraints.

Application scripts
Application scripts are high-level PYTHON scripts which generate problem-specific context by preprocessing given information and creating necessary Rappertk components to be used by the search strategy. They can be invoked as execution modes from Rappertk launcher script. Application scripts are assisted by various utility scripts like the one for creating a standard set of builders and restraints.
Automatic ordering of builders and restraints involved in sampling a three-residue peptide

Benchmarking
As tracing a polypeptide chain is central to all the tasks performed by RAPPER, we compare Rappertk's performance at chain tracing with that of RAPPER for 9 large (> 300 residues) proteins from the [20] benchmark set (see Table 1). Ensembles of 50 models were generated and the average RMSD values for the ensembles calculated. If the conformational search could not generate a model within 24 hours of computational time, the search was considered unsuccessful. Mainchain-only models were generated using C α restraint radii of 0.5, 1, 1.5 and 2Å. The C α restraint threshold defines the radius of the sphere within which the C α atom of the modelled residue is restrained to lie. The centre of this sphere is given by the native C α position. All-atom models were generated under C α restraint of 1Å and 2Å restraint on the centroid of the sidechain atoms. The van der Waals radii were reduced by 25% to compensate for the fact that only specific sidechain rotamers were allowed. Sidechain centroid restraint places and orients the side chain atoms with respect to the mainchains and affects bulky side chains more than the smaller sidechain groups.
Rappertk can trace either from N to C terminal (forward) or in the C to N (backward) directions, with and without sidechains, in guided or standard sampling modes. Standard sampling is RAPPER-like ϕ, ψ sampling which is unaware of the C α restraint to be satisfied. Such sampling can be the bottleneck when restraints are tight or only a small portion of the restraint spheres are reachable geometrically. Hence we have also incorporated guided sampling in which the sampler is aware of the restraint and produces samples within that restraint. As shown in Fig. 5, the loca- Thus the restraint sphere is sampled spatially by the guided sampler to obtain r, α, θ, samples. Corresponding ϕ, ψ, ω values are found using a precalculated mapping from r, α, θ to allowed ϕ, ψ, ω. Since this mapping is one-to-many, a random sample is taken from available ϕ, ψ, ω values. Such sampling ensures that the restraint sphere is sampled efficiently while still using ϕ, ψ values from the allowed region of Ramachandran plot.

Mainchain modelling
In addition to comparing the main chain modelling accuracy between RAPPER and Rappertk in standard forward mode, models were built in the backward (C to N) mode in order to check whether the performance varies. Table 2 shows the model accuracy under a spherical positional restraint of radius 1Å on C α atoms. Similar values of mainchain and C α RMSDs obtained demonstrate that performance of Rappertk is comparable to that of RAPPER and consistent across the whole target set. The low standard deviation values within each ensemble show that all the three approaches produce tight clusters containing models that are all equally acceptable. Larger restraint radii result in looser restraints and give models that deviate further from the native structures. RMSD values in Table 3 demonstrate that both RAPPER and Rappertk perform equally well under different C α restraint thresholds. For the restraint radius of 0.5Å both RAPPER and Rappertk failed to find complete ensembles for proteins 4enl, 8abp and 8tln. For 8tln, the conformational search repeatedly failed at Leu-133. Since the conformational search builds one residue at a time, slight errors introduced earlier can sometimes make it difficult to find a suitable conformation for a residue causing repeated failures at the same position. This limitation can be circumvented by building the peptide chain in the reverse direction. Using backward building for 8tln, 5 models could be found having an average main chain RMSD of 0.4lÅ (0.01) and a C α RMSD of 0.35Å (0.01). Models for proteins 8abp and 4enl were built using the guided sampling mode in Rappertk.

All atom modelling
As can be seen from Table 4, the model accuracies for RAP-PER and Rappertk are comparable and do not vary signifi-

Computational Cost and Quality Check
Model quality was assessed using PROCHECK [21]. All structures have main chain bond lengths and angles within the limits of the standard deviation of their small molecule values and also have good sidechain stereochemistries. Computational cost scales with the size of the restraint sphere used to generate the models. As can be seen from Fig. 6 the computational cost for Rappertk is less than that for RAPPER for restraint radii 0.5, 1, and 1.5 l.5Å. The average time taken by Rappertk under a C α restraint of 2Å is only slightly higher at 37 s/models compared to RAPPER which takes 32 s/model. There is a visi- Guided sampling Figure 5 Guided sampling. Location of C α i can be described by specifying locations of C i-2 , N i-1 , C α i-1 along with {r, α, θ} or {ϕ, ψ, ω}. This leads to one-to-many mapping between {r, α, θ} and {ϕ, ψ, ω}.

Illustrations
We now describe the use of Rappertk to carry out some new sampling tasks.

Protein-ligand interface sampling in electron density
Protein ligand interactions are central to understanding the roles of ligands as well as the mechanisms of enzymes.
The approximate location of a ligand is often known but small ligands often have poor electron density. This scenario is suitable for automatically fitting various ligand conformations into the density with Rappertk, thus creating an ensemble for further refinement. From a recent paper on automatic modeling of ligands [22], we chose a medium resolution (2.6Å) structure (1di9) of p38 kinase in complex with a quinazoline ligand.
In order to describe the degrees of freedom in a ligand, a file format was devised. It describes the ligand's bootstrapping (init lines), rotatable bonds (rotbond lines) and internal distance restraints (mindist lines). Builders and restraints are created using the information given in this file. Covalent bond lengths and angles are not altered from the initial coordinates given as input.
Depending on ligand proximity, small sections of protein chains are identified and sampled using a loop sampling  c Percentage of side chains with χ 1 > 40° of the equivalent χ 1 in the crystal structure, averaged over all proteins in target set. d Percentage of side chains with χ 1,2 > 40° of the equivalent χ 1,2 in the crystal structure, averaged over all proteins in target set. and closure procedure. Loop closure samples the location of C α i given the locations of C α i-1 and C α i+1 as shown in Fig.  8. Sidechain centroid restraint and C α restraint are lenient close to ligand.
Electron density restraints are employed using the excellent Clipper libraries [23] for crystallographic computations. The deposited PDB structure is used to phase the structure factor amplitudes and to obtain an electron density map. EDrestraint is satisfied by builder outputs which lie in reasonable density (> 0.25 σ) and have good mean density (> 1 σ). EDrestraints are optional except for the ligand. EDrestraints operate on the output of each builder.
This scheme of flexible-protein flexible-ligand yields an ensemble of protein-ligand interface conformations which are consistent with the expected degrees of freedom of ligand, electron density, hard-sphere clash restraints and covalent geometry of the protein (Fig. 9). Further refinement and ensemble interpretation will be addressed in future work. Apart from crystallographic application, such sampling can be used by small molecule docking programs also to generate trial conformers of the ligand and protein.

Protein-RNA interface sampling
Although RNA conformational preferences are harder to identify due to the much larger conformational space (7 backbone dihedral angles), recent analysis has revealed the ro-tameric nature of the RNA backbone [4]. We use these preferences to extend the RNA chain as shown in Fig.  10. Bootstrapping copies the initial few atoms from the given structure to the region specified by restraints on them. Incremental build of the RNA chain is done by RNAsuiteBuilder, which depends on atoms {C5*, C4*, C3*} and builds atoms {O3*, P, O1P, O2P, O5*, C5*, C4*, C3*} along with sugar and base.
In this illustration (Fig. 11), we choose protein chain A and RNA chain E from a recently solved protein-RNA Computational cost scales as a function of C α restraint radius for RAPPER (squares), Rappertk (diamonds) and Rappertk using backward building (triangles) Figure 6 Computational cost scales as a function of C α restraint radius for RAPPER (squares), Rappertk (diamonds) and Rappertk using backward building (triangles). 5cpa is exluded.

Rappertk RAPPER Rappertk Backward
Computational cost for all-atom modelling across target set Figure 7 Computational cost for all-atom modelling across target set. The average time required to build a successful model is shown for RAPPER (diamonds), Rappertk (squares) and Rappertk with guided sampling (triangles). complex (helicase-core region of Vasa bound to a single stranded RNA [24]). We identify sections of protein chain in close proximity to the RNA. These sections are later sampled as loops with loop closure and restrained with C α and sidechain centroid positional restraints. RNA bootstrap builder regards {C5*, C4*, C3*, P, O1P, O2P, O5*} atoms of the first nucleotide as a rigid body and translates/ rotates it so that C5*, C4*, C3* atoms are within 2Å of native positions. During incremental building, the C3* atom is restrained to lie within 2Å of the native C3* atom. As before, the deposited PDB structure is used to phase the deposited structure factor amplitudes and builders are restrained to build within a mean electron density of 1 σ .
Generation of multiple conformations of protein-RNA interface with Rappertk can be useful in deriving multiple interpretations permitted by the crystallographic data. Interface diversity thus assessed may lead to novel insights into function. This issue will be addressed in detail in a future study.

Sampling β sheets
In low-resolution crystallographic or EM data, salient features of the structure (β-sheet or α-helix) are more detectable than the terminal regions or loops, making it desirable to start building a model at such features. α-helices are easier to sample than β-sheets because hydrogen bond restraints in helices are sequential unlike those in sheets. Hence sequential sampling is inefficient for the later strands in a sheet. As Rappertk is not restricted to sequential sampling, a β-hairpin can be built as shown in order is zigzag and helps in maintaining strict hydrogen bond geometry (distance O-N within between 1.5Å, 3.5Å angle C-O-N > 100°). We observed that this builder order is more efficient in sampling the hairpin under positional and hydrogen bonding restraints, than the simple sequential order.
Rappertk extends this scheme of sampling β-sheets to parallel sheets and arbitrarily many strands (see Fig. 13) Rappertk procedure for RNA sampling Figure 10 Rappertk procedure for RNA sampling. Bootstrap builder assigns the location of a few initial atoms of the first nucleotide by rigid-body transformation from given structure satisfying the positional restraints specified on C5*, C4*, C3* atoms. Chain extension is carried out by RNAbuilder according to the backbone dihedrals sampled by RNAsampler (using the rotamericity described in [4]). RNAbuilder builds the corresponding sugar and base also. strand may be linked to other strands in both parallel and antiparallel arrangements, e.g. in a 3-stranded sheet with corresponding residue positions (...i -1, i, i + 1...), (...j -1, j, j + 1...) and (...k + 1, j, j -1...), residue j is involved in (N j , O i-1 ), (O j , N i+1 ) while residue j + 1 forms hydrogen bonds (N j+1 , O k-1 ), (O j+1 , N k-1 ); this pattern repeats every alternate residue. This scheme is used in the next example.

3.2.4
All-atom model generation from approximate secondary structure information and particle shape Techniques like EM and SAXS are valued for their ability to estimate macromolecular shape and to help in global relative positioning of parts of the particle. Automatic identification of secondary structures and prediction of their topology is possible [25,26] by morphological analysis of EM data. Coupled with secondary structure prediction from sequence, this generates approximate positional restraints on C α atoms in secondary structures. We demonstrate here that Rappertk can combine the shape and secondary structure positional restraints to generate atomic models.
In order to simulate this scenario, we generated an artificially blurred electron density map at 10Å resolution using EMAN [27] and built into the envelope defined by 1 σ contour. 3Å C α positional restraints are placed on residues in secondary structures. There are no positional restraints on sidechains and loops. Hydrogen bonding Multiple conformations possible for a protein-RNA interface (helicase-core region of Vasa, chains A, E in PDB 2db3) within electron density restraints Figure 11 Multiple conformations possible for a protein-RNA interface (helicase-core region of Vasa, chains A, E in PDB 2db3) within electron density restraints. Native structure is rendered as sticks and cartoon, five models as lines and ribbon. Electron density not shown for clarity. restraints are used for β sheets as described earlier, and also on α helices. Ten models thus generated are shown in

Discussion and Conclusions
Rappertk's design makes it possible to apply discrete restraint-based modeling to a variety of problems robustly and easily because • Introducing new builders, restraints, samplers and search strategies is easy.
• Any level of granularity can be chosen to represent the structure.
• Automatic ordering of builders and restraints spares the user from the tedious task, but a preferred order may be imposed if needed.
• Any number of coordinates may be known before modelling. They can be used as restraints or to make seed builders or just as steric obstructions.
• Ensemble building and average restraints can be introduced easily by adding restraints which check the average value of some property of the conformational pool.
The modularity and flexibility of Rappertk makes it an attractive platform for carrying out discrete restraint-based modeling tasks under a variety of restraints, as we have demonstrated here. Rappertk can also be useful to generate decoy sets useful in developing energy functions for discriminating between non-native and native conformations.
Our immediate goals with this toolkit include exploring protein-ligand and protein-RNA interface conformations, aiding automation of X-ray refinement and developing a protocol for interpreting NMR restraints. To address these tasks more effectively, some more features will likely be needed. For instance, non-binary restraints are not at present implemented. To introduce such analog restraints, the population seach strategy will be modified to allow scoring of conformational extensions as well as members of an ensemble of conformations. We also intend to implement coarse samplers to address sparse restraint scenarios, e.g. by analyzing geometric preferences between adjacent secondary structure elements, a coarsegrained secondary structure incremental sampling can be achieved. Another concern is that although builder order in Rappertk is flexible, still it is a linear order, hence concerted conformational change is not possible. We are working on implementing a strategy inspired by the SCWRL algorithm [28], which will operate at the level of side-chains as well as fragments and optimize the conformational possibilities independent of builder order. Another strategy under consideration involves simulated annealing and incorporation of conformation-modifiers which tweak the structure in a particular way, e.g. local backbone moves, rigid-body fragment movements, sidechain flips and so on. Tweakers will form the move-set for simulated annealing which will be used to obtain a coarse structural framework that will be further explored to get atomic models.
In conclusion, we believe that Rappertk will prove to be a useful platform for conformational sampling and searching for a wide range of applications. β-hairpin building. Blue is NanchorBuilder's output, red is that of forward PeptideBuilder, brown is that of backward PeptideBuilder and magenta that of Peptide-BridgeBuilder. Dotted lines show the distance restraints used for hydrogen bonding in addition to 0.5Å C α positional restraints.

Authors' contributions
SPG designed and implemented the software library and drafted the manuscript. AMK performed benchmarking runs and contributed the corresponding section in the manuscript. TLB critically reviewed the manuscript and provided valuable guidance. All authors read and approved the final manuscript.
Combining shape and secondary structure skeleton to gener-ate atomic models Figure 14 Combining shape and secondary structure skeleton to generate atomic models. β sheet building by identifying the ladder and sampling along the steps Figure 13 β sheet building by identifying the ladder and sampling along the steps. Multiple strands and both (anti/parallel) arrangements can be sampled within hydrogen bonding (angle C-O-N and distance O-N) restraints.