Volume 13 Supplement 1
Selected articles from the Computational Structural Bioinformatics Workshop 2012
Unbiased, scalable sampling of protein loop conformations from probabilistic priors
 Yajia Zhang^{1}Email author and
 Kris Hauser^{1}
https://doi.org/10.1186/1472680713S1S9
© Zhang and Hauser; licensee BioMed Central Ltd. 2013
Published: 8 November 2013
Abstract
Background
Protein loops are flexible structures that are intimately tied to function, but understanding loop motion and generating loop conformation ensembles remain significant computational challenges. Discrete search techniques scale poorly to large loops, optimization and molecular dynamics techniques are prone to local minima, and inverse kinematics techniques can only incorporate structural preferences in adhoc fashion. This paper presents SubLoop Inverse Kinematics Monte Carlo (SLIKMC), a new Markov chain Monte Carlo algorithm for generating conformations of closed loops according to experimentally available, heterogeneous structural preferences.
Results
Our simulation experiments demonstrate that the method computes highscoring conformations of large loops (> 10 residues) orders of magnitude faster than standard Monte Carlo and discrete search techniques. Two new developments contribute to the scalability of the new method. First, structural preferences are specified via a probabilistic graphical model (PGM) that links conformation variables, spatial variables (e.g., atom positions), constraints and prior information in a unified framework. The method uses a sparse PGM that exploits locality of interactions between atoms and residues. Second, a novel method for sampling subloops is developed to generate statistically unbiased samples of probability densities restricted by loopclosure constraints.
Conclusion
Numerical experiments confirm that SLIKMC generates conformation ensembles that are statistically consistent with specified structural preferences. Protein conformations with 100+ residues are sampled on standard PC hardware in seconds. Application to proteins involved in ionbinding demonstrate its potential as a tool for loop ensemble generation and missing structure completion.
Keywords
Conformation sampling Monte Carlo methods protein loops ensemble generation graphical modelsBackground
Sampling conformations of kinematic chains  rigid objects connected by articulated joints  is a fundamental problem in protein structure prediction, the geometry of folding linkages, and robot motion planning. Sampling poses a challenging computational problem when chains are large and must satisfy a variety of constraints and statistical preferences. Conformations may be required to satisfy hard feasibility constraints, such as loop closure and collision avoidance, while also obeying soft preference constraints, such as low energy and high structural likelihood. Particularly around folded protein structures, the subset of feasible and favorable conformations comprises a miniscule fraction of the conformation space, and due to the "curse of dimensionality" this fraction shrinks dramatically with the dimensionality of the state space. Because interesting biological macromolecules have large numbers of degrees of freedom, ranging up to hundreds or thousands, new techniques are needed to sample severely constrained conformations efficiently.
Protein loops are flexible structures that often deform during binding, and are extremely important for understanding protein functioning [1]. Loop sampling has been used in missing fragment reconstruction, generating fluctuations in equilibrium conformations, and generating decoy sets for function prediction. Such applications typically require methods for sampling energeticallylikely and diverse configuration ensembles rather than optimizing a single point estimate. The loop closure constraint, which requires the terminal atoms of a loop to lie at fixed positions dictated by the surrounding structured regions, poses a major challenge in sampling. Existing loop sampling methods include discrete search [2], optimization [1], and inverse kinematics (IK) methods [3–6]. Fiser et al's [1] approach takes an energy function that encodes spatial restraints and preferences on dihedral angles, and then runs a numerical optimization to minimize that energy. Optimization is relatively computationally expensive, which is usually worthwhile for single structure prediction but less so for generating conformation ensembles. Discrete search methods are able to explore a wider space of conformations by incrementally building a tree of clashfree subchain conformations starting from one end of the loop and progressing toward the terminal end [2, 7, 8]. Although effective for small loops, these methods do not scale well to large loops due to the problem of combinatorial explosion. As a result, discrete search is intractable with chains containing 78 or more residues. They also introduce discretization artifacts, and are not able to close the gap between a terminal atom and its desired position. Inverse kinematics (IK) techniques from the robotics field have been adopted to sample conformations with exact loop closure [3–5, 9]. However, these methods prioritize the loop closure constraint and do not take energies into account during sampling. Hence, some authors employ a secondary energy optimization step to generate more plausible conformations [3, 6, 10].
For each of these methods, the sampling distribution is throughly entangled with the sampling procedure. To achieve a desired distribution, the sampling procedure must be tuned in an opaque, nontrivial manner, and it is unclear that a desired distribution can even be achieved. Instead, extensive posthoc empirical testing to assess the quality of the resulting distribution and to argue that a method samples well. Monte Carlo (MC) techniques represent a more principled class of approaches that sample directly from a desired destribution. They have a long history of use in computational biology because they can quickly explore multiple energy minima and transition pathways, while molecular dynamics and optimization techniques often get stuck in single local minima [11–14]. They are also wellsuited for generating conformation ensembles. The general MetropolisHastings approach generates a sequence of incrementally perturbed configurations via a random walk, with a carefullydesigned acceptance criterion (the detailed balance condition) that ensures that the sampling distribution approaches the desired one as more samples are drawn. However, there is a tradeoff in choosing the perturbation size: small perturbations raise the fraction of accepted moves but lower the speed of conformation space exploration. Moreover, standard MC techniques cannot be directly applied to protein loops due to the loop closure constraint, which causes each step to be accepted with probability 0.
Characteristics of loop generation techniques
Technique  Loop closure  Prior distribution/energy function  Global search  Scalability 

Optimization  Exact  Y  N  + 
Inverse kinematics sampling  Exact  N  Y  ++ 
Discrete search  Inexact  Y  Finite subset   
Standard Monte Carlo  No  Y  Y, reqs. mixing  + 
SLIKMC  Exact  Y  Y, reqs. mixing  ++ 
Methods
SLIKMC is a Markov chain Monte Carlo (MCMC) method that takes as input an experimental conformation scoring function Φ, a protein structure from the Protein Data Bank (PDB), the beginning and ending residues of the loop, and outputs a sequence of perturbed loop conformations such that the sequence asymptotically approaches a probability distribution proportional to Φ. If the structure is missing, a rough initial structure is sampled using existing inverse kinematics loop closure techniques. To generate a subsequent conformation, it performs the following operations:
 1.
Sample a new subloop conformation that satisfies kinematic constraints.
 2.
Compute the MetropolisHastings importance ratio α of the new conformation against the previous conformation.
 3.
Accept or reject the new subloop conformation with probability α.
The method terminates when a fixed number of conformations are generated or until a desired time cutoff is reached. The novel contributions of this paper include an exact derivation of the importance ratio α for the inverse kinematics sampler of step 1 and the use of sparse PGMs to evaluate the importance ratio quickly persubloop. We also describe extensions that handle flexibility in sidechains and molecules with multiple branches or loops (e.g., polycyclic compounds).
As a MCMC method, SLIKMC samples from a complex joint probability distribution by constructing a Markov chain whose equilibrium distribution is equal to the desired distribution. It is a hybrid MCMC algorithm that combines blocked Gibbs sampling and MetropolisHastings (MH) sampling. MH permits the use of nonnormalized probability distributions, which is important because it is relatively simple to define a useful scoring function but virtually impossible to ensure that it integrates to one. The blocked Gibbs sampling method samples a small subloop at each step, which helps SLIKMC scale better to large chains, because acceptance rates decrease roughly exponentially in the number of variables sampled at once. This section will first review classical MCMC methods and then describe the new approach.
Markov chain Monte Carlo framework
where T is the system temperature.
is called the importance ratio. If the ratio is greater than 1, then the new sample is accepted; otherwise it is accepted with probability equal to the ratio. With the detailed balance construction, P (x) is indeed the stationary distribution of the Markov chain generated by successive samples.
The key question for MH is how to choose a proposal distribution that we can sample from and evaluate. The acceptance strategy must evaluate the terms in (3) exactly so that the MH algorithm respects the detailed balance. One of our key contributions is a technique for evaluating Q exactly when sampling from closed chain submanifolds, which enables our method to generate an unbiased sampling sequence.
and keeping the remaining variables fixed. The variable is updated and the index i is incremented in looping fashion. If the dependencies between the variables are sparse (e.g., every variable x_{ i } only depends on a handful of variables rather than the remaining n  1 variables), then Gibbs sampling can be efficient even for very large problems. This is exploited in sparse PGMs. Blocked Gibbs sampling is a variation of Gibbs sampling that groups multiple variables as a block and samples the block from the joint distribution conditioned on all other variables.
Our method combines Gibbs sampling with MH sampling to generate a new sample from (5). To do so, simply consider all other variables fixed, sample ${x}_{i}^{\prime}$ from a conditional proposal distribution $Q\left({x}_{i}^{\prime};{x}_{i}{x}_{1},...,{x}_{i1},{x}_{i+1},...,{x}_{n}\right)$, and then apply the importance ratio test as usual to determine whether to accept the step ${x}_{i}^{\left(k+1\right)}\leftarrow {x}_{i}^{\prime}$ or keep ${x}_{i}^{\left(k+1\right)}={x}_{i}^{\left(k\right)}$.
Sparse factored models
where each ϕ_{ i } is known as a factor and each S_{ i } is a subset of {x_{1}, ..., x_{ n }} known as the domain of the factor ϕ_{ i }. For example, in protein structure prediction factors may include Ramachandran plots relating each pair of dihedral angles (φ, ψ), steric clashes, energy functions defined over atom positions, and prior knowledge from Bfactors or electron density maps.
Probabilistic graphical models like Bayesian networks and Markov random fields are inherently factored: the domain of each factor consists only of a vertex and its neighbors in the graph. A graphical model is sparse if each variable x_{ i } is involved in only a handful of factors (i.e., bounded by a constant unrelated to n), and hence only interacts directly with a few other variables. An important consequence in the discrete case is that probabilistic inference is computationally tractable in sparse models (polynomial in n), whereas inference is intractable in dense models (in general, exponential in n). A key step in our method converts the representation of a kinematic chain from dense to sparse form, as described below. The implementation described in this paper currently supports:

Ramachandran plots ϕ_{RP(r)}(φ, ψ) which vary by residue r.

Steric clashes ϕ_{SC(j, k)}(p_{ j }, p_{ k }) which are 0 if atom j collides with atom k and 1 otherwise.

Bfactors defined as Gaussians ${\varphi}_{BF(j)}({p}_{j})={\scriptscriptstyle \frac{1}{c\sqrt{2\pi {B}_{j}}}}\mathrm{exp}\left({\scriptscriptstyle \frac{\left\right{p}_{j}{\mu}_{j}{}^{2}}{2{B}_{j}{c}^{2}}}\right)$ where µ_{ j }is the predicted atom position and B_{ j }is the Bfactor value in the protein's PDB file. A constant of proportionality c can be set by the user according to his/her confidence in the quality of the Bfactor estimates.

Sidechain rotamer distributions, as described the Side Chain Sampling section.
Each factor can be evaluated quickly, but over thousands or millions of evaluations they accumulate significant computational cost. Significant savings can be achieved in sparse models, because when a few variables are changed, the change in Φ can be calculated quickly by only evaluating those factors involved, rather than recomputing Φ from scratch. Although steric clashes are theoretically considered as O(n^{2}) pairwise factors, in practice we use a gridbased hashing data structure that only checks nearby atoms for collision. As a result, each Gibbs sampling step can be performed in O(1) time.
In future work we are interested in including additional statistical potentials and/or allatom energy function terms in scoring. With a naive implementation, each atom is involved in O(n) pairwise interactions, but we expect to exploit the weakness of distant interactions to reduce the number of factors included in the computation.
Kinematic chain modeling
Consider a jointed kinematic chain with reference frames T_{0}, T_{1}, ..., T_{ N }, connected with relative rotational angles q_{1}, ..., q_{ N }. For a protein backbone, there is a onetoone correspondence between frames and backbone atom positions p_{1}, ..., p_{ M }, and the rotational variables are simply the backbone dihedral angles φ_{1}, ψ_{1}, ..., φ_{N/2}, ψ_{N/2}.
Although it is standard practice and beneficial for certain algorithms to define the system state with a minimal set of coordinates, e.g., x = (T_{0}, q_{1}, ..., q_{ N }), a key step of our method is to consider an expanded state. Minimal coordinates use the fact that each subsequent frame T_{1}, ..., T_{ N } can be determined from x through straightforward forward kinematics, leading to a lower dimensional representation. However, this approach eliminates sparsity in the probabilistic model because a factor defined on T_{ N } will depend on all variables, a factor defined over T_{N  1}will depend on all variables except q_{ n }, and so on. Moreover, if a sampler is asked to generate certain variables from a density defined over T_{1}, ..., T_{ N } (for example, atom positions), the generated distribution may be biased unless it computes the determinant of an N × N metric tensor for each evaluation of Φ. As described below, this is a consequence of nonlinear transformations of distributions (see Appendix). On the other hand, computing determinants takes O(N^{3}) time, which scales poorly with large N.
where ${T}_{j}^{rel}$ is the relative transformation of frame j relative to frame j  1 and R(a, q) is the rotation of angle q about axis a. Fixedendpoint constraints can also be encoded with indicator factors ϕ_{ closure }(T_{0}) and ϕ_{ closure }(T_{ n }) that are zero everywhere except at the fixed frames.
With (7) encoded so that factors contain few variables in their domain, the model becomes sparse. However, we have added the complication of maintaining a valid kinematic structure, because the set of x for which Φ is nonzero lies on a lowerdimensional manifold. Technically speaking, the probability density must be considered with respect to a base measure that assigns finite, nonzero density to the manifold. For 3D chains, the state space has dimensionality 7N but the manifold has dimensionality 6 + N for freeendpoint chains or N  6 for fixedendpoint chains. The next section will describe how we handle these submanifolds in detail.
Block sampling and selection
A block is a subset of variables that are simultaneously sampled. The number of variables in a block must be sufficiently large to give at least one continuous degree of freedom of movement. The MetropolisHastings criterion is used to accept or reject a move because it is unrealistic to sample directly from the block's conditional density. This key subroutine, SampleBlockMH, takes as input the previous sample x^{(k)}and a block B of b consecutive joint angles and their intervening frames. It then samples a candidate move, and accepts it according to the MH criterion. Pseudocode is as follows:
 1.
Using SampleBlock as described below, sample a candidate conformation ${{x}^{\prime}}_{B}$ of B at random, keeping the rest of the chain ${x}_{C}^{\left(k\right)}$ fixed.
 2.Compute the MH acceptance probability$\alpha =\mathrm{min}\left(1,\frac{{\mathrm{\Phi}}_{B}({{x}^{\prime}}_{B}){Q}_{B}({x}_{B}^{(k)}{x}_{C}^{(k)})}{{\mathrm{\Phi}}_{B}({x}_{B}^{(k)}){Q}_{B}({{x}^{\prime}}_{B}{x}_{C}^{(k)})}\right).$
 3.
Accept the move ${x}_{B}^{\left(k+1\right)}\leftarrow {{x}^{\prime}}_{B}$ with probability α.
Here the subscript B denotes the subset of variables in the block, while the subscript C denotes the complement of the block. The score Φ_{ B } calculates the product of factors ϕ_{ i } whose domains S_{ i } overlap with B, which is more efficient than recomputing Φ from scratch. The remaining details of the method  the block size, the block sampling procedure, and calculating the sampling probability Q_{ B }  are described in detail in the remainder of this section. To generate a new conformation x^{(k+1)}of the entire chain, SampleBlockMH is called several times with overlapping blocks incremented sequentially down the chain. Block ordering (e.g. forward, backward, or random order) has no effect on the asymptotic distribution and experiments suggest virtually no noticeable effect apart from the first handful of samples. Thanks to sparsity, each pass is performed in O(N) time, which takes a fraction of a second for chains with hundreds of variables.
How many variables should be included in a block? Standard Gibbs sampling (i.e., b = 1) does not work because loop closure constraints constrain the conditional density of any variable given the rest (5) to a Dirac. Hence, the state would never change. In fact, no mixing occurs for b ≤ 5, except possibly at singular conformations, which occupy a set of measure zero in conformation space and are therefore unlikely to occur naturally. For 6 angles, analytical inverse kinematics (IK) techniques are available to compute solutions for a pair of fixed end frames [9]. In fact, any number from 0 to 16 solutions may exist for a given 6angle problem. Nevertheless, b = 6 is not suitable because it restricts the random walk to only a finite set of conformations.
SampleBlock
 1.
Sample values for the independent subchain at random.
 2.
Attempt to close the chain by calculating an analytical IK solution for the dependent subchain. We use the method of [9].
 3.
If more than one IK solution exists, one is picked at random, and if no solution exists, the process terminates with failure.
Calculation of subloop sampling densities
To calculate the MH importance ratio, we must calculate sampling density for the sampling procedure SampleBlock. Several concepts from differential geometry are required in order to derive this density ${Q}_{B}\left({{x}^{\prime}}_{B}{x}_{C}^{\left(k\right)}\right)$.
where $\frac{\partial f}{\partial y}\left(y\right)$ is the Jacobian of the function f. Here we have also introduced a positive semidefinite weighting matrix W for the purpose of weighting the relative importance of matching the prior along certain axes. In the standard case, W is an identity matrix, but it can also be useful to choose a nonuniform diagonal matrix to account for heterogeneous units (e.g., angle vs. position variables).
A remaining issue is that it is often difficult to explicitly compute the Jacobian of the IK function involved in f. In other words, with z ≡ z(y) denoting the 6 angles in the dependent chain, it is difficult to evaluate ∂ z /∂ y. So, we compute an implicit chart Jacobian by considering the implicit form of the constraints C(x_{ B }) = 0. These vectorvalued constraints state that the difference between the terminal frame of the subchain and the desired frame is zero.
holds as long as $\frac{\partial C}{\partial z}$ is invertible, which is true everywhere except at singular conformations. Each derivative of C in the above expression is a submatrix of the Jacobian and can be computed using standard techniques.
in which I is the identity matrix and all frame derivatives are calculated using the chain rule $\frac{d{T}_{j}}{dy}=\frac{\partial {T}_{j}}{\partial y}+\frac{\partial {T}_{j}}{\partial z}\frac{\partial z}{\partial y}$. These partial derivatives are calculated using standard techniques.
Beyond computing the proper sampling density, it is also important to design the algorithm to efficiently compute the MH acceptance probability. Since clash detection takes 60 times more computation time than calculating the rest of the terms in Φ, we check collisions after determining whether a move will be accepted. Compared to the naive method, this method achieves an order of magnitude speedup.
Extension to other topologies
Although the core method applies to linear closed kinematic chains, it can be extended to handle other molecular topologies, such as freeendpoint chains and sidechains. In theory, polycyclic compounds may also be handled as well. Each new topological structure requires specialized block selection and sampling routines. For example, freeendpoint chains need separate sampling subroutines for the start and end blocks. Standard MC methods are used to do so.
Sidechain deformations are important for shaping binding cavities, and SLIKMC can be adapted to generate sidechain conformations in the same graphical modeling framework. It is known that the sidechain conformation depends on the backbone dihedral angle of the corresponding residue [18]. This requires sampling sidechains after the backbone conformation is sampled. Furthermore, since the distribution of sidechain torsional angles are limited to small number of typical conformations (rotamers) for most residues [19], we sample sidechains according to experimentallydetermined distributions.
Sidechain sampling
For sidechain conformation priors we use the 2010 Backbonedependent Rotamer Library [20]. In this library, each rotameric residue is associated with a list of rotamers which representing the high probability regions for sidechain torsion angles. The probability of a rotamer conformation χ is modeled as a continuous distribution given the backbone dihedral angle pairs. The dihedral angle (φ, ψ) space of each rotameric backbone residue r is discretized into a grid and each cell [a, b] × [c, d] contains its experimentally observed probabilities P (χ  r, a ≤ φ ≤ b, c ≤ ψ ≤ d). Each distribution over χ is specified as a Gaussian mixture model. For nonrotameric residues the terminal χ angles are handled specially due to the asymmetry in their distributions.
where ϕ_{ SC } indicates stericclashes and ϕ_{R(r)}indicates the sidechain conformation prior for residue r. Sidechain Bfactors are typically not included since we want to give enough freedom to explore the conformation space, and our experiments indicate that the flexibility of the protein chain will reduce greatly when we specify Bfactors as prior to both backbone and sidechain atoms.
Extending block sampling to include sidechains requires justifying the importance ratio carefully to ensure unbiased sampling. An efficient sampling procedure is as follows: first compute a closedloop backbone subchain from the blocked Gibbs sampling step and compute its acceptance probability as usual. If accepted, sample each side chain along the block according to its backbonedependent rotameric distribution. Because it is a Gaussian mixture, we can sample from ϕ_{R(r)}directly: pick a Gaussian from the mixture according to its weight and then sample from the Gaussian. Finally, reject the sample if the side chains collide.
Since the first term is simply the importance ratio of the backbone and ϕ_{ SC } is binary, we conclude that the block acceptance probability is either the backbone importance ratio if clashfree or zero if clashing. Hence the sidechain sampling procedure is sound.
Multiplyclosed kinematic loops
It may be possible to extend SLIKMC to handle multiplyclosed loops such as those that occur in polycyclic compounds. This requires special care to divide the structure into blocks that can be split into dependent and independent subchains, such that a conformation of the independent subset completely determines the dependent subset, up to some finite multiplicity. In other words, the independent subchains form a chart of the space of closedchain conformations of the whole block. The union of all blocks must also cover all state variables.
Mixing and autocorrelation
In any MCMC method it is important to empirically examine the mixing rate of the Markov Chain. Firstly, it can potentially take many iterations to "forget" the effects of a poor initialization. For protein sampling, this is not a significant problem because we initialize the chain with the native structure in PDB, which is typically quite good.
Result and discussion
SLIKMC is implemented as an addon to the software package LoopTK [21][22] for protein loop sampling and is available at http://www.iu.edu/~motion/slikmc/. All experiments are run on a Intel i7 2.7 GHz computer with 4 GB RAM. The library currently supports sampling with prior information from Ramachandran plots, steric clashes, and Bfactors, and supports integration with the BackboneDependent Rotamer Library for sidechain sampling. Numerical experiments suggest that SLIKMC generates higher quality samples for large loops with lower computational cost than standard Monte Carlo techniques for openended chains and the RAMP loop completion package [23].
Loop sampling with prior distributions
Missing loop completion
Scalability tests on freeendpoint chains
We compare SLIKMC against a standard MetropolisHastings algorithm that samples backbone angles according to a Gaussian proposal distribution with 1° standard deviation. The target distribution for both methods includes steric clashes, Ramachandran plots, and Bfactors. Note that standard MH has probability zero of sampling a conformation that satisfies terminal endpoint constraints exactly, and is not applicable to closed loops. So, these tests ignore the loop closure constraint altogether.
Simultaneous backbone and sidechain sampling
Conclusion
We propose SLIKMC  a Markov chain Monte Carlo method for sampling closed chains according to specified probability distribution. A probabilistic graphical model (PGM) is proposed to specify the structure preferences. A novel method for sampling subloops is developed to generate statistically unbiased samples of probability densities restricted by loopclosure constraints and mathematical conditions necessary for unbiased sampling is derived. Simulation experiments show that SLIKMC completes large loops (> 10 residues) orders of magnitude faster than standard Monte Carlo and discrete search techniques.
SLIKMC is demonstrated to be applicable to various tasks such as conformation ensemble generation, missing structure construction. For future work we intend to integrate SLIKMC with more complex energy functions, statistical potentials, and machinelearningbased structural function predictors. Another limitation of the technique is that due to the locality of each block adjustment, largemagnitude global motions may take a huge number of iterations to sample, particularly when the motion must cross lowscoring chasms in conformation space. We intend to investigate annealinglike or random restart techniques for overcoming these difficulties, as well as different block choices that allow the algorithm to take larger steps. Finally, we are interested in extending our method to study simultaneous backbone and sidechain flexibility in proteinligand and proteinprotein binding.
Appendix
This appendix presents a fundamental statement about probability densities under a transformation of variables.
for any subset U ⊆ A, where dµ is the mvolume element of M.
where X(u) is the mvolume of the parallelotope spanned by the axes of the coordinate chart f centered at u: $\frac{\partial f}{\partial {u}_{1}}\left(u\right),...\frac{\partial f}{\partial {u}_{m}}\left(u\right)$.
Note that this can be expressed more compactly as det(A^{ T } A) where A is the matrix with v_{1}, ..., v_{ m } as its columns. Hence, $X\left(u\right)=\sqrt{\text{det}G\left(u\right)}$. Finally, substituting g_{ u } in the r.h.s. of (22) gives the desired result.
Declarations
Acknowledgements
The authors thank Predrag Radivojac for valuable discussions that inspired us to start this project and helped clarify our understanding of protein structure and function. This research is partially supported by NSF Grant No. 1218534.
Declarations
The publication costs for this article were funded by Dr. Kris Hauser.
This article has been published as part of BMC Structural Biology Volume 13 Supplement 1, 2013: Selected articles from the Computational Structural Bioinformatics Workshop 2012. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcstructbiol/supplements/13/S1.
Authors’ Affiliations
References
 Fiser A, Do RKG, Šali A: Modeling of loops in protein structures. Protein Science 2000, 9(9):1753–1773. [http://dx.doi.org/10.1110/ps.9.9.1753] 10.1110/ps.9.9.1753PubMed CentralView ArticlePubMedGoogle Scholar
 DePristo MA, Wde Bakker PI, Lovell SC, Blundell TL: Ab Initio Construction of Polypeptide Fragments: Efficient Generation of Accurate, Representative Ensembles. PROTEINS: Structure, Function, and Genetics 2003, 51: 41–55. 10.1002/prot.10285View ArticleGoogle Scholar
 Mandell D, Coutsias E, Kortemme T: Subangstrom accuracy in protein loop reconstruction by roboticsinspired conformational sampling. Nature Methods 2009, 6: 551–552. 10.1038/nmeth0809551PubMed CentralView ArticlePubMedGoogle Scholar
 Canutescu A, Dunbrack R Jr: Cyclic coordinate descent: A robotics algorithm for protein loop closure. Protein Science 2003, 12: 963–972. 10.1110/ps.0242703PubMed CentralView ArticlePubMedGoogle Scholar
 Cortés J, Siméon T, RemaudSiméon M, Tran V: Geometric algorithms for the conformational analysis of long protein loops. Journal of Computational Chemistry 2004, 25(7):956–967. [http://dx.doi.org/10.1002/jcc.20021] 10.1002/jcc.20021View ArticlePubMedGoogle Scholar
 van den Bedem H, Lotan I, Latombe JC, Deacon A: Realspace proteinmodel completion: an inversekinematics approach. Acta Crystallography 2005, 61: 2–13.Google Scholar
 Tosatto SC, Blindewald E, Hesser J, Männer R: A divide and conquer approach to fast loop modeling. Protein Engineering 2002, 15(4):279–286. 10.1093/protein/15.4.279View ArticlePubMedGoogle Scholar
 Samudrala R, Moult J: An Allatom Distancedependent Conditional Probability Discriminatory Function for Protein Structure Prediction. Journal of Molecular Biology 2002, 275: 895–916.View ArticleGoogle Scholar
 Coutsias E, Soek C, Jacobson M, Dill K: A kinematic view of loop closure. J Computational Chemistry 2004, 25: 510–528. 10.1002/jcc.10416View ArticleGoogle Scholar
 Shehu A, Clementi C, Kavraki L: Modeling Protein Conformational Ensembles: From Missing Loops to Equilibrium Fluctuations. Proteins: Structure, Function, and Bioinformatics 2006, 65: 164–179. 10.1002/prot.21060View ArticleGoogle Scholar
 Rathore N, de Pablo JJ: Monte Carlo simulation of proteins through a random walk in energy space. J Chem Phys 2002., 116(7225):Google Scholar
 Li Z, Scheraga HA: Monte Carlominimization approach to the multipleminima problem in protein folding. Proceedings of the National Academy of Sciences 1987, 84(19):6611–6615. [http://www.pnas.org/content/84/19/6611.abstract] 10.1073/pnas.84.19.6611View ArticleGoogle Scholar
 Hansmann UH, Okamoto Y: New Monte Carlo algorithms for protein folding. Current Opinion in Structural Biology 1999, 9(2):177–183. [http://www.sciencedirect.com/science/article/pii/S0959440X99800256] 10.1016/S0959440X(99)800256View ArticlePubMedGoogle Scholar
 Bouzida D, Kumar S, Swendsen RH: Efficient Monte Carlo methods for the computer simulation of biological molecules. Phys Rev A 1992, 45(12):8894–8901. [http://link.aps.org/doi/10.1103/PhysRevA.45.8894] 10.1103/PhysRevA.45.8894View ArticlePubMedGoogle Scholar
 Yanover C, SchuelerFurman O, Weiss Y: Minimizing and Learning Energy Functions for SideChain Prediction. Journal of Computational Biology 2008, 15(7):899–911. 10.1089/cmb.2007.0158View ArticlePubMedGoogle Scholar
 Lasker K, Topf M, Sali A, Wolfson HJ: Inferential Optimization for Simultaneous Fitting of Multiple Components into a CryoEM Map of Their Assembly. Journal of Molecular Biology 2009, 388: 180–194. 10.1016/j.jmb.2009.02.031PubMed CentralView ArticlePubMedGoogle Scholar
 Hastings W: Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika 1970, 57: 97–109. 10.1093/biomet/57.1.97View ArticleGoogle Scholar
 Dunbrack R Jr, Karplus M: Backbonedependent Rotamer Library for Proteins. Application to Sidechain Prediction. J Mol Biol 1987, 193: 775–791. 10.1016/00222836(87)903585View ArticleGoogle Scholar
 Ponder J, Richards F: Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. J Mol Biol 1993, 230: 543–574. 10.1006/jmbi.1993.1170View ArticleGoogle Scholar
 Shapovalov M, Dunbrack R Jr: A smoothed backbonedependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure 2011, 19: 844–858. 10.1016/j.str.2011.03.019PubMed CentralView ArticlePubMedGoogle Scholar
 Dhanik A, Kou C, Marz N, Yao P, Propper R: LoopTK: Protein Loop Kinematic Toolkit.2007. [https://simtk.org/home/looptk]Google Scholar
 Yao P, Dhanik A, Marz N, Propper R, Kou C, Liu G, van den Bedem H, Latombe JC, HalperinLandsberg I, Altman R: Efficient Algorithms to Explore Conformation Spaces of Flexible Protein Loops. Computational Biology and Bioinformatics, IEEE/ACM Transactions on 2008, 5(4):534–545.View ArticleGoogle Scholar
 Hung LH, Ngan SC, Liu T, Samudrala R: PROTINFO: new algorithms for enhanced protein structure predictions. Nucleic Acids Research 2005, 33: W77W80. 10.1093/nar/gki403PubMed CentralView ArticlePubMedGoogle Scholar
 Jamroz M, Kolinski A: Modeling of loops in proteins: a multimethod approach. BMC Structural Biology 2010, 10: 5. 10.1186/14726807105PubMed CentralView ArticlePubMedGoogle Scholar
 Samudrala R, Levitt M: A comprehensive analysis of 40 blind protein structure predictions. BMC Structural Biology 2002, 2: 3. 10.1186/1472680723PubMed CentralView ArticlePubMedGoogle Scholar
 Lovell SC, Davis I, Arendall W III, de Bakker P, Word J, Prisant M, Richardson J, Richardson D: Structure validation by Calpha geometry: phi, psi and Cbeta deviation. Proteins: Structure, Function, and Bioinformatics 2003, 50(Issue 3):437–450.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.