 Research
 Open Access
 Published:
Modeling protein conformational transitions by a combination of coarsegrained normal mode analysis and roboticsinspired methods
BMC Structural Biology volume 13, Article number: S2 (2013)
Abstract
Background
Obtaining atomicscale information about largeamplitude conformational transitions in proteins is a challenging problem for both experimental and computational methods. Such information is, however, important for understanding the mechanisms of interaction of many proteins.
Methods
This paper presents a computationally efficient approach, combining methods originating from robotics and computational biophysics, to model protein conformational transitions. The ability of normal mode analysis to predict directions of collective, largeamplitude motions is applied to bias the conformational exploration performed by a motion planning algorithm. To reduce the dimension of the problem, normal modes are computed for a coarsegrained elastic network model built on short fragments of three residues. Nevertheless, the validity of intermediate conformations is checked using the allatom model, which is accurately reconstructed from the coarsegrained one using closedform inverse kinematics.
Results
Tests on a set of ten proteins demonstrate the ability of the method to model conformational transitions of proteins within a few hours of computing time on a single processor. These results also show that the computing time scales linearly with the protein size, independently of the protein topology. Further experiments on adenylate kinase show that main features of the transition between the open and closed conformations of this protein are well captured in the computed path.
Conclusions
The proposed method enables the simulation of largeamplitude conformational transitions in proteins using very few computational resources. The resulting paths are a first approximation that can directly provide important information on the molecular mechanisms involved in the conformational transition. This approximation can be subsequently refined and analyzed using stateoftheart energy models and molecular modeling methods.
Background
Conformational transitions in proteins are generally related to their capacity to interact with other molecules. Their study is therefore essential for the understanding of protein functions. Unfortunately, it is very difficult to obtain this type of dynamic information at the atomic scale using experimental techniques. Modeling protein conformational transitions with conventional computational methods is also challenging because, in many cases, these transitions are rare, slow events. Standard molecular dynamics (MD) simulations with current computational resources cannot be applied in practice to model largeamplitude (slow timescale) conformational transitions. Such simulations require variants of MD methods that enhance sampling of rare events or that bias the exploration in a given direction (e.g. [1–5]), or, alternatively, to have access to outstanding computational power [6].
Modeling conformational transitions in proteins has motivated the development of specific methods, computationally more efficient than MD simulations. Many of these methods (e.g. [7–9]) are based on the deformation of a trivial initial path between the two given conformations toward the minimum energy path connecting them. Consequently, the performance of these methods is strongly conditioned by the suitability of the initial path. In recent years, methods to model conformational transitions have also been developed on the basis of robot motion planning algorithms [10–13]. Most of these roboticsinspired methods are aimed at providing qualitative information about the conformational transition using few computational resources. For this, they exploit the efficiency of samplingbased exploration algorithms applied to simplified molecular models.
The high dimensionality of the space to be explored is the main difficulty that all computational methods to model protein conformational transitions have to face. Therefore, several approaches have been developed to reduce the dimensionality of the problem (e.g. [14–16]). Normal mode analysis (NMA) [17] is a particularly interesting tool in this regard, since a small number of lowfrequency normal modes provide a good hint of the direction of largeamplitude conformational changes [18–21]. Several recent works apply this property of NMA to improve the performance of conformational exploration methods.
The approach presented in this paper was originally introduced in [22]. The basic principle is to use NMA to bias the conformational exploration performed by a Rapidlyexploring Random Tree (RRT) algorithm [23], aiming to efficiently compute conformational transition paths. The main novelty presented in the present work is the introduction of a multiscale model for the protein. In this model, an elastic network is defined considering only a single node (called a particle) per tripeptide. Motion directions provided by NMA of such a coarsegrained elastic network are then applied to the allatom model for a more accurate conformational exploration. The introduction of this multiscale model has important outcomes. First, the number of normal modes is largely reduced thanks to the use of the coarsegrained model, which significantly reduces the time required to compute them. In addition, generating the allatom model from the coarsegrained model can be accurately and efficiently achieved using methods from robot kinematics [24], avoiding the need of artifacts such as the RTB approach (rotationstranslations of blocks) [25].
Next section presents the overall method, and explains each of its elementary components: elastic network normal mode analysis, tripeptidebased multiscale protein modeling, and motionplanningbased conformational exploration. Then, several types of results aimed to validate the approach and to show its good computational performance are presented for a set of proteins with different sizes and topologies. A more detailed analysis of results is presented for adenylate kinase (ADK). Finally, together with the conclusions, we discuss possible directions for future work. Note that a preliminary version of this work was presented in [26]. Compared to this previous version, this paper includes more detailed explanations of the method, a more exhaustive presentation of results, with additional figures and tables, as well as additional results for the ADK protein. In addition, some movies that illustrate results obtained with the proposed method are included as supplementary material.
Methods
This section presents a new method to model protein conformation transitions. It builds on the combination of several components inside an iterative algorithm. One of these components is NMA performed on a coarsegrained elastic network model of the protein, which enables very fast computation of normal modes. Indeed, a single particle of the elastic network is considered for each group of three consecutive aminoacid residues (i.e. one particle per tripeptide). The allatom model, which is used to accept or reject sampled states during the conformational exploration, is accurately reconstructed from the coarsegrained one using closedform inverse kinematics. The RRT algorithm is applied to explore linear combinations of normal modes computed from intermediate conformations along the path. All these elementary components of the method are further explained below.
Elastic networks and normal mode analysis
Based on a harmonic approximation of the potential energy, normal mode analysis provides information about the directions and frequencies of vibration of a molecule from a minimumenergy conformation. Each mode represents a motion pattern, in which all the atoms move with the same frequency and phase. Lowfrequency normal modes correspond to collective motions (e.g. domain motions), whereas highfrequency normal modes correspond to local fluctuations [19, 27].
Normal modes are calculated by diagonalizing the Hessian matrix of the potential energy of the molecule. For reducing the computational cost of this operation, several works propose to use simplified potentials and coarsegrained models. An extensively used simplified potential is based on the elastic network model (ENM) [28], which represents the molecule as a set of particles connected by virtual springs. All the protein atoms can be considered as particles in the elastic network. However, a coarsegrained representation that only considers C_{ α } atoms (i.e. a single particle per aminoacid residue) is often applied [19, 20]. Moreover, particles are connected by virtual springs only if they are closer than a userdefined cutoff distance d_{ cut }.
The potential energy function of such an elastic network takes the following form:
where d_{ ij } is the distance between particle i and particle j, ${d}_{ij}^{0}$ is the distance between the two particles at the equilibrium state and C is the elastic constant. This type of simplified potential has been used in many works and for very different applications [29–32].
In this work, we investigate a further simplification of the ENM. Indeed, the ENM is built using a coarser model based on tripeptides, instead of using C_{ α } atoms. Figure 1 illustrates the approach. Note that coarsegrained NMA approaches considering more than one residue per particle have already been proposed [25, 33, 34]. However, these approaches, which are mainly devised to analyze motions of very large systems made of protein assemblies, consider rigidbody motions of groups of residues. In contrast, the approach presented here preserves full flexibility of the protein, which leads to a more accurate simulation of conformational transitions.
Several works show that using a simplified ENM does not necessarily imply a loss of accuracy in the prediction of largeamplitude motion directions [20, 25]. However, it certainly leads to a computational performance gain. This issue is further discussed in the results section, where the performance of NMA using tripeptidebased models and Cαbased models is compared.
The anisotropic network model (ANM) approach, as described in [27, 35], is adopted in this work to construct the Hessian matrix from the positions of the particles of the tripeptidebased model. Each 3 × 3 submatrix corresponding to the interaction between two particles is computed as follows:
If the distance between particles i and j is more than the cutoff distance d_{ cut }, then the whole 3 × 3 matrix is replaced by zeros. The Hessian matrix is then diagonalized to compute the eigenvalues and eigenvectors. Each eigenvalue and eigenvector pair corresponds to one normal mode, where the eigenvalue defines the mode frequency and the eigenvector defines the motion direction for each particle in the elastic network.
Multiscale model
Tripeptidebased model
The multiscale modeling approach applied in this work is based on a decomposition of the protein chain into fragments of three amino acid residues, which we refer to as tripeptides. The reason for choosing such a subdivision is that, assuming fixed bond lengths, bond angles and peptide bond torsions, the backbone of a tripeptide involves 6 degrees of freedom (three pairs of angles φ, ψ), and thus, an analogy can be made with a 6R mechanism like a robotic manipulator [24]. Two Cartesian reference frames attached to the N atom in the backbone of the first residue and to the C atom in the last residue define respectively the baseframe and the endframe of the 6R mechanism. Since tripeptides are linked through rigid peptide bonds, the location of the endframe of tripeptide i can be determined from the baseframe of tripeptide i + 1 by a constant 3D transformation. Given the location of the baseframe and the endframe, the conformation of a tripeptide backbone can be obtained by inverse kinematics. Consequently, the conformation of the whole protein backbone can be determined from the pose of a single reference frame attached to each tripeptide (this is true for all the protein backbone except two short fragments at the Nterminal and Cterminal ends of the chain, which require a particular treatment). In the following, we will refer to these reference frames as (oriented) particles. They are the particles in the coarsegrained ENM.
Reconstructing the allatom model
The interest of the decomposition of the protein into tripeptides explained above is that closedform inverse kinematics (IK) can be applied to reconstruct the allatom protein model from the coordinates of the particles. The IK solver applied in this work has been adapted from the method developed by Renaud [36]. This solver is based on algebraic elimination theory, and develops an adhoc resultant formulation inspired by the work of Lie and Liang [37]. Starting from a system of equations representing the IK problem, the elimination procedure leads to an 8by8 quadratic polynomial matrix in one variable. The problem can then be treated as a generalized eigenvalue problem, as proposed in [38], for which efficient and robust methods such as the Schur factorization can be applied. Note however that our approach is not dependent on this solver, so that other IK methods (e.g. [38, 39]) could be applied.
In general, the IK problem for a 6R serial kinematic chain has a finite number of solutions (up to 16 in the most general case). All the solutions correspond to geometrically valid conformations of the tripeptide backbone with fixed ends defined by the pose of the particles. However, when the goal is to simulate continuous motions, the closest conformation to the previous one (i.e. the one before a perturbation applied to the particles) has to be selected in order to avoid jumps in the conformational space. All IK solutions are rejected if none of them remains within a distance threshold that depends on the perturbation stepsize.
The explanations above concern only the reconstruction of the allatom model of the protein backbone from the coarsegrained tripeptidebased model. Sidechains are treated separately, using a simple method based on energy minimization as explained below.
Path finding algorithm
The path finding method works by iteratively generating short portions of the transition between two given conformations of a protein, which we will refer to as q_{ init } and q_{ goal }. Algorithm 1 presents the pseudocode with the main steps of the method. At each iteration, normal modes are computed for a root conformation q_{ root }. Note that q_{ root } = q_{ init } for the first iteration. Then, the RRT algorithm is applied to explore motions corresponding to linear combinations of normal modes. RRT is run until the protein moves a predefined distance toward the target conformation q_{ goal }. The conformational exploration performed by the RRT algorithm is further explained below. Once the RRT exploration is stopped, the closest node q_{ close } in the tree to q_{ goal } is searched. The path between q_{ root } and q_{ close } is then extracted and saved. All the conformations in this path are guaranteed to have a collisionfree backbone (including C_{ β } atoms) which generally implies getting acceptable energy values after a short minimization to rearrange sidechain conformations. Such an energy minimization procedure is performed on q_{ close }, which will be the root conformation in the next iteration. The algorithm keeps iterating until a predefined distance d_{ target }to q_{ goal } is reached. The resulting path is defined by the sequence of minimized conformations q_{ close } at each iteration. If a finergrained path is required, other intermediate conformation can be extracted from the subpaths computed at each iteration. These conformations may require energy minimization to rearrange sidechains, as it is done for q_{ close }.
Algorithm 1: COMPUTE _PATHWAY
input : Initial conformation q_{ init }, final conformation q_{ goal }, minimum distance to target d_{ target }
output : The transition path p
begin
q_{ root } ← q_{ init };
while RMSD(q_{ root }, q_{ goal }) >d_{ target } do
a ← COMPUTE _NORMAL _MODES(q_{ root });
t ← BUILD _RRT(q_{ root }, q_{ goal }, a);
q_{ close } ← CLOSEST _TO _TARGET(t, q_{ goal });
q_{ root } ← MINIMIZE(q_{ close });
p ← CONCATENATE(p, q_{ root });
end
Algorithm 2: BUILD _RRT
input : Initial conformation q_{ root }, final conformation q_{ goal }, normal modes a
output : The tree t
begin
t ← INIT _TREE(q_{ root });
while not STOP _CONDITION(t, q_{ goal }) do
q_{ rand } ← SAMPLE(t, a);
q_{ near } ← BEST _NEIGHBOR(t, q_{ rand });
q_{ new } ← EXPAND _TREE(q_{ near }, q_{ rand });
if IS VALID (q_{ new }) then
ADD _NEW _NODE(t, q_{ new });
ADD _NEW _EDGE(t, q_{ near }, q_{ new });
end
Implementation details
The RRT algorithm, iteratively applied in Algorithm 1, performs the same steps as the basic RRT [23]. The steps are sketched in Algorithm 2. At each iteration, a conformation q_{ rand } is randomly sampled. Note that q_{ rand } is not required to be a feasible conformation. Then, the tree is searched for the closest conformation to q_{ rand }, called q_{ near } . A new conformation, q_{ new } , is generated by moving from q_{ near } towards q_{ rand } with a predefined short step size. The new conformation is added to the tree if it does not violate feasibility constraints, which in the present work are limited to geometric constrains related to no atom overlapping and no bond breaking. The difference with respect to the basic RRT algorithm concerns the implementation of the methods for sampling conformations, searching the nearest neighbor, and expanding the tree. These methods, which are further explained below, are specific to the present framework because of the multiscale protein model and the application of NMA to bias the exploration.
Sampling random conformations
The idea is to randomly sample conformations q_{ rand } using information given by the normal modes. The coarsegrained tripeptidebased model is used at this level. Hence, q_{ rand } is not an allatom conformation, but an array of particle positions. Random particle positions are generated by moving them from their initial positions, defined by q_{ root }, using a linear combination of normal modes with randomly sampled weights. More precisely:

A sequence of 3n random weights w_{ j } are sampled in the range [1, 1], where n is the number of particles, being 3n the number of normal modes (actually, the number of normal modes is 3n − 6, since 6 degrees of freedom correspond to rigidbody motions of the whole set of particles).

The new positions of the n particles are computed by a linear combination of all the randomly weighted modes as follows:
$${q}_{rand}={q}_{root}+{\displaystyle \sum _{3n}^{}}f*{w}_{j}*{a}_{j}$$
where a_{ j } refers to each normal mode, and f is an amplification factor used to push the sampled conformation away from q_{ root } (this factor is the same for all the normal modes). Note that, since the normal modes are not normalized, low frequency modes have larger norm. Thus, they contribute more significantly in the sum.
Finding nearest neighbors
Nearest neighbor search is also performed using the coarsegrained model. Indeed, the computed distance is based on the root mean squared deviation (RMSD) of the particle positions. In the current implementation, the distance is biased to pull the exploration towards the target conformation as follows:
In this work, we have implemented a simple bruteforce algorithm to find q_{ near }. More sophisticated nearest neighbor search algorithms could be used to reduce the number of performed distance computations. Note, however, that currently used algorithms based on space partitioning techniques (e.g. kdtrees) do not perform well in highdimensional spaces [40]. A computationally efficient solution would require the implementation of an approximate nearest neighbor search algorithm.
Generating new conformations
For generating q_{ new }, all particle positions in q_{ near } are linearly interpolated towards q_{ rand } with a predefined step size k. Given these new particle positions, the allatom model corresponding to q_{ new } is obtained by solving an IK problem for every tripeptide. The implemented method proceeds iteratively. If no IK solution is found for a tripeptide t_{ i }(the tripeptide between particles p_{ i } and p_{i+1}) or if the solution involves atom collisions, the pose (position and orientation) of particle p_{i+1}is slightly perturbed and the IK problem is solved again. This process is repeated until a collisionfree IK solution is found or a maximum number of trials is reached. If this process fails to find a collisionfree IK solution for any tripeptide, failure is reported and the RRT algorithm goes back to the random sampling step.
Once the treatment of all tripeptides has been completed, the conformation of the two terminal fragments is generated. For this, the pose of these fragments is updated with respect to the new poses of the first and last tripeptides. Random perturbations can be applied to these end fragments in order to remove possible collisions with the rest of the protein.
Protein conformations q_{ new } generated using the aforementioned process are guaranteed to satisfy geometric constraints: correct bond geometry and no overlap betweew backbone atoms. However, in order to speedup computations, sidechains are not treated at this stage (only C_{ β } atoms are considered for collision avoidance). This is because sidechains are known to be very flexible, and resolving possible collisions along the conformational transition path can be done in a postprocessing stage. Indeed, sidechain collisions are resolved during the minimization step at the end of each short RRT execution.
Results and discussion
This section discusses several experiments aimed to validate the proposed method and to evaluate its performance. First, the question concerning the accuracy of the tripeptidebased elastic network model is addressed. Then, results are presented on conformational transitions computed for a set of ten proteins with different sizes and topologies. Finally, further results on adenylate kinase are presented and compared to available data on the transition between the open and closed forms of this protein.
Validating the coarsegrained ENM
Previous works (e.g. [19, 20]) have shown that simple ENMs built using C_{ α } atoms perform as well as ENMs built using the allatom model when studying the dynamic properties of proteins with NMA. Here, we compare the performance of the proposed tripeptidebased model with the C_{ α }based model for predicting directions of conformational transitions. A set of seven proteins listed in Table 1 was used for this comparison. These proteins were also used in related work [20] for the validation of the C_{ α }based ENM.
For evaluating the capability of normal modes to predict directions of conformational transitions, we use the notion of overlap as proposed in related work [20]. The overlap I_{ j } between a normal mode j and an experimentally observed conformational change between two conformations (open and closed) q^{o}and q^{c}is defined as a measure of similarity between the conformational change and the direction given by the normal mode j. It can be computed as follows:
where $\text{\Delta}{q}_{i}={q}_{i}^{o}{q}_{i}^{c}$ measures the difference between the particle coordinates in conformations q^{o}and q^{c}, a_{ ij } corresponds to the i^{th} coordinate of the normal mode j, and n is the number of particles. A value of 1 for the overlap means that the direction given by the normal mode matches exactly the conformational change, whereas a value around 0.2 or less means that the normal mode is unable to provide any meaningful prediction.
Before conducting the comparative analysis, we need to determine an optimal cutoff distance for the tripeptidebased ENM. A good cutoff distance should create an elastic network that correctly captures the topology of the protein. For C_{ α }based models, 8 Å is generally used, since this cutoff distance has been empirically shown to provide the best results in most cases. It can be intuitively inferred that the same cutoff distance may not be the optimal choice in our case, because distances between particles of the tripeptidebased model are larger than distances between C_{ α } atoms. Moreover, defining the optimal cutoff value theoretically is not straightforward. Therefore, we have measured and compared the overlap values for the seven proteins with cutoff distances between 8 and 34 Å in order to empirically determine the most suitable range of cutoff values. Figure 2 shows the overlap value for each cutoff distance averaged over the seven proteins. Note that, for each protein, overlap values were computed for all the normal modes, and the best value was considered for the average. As clearly shown in the figure, the best overlap values are for cutoff distances of 15, 16 and 17 Å.
The tripeptidebased ENMs for four of the proteins in Table 1, using a cutoff distance of 16 Å, are represented in Figure 3. The figure shows that the main topological features of the proteins appear in the coarsegrained model.
Table 2 compares overlap values of tripeptidebased ENMs using a cutoff distance of 16 Å with those presented in [20] for C_{ α }based ENM using a cutoff distance of 8 Å. In the table, columns labeled "Open" correspond to the opentoclosed conformation and columns labeled "Closed" are for the opposite case. The similar overlap values show that the coarsegarined, tripeptidebased ENM is also able to capture the topological information required to compute normal modes that correctly predict directions of largeamplitude motions. Importantly, such a similar performance in terms of overlap is obtained with much less computational cost. Since the computational complexity of the Hessian matrix diagonalization is $\mathcal{O}\left({n}^{3}\right)$, the reduction of n by a factor 3 (a tripeptide involves 3 C_{ α } atoms) provides a theoretical gain of more than one order of magnitude. This theoretical gain has been confirmed with some experiments. In summary, the time required to compute the normal modes with the tripetidebased model ranges from 0.05 seconds to 0.9 seconds, whereas several minutes may be necessary using the C_{ α } model.
Finding conformational transitions
Experimental setup
The proposed method was applied to compute conformational transition paths for the ten proteins listed in Table 3, and represented in Figure 4. For each protein, at least two experimental structures corresponding to different conformations are available in the Protein Data Bank (PDB) [41]. The difference between these conformations involves largeamplitude domain motions. The ten proteins are varied in size and topology, as well as in the type of domain motions they undergo. This heterogeneity is important to analyze the reliability and scalability of the method.
Each iteration of the algorithm that computes the transition path performs a short RRT exploration, as mentioned in the previous section. In the current implementation, such a local exploration runs until the protein moves 0.3 Å C_{ α }RMSD towards the goal. This distance is gradually reduced to 0.15 Å as the distance to the target conformation decreases. The reason is that the speed of convergence tends to decrease when approaching the target conformation, and recomputing normal modes more frequently provides better results in this situation. If the distance stopping condition is not reached first, the exploration stops after a predefined number of iterations (4000 in our case). This additional stopping condition prevents too long runs of RRT in case of blocking situations.
At the end of the RRT exploration, the closest conformation to the goal is identified and submitted to an energy minimization procedure aimed at generating better sidechain conformations. In this work, we have used the AMBER software package [42] for energy minimization.
Results
Table 4 summarizes the results achieved by the proposed method for the set of ten proteins. In this table, CαRMSD_{ end } is the distance between the goal conformation and the conformation obtained at the end of the iterative path finding process. Time_{ total } is overall computing time, which includes the RRT running time (TimeRRT ) and the time for computing the normal modes and running minimizations at the end of each iteration. The number of iterations of the main algorithm (i.e. the number of NMA calculations) is also indicated in the table. Note that, in all the experiments, the RRT exploration takes more than 90% of the total computing time, which corresponds to runs on a single core of an AMD Opteron 148 processor at 2.6 GHz.
In all cases, the method was able to compute the conformational transition, reaching conformations very close to the given goal conformations. Figure 5 shows superimposed structures (structure superimpositions and images have been done using PyMOL [43]) of open and closed forms of the proteins (q_{ init } and q_{ goal }), and of the closed form and the last conformation of the computed transition path (q_{ goal } and q_{ final }). The distances between the final and goal conformations are below 2 Å (measured using C_{ α }RMSD) for all the tested proteins with the exception of DDT and GroEL. Note that 2 Å RMSD corresponds to the current accuracy of experimental methods for highresolution protein structure determination. As can be seen in Figure 5 the superimpositions of the final and goal conformations is very good, even for DDT and GroEL. Note that the method could have reached closer conformations to the goal with a higher number of iterations. Nevertheless, the strategy applied in these experiments was to stop iterating when the distance to the goal reached a very slow rate of convergence.
We also conducted experiments to analyze the relationship between the computing time and the size of the protein. Since the lengths of the transition paths for the different test systems is variable, we measured the computing time to move 1Å along these paths. The results of these experiments, presented in Table 5 and Figure 6, show a linear relationship between the computing time and the protein size. This scalability is an interesting property of the method. Note that the performance of the method seems not be (or only slightly) affected by the topology of the protein. This is an important advantage over the method presented in [22], which experienced some difficulties in dealing with relative motions of domains connected by several linkers, mainly because of the internalcoordinate representation of proteins used in this previous work.
Finally, we did a profiling of the algorithm to identify possible bottlenecks and points to be improved to enhance computational efficiency. Table 6 gives values of the percentage of the time spent in the most timeconsuming operations within the RRT exploration: nearest neighbor search (NN), collision checking (CC), inverse kinematics (IK) and random sampling (RS). Surprisingly, nearest neighbor search takes around 60% of the overall computing time. This is due to the bruteforce algorithm applied in the current implementation. As mentioned before, a more sophisticated nearest neighbor algorithm should be implemented. The performance of the method could also be enhanced by applying simplified distance metrics (e.g. [16, 44]). The use of an appropriate simplified distance metric could reduce computing time while preserving good exploration properties of the algorithm.
A closer look at adenylate kinase
Adenylate kinase (ADK) [45] is a widely studied protein involved in signal transduction. The structure of ADK is composed of three domains known as: LID, CORE and NMPbind. Several works tend to show that the LID and NMPbind domains undergo largeamplitude conformational changes with respect to the CORE domain, which remains stable [46, 47]. Some of these works (e.g. [47]) also suggest that the conformational transition between open and closed states of ADK proceeds in two steps: (1) the LID domain moves more clearly than the NMPbind domain at the beginning of the opentoclose transition; (2) then NMPbind domain moves at a faster pace towards the end of the transition path.
The open conformation of ADK (PDB ID 4AKE), the closed conformations (PDB ID 1AKE) of ADK, and several intermediate conformations obtained with our method are represented in Figure 7 The figure shows significant conformational changes of the LID and NMPbind domains, as expected. The motion of these two regions is also illustrated in Figure 8, which represents the displacement of the residues along the conformational transition. Two darker regions, involving residues 2060 and 130160, indicate the parts of the protein that undergo larger displacements. These regions correspond to the NMPbind domain and LID domain, approximately. Figure 8 also shows that residues 2060, corresponding to the NMPbind domain, start moving more significantly near the end of the transition path, whereas residues 130160, corresponding to the LID domain, start moving at an earlier stage. This reflects the twostep nature of the conformational transition discussed earlier, and shows that our method provides results that are qualitatively comparable with those presented in previous work on ADK.
The open conformation of ADK (PDB IDs 4AKE), the closed conformations (PDB IDs 1AKE) of ADK, and several intermediate conformations obtained with our method are represented in Figure 7 The figure shows significant conformational changes of the LID and NMPbind domains, as expected. The motion of these two regions is also illustrated in Figure 8, which represents the displacement of the residues along the conformational transition. Two darker regions, involving residues 2060 and 130160, indicate the parts of the protein that undergo larger displacements. These regions correspond to the NMPbind domain and LID domain, approximately. Figure 8 also shows that residues 2060, corresponding to the NMPbind domain, start moving more significantly near the end of the transition path, whereas residues 130160, corresponding to the LID domain, start moving at an earlier stage. This reflects the twostep nature of the conformational transition discussed earlier, and shows that our method provides results that are qualitatively comparable with those presented in previous work on ADK.
We have also compared intermediate conformations in the computed transition path of the ADK to a small number of other experimentally solved structures of this protein. These structures correspond to homolog proteins or mutants with very high sequence identity, and some of them are known to be intermediate structures between open and closed forms of the protein. Interestingly, four of these structures are very close to conformations along the transition path. Table 7 shows the distance between each of these structures and the closest conformation in the transition path. The table also shows the position of this conformation in the path. More precisely, the table shows the corresponding iteration number and the percentage of the path length. 2RH5 (A) is very close to the conformation generated by the first iteration, whereas 1E4Y (A) is close to the conformation generated by iteration 27 (near the closed structure). 1DVR (A) is also very close to a conformation toward the beginning of the path (near the open structure), whereas 2RH5 (B) is a slightly less open structure. These results are comparable to those provided by previous studies [12, 48], which further validates the proposed method.
Conclusions
This paper has presented an efficient approach for computing largeamplitude conformational transitions in proteins. It exploits the ability of normal modes to predict directions of collective, largeamplitude motions and the efficiency of the RRT algorithm to explore large spaces. The proposed approach also relies on a multiscale representation of the protein, based on a decomposition into tripeptides, which significantly contributes to the good performance of the method.
Interestingly, first results presented in the paper show that using an ENM based on the coarsegrained tripeptidebased model instead of a Cαbased model preserves the ability of NMA to predict directions of largeamplitude motions, while significantly reducing computing time.
The proposed method was applied to simulate largeamplitude conformational transitions in proteins of different sizes and topologies. Results show a good performance of the method in all the cases. Computing time scales linearly with the number of residues. It ranges from a few hours for mediumsize proteins to a few days for very large ones. This computational performance could be significantly improved by the implementation of more sophisticated methods to perform the most timeconsuming operations within the RRT algorithm, in particular, nearest neighbor search.
A deeper analysis of the conformational transition between open and closed forms of ADK shows that results provided by the proposed method are qualitatively consistent with results obtained with other computational methods and with experimental data. Nevertheless, it is important to note that the resulting paths are a first approximation, which cannot be used directly for an accurate evaluation of energy variations along conformational transitions. This would require a subsequent refinement and analysis using stateoftheart energy models and molecular modeling methods. It could also be possible to integrate energy evaluations within the RRT exploration with the aim of obtaining betterquality solutions, at the expense of additional computational cost. An interesting extension that could be investigated is to use TRRT [49, 50], instead of RRT, to compute paths that follow more accurately the valleys of the conformational energy landscape.
In this work, we have shown the ability of the proposed method to compute transition paths between two given conformations of a protein. Nevertheless, the approach could also be applied to more challenging problems, such as the prediction of other (meta)stable states reachable from a given protein conformation, or the discrimination between probable and improbable transitions. This would require some extensions, mainly in the definition of energy/scoring functions to identify interesting intermediate and metastable states, as well as highenergy barriers, during the conformational exploration.
Additional files
Abbreviations
 MD:

molecular dynamics
 NMA:

normal mode analysis
 RRT:

rapidlyexploring random tree
 RTB:

rotationstranslations of blocks
 ENM:

elastic network model
 ANM:

anisotropic network model
 IK:

inverse kinematics
 RMSD:

root mean squared deviation
 PDB:

Protein Data Bank.
References
 1.
Voter AF: A method for accelerating the molecular dynamics simulation of infrequent events. J Chem Phys 1997, 106(11):4665–4677. 10.1063/1.473503
 2.
Izrailev S, Stepaniants S, Isralewitz B, Kosztin D, Lu H, Molnar F, Wriggers W, Schulten K: Steered molecular dynamics In Computational Molecular Dynamics: Challenges, Methods, Ideas. SpringerVerlag; 1998:39–65.
 3.
Sørensen RR, Voter AF: Temperatureaccelerated dynamics for simulation of infrequent events. J Comput Phys 2000, 112: 9599–9606.
 4.
Laio A, Parrinello M: Escaping freeenergy minima. Proc Natl Acad Sci USA 2002, 99(20):12562–12566. 10.1073/pnas.202427399
 5.
Hamelberg D, Morgan J, McCammon JA: Accelerated molecular dynamics: a promising and efficient simulation method for biomolecules. J Chem Phys 2004, 120: 11919–11929. 10.1063/1.1755656
 6.
Shaw DE, Maragakis P, LindorffLarsen K, Piana S, Dror RO, Eastwood MP, Bank JA, Jumper JM, Salmon JK, Shan Y, Wriggers W: Atomiclevel characterization of the structural dynamics of proteins. Science 2010, 330(6002):341–346. 10.1126/science.1187409
 7.
Mills G, Jónsson H: Quantum and thermal effects in H2 dissociative adsorption: Evaluation of free energy barriers in multidimensional quantum systems. Phys Rev Lett 1994, 72: 1124–1127. 10.1103/PhysRevLett.72.1124
 8.
E W, Ren W, VandenEijnden E: String method for the study of rare events. Phys Rev B. 2002, 66: 052301.
 9.
Bolhuis PG, Chandler D, Dellago C, Geissler PL: Transition path sampling and the calculation of rate constants. Annu Rev Phys Chem. 2002, 53: 291–318. 10.1146/annurev.physchem.53.082301.113146
 10.
Cortés J, Siméon T, Ruiz de Angulo V, Guieysse D, RemaudSiméon M, Tran V: A path planning approach for computing largeamplitude motions of flexible molecules. Bioinformatics 2005, 21(suppl 1):i116i125. 10.1093/bioinformatics/bti1017
 11.
Raveh B, Enosh A, SchuelerFurman O, Halperin D: Rapid sampling of molecular motions with prior information constraints. PLoS Comput Biol. 2009, 5(2):e1000295. 10.1371/journal.pcbi.1000295
 12.
Haspel N, Moll M, Baker M, Chiu W, Kavraki LE: Tracing conformational changes in proteins. BMC Struct Biol. 2010, 10(Suppl 1):S1. 10.1186/1472680710S1S1
 13.
AlBluwi I, Siméon T, Cortés J: Motion planning algorithms for molecular simulations: a survey. Comput Sci Rev. 2012, 6(4):125–143. 10.1016/j.cosrev.2012.07.002
 14.
Kim MK, Jernigan RL, Chirikjian GS: Rigidcluster models of conformational transitions in macromolecular machines and assemblies. Biophys J 2005, 89: 43–55. 10.1529/biophysj.104.044347
 15.
Thomas S, Tang X, Tapia L, Amato NM: Simulating protein motions with rigidity analysis. J Comput Biol. 2007, 14(6):839–855. 10.1089/cmb.2007.R019
 16.
Plaku E, Stamati H, Clementi C, Kavraki L: Fast and reliable analysis of molecular motion using proximity relations and dimensionality reduction. Proteins 2007, 67(4):897–907. 10.1002/prot.21337
 17.
Cui Q, Bahar I: Normal mode analysis: theory and applications to biological and chemical systems. Chapman and Hall/CRC mathematical and computational biology series, Chapman & Hall/CRC; 2006.
 18.
Brooks B, Karplus M: Normal modes for specific motions of macromolecules: application to the hingebending mode of lysozyme. Proc Natl Acad Sci USA 1985, 82(15):4995–4999. 10.1073/pnas.82.15.4995
 19.
Hinsen K: Analysis of domain motions by approximate normal mode calculations. Proteins 1998, 33(3):417–429. 10.1002/(SICI)10970134(19981115)33:3<417::AIDPROT10>3.0.CO;28
 20.
Tama F, Sanejouand YH: Conformational change of proteins arising from normal mode calculations. Protein Eng 2001, 14: 1–6. 10.1093/protein/14.1.1
 21.
Alexandrov V, Lehnert U, Echols N, Milburn D, Engelman D, Gerstein M: Normal modes for predicting protein motions: a comprehensive database assessment and associated Web tool. Protein Sci 2005, 14(3):633–643. 10.1110/ps.04882105
 22.
Kirillova S, Cortés J, Stefaniu A, Siméon T: An NMAguided path planning approach for computing largeamplitude conformational changes in proteins. Proteins 2008, 70: 131–143.
 23.
LaValle SM, Kuffner JJ: Rapidlyexploring random trees: progress and prospects. In Algorithmic and Computational Robotics: New Directions. Edited by: Donald B, Lynch K, Rus D, Boston. A.K Peters; 2001:293–308.
 24.
Siciliano B, Khatib O: Springer Handbook of Robotics. Springer; 2008.
 25.
Tama F, Gadea FX, Marques O, Sanejouand YH: Buildingblock approach for determining lowfrequency normal modes of macromolecules. Proteins 2000, 41: 1–7. 10.1002/10970134(2000)41:4+<1::AIDPROT10>3.0.CO;22
 26.
AlBluwi I, Vaisset M, Siméon T, Cortés J: Coarsegrained elastic networks, normal mode analysis and roboticsinspired methods for modeling protein conformational transitions. Bioinformatics and Biomedicine Workshops (BIBMW), 2012 IEEE International Conference on: 4–7 October 2012 2012, 40–47. 10.1109/BIBMW.2012.6470359
 27.
Atilgan AR, Durell SR, Jernigan RL, Demirel MC, Keskin O, Bahar I: Anisotropy of fluctuation dynamics of proteins with an elastic network model. Biophys J. 2001, 80: 505–515. 10.1016/S00063495(01)76033X
 28.
Tirion MM: Large amplitude elastic motions in proteins from a singleparameter, atomic analysis. Phys Rev Lett 1996, 77(9):1905–1908. 10.1103/PhysRevLett.77.1905
 29.
Kim MK, Jernigan RL, Chirikjian GS: Efficient generation of feasible pathways for protein conformational transitions. Biophys J 2002, 83(3):1620–1630. 10.1016/S00063495(02)739313
 30.
Tama F, Miyashita O, Brooks CL III: Normal mode based flexible fitting of highresolution structure into lowresolution experimental data from cryoEM. J Struc Biol 2004, 147(3):315–326. 10.1016/j.jsb.2004.03.002
 31.
Cavasotto CN, Kovacs JA, Abagyan RA: Representing receptor flexibility in ligand docking through relevant normal modes. J Am Chem Soc 2005, 127(26):9632–9640. 10.1021/ja042260c
 32.
Mouawad L, Perahia D: Motions in hemoglobin studied by normal mode analysis and energy minimization: evidence for the existence of tertiary Tlike, quaternary Rlike intermediate structures. J Mol Biol 1996, 258(2):393–410. 10.1006/jmbi.1996.0257
 33.
Schuyler AD, Chirikjian GS: Efficient determination of lowfrequency normal modes of large protein structures by clusterNMA. J Mol Graph Model 2005, 24: 46–58. 10.1016/j.jmgm.2005.05.002
 34.
Demerdash ONA, Mitchell JC: Densitycluster NMA: a new protein decomposition technique for coarsegrained normal mode analysis. Proteins 2012, 80(7):1766–1779.
 35.
Eyal E, Yang LW, Bahar I: Anisotropic network model: systematic evaluation and a new web interface. Bioinformatics 2006, 22(21):2619–2627. 10.1093/bioinformatics/btl448
 36.
Renaud M: A simplified inverse kinematic model calculation method for all 6R type manipulators. In Current Advances in Mechanical Design and Production VII. Edited by: Hassan MF, Megahed SM. New York: Pergamon; 2000:57–66.
 37.
Lee HY, Liang CG: A new vector theory for the analysis of spatial mechanisms. Mech Mach Theory 1988, 23(3):209–217. 10.1016/0094114X(88)901061
 38.
Manocha D, Canny JF: Efficient inverse kinematics for general 6R manipulators. IEEE Trans Robot Autom. 1994, 10(5):648–657. 10.1109/70.326569
 39.
Coutsias EA, Seok C, Jacobson MP, Dill KA: A kinematic view of loop closure. J Comput Chem. 2004, 25(4):510–528. 10.1002/jcc.10416
 40.
Plaku E, Kavraki LE: Quantitative analysis of nearestneighbors search in highdimensional samplingbased motion planning. In Algorithmic Foundations of Robotics VII. Edited by: Akella S, Amato NM, Huang WH, Mishra B,. Berlin: SpringerVerlag; 2008:3–18.
 41.
Research Collaboratory for Structural Bioinformatics PDB[http://www.rcsb.org/pdb/]
 42.
Case DA, Darden TA, Cheatham TE III, Simmerling CL, Wang J, Duke RE, Luo R, Merz KM, Pearlman DA, Crowley M, et al.: AMBER 9. San Francisco: University of California; 2006.
 43.
The PyMOL Molecular Graphics System, Version 1.5, Schrödinger, LLC
 44.
Shehu A, Olson B: Guiding the search for nativelike protein conformations with an abinitio treebased exploration. Int J Robot Res 2010, 29(8):1106–1127. 10.1177/0278364910371527
 45.
Müller CW, Schulz GE: Structure of the complex between adenylate kinase from Escherichia coli and the inhibitor Ap5A refined at 1.9 Å resolution. A model for a catalytic transition state. J Mol Biol 1992, 224: 159–177. 10.1016/00222836(92)905825
 46.
Müller CW, Schlauderer GJ, Reinstein J, Schulz GE: Adenylate kinase motions during catalysis: an energetic counterweight balancing substrate binding. Structure 1996, 4(2):147–156. 10.1016/S09692126(96)000184
 47.
Maragakis P, Karplus M: Large amplitude conformational change in proteins explored with a plastic network model: adenylate kinase. J Mol Biol 2005, 352: 807–822. 10.1016/j.jmb.2005.07.031
 48.
Feng Y, Yang L, Kloczkowski A, Jernigan RL: The energy profiles of atomic conformational transition intermediates of adenylate kinase. Proteins 2009, 77(3):551–558. 10.1002/prot.22467
 49.
Jaillet L, Cortés J, Siméon T: Samplingbased path planning on configurationspace costmaps. IEEE Trans Robot. 2010, 26(4):635–646.
 50.
Jaillet L, Corcho FJ, Pérez JJ, Cortés J: Randomized tree construction algorithm to explore energy landscapes. J Comput Chem. 2011, 32(16):3464–3474. 10.1002/jcc.21931
Acknowledgements
This work has been partially supported by the French National Research Agency (ANR) under project ProtiCAD (project number ANR12MONU001501).
Declarations
The publication costs for this article were funded by the French National Research Agency (ANR) underproject ProtiCAD (project number ANR12MONU001501).
This article has been published as part of BMC Structural Biology Volume 13 Supplement 1, 2013: Selected articles from the Computational Structural Bioinformatics Workshop 2012. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcstructbiol/supplements/13/S1.
Author information
Additional information
Competing interests
The authors declare that there are no competing interests.
Authors' contributions
TS and JC designed this research and supervised the work. IA implemented the method and carried out experiments. MV participated in the software design and implementation. IA and JC wrote the manuscript. All authors have read and approved the manuscript.
Electronic supplementary material
Rights and permissions
About this article
Published
DOI
Keywords
 Normal Mode
 Root Mean Square Deviation
 Inverse Kinematic
 Conformational Transition
 Adenylate Kinase