Generalized spring tensor models for protein fluctuation dynamics and conformation changes

Background In the last decade, various coarse-grained elastic network models have been developed to study the large-scale motions of proteins and protein complexes where computer simulations using detailed all-atom models are not feasible. Among these models, the Gaussian Network Model (GNM) and Anisotropic Network Model (ANM) have been widely used. Both models have strengths and limitations. GNM can predict the relative magnitudes of protein fluctuations well, but due to its isotropy assumption, it can not be applied to predict the directions of the fluctuations. In contrast, ANM adds the ability to do the latter, but loses a significant amount of precision in the prediction of the magnitudes. Results In this article, we develop a single model, called generalized spring tensor model (STeM), that is able to predict well both the magnitudes and the directions of the fluctuations. Specifically, STeM performs equally well in B-factor predictions as GNM and has the ability to predict the directions of fluctuations as ANM. This is achieved by employing a physically more realistic potential, the Gō-like potential. The potential, which is more sophisticated than that of either GNM or ANM, though adds complexity to the derivation process of the Hessian matrix (which fortunately has been done once for all and the MATLAB code is freely available electronically at http://www.cs.iastate.edu/~gsong/STeM), causes virtually no performance slowdown. Conclusions Derived from a physically more realistic potential, STeM proves to be a natural solution in which advantages that used to exist in two separate models, namely GNM and ANM, are achieved in one single model. It thus lightens the burden to work with two separate models and to relate the modes of GNM with those of ANM at times. By examining the contributions of different interaction terms in the Gō potential to the fluctuation dynamics, STeM reveals, (i) a physical explanation for why the distance-dependent, inverse distance square (i.e., ) spring constants perform better than the uniform ones, and (ii), the importance of three-body and four-body interactions to properly modeling protein dynamics.


Introduction
It is now well accepted that the functions of a protein are closely related to not only its structure but also its dynamics. With the advancement of the computational power and increasing availability of computational resources, function-related protein dynamics, such as large-scale conformation transitions, has been probed by various computational methods at multiple scales.
Among these computational methods, coarse-grained models play an important role since many functional processes take place over time scales that are well beyond the capacity of all-atom simulations [1]. One type of coarse-grained models, the elastic network models (ENMs), have been particularly successful and widely used in studying protein dynamics and in relating the intrinsic motions of a protein with its functional-related conformation changes over the last decade [2][3][4][5].
The reason why ENMs have been well received as compared to the conventional normal mode analysis (NMA) lies at its simplicity to use. ENMs do not require energy minimization and therefore can be applied directly to crystal structures to compute the modes of motions. In contrast, minimization is required for carrying out the conventional normal mode analysis (NMA). The problematic aspect of energy minimization is that it usually shifts the protein molecule away from its crystal conformation by about 2 Å. In addition, in ENMs analytical solutions to residue fluctuations and motion correlations can be easily derived. On the other hand, the simplicity of ENMs leaves much room for improvement and many new models have been proposed [6][7][8][9][10][11][12].
The two most widely used ENM models are Gaussian Network Model (GNM) and Anisotropic Network Model (ANM). They have been used to predict the magnitudes or directions of the residue fluctuations from a single structure and have been applied in many research areas [4,5], such as domain decomposition [13] and allosteric communication [14][15][16][17]. Both models have their own advantages and disadvantages. GNM can predict the relative magnitudes of the fluctuations well, but due to its isotropy assumption, it can not be applied to predict the directions of the fluctuations. In contrast, ANM adds the ability to do the latter, but it loses a significant amount of precision in the prediction of the magnitudes.
Gaussian network model. Gaussian Network Model (GNM) was first introduced in [2] under the assumption that the separation between a pair of residues in the folded protein is Gaussianly distributed. Given its simplicity, the model performs extremely well in predicting the experimental B-factors. The model represents a protein structure using its C a atoms. The connectivity among the C a 's is expressed in Kirchhoff matrix Γ (see Eq. (1)). Two C a 's are considered to be in contact if their distance falls within a certain cutoff distance. The cutoff distance between a pair of residues is the only parameter in the model and is normally set to be 7 Å to 8 Å. Let Δr i and Δr j represent the instantaneous fluctuations from equilibrium positions of residues i and j and r ij and r 0 ,ij be the respective instantaneous and equilibrium distances between residues i and j. The Kirchhoff matrix Γ is: where i and j are the indices of the residues and r c is the cutoff distance.
The simplicity of the Kirchhoff matrix formulation results from the assumption that the fluctuations of each residue are isotropic and Gaussianly distributed along the X, Y and Z directions. The expected value of residue fluctuations, <Δr i 2 >, and correlations, <Δr i · Δr j >, can be easily obtained from the inverse of the Kirchhoff matrix: where k B is the Boltzmann constant and T is the temperature. g is the spring constant. The <Δr i 2 > term is directly proportional to the crystallographic Bfactors.
Anisotropic network model. GNM provides only the magnitudes of residue fluctuations. To study the motions of a protein in more details, especially to determine the directions of the fluctuations, normal mode analysis (NMA) is needed. Traditional NMA is all-atom based and requires a structure to be first energy-minimized before the Hessian matrix and normal modes can be computed, which was rather cumbersome. Even after the energy minimization, the derivation of the Hessian matrix is not easy due to the complicated all-atom potential. In Tirion's pioneering work [18], the energy minimization step was removed and a much simpler Hookean potential was used, and yet it was shown that the low frequency normal modes remained mostly accurate. Since then, the Hookean spring potentials have been favored in most coarse-grained C α models [3,19,20]. One of such models is best known as Anisotropic Network Model (ANM) [3] since it has anisotropic, directional information of the fluctuations. The potential in ANM has the simplest harmonic form. Assuming that a given structure is at equilibrium, the Hessian matrix (3N×3N) can be derived analytically from such a potential [3] where H i j is the interaction tensor between residues i and j and can be expressed as: lie in several aspects: (i) it is a coarse-grained model and uses the C a 's to represent the residues in a structure; (ii) it does not require energy minimization and thus can be applied directly to crystal structures to compute the modes of motions; (iii) it provides analytical solutions to the mean square fluctuations and motion correlations.
The limitations of the GNM model. GNM provides only information on the magnitudes of residue fluctuations but no directional information. Therefore, the modes of GNM should not be interpreted as protein motions or components of the motions, since the potential in GNM is not rotationally invariant [21].
The limitations of the ANM model. In contrast to that in GNM, the potential in ANM is based on simple, harmonic Hookean springs and is rotationally invariant. And thus, the modes of ANM do represent the possible modes of protein motions. In doing this, however, ANM loses a significant amount of precision in predicting the magnitudes of the fluctuations. The reason is that, in GNM, the fluctuations in the separation between a pair of residues are assumed to be Gaussianly distributed and isotropic, while in ANM, because only a Hookean spring is attached between a pair of residues i and j, the fluctuation of residue j is constrained only longitudinally along the axis from i to j. The fluctuation is unconstrained transversely. The interaction spring tensor H i,j ANM between residues i and j in Eq. (5) becomes the following in the local frame (where the Z axis is along the direction from residues i to j): Because the fluctuation of residue j is unconstrained transversely relative to residue i, the fluctuations given by ANM are less realistic than those by GNM, which are assumed to be isotropic. The isotropy in GNM is equivalent to an interaction spring tensor between residues i and j of the following form: From the two tensors H i,j ANM and H i,j GNM given in Eqs. 8 and 9, the causes for the limitations in GNM and ANM are clearly displayed. The unrealistic-ness in ANM is an artifact resulting from its over-simplified potential. The isotropy assumption of GNM, on the other hand, does a better job than ANM in modeling the effect of residue interactions on the magnitudes of the fluctuations, but gives up completely on representing the anisotropic nature that is intrinsic to all physical forces and interactions, since only the magnitudes of the mean-square fluctuations and cross-correlations were of concern when GNM was first proposed. Therefore, to overcome the limitations of GNM and ANM, what is needed is a generalized interaction spring tensor that both is anisotropic and can exert more proper constraints on the fluctuations than the ANM tensor H i,j ANM does. This calls for a model that has a physically more realistic potential than that of ANM. Since potentials with only two-body interactions can provide only longitudinal constraints, it is necessary to include multibody interactions in the potential in order to have transversal constraints as well. The multi-body interactions provide additional diagonal and off-diagonal terms to the interaction spring tensor between residues i and j. For example, by properly including three-body interactions, the interaction spring tensor may look like: where k represent the indices of the residues that interact with both residues i and j through three-body interaction S. The first tensor on the right side of the equation represents the two-body interaction, which is similar to H i,j ANM , except that the interaction strength T (i, j) depends on residues i and j, and thus may be distance-dependent as well.
Our contributions. To overcome the limitations of ANM and GNM, we have developed a generalized spring tensor model for studying protein fluctuation dynamics and conformation changes. It is called generalized spring tensor model, or STeM, for the reason that the interaction between a pair of residues i and j is no longer a Hookean spring that has the tensor form of Eq. (8), but takes a generalized tensor form (similar to that in Eq. 10) that can provide both longitudinal and transversal constraints on a residue's fluctuations relative to its neighbours. We obtain the generalized tensor form by deriving the Hessian matrix from a physically more realistic Gō-like potential (Eq. 11), which has been successfully used in many MD simulations to study protein folding processes and conformation changes [22][23][24]. In additional to the Hookean spring interactions, the potential includes bond bending and torsional interactions, both of which had been found to be helpful in removing the "tip effect" of the ANM model [9]. The inclusion of the bond bending and torsional interactions is reflected in the generalized tensor spring interaction between residues i and j, in such a way that the tensor now includes not only the two-body interaction between residues i and j, but also three-body and four-body interactions that involve residues i and j (see Eq. 10). In doing this, the STeM model is able to integrate all the aforementioned attractive features of ANM and GNM and overcome their limitations. Specifically, STeM performs equally well in B-factors predictions as GNM and has the ability to predict the directions of the fluctuations as ANM. This is accomplished with virtually no performance slowdown. The only potential drawback of this model is the significantly increased complexity in deriving the Hessian matrix. Fortunately, this has been done once for all and the derivation results are available electronically at http://www.cs.iastate.edu/~gsong/STeM. STeM is physically more accurate by explicitly including the bond bending and torsional interactions since they capture the chain behavior of protein molecules, which are neglected in most elastic network models where a protein is treated as an elastic rubber. Therefore, we have reasons to expect this model will further distinguish itself in studying protein dynamics where a correct modeling of bond bending and/or torsional rotations is critical.

Results and discussion
Crystallographic B-factor prediction Table 1 shows the correlation coefficients between the experimental and calculated B-factors of the 111 proteins in the first dataset. The mean values of the correlation coefficients of ANM, GNM, and STeM are 0.53, 0.59, and 0.60 respectively. STeM provides the directional information of the residue fluctuations as ANM and has an accuracy even slightly better than GNM in B-factor predictions. Figure 1 shows the distributions of the correlation coefficients between the calculated B-factors and the experimental B-factors. STeM is the only model in which there are instances where the correlation coefficient is above 0.85 and no instances where the correlation coefficient is below 0.25. This implies that the performance of STeM is more steady than either ANM or GNM. The scatter plot of the correlation coefficients between ANM and STeM in Figure 2 shows that STeM performs better than ANM for 80% of the proteins in the dataset. Protein structures of higher resolution have more accurate data on atom coordinates and B-factors. We investigate whether our model's performance can be further improved when the dataset used is limited to structures with higher resolution. We select the 12 structures with resolution better than 1.3 A from the first dataset. The mean values of the correlation coefficients of these 12 structures are 0.56, 0.62, and 0.63 for ANM, GNM, and STeM, respectively, which gives an improvement of about 5%-6% for all of the three models. Since the improvement is based on a relatively small set of 12 structures, a larger dataset is needed to further examine this potential dependence of B-factor prediction accuracy on structure quality.

The contributions of different interaction terms to the fluctuations
The Gō-like potential in eq. (11) has four different interaction terms, namely, bond stretching, bond bending, torsional interactions, and the non-bonded interactions. It is of great interest to investigate the relative contributions of these different terms to the agreement with experimental B-factors. Since only the nonbonded interaction term (V 4 ) is able to provide by itself enough constraints to ensure the Hessian matrix to have no more than six zero eigenvalues, V 4 is used as the base term for the evaluation of different terms' contributions to the mean-square fluctuations. The Hessian matrix of ANM, denoted by H ANM , is used as another baseline for comparison purposes. Table 2 lists the contributions of these different terms to the improvement of B-factor predictions as they are added to the potential.
First, it is seen that the non-bonded interactions, as are present in H V 4 and H ANM , play a dominant role in contributing to the B-factors. This is not surprising since the mean-square fluctuations of a residue are mostly constrained by its interactions with its spatial neighbours, most of which are through non-bonded interactions. What is more interesting is that H V 4 term alone performs better than H ANM . This is in agreement with recent results that the performance of B-factor predictions can be improved by using distance-dependent force constants [25,26]. Particularly, the spring constants that take the form of inverse distance square have been shown to be superior in a recent exhaustive study that experimented with different distance-dependent spring constants on a large dataset [10]. The Taylor expansion of the non-bonded interaction term (V 4 ) shows that it has an equivalent spring constant of the form 120 r 0, 2  ij (see Eq. 36), which is exactly proportional to the inverse of the pairwise distance square. Thus, STeM provides a physics-based explanation for the choice of using inverse square distance spring constants.
The contribution to the improvement in B-factor predictions from each of the bonded interactions, such as that of bond stretching, is small, as had been pointed out by Bahar et al when GNM was first proposed over a decade ago [2]. However, when the contributions of all of these four terms are added up, they together enable the STeM model to gain a significant improvement over ANM to reach the level of accuracy on a par with GNM.

Conformational change evaluation
It is known that the modes derived from the open form of a structure have better overlaps and correlations with the direction of a protein's conformation change than the ones derived from the closed form [20]. Here we apply the STeM model to study the conformation changes between the open and closed forms of 20 proteins and the open forms are used to calculate the normal modes. Table 3 lists the overlaps and correlations of the observed conformation changes and the indices of the modes that are most involved in the conformation changes. GNM is not considered since it cannot provide directional information. The mean values of the overlaps and correlation coefficients of ANM are 0.49 and 0.61 respectively, and 0.52 and 0.64 respectively for STeM. These amount to an improvement of about 5% for STeM over ANM on both overlap and correlation.
Since the results are obtained based on a relatively small set of 20 protein pairs, the significance of the improvement seen here needs to be further tested by conducting a more exhaustive analysis that uses a larger set of proteins and varying parameters, and preferably taking into account the effect of crystal packing as well.
We will leave this for future work. It is also worth noting that, in both the overlap and correlation calculations, the modes that are most involved in the conformation change tend to have lower indices in STeM than in ANM (see Table 3), which may imply the modes of STeM be of higher quality than those of ANM.

Conclusions
Protein mean-square fluctuations and conformation changes are two closely related aspects of protein dynamics. However, in the past, two separate groups of models were needed to best explain protein meansquare fluctuations or conformation changes. Specifically, the best models for predicting mean-square fluctuations cannot predict conformation changes, and the models that can predict conformation changes do not have the best performance in predicting meansquare fluctuations. There is thus an obvious gap between the models that work well in predicting one aspect of the dynamics and those in another. Since protein mean-square fluctuations and conformation changes are two closely related dynamic phenomena and share a similar physical origin, we reasoned that models based on a physically more accurate potential should be able to bridge the gap and predict both aspects of the protein dynamics well. Indeed, by using a Gō-like potential, we have successfully developed a spring tensor model (STeM) that is able to singly predict well both meansquare fluctuations and conformation changes. Specifically, STeM performs equally well in B-factor predictions as GNM and has the ability to predict the directions of fluctuations as ANM. The new STeM model does come with a cost. As is seen, the derivation process of the Hessian matrix in STeM is much more complex than models using only two-body Hookean potentials, such as those used in ANM. However, the introduced complexity in the potential is necessary in resolving the aforementioned gap that is mainly due to over-simplified potentials and in providing a single, unified model for protein dynamics. Moreover, the derivation process, though more complex, needs to be done only once. Examining the different interaction terms in the Gō potential and their contributions to the agreement with experimental B-factors provides further benefits. Along the way, we have discovered a physical explanation for why the distance-dependent, inverse Figure 1 The distributions of the correlation coefficients between the experimental and calculated B-factors Figure 2 The scatter plot of the correlation coefficients by ANM and those by STeM For 80% of the proteins listed in Table 1, STeM does better than ANM.
Lin and Song BMC Structural Biology 2010, 10(Suppl 1):S3 http://www.biomedcentral.com/1472-6807/10/S1/S3 distance square (i.e., 1 r 2 ) spring constants perform better than the uniform ones. The van der Waals interaction term in the potential naturally renders inverse distance square spring constants! By including the bond bending and torsional interactions and their contributions to the improvement in B-factor predictions, the STeM model confirms the importance of 3-body and 4-body potentials. The importance of multi-body potentials is made even more evident when their contribution to the interaction spring tensor is examined -the multi-body potentials are shown to be necessary in providing proper constraints on residue fluctuations, even transversely. It is worth noting that the 3-body and 4-body potentials introduced through bond bending and torsional interactions only scratch the surface of the full extensity of the multi-body potentials since bond bending and torsional interactions are restricted to only conssecutive residues along the protein chain. The improvement seen here calls for other generalized spring tensor models that have a thorough treatment of the multi-body potentials. Chain breaking, such as that due to missing residues, has a more felt impact on STeM than on ANM or GNM, since the first, the second, and the third terms of the potential used to derive the model are all related to the continuity of the chain. We have not evaluated such impact in the current work but this could be a future research direction and our STeM model would be a proper tool for evaluating the impact of chain breaking on protein motions. STeM does not always outperform ANM in B-factor predictions -it does better than ANM for 80% of the proteins studied. it would be interesting to find out why this is so. Crystal packing has been known to impact significantly the mean-square fluctuations. Therefore, a proper inclusion of the crystal packing effect may further enhance STeM's performance. Since STeM takes into account bond bending and

Methods
In this section we will show the derivations of the Hessian matrix from a Gō-like potential proposed by Clementi et al [22].
The Gō-like potential The Gō-like potential in [22] takes the non-native and native (equilibrium) conformations as input and it can be divided into four terms. The first term of this Gō-like potential (defined as V 1 for later use) preserves the chain connectivity. The second (V 2 ) and third terms (V 3 ) define the bond angle and torsional interactions respectively and the last term (V 4 ) is the nonlocal interactions. The Gō-like potential has the following expression: In Eq. (11), r and r 0 represent respectively the instantaneous and equilibrium lengths of the virtual bonds between the C a atoms of consecutive residues. Similarly, the θ (θ o ) and F (F 0 ) are respectively the instantaneous (equilibrium) virtual bond angles formed by three consecutive residues and the instantaneous (equilibrium) virtual dihedral angles formed by four consecutive residues. The r ij and r 0,ij represent respectively the instantaneous and equilibrium distances between two non-consecutive residues i and j. The Gō-like potential in Eq. (11) includes several force parameters (K r , K θ , K 1    , K 3    and ε) and the values of these parameters are taken directly from [22] without any tuning. The values of these parameters are: Anisotropic fluctuations from the second derivative of the Gō-like potential Similar to ANM, STeM has a 3N×3N Hessian matrix that can be decomposed into N×N super-elements. Each super-element in STeM, H i,j , is a summation of four 3×3 matrices. The first 3×3 matrix is the contribution from bond stretching. The second and third 3×3 matrices are the contributions from bond bending and torsional rotations respectively. The fourth 3×3 matrix is the contribution from nonlocal contacts. , The Hessian matrix is the second derivative of the overall potential (equation 11). Let us first consider the first term of the Gō-like potential and let (X i ,Y i , Z i ) and (X j , Y j , Z j ) be the Cartesian coordinates of two consecutive residues i and j.
The protein sets studied To evaluate the STeM model, we apply it to compute Bfactors and to study protein conformation changes and compare the results with those computed from ANM and GNM. For B-factors computations, the protein dataset is from [27] and contains 111 proteins. Two proteins, 1CYO and 5PTP, are removed from the dataset because they no longer exist in the current Protein Data Bank [28]. The proteins in the first dataset all have a resolution that is better than or equal to 2.0 Å. For conformation change studies, the dataset is from [20], which contains 20 pairs of protein structures. Each pair of protein structures have significantly large structure difference from each other.

Evaluation techniques
We used the same evaluation techniques as have been applied before [20,27]. Specifically, the following three numerical measures are used.

The correlation between the experimental and calculated B-factors
The linear correlation coefficient between the experimental and calculated B-factors is calculated using the following formula. The overlap between the experimental observed conformation changes and the calculated modes