- Research article
- Open Access
- Published:

# Large-scale evaluation of dynamically important residues in proteins predicted by the perturbation analysis of a coarse-grained elastic model

*BMC Structural Biology*
**volume 9**, Article number: 45 (2009)

## Abstract

### Backgrounds

It is increasingly recognized that protein functions often require intricate conformational dynamics, which involves a network of key amino acid residues that couple spatially separated functional sites. Tremendous efforts have been made to identify these key residues by experimental and computational means.

### Results

We have performed a large-scale evaluation of the predictions of dynamically important residues by a variety of computational protocols including three based on the perturbation and correlation analysis of a coarse-grained elastic model. This study is performed for two lists of test cases with >500 pairs of protein structures. The dynamically important residues predicted by the perturbation and correlation analysis are found to be strongly or moderately conserved in >67% of test cases. They form a sparse network of residues which are clustered both in 3D space and along protein sequence. Their overall conservation is attributed to their dynamic role rather than ligand binding or high network connectivity.

### Conclusion

By modeling how the protein structural fluctuations respond to residue-position-specific perturbations, our highly efficient perturbation and correlation analysis can be used to dissect the functional conformational changes in various proteins with a residue level of detail. The predictions of dynamically important residues serve as promising targets for mutational and functional studies.

## Background

Protein conformational dynamics [1, 2] is critically involved in many biochemical processes ranging from catalysis [3] to allostery [4–9] and signal transduction [10]. Protein dynamics spans a wide range of time scales (from picoseconds to seconds or minutes). Biologically relevant conformational motions of proteins are often collective (for example in the form of hinge-bending or shearing motions between rigid domains, see [11]). These highly coordinated motions are thought to involve a network of key amino acid residues that couple spatially separated functional sites [9]. The conservation and variation of protein functions are likely underscored by the conservation and co-evolution of these dynamically important residues. The existence of a sparse network of allosterically coupled residues in various proteins has been revealed by the statistical coupling (or correlated mutation) analysis based on multiple sequence alignment (see [12–14]). The discovery of dynamically important residues will lead to the following important applications: first, facilitate the mechanistic studies of a variety of biomolecular systems whose conformational dynamics plays a key role in function [1]; second, enable drug design that dynamically alters target proteins via small molecule binding [15–18]; third, guide the engineering of new molecular devices with novel dynamic properties [19, 20].

In complement with experimental efforts for probing protein dynamics at atomic resolution (such as NMR, see [21, 22]; and time-resolved Xray crystallography, see [23]), structure-based computer simulations have promised to elucidate the fine details of protein conformational motions. When multiple crystal structures are available for a protein at different states, structural analysis can identify those residues involved in the functional conformational changes between these states (such as local motions in allosteric proteins, see [24]; conformational changes due to binding of small molecules and other proteins, see [25, 26]). In one study, the analysis of local structural changes (such as changes in the pseudo-bond angles and pseudo-dihedral angles along the C_{
α
}trace of a protein) was employed to identify key residues involved in the lid-closing conformational change in adenylate kinase [27]. Hinge residues of protein domain motions can be identified by structural analysis tools such as DynDom [28] and Hingefind [29], or graph theory based method [30]. To further yield dynamic information from static crystal structures, atomistic molecular dynamics (MD) [31] and related methods (such as non-equilibrium MD, see [32]; targeted MD, see [33]; biased MD, see [34]) have been employed to identify key residues or structural elements involved in protein conformational fluctuations and transitions (see [32, 35, 36]). Nevertheless, the applications of atomistic MD simulations are limited by the high cost of simulating conformational dynamics beyond tens of nanoseconds [31].

To overcome the time-scale barrier for atomistic MD simulations, a variety of coarse-grained (low-resolution) modeling techniques [37] have been developed to explore protein conformational motions more efficiently. For example, the Go model [38] has been recently used to simulate conformational transitions between known protein structures (see [39]). A structural thermodynamic model (COREX) based on "Go-like" sampling of protein ensembles was developed to simulate coupling between local structural fluctuations, ligand binding and global conformational changes [40]. Of particular interest to the present study is the elastic network model (ENM) [41–43] and its isotropic variation – Gaussian network model (GNM) [44, 45], which represent a protein structure as a network of C_{
α
}atoms locally connected by springs with a uniform force constant [46]. The normal mode analysis (NMA) of ENM often yields a handful of low-frequency modes that dominate the large-scale domain motions observed between protein crystal structures [43, 47, 48]. Numerous studies have established ENM as an efficient means to tease out the functionally relevant conformational dynamics from protein structures with no limit in time scale or system size (for reviews, see [49–51]). The ENM/GNM-based NMA has formed the basis of several recently developed computational methods for locating ligand-binding sites [52, 53], predicting hinge residues using low-frequency modes [54–56], and modeling conformational transition pathways [57–61]. Thanks to its high efficiency the ENM/GNM-based NMA has proven to be a powerful tool for bioinformatic analysis of protein structures and motions at a large scale [47, 48, 62].

In several recent studies, we have proposed and employed an ENM-based perturbation analysis to predict the dynamically important residues involved in the observed protein functional motions [63–65]. This method analyzes how the "dominant mode" (the normal mode that dominates the observed protein functional motions) changes in response to residue-position-specific perturbations to the ENM force constant that mimic the effects of point mutations (see Methods). By combining the perturbation analysis with the linear response theory, a correlation analysis was developed to locate hinge residues that control the structural fluctuations of the entire protein structure or its active site [66, 67]. These novel methods are available at an NIH-based webserver http://enm.lobos.nih.gov. In our previous studies of chaperonin GroEL, helicase, myosin and polymerases [63–67], we have shown that the key residues involved in the functional motions of these proteins are highly conserved, and many of them were found to be functionally important by mutational studies [63–65]. Nevertheless, the applicability of these methods and other coarse-grained methods to a large variety of proteins remains to be established. Additionally, the perturbation analysis was based on a single dominant mode, while the conformational changes in proteins often involve multiple modes [47, 48]. So it is necessary to extend our method to study protein functional motions that are not dominated by a single normal mode.

In this study, we will first introduce a new fluctuation-based perturbation protocol (see Methods) that does not assume the existence of a single dominant mode. Then we will perform a comprehensive evaluation of our ENM-based methods together with several alternative ones for predicting dynamically important residues. For the lack of systematic data on protein dynamics and functions, we will use sequence conservation as the primary metric for method evaluation. The evaluation is performed on two lists of protein structure pairs – a short list with 25 pairs of protein structures which is compiled from previous works by us and others [43, 68, 69]; a long list with >500 pairs gleaned from Protein Data Bank by an automated procedure (see Methods). The above two lists are complementary in the following ways. The short list contains functionally relevant structural changes whose biological significance was established in literature (all of them are involved in the binding/release of a biologically relevant ligand), but the arbitrary selection of a small number of cases may introduce artificial bias and statistical uncertainty; the long list, however, may contain biologically irrelevant structural changes (for example, due to crystal packing), but it provides a less biased data set with smaller statistical errors.

We have found that the dynamically important residues predicted by the perturbation and correlation analysis are strongly or moderately conserved for >67% of the test cases. These key residues, which constitute a small fraction of all residues (~15%), are clustered both in 3D space and along protein sequence, and they dominate the structural dynamics of ENM. Together, they form a sparse network which may couple distant functional sites in a protein complex. The conservation of these key residues is mainly due to their dynamic role instead of ligand binding or high network connectivity. This large-scale study, along with previous database-scale studies using NMA [47, 62, 70–72], will pave the way for future bioinformatic analyses of protein structure-function relationships via high-throughput dynamic modeling. These techniques promise to offer new biophysical insights that will enrich the existing databases of protein structures and motions [71].

## Methods

### 1. Elastic Network Model (ENM)

In an ENM, a protein structure is represented as a network of C_{
α
}atoms of amino acid residues whose equilibrium coordinates are given by a crystal structure. A harmonic potential with a uniform force constant *C* accounts for elastic interactions between two C_{
α
}atoms that lie within a cutoff distance *R*_{
c
}(it is usually set within the range of 7 Å–20 Å). The potential energy function of ENM is [46]

where *d*_{
ij
}is the distance between the C_{
α
}atom *i* and *j*, is the equilibrium distance between the C_{
α
}atom *i* and *j* in the crystal structure, *N* is the number of C_{
α
}atoms, and *θ* (*x*) is the Heaviside function.

The above potential energy function is expanded to second order:

where *δX* = *X* - *X*_{0}, *X* is a 3*N*-dimensional vector representing the Cartesian coordinates of *N* C_{
α
}atoms, C_{
α
}gives the equilibrium C_{
α
}coordinates in the crystal structure, is the Hessian matrix, and the matrix element of *H*^{ij}is given by , where *a* and *b* are indices for x, y and z components of the Cartesian coordinates of C_{
α
}atoms *i*' and *j*'.

A normal mode analysis of the Hessian matrix yields 3*N*-6 non-zero normal modes (excluding 6 zero modes corresponding to 3 rotations and 3 translations), which are numbered from 1 to 3*N*-6 in order of ascending eigenvalue. To validate ENM, each normal mode is compared with the observed structural change between two crystal structures, which is represented by a 3*N*-dimensional vector Δ*X*_{
obs
}obtained by superimposing the two structures with minimal Root Mean Squared Deviation (RMSD). An overlap is calculated for mode *m*, where *V*_{
m
}is its eigenvector, |Δ*X*_{
obs
}| and |*V*_{
m
}| represent the amplitudes of Δ*X*_{
obs
}and *V*_{
m
}. The mode with the highest overlap is named the dominant mode, if its overlap value is high (|*I*_{
m
}| > 0.5).

To assess how collective the observed structural change is, we define the collectivity of Δ*X*_{
obs
}as [43], where and Δ*X*_{obs, i}is the 3D component of Δ*X*_{
obs
}at residue *i*.

### 2. Perturbation analysis based on a single dominant mode

To simulate the dynamic effect of a point mutation, we have introduced a perturbation to the local elastic interactions involving a given residue *i* as follows [63]:

where *δC* represents an arbitrary change to the force constant of those springs connecting residue *i* to its neighbors (residue *j*). This perturbation results in the following change in the Hessian matrix:

In the single-mode-based SPM proposed earlier [63–65], we analyzed how much the above perturbations change the eigenvalue (or eigenvector) of the mode that dominates the protein functional motions:

where *δλ*_{m, i}(denoted as *δω*_{
m
}in [63]) is the resulting change in the eigenvalue of mode *m*, which is proportional to the elastic energy stored in the springs that connect residue *i* to its neighbors following a displacement of the protein structure in the direction of the eigenvector *V*_{
m
}. Because , if we set *δC* = *C*/2, then (named as normalized *δλ*_{m, i}) gives the fractional contribution of local interactions at residue *i* to mode *m*. was found to be a good indicator for the dynamic importance of residue positions [63].

### 3. Perturbation analysis based on overall structural fluctuations and structural fluctuations in the direction of observed conformational change

In addition to the perturbation analysis of individual normal modes [63–65], we have also evaluated how much the perturbation in Eq. 3 changes the overall structural fluctuations of the entire protein structure [66] or its active site [67].

The change in the overall structural fluctuations caused by a perturbation at residue *i* (see Eq. 3) is given by the follow score [66]:

where , *δH*_{
i
}is given in Eq. 4, *Tr* represents the trace of a matrix. In practice, *H*^{-1} can be calculated efficiently by inverting *H* + *εI* using a sparse linear equation solver (*I* is the identity matrix and *ε* is a small number set to be 0.00001). The purpose of adding a small *ε* is to change the eigenvalues of six zero modes of *H* to positive values so that *H* can be inverted by a linear equation solver. The eigenvectors are not changed by the addition of *ε*. Because *δH*_{
i
}is translationally and rotationally invariant, the inclusion of translational and rotational modes in *H*^{-1} does not affect the calculations in Eq. 6 except for a small increase in the eigenvalues of all non-zero modes by *ε*.

Because , if we assume *δC* = -*C*/2, then δ2, then *δf*_{
i
}/⟨(*X* - *X*_{0})^{2}⟩ (named as normalized *δf*_{
i
}) gives the fractional contribution of local interactions at residue *i* to the overall structural fluctuations of the protein structure. This score was previously used to identify the hinge residues in myosin motor domain [66].

To focus on the fluctuations in the direction of the crystallographically observed conformational change (Δ*X*_{
obs
}), we introduce a new fluctuation-based perturbation score:

Because , if we assume *δC* = -*C*/2, then (named as normalized ) gives the fractional contribution of local interactions at residue *i* to the fluctuations in the direction of Δ*X*_{
obs
}Therefore, those residue positions with high are significantly involved in the structural fluctuations that sample the new structural state at *X*_{0} + Δ*X*_{
obs
}If the observed conformational change is dominated by a single mode *m* (Δ*X*_{
obs
}~*V*_{
m
}), is reduced to . Otherwise, can be calculated without assuming the existence of a single dominant mode.

### 4. Alternative scores for assessing dynamic importance of residue positions

For comparison with the ENM-based scores of dynamic importance defined in Eqs 5–7, we have also examined the following alternative scores based on structural analysis or GNM modes:

**a**. Two scores based on changes in the pseudo-bond angles and pseudo-dihedral angles associated with the observed conformational change (Δ*X*_{
obs
}): they are defined as follows

where *δθ*(*i* - l, *i*, *i* + l) is the change in the pseudo-bond angle of three consecutive C_{
α
}atoms *i*-1, *i*, *i*+1; *δφ* (*i*-2, *i*-1, *i*, *i*+1) (*δφ* (*i*-1, *i*, *i*+1, *i*+2)) is the change in the pseudo-dihedral angle of four consecutive C_{
α
}atoms *i*-2, *i*-1, *i*, *i*+1 (*i*-1, *i*, *i*+1, *i*+2).

**b**. Strain energy score: it is defined as the elastic energy stored in the springs that connect residue *i* to its neighbors following the observed conformational change (Δ*X*_{
obs
}):

**c**. GNM-based fluctuation score: it is defined as follows using the lowest two nonzero GNM modes (following [55])

where *V*_{
mi
}is the *i'th* component of the eigenvector of the GNM mode *m*, and *λ*_{
m
}is the corresponding eigenvalue. Note that residues with high values have low mobility and correspond to hinge centers of a protein structure (see [55]).

### 5. Evaluation of predictions of dynamically important residue positions

Since the dynamically important residues are not completely known for the proteins in our lists, we will evaluate our predictions indirectly by assessing the conservation of the predicted key residues. As we proposed previously [63], the dynamically important residue positions are under functional constraints so they must be either conserved in amino acid type/property, or under co-evolution with each other (See [12]). Therefore, they are expected to be more conserved than those residues not under functional constraints. Consequently, the quality of our predictions can be evaluated by the average conservation scores for the predicted key residue positions.

For a given score of dynamic importance (*S*_{
i
}= *δλ*_{m, i}, *δf*_{
i
}, , , , , or ), we rank and select the top 15% key residue positions (the choice of a different percentage between 10% and 20% does not qualitatively change our results).

Next we calculate the average conservation score for *N*_{
key
}= 15%·N key residue positions (), and for all residue positions (). The conservation scores (*CS*) are calculated based on the multiple sequence alignments of homologous protein sequences by ConSurf server (http://consurf.tau.ac.il/, [73]). The lowest (highest) conservation score represents the most (least) conserved residue position.

Then we calculate the following Z score to assess statistically how well the predicted residue positions are conserved than average:

where *σ*_{
rand
}is the squared root of the variation of the average conservation score for *N*_{
key
}randomly chosen residue positions calculated as follows:

where {*X*_{1}, ... } is a randomly selected subset of {*CS*_{1}, ... *CS*_{
N
}}.

The more negative *Z* is, the more significant is the overall conservation of the key residue positions.

The above evaluation is performed on two lists of test cases (see Table S1&S2 in Additional file 1). For each score, the average (⟨Z⟩) and standard deviation (*σ*_{Z}) of the Z scores are calculated for the test cases in both lists. A more negative ⟨Z⟩ is indicative of a better performance

### 6. Generation of a long list of PDB structure pairs

We follow an automated procedure to generate a long list of PDB structure pairs (represented as (pdb1, pdb2)):

**a**. Start from an initial list of 2039 high-resolution protein structures (with sequence identity <30%, resolution <1.6 Å, and R-factor < 0.25) obtained from culledpdb web site http://dunbrack.fccc.edu/;

**b**. Remove protein structures with <100 residues from the initial list;

**c**. For each protein structure (denoted as pdb1) from the initial list, generate a list of homologous structures (denoted as pdb2) that satisfy the following conditions: (i). sequence identity with pdb1 ≥ 90%; (ii). number of residues ≥ 100; (iii). RMSD between pdb1 and pdb2 ≥ 1 Å. Note: A larger list of homologous structures (denoted as pdb2') with only condition (i) satisfied is compiled to define ligand-binding residues of pdb1, which are either in heavy-atom-contact with a hetero atom in pdb1 or mapped sequentially to residues of a pdb2' which are in heavy-atom-contact with a hetero atom in pdb2';

**d**. Select the pdb2 with the maximal RMSD relative to pdb1 (denoted as pdb2_{max}) and add the structure pair (pdb1, pdb2_{max}) to the final list.

The final list contains 502 PDB structure pairs. Among them, 473 have ConSurf scores calculated for ≥ 50% of residue positions in pdb1, which are used for Z score calculations. For details see Table S2 in Additional file 1.

## Results and discussion

To evaluate the performance of various scores for selecting dynamically important residues (*δλ*_{m, i}, *δf*_{
i
}, , , , , and , see Methods), we have assessed the conservation of the predicted residues (using a Z score, see Methods) for both a short and a long list of test cases (see Table S1&S2 in Additional file 1). The two lists contain a broad range of proteins with diverse biochemical functions (including signaling proteins, DNA-binding proteins, motor proteins and enzymes) and structural architectures (single-domain and multi-domain proteins). The results are summarized in Table 1. Then we have analyzed factors (including ENM parameter, properties of protein structures and conformational changes, see Table 2) that may affect the performance of the ENM-based scores (*δλ*_{m, i}, *δf*_{
i
}, ), and their relationships with each other and alternative scores based on structural analysis (, ) and GNM (). Next, we have studied how the predicted key residues correlate with ligand-binding residues and hub residues with high network connectivity, and verified that they form a sparse spatially and sequentially clustered network within protein complexes. Finally, we have illustrated the application of perturbation and correlation protocols to two motor proteins (myosin and kinesin).

For the simplicity of evaluation, we have chosen the dynamically important residues as the top 15% residues ranked by a given score. The use of a small percentage like 15% is in line with the previous findings of a sparse network of allosterically coupled residues in various proteins (for example, the allosteric network consists of only 14% of all residues in GPCR, see [13]). It is conceivable that the percentage of dynamically important residues varies from protein to protein. A more sophisticated cutoff scheme will be developed in our future studies.

### 1. Conservation of dynamically important residues

We first examine the distribution of Z scores for the key residues predicted by the perturbation-based scores (*δλ*_{m, i}, *δf*_{
i
}, , see Figure 1). We have divided all test cases in the long list into three classes based on their Z scores: highly conserved (Z ≥ -2), moderately conserved (-2<Z ≥ -1) and not conserved (-1<Z). Assuming a normal distribution of the average conservation score for *N*_{
key
}randomly chosen residue positions (*N*_{
key
}is the number of key residues), a Z score ≥ -2 corresponds to 98% confidence level, and a Z score ≥ -1 corresponds to 84% confidence level. Using one of the three scores (*δλ*_{m, i}, *δf*_{
i
}, , see Figure 1), the predicted residues in 41–48% (20–26%) of cases are highly (moderately) conserved. Therefore, in most (>67%) of the cases, the dynamically important residues predicted by our perturbation-based scores are either highly or moderately conserved.

We have found large variations in the conservation of dynamically important residues (see Figure 1 and Table 1), which may be attributed to the following causes:

First, it is likely that those key residues co-evolve with each other, which results in weaker conservation of them individually. A detailed analysis of correlated mutations (see [12–14]) is needed to explore this possibility which is beyond the present study.

Second, the accuracy of ENM (with its simplified force field) in predicting the key residues may vary from case to case. We will further explore how the properties of protein structures and conformational changes affect the accuracy in Subsection 3.

Third, some crystallographically observed conformational changes in the long list may not be functionally relevant, therefore the residues involved in these changes are not under functional constraints. The relevance of this factor is supported by the observation that the average Z scores are significantly more negative for the short list (see Table 1) which includes manually selected protein conformational changes whose functional relevance is established in literature.

Judging from the average Z scores (⟨Z⟩), and *δf*_{
i
}perform slightly better than *δλ*_{m, i}(with more negative ⟨Z⟩, see Table 1), which is attributed to the use of more modes than the dominant one in the perturbation and correlation analysis. To show the statistical significance of the above improvement, we perform the following Z score estimation. The standard error of ⟨Z⟩ for the long list is estimated to be (*σ*_{Z} is given in Table 1, 473 is the number of protein structure pairs used for Z score calculations, see Methods). The significance of the difference in ⟨Z⟩ between and *δλ*_{m, i}is assessed by the following Z score:(-1.8 + 1.60)/*σ*_{⟨Z⟩}≈ -2.6 (⟨Z⟩ values are given in Table 1). This large negative Z score indicates that the improvement from *δλ*_{m, i}to is statistically significant. Similarly, we can show that the improvement from *δλ*_{m, i}to *δf*_{
i
}is also statistically significant.

### 2. Optimal performance of perturbation and correlation analysis at a low cutoff distance *R*_{
c
}= 7 Å

To explore how the performance of perturbation-based scores (*δλ*_{m, i}, *δf*_{
i
}, ) depends on ENM parameter (cutoff distance *R*_{
c
}), the Z score calculations are done for a range of *R*_{
c
}values (7 Å ≤ *R*_{
c
}≤ 15 Å) (we do not consider *R*_{
c
}<7 Å because it causes many zero modes to arise due to insufficient connectivity in ENM). For both the short and long list, we have found that the best performance (the most negative ⟨Z⟩) for *δλ*_{m, i}, *δf*_{
i
}and is attained at *R*_{
c
}= 7 Å (see Table 1). In addition, if the C_{
α
}-C_{
α
}distance cutoff at 7 Å is combined with a heavy-atom-contact criterion (namely, two residues are in contact if the minimal distance between their heavy atoms is <4 Å), the performance is further improved (see Table 1).

The above results suggest that the accuracy of ENM-based perturbation and correlation analysis is optimal when the range of C_{
α
}-C_{
α
}interactions in ENM matches the center of the distribution of C_{
α
}-C_{
α
}distances between residues in atomic contacts (4.4 Å ~12.8 Å, see [74]). This is contrary to previous studies that found better ENM performance for fitting crystallographic B factors at relatively high *R*_{
c
}(15 Å ≤ *R*_{
c
}≤ 24 Å) [62]. This is also at odds with our finding that the maximal overlap for the dominant mode is lower at *R*_{
c
}= 7 Å (0.35 ± 0.14 for the long list) than at 8 Å ≤ *R*_{
c
}≤ 11 Å (0.42 ± 0.16 for the long list). Interestingly, the finding of optimality at low *R*_{
c
}is roughly consistent with a recent study that found the optimal descriptions of structural fluctuations and conformational changes by a generalized anisotropic network model at *R*_{
c
}= 8 Å [75].

The above discrepancy may be explained as follows. Intuitively, the use of high *R*_{
c
}tends to suppress local motions (for example, in a dangling loop) that often arise as extra zero modes at low *R*_{
c
}. Therefore, high *R*_{
c
}helps to remove the tip effects (overly flexible pieces of proteins that protrude out of the main globular body, see [76]) and improve the description of collective domain motions observed crystallographically [11]. However, the introduction of additional elastic interactions beyond the range of physical interactions between contacting residues may compromise the accuracy of modeling local perturbations by point mutations, which explains the lower performance as *R*_{
c
}increases from 7 Å to 15 Å (Table 1). Therefore, the optimal choice of ENM parameter is application-dependent. To simulate the dynamic effects of local perturbations to protein structures, it is preferable to use a relatively low *R*_{
c
}, which allows the structural fluctuations and low-frequency modes of ENM to respond more sensitively to residue-position-specific perturbations to local interactions (see Eq. 3 of Methods). Indeed, we have found that the percentage of structural fluctuations contributed by the top 15% key residues (ranked by *δf*_{
i
}) decreases significantly (from 59 ± 10% to 31 ± 18%, for cases in the long list) as *R*_{
c
}increases from 7 Å to 15 Å. Therefore, despite the findings that low-frequency modes are robust to ENM parameters (see [77]), the use of perturbation methods to probe ENM dynamics [63, 64, 53, 78] remains feasible (especially at low *R*_{
c
}). In this study, we will use ENM constructed with *R*_{
c
}= 7 Å and the heavy-atom-contact criterion.

### 3. Dependence of performance on properties of protein structures and conformational changes

We now study how the performance of three perturbation-based scores (, *δf*_{
i
}, *δλ*_{m, i}) depends on four properties of protein structures and conformational changes in the long list. To reduce the statistical noise due to large variations in Z scores (large width in distribution of Z scores, see Figure 1), we have performed the following three-tier averaging: for each property, all test cases are sorted by the property value and then divided equally into three tiers (the bottom 1/3, middle 1/3 and top 1/3 go into the low, medium and high tier, respectively); the average (*σ*_{
Z
}) and the standard deviation (⟨Z⟩) of the Z scores are calculated separately for each tier (see Table 2). The statistical significance of the observed differences in ⟨Z⟩ between tiers can be demonstrated following the same Z score estimation as given at the end of subsection 1.

#### a. Protein size (number of residues in a protein structure)

For all three scores (, *δf*_{
i
}, *δλ*_{m, i}), ⟨Z⟩ decreases significantly from low-size, medium-size to high-size tier. Thus a better performance is expected for larger proteins despite large variations in Z scores. This result agrees with the general understanding that ENM modeling is more suitable for large proteins whose low-frequency dynamics is more dependent on the global shape, and less sensitive to inaccuracy in local interactions.

#### b. RMSD between two PDB structures

The dependence on RMSD differs between three scores – *δλ*_{m, i}has the weakest RMSD-dependence; *δf*_{
i
}performs best in medium-RMSD tier; performs best in high-RMSD tier. Therefore, *δf*_{
i
}and may be used selectively depending on the magnitude of conformational change. It is encouraging that is applicable to large-scale conformational changes beyond thermal fluctuations.

#### c. Overlap of the dominant mode for the observed conformational change

A negative correlation (⟨Z⟩ decreases from low-, medium- to high-overlap tier) is found for all three scores, which is strongest for *δλ*_{m, i}and relatively weak for *δf*_{
i
}and . This is expected because *δλ*_{m, i}relies on the assumption of a single dominant mode while *δf*_{
i
}and do not. Therefore ENM remains valid for describing structural fluctuations and residues involved even if it fails to capture the observed conformational changes by a single dominant mode.

#### d. Collective nature of the observed conformational change

Previous studies have found that ENM performs better in describing protein conformational changes (see Yang 2007) with high collectivity (for definition, see Methods). We have explored the dependence of Z scores on the collectivity of the observed conformational change. Surprisingly, it is found that high/medium collectivity only improves the performance of *δλ*_{m, i}but not or *δf*_{
i
}. Therefore, unlike other ENM assessing metrics (such as the overlap of the dominant mode, see [48]), the accuracy of ENM-based predictions for dynamically important residues (using or *δf*_{
i
}) does not strongly depend on the collectivity of conformational changes.

In complement to the collectivity calculations, we have also determined how the performance depends on whether the observed conformational change can be approximated by rigid-body domain motions. To this end, we have applied DynDom [28] to the observed conformational changes in the long list, which are then divided into two subsets (success/failure) depending on whether DynDom succeeds/fails in dynamic domain partition. Domain partitions were made successfully by DynDom for 135 out of 502 cases. We have found that all three scores perform better if the conformational change involves domain motions. For those cases where domain partition succeeds, performs the best; for those cases where domain partition fails, *δf*_{
i
}performs the best.

In sum, the performance of the three perturbation-based scores depends more on the protein size than the properties of conformational changes (see Table 2). In particular, *δf*_{
i
}and perform well even when the observed conformational change is not collective or not dominated by a single mode. In fact, they outperform *δλ*_{m, i}under those conditions (see Table 2). Therefore the fluctuation-based perturbation and correlation analysis may be applied more broadly than the single-mode-based protocol to cases where many modes are involved in the functional motions.

### 4. Relationship between various scores of dynamic importance

To evaluate how much the key residues predicted by two scores of dynamic importance (denoted as S1 and S2) overlap with each other, we have calculated a correlation factor – it is defined as the enrichment in the probability of finding a key residue predicted by S2 within the set of key residues predicted by S1 relative to that expected if randomly selecting a residue from all residues in a protein. In practice, the correlation factor is calculated as , where *N*_{key 12}is the number of key residues predicted by both S1 and S2, *N*_{key 1}(*N*_{key 2}) is the number of key residues predicted by S1 (S2), and *N* is the total number of residues. A correlation factor >>1 indicates strong correlation between S1 and S2. Based on the statistics of the long list (same below in this subsection), the correlation factor between the key residues predicted by and *δf*_{
i
}(*δλ*_{m, i}) is 3.57 ± 1.13 (3.18 ± 1.39). In addition, 55 ± 17% (49 ± 21%) of the key residues predicted by and *δf*_{
i
}(*δλ*_{m, i}) overlap with each other. Namely, 55 ± 17% (49 ± 21%) of the key residues predicted by are also predicted by *δf*_{
i
}(*δλ*_{m, i}). Therefore, there are strong overlaps between the key residues predicted by the three perturbation-based scores (, *δf*_{
i
}and *δλ*_{m, i}). Many common residues are involved in both the equilibrium fluctuations (in all directions as probed by *δf*_{
i
}) and the large conformational change (in a particular direction as probed by ). Therefore, it may be possible to predict those residues involved in the *slow* conformational change from a known structural state toward an unknown structural state, by simulating the *fast* equilibrium fluctuations near the known state [27]. This useful strategy avoids the need for costly long-time MD simulations.

Compared with the above perturbation-based scores, the residues involved in the observed conformational changes as identified by alternative scores based on energetic or structural analysis (, , , see Methods) are much less conserved (see Table 1). Therefore, we infer that not all residues involved in the observed conformational changes are equally constrained by functions. From the point of view of structural transition, not all changes in local structures/interactions are equally important to the transition – those early-occurring changes are more likely to affect the transition rate than the late-occurring ones (see [60]). It is conceivable that the early structural changes involve intrinsic motions described by the low-frequency normal modes. Therefore, by capturing these early structural changes, the NMA-based perturbation and correlation analysis can predict dynamically important residues more accurately than standard structural and energetic analysis.

Following [55] we have also calculated a GNM-based score for predicting hinge residues with minimal mobility. This score () is based on the calculation of mean-squared fluctuations due to the lowest two GNM modes (see Methods). It is found that the hinge residues (defined as the top 15% residues ranked by ) are significantly conserved (see Table 1), which supports their functional importance [55]. Nevertheless, they do not overlap significantly with the key residues predicted by the perturbation and correlation analysis. Indeed, the correlation factor between the key residues predicted by and is only 1.6 ± 0.95, and only 25 ± 14% of key residues predicted by and overlap with each other. Such a lack of correlation is not unexpected, because these methods are based on different principles: the GNM score depends on the distribution of structural fluctuations, while and *δf*_{
i
}probe how the overall fluctuations are coupled to local perturbations. Therefore, the two methods may complement each other in probing different aspects of protein conformational dynamics.

### 5. Correlation between dynamically important residues and ligand-binding residues

The correlation between conformational dynamics and ligand binding in enzymes has been explored computationally in several recent studies. In one study, Yang and Bahar found that catalytic residues are often co-localized with hinge residues identified by GNM [55]. Another study found that the ability to trigger large changes in conformational distribution makes a good score function for predicting ligand-binding site [52, 53]. One can argue that evolution favors ligand binding to dynamically important regions of proteins to effectively trigger or regulate protein dynamics. Thus these key residues may be constrained by both conformational dynamics and ligand binding affinity.

To explore the correlation between ligand-binding residues and the dynamically important residues predicted by our perturbation and correlation analysis, we have evaluated if these two sets of residues co-localize in protein structures. To enable analysis of a large dataset, the identification of ligand-binding residues is automated using two criteria: first, they are highly conserved (with grade = 9 based on the ConSurf conservation scores, [73]); second, they are in heavy-atom-contact with at least one hetero atom (from the HETATM records of PDB files, excluding waters and hydrogen atoms) in the PDB file of the given structure or a homologous protein structure with ≥ 90% sequence identity. The first criterion allows us to automatically filter out those unconserved residues that bind biologically irrelevant ligands or reagents used for crystallization (see [79]). For each test case of the long list, we have calculated a correlation factor (defined as the enrichment in the probability of finding a ligand-binding residue from the set of key residues relative to that expected if randomly selecting a residue from all residues in a protein). A high correlation factor (>>1) indicates a high level of co-localization between the two residue sets. The average correlation factors for the three perturbation-based scores (, *δf*_{
i
}and *δλ*_{m, i}) are 1.58 ± 1.32, 1.45 ± 1.38 and 1.47 ± 1.30, respectively. This result indicates a weak tendency of co-localization – there is ~1.5 times higher probability for ligands to bind with dynamically important residues than random. The ligand-binding residues only comprise a small fraction of the predicted key residues (the average percentage of key residues which are also ligand-binding is only ~6–7%). Therefore, we infer that the conservation of the predicted key residues is not primarily due to the conservation of ligand-binding residues. Indeed, we have reevaluated the performance of the perturbation-based scores after removing all highly conserved ligand-binding residues, the average Z scores are only slightly reduced (see Table 1). Therefore, most of the predicted key residues are evolutionarily constrained by their roles in conformational dynamics rather than specific interactions with ligands.

### 6. Correlation between dynamically important residues and hub residues

In an ENM, the hub residues with high connectivity (defined as the number of neighboring residues in heavy-atom-contact with a given residue) are likely involved in both structural stability and dynamics [80]. Indeed, we have found a high level of conservation for the hub residues with top 15% connectivity (their Z scores are -3.35 ± 1.42 for cases in the long list). Because the residue-position-specific perturbation (see Eq. 3 of Methods) is applied to the interactions between a given residue and its neighbors, it is natural to expect such perturbation to be larger for residues with more neighbors than those with fewer neighbors. To evaluate the connectivity-dependence of the perturbation-based scores, we have assessed how much the predicted key residues overlap with the hub residues.

The correlation factors for the three perturbation-based scores (, *δf*_{
i
}and *δλ*_{m, i}) are 1.65 ± 0.45, 1.69 ± 0.48 and 1.65 ± 0.50, respectively (statistics of long list). This result indicates a relatively weak overlap between the two sets of residues – there is ~1.7 times higher probability to find dynamically important residues to be hub residues than random. The hub residues only comprise a minor fraction of the predicted key residues (the average percentage of key residues predicted by the three scores which are also hub residues is 34–35% for cases in the long list). Thus the majority (>65%) of the dynamically important residues are not hub residues, whose dynamic importance is not due to their high connectivity in ENM.

### 7. A sparse network formed by dynamically important residues

The three perturbation-based scores (after normalization, see Methods) give the fractional contribution of each residue position to either structural fluctuations (, *δf*_{
i
}) or elastic energy (*δλ*_{m, i}). For cases in the long list, on average 59–70% of the structural fluctuations or elastic energy are contributed by the top 15% key residues predicted by one of the three scores. Therefore the protein dynamics is dominated by a sparse set of key residues, which agrees with similar findings by sequence-based analysis (see [12, 13]).

To further explore how the key residues are distributed in 3D space and along protein sequence, we have calculated a spatial (or sequential) correlation factor defined as the enrichment in the probability of finding two key residues in heavy-atom-contact (or sequentially separated by <10 residues) relative to that expected if these residues are distributed randomly in a protein structure (or sequence). In practice, the spatial correlation factor is calculated as , where *N*_{key, par}(*N*_{
par
}) is the number of key residue pairs (all residue pairs) that are in heavy-atom-contact. For cases in the long list, the spatial correlation factors for the three scores (, *δf*_{
i
}and *δλ*_{m, i}) are 3.69 ± 0.63, 3.61 ± 0.59 and 3.56 ± 0.69, respectively; the sequential correlation factors for the three scores are 2.14 ± 0.67, 1.99 ± 0.58 and 2.08 ± 0.66, respectively. Therefore, the dynamically important residues are clustered both in 3D space and along protein sequence (though to a less extent), which form a strongly coupled network. This finding complements a recent finding that the residues involved in local motions of allosteric proteins are correlated in 3D space and along sequence [24]. The high degree of spatial and sequential connectivity allows these key residues to mediate signal transmissions over long distances between spatially separated functional sites.

### 8. Case studies of dynamically important residues in myosin and kinesin

Finally, we will apply the above methods to the functional conformational changes in myosin and kinesin – two filament-based motor proteins driven by ATP binding and hydrolysis. Myosin has been previously studied by the single-mode-based perturbation analysis [64] and correlation analysis [66]. Here we will focus on a comparison of the predictions of dynamically important residues made by various protocols (, *δf*_{
i
}, *δλ*_{m, i}, , connectivity and DynDom).

#### a. myosin

We have examined the large conformational change (see Figure 2d) from a pre-powerstroke transition-state structure (PDB: 1VOM) to a post-powerstroke nucleotide-free structure (PDB: 2AKA), which pertains to the force generation and hydrolysis product release in myosin [81]. This conformational change involves large rotations of the converter, the upper/lower 50 kDa subdomains (U50/L50) relative to the N-terminal subdomain (see Figure 2d). This collective conformational change is captured by the lowest two ENM modes calculated for 1VOM[66]. The key residues predicted by , *δf*_{
i
}and *δλ*_{m, i}are highly conserved (with Z scores of -8.35, -7.80, -7.70, respectively). The key residues predicted by overlap strongly with those predicted by *δf*_{
i
}(66% identical, see Figure 2a) and *δλ*_{m, i}(74% identical, see Figure 2b). These residues are clustered in 3D space, forming a sparse network that connects the active site to the force-generating component (converter) via two flexible connectors – relay helix and SH1 helix (see Figure 2a, b). Interestingly, a few key residues predicted by *δf*_{
i
}are located near the actin-binding cleft (see Figure 2a), which may allow actin binding to regulate the structural fluctuations of myosin [66].

For comparison, we have also shown the hinge residues predicted by the GNM score [55], the hub residues with high connectivity (see Figure 2c) and the hinge-bending residues identified by DynDom (see Figure 2d). They overlap insignificantly with the key residues predicted by (18% identical for the GNM score, 27% identical for the hub residues, 11% identical for DynDom). Unlike the key residues predicted by the perturbation-based scores, the hub residues are extensively distributed rather than clustered in space, while the hinge residues predicted by the GNM score are concentrated near the active site [55]. Therefore, they do not provide a signaling path that connects the active site to converter.

#### b. kinesin

We have studied the observed conformational change (see Figure 3d) from an ADP-bound KIF1A structure (PDB: 1I5S) to an ATP-analog-bound KIF1A structure (PDB: 1VFW) which pertains to ADP release followed by ATP-binding-induced force generation in kinesin [82, 83]. This conformational change involves an *en block* translation and rotation of the switch II cluster (including *α* 4, *α* 5 helices and switch II at the active site, see Figure 3a, d) and local changes in the switch I region (see Figure 3a, d) [82, 83]. The key residues predicted by , *δf*_{
i
}and *δλ*_{m, i}are highly conserved (with Z scores of -4.48, -2.09, -5.21, respectively). The key residues predicted by overlap moderately with those predicted by *δf*_{
i
}(44% identical, see Figure 3a) and *δλ*_{m, i}(30% identical, see Figure 3b). The divergence among the three perturbation-based scores may be due to the involvement of many modes in the structural fluctuations and the observed conformational changes. However, despite the lack of a dominant mode (the maximal overlap is only 0.23), the perturbation-based scores still predict a highly conserved network of residues that connect the active site to the force-generating neck linker via *α* 4 helix (see Figure 3a, b, d).

For comparison, we have also shown the hinge residues predicted by the GNM score, and the hub residues with high connectivity (see Figure 3c). They overlap moderately with the key residues predicted by (30% identical for the GNM score, 22% identical for the hub residues). Similar to myosin, the hub residues are extensively distributed while the hinge residues predicted by the GNM score are clustered spatially (see Figure 3c).

## Conclusion

In this study, we have performed a large-scale evaluation of the predictions of dynamically important residues by a variety of computational protocols including three based on the perturbation and correlation analysis of ENM, and alternative ones based on structural analysis or GNM. We have found that the two fluctuation-based scores ( and *δf*_{
i
}), by accounting for contributions from many ENM modes, outperform the previously proposed single-mode-based score (*δλ*_{m, i}, see [63–65]) especially in cases where many modes are involved in the conformational changes. The dynamically important residues predicted by the perturbation-based scores (, *δf*_{
i
}, *δλ*_{m, i}) are strongly or moderately conserved, and they form a sparse network of residues which are clustered both in 3D space and along protein sequence. Their overall conservation is attributed to their dynamic role rather than ligand binding or high connectivity in ENM. Future functional studies are needed to dissect the detailed roles of these predicted residues.

As shown by numerous studies (see reviews [49–51]), the coarse-grained ENM captures the essence of collective protein dynamics, which is largely determined by the global shape of and local packing in protein structures. Therefore, ENM provides a simple and adequate framework for exploring, with a residue level of details, how much a perturbation to local packing affects protein structural dynamics. Nevertheless, our perturbation methods are only useful for locating the residue positions of dynamic importance, but not for predicting the functional effects of a particular perturbation, which may require more accurate modeling of energetics and dynamics beyond coarse-grained modeling.

Compared with standard NMA, the major advantage of the fluctuation-based scores ( and *δf*_{
i
}) is their low computational cost (the CPU time for analyzing the long list of >500 pairs of structures is < 25 minutes on a quad-core Xeon workstation). The calculation does not require the solution of all ENM modes which is computationally expensive and memory-demanding for large proteins. Instead, it only requires the inversion of a sparse Hessian matrix (see Eq. 6 and 7 in Methods), which can be done much faster than NMA, thanks to the availability of highly efficient sparse linear equation solver [84]. The sparseness of the Hessian matrix is particularly high for low *R*_{
c
}where the perturbation and correlation analysis is optimal (see Table 1).

The perturbation-based protocols are different from and complementary to the GNM-based technique (see [55]) and various others (see [56] and references therein) that find hinge regions of protein domain motions. The dynamically important residues defined by our perturbation analysis are different from the hinge residues defined based on other criteria (see [56]). Our approach can be applied to protein conformational changes that are not rigid-body domain motions, where inter-domain hinges are not well defined.

The finding of conservation of the key residues involved in the fluctuations toward the direction of a structural transition supports the functional importance of such fluctuations. For a biochemical transition from an apo to a ligand-bound state, our finding supports the proposal that sampling of the ligand-bound-state conformation in the absence of a ligand is essential for the transition (see [4]). The key residues predicted by are likely involved in coordinating such pre-existing sampling, which call for future mutational and functional studies of these residues.

It is increasingly recognized that in many proteins the allosteric effects on function involve changes in dynamics in the absence of detectable structural changes [4, 5, 85]. The ENM-based perturbation and correlation analysis involves local perturbations that do not change the equilibrium structure, so they are well suited for simulating the allosteric effects via dynamic regulation and entropic changes [85]. These methods complement alternative computational techniques that simulate changes in equilibrium structure during biomolecular transitions, including several transition pathway modeling methods [57–61]. Our perturbation methods also complement alternative allostery-modeling techniques, including those based on Markov propagation of information across the protein structure [86–88].

The perturbation methods described here are similar in spirit to the dynamics perturbation analysis algorithm proposed by Ming and Wall [89]. Both are based on the first order perturbation theory, although the mathematic form of the perturbation to Hessian matrix differs.

Besides predicting dynamically important residues, our perturbation methods can also be used to parameterize ENM to fit crystallographic temperature factors, which depend on both the intrinsic dynamics of a protein structure and its crystalline environment (see [90]). We have also explored the use of perturbation and correlation analysis to probe allosteric couplings in F_{1} ATPase [91] and the coupling between normal modes in myosin motor [92].

## References

- 1.
Henzler-Wildman K, Kern D:

**Dynamic personalities ofproteins.***Nature*2007,**450**(2):964–972. - 2.
Gerstein M, Echols N:

**Exploring the range of protein flexibility, from a structural proteomics perspective.***Curr Opin Chem Biol*2004,**8**(2):14–19. - 3.
Yon JM, Perahia D, Ghelis C:

**Conformational dynamics and enzyme activity.***Biochimie*1998,**80**(2):33–42. - 4.
Kern D, Zuiderweg ERP:

**The role of dynamics in allosteric regulation.***Curr Opin Struct Biol*2003,**13**(2):748–757. - 5.
Gunasekaran K, Ma BY, Nussinov R:

**Is allostery anintrinsic property of all dynamic proteins?***Proteins*2004,**57**(2):433–443. - 6.
Changeux JP, Edelstein SJ:

**Allosteric mechanisms of signaltransduction.***Science*2005,**308**(2):1424–1428. - 7.
Swain JF, Gierasch LM:

**The changing landscape of protein allostery.***Curr Opin Struct Biol*2006,**16**(2):102–108. - 8.
Cui Q, Karplus M:

**Allostery and cooperativity revisited.***Protein Sci*2008,**17**(2):1295–1307. - 9.
Goodey NM, Benkovic SJ:

**Allosteric regulation andcatalysis emerge via a common route.***Nat Chem Biol*2008,**4**(2):474–482. - 10.
Rousseau F, Schymkowitz J:

**A systems biology perspective on protein structural dynamics and signal transduction.***Curr Opin Struct Biol*2005,**15**(2):23–30. - 11.
Gerstein M, Lesk AM, Chothia C:

**Structural Mechanisms for Domain Movements in Proteins.***Biochemistry*1994,**33**(2):6739–6749. - 12.
Lockless SW, Ranganathan R:

**Evolutionarily conserved pathways of energetic connectivity in protein families.***Science*1999,**286**(2):295–299. - 13.
Suel GM, Lockless SW, Wall MA, Ranganathan R:

**Evolutionarily conserved networks of residues mediate allosteric communication in proteins (vol 10, pg 59, 2003).***Nat Struct Biol*2003,**10**(2):232–232. - 14.
Kass I, Horovitz A:

**Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations.***Proteins*2002,**48**(2):611–617. - 15.
Bertrand D, Gopalakrishnan M:

**Allosteric modulation of nicotinic acetylcholine receptors.***Biochem Pharmacol*2007,**74**(2):1155–1163. - 16.
Raddatz R, Schaffhauser H, Marino MJ:

**Allosteric approaches to the targeting of G-protein-coupled receptors for novel drug discovery: A critical assessment.***Biochem Pharmacol*2007,**74**(2):383–391. - 17.
Ross RA:

**Allosterism and cannabinoid CB1 receptors: the shape of things to come.***Trends Pharmacol Sci*2007,**28**(2):567–572. - 18.
Shi ZS, Resing KA, Ahn NG:

**Networks for the allosteric control of protein kinases.***Curr Opin Struct Biol*2006,**16**(2):686–692. - 19.
Dueber JE, Mirsky EA, Lim WA:

**Engineering synthetic signaling proteins with ultrasensitive input/output control.***Nat Biotechnol*2007,**25**(2):660–662. - 20.
Dueber JE, Yeh BJ, Chak K, Lim WA:

**Reprogramming control of an allosteric signaling switch through modular recombination.***Science*2003,**301**(2):1904–1908. - 21.
Kay LE:

**NMR studies of protein structure and dynamics.***J Magn Reson*2005,**173**(2):193–207. - 22.
Boehr DD, Dyson HJ, Wright PE:

**An NMR perspective on enzyme dynamics.***Chem Rev*2006,**106**(2):3055–3079. - 23.
Schotte F, Lim MH, Jackson TA, Smirnov AV, Soman J, Olson JS, Phillips GN, Wulff M, Anfinrud PA:

**Watching a protein as it functions with 150-ps time-resolved X-ray crystallography.***Science*2003,**300**(2):1944–1947. - 24.
Daily MD, Gray JJ:

**Local motions in a benchmark of allosteric proteins.***Proteins*2007,**67**(2):385–399. - 25.
Najmanovich R, Kuttner J, Sobolev V, Edelman M:

**Side-chain flexibility in proteins upon ligand binding.***Proteins*2000,**39**(2):261–268. - 26.
Betts MJ, Sternberg MJ:

**An analysis of conformational changes on protein-protein association: implications for predictive docking.***Protein Eng*1999,**12**(2):271–283. - 27.
Henzler-Wildman KA, Lei M, Thai V, Kerns SJ, Karplus M, Kern D:

**A hierarchy of timescales in protein dynamics is linked to enzyme catalysis.***Nature*2007,**450**(2):913-U927. - 28.
Hayward S, Lee RA:

**Improvements in the analysis of domain motions in proteins from conformational change: DynDom version 1.50.***J Mol Graph Model*2002,**21**(2):181–183. - 29.
Wriggers W, Schulten K:

**Protein domain movements: Detection of rigid domains and visualization of hinges in comparisons of atomic coordinates.***Proteins*1997,**29**(2):1–14. - 30.
Jacobs DJ, Rader AJ, Kuhn LA, Thorpe MF:

**Protein flexibility predictions using graph theory.***Proteins*2001,**44:**150–165. - 31.
Karplus M, McCammon JA:

**Molecular dynamics simulations of biomolecules (vol 9, pg 646, 2002).***Nat Struct Biol*2002,**9**(2):788–788. - 32.
Ota N, Agard DA:

**Intramolecular signaling pathways revealed by modeling anisotropic thermal diffusion.***J Mol Biol*2005,**351**(2):345–354. - 33.
Schlitter J, Engels M, Kruger P:

**Targeted Molecular-Dynamics – a New Approach for Searching Pathways of Conformational Transitions.***J Mol Graph*1994,**12**(2):84–89. - 34.
Paci E, Karplus M:

**Forced unfolding of fibronectin type 3 modules: An analysis by biased molecular dynamics simulations.***J Mol Biol*1999,**288**(2):441–459. - 35.
Ma JP, Karplus M:

**Molecular switch in signal transduction: Reaction paths of the conformational changes in ras p21.***Proc Natl Acad Sci USA*1997,**94**(2):11905–11910. - 36.
Kong YF, Karplus M:

**Signaling pathways of PDZ2 domain: A molecular dynamics interaction correlation analysis.***Proteins*2009,**74**(2):145–154. - 37.
Tozzini V:

**Coarse-grained models for proteins.***Curr Opin Struct Biol*2005,**15**(2):144–150. - 38.
Ueda Y, Taketomi H, Go N:

**Studies on Protein Folding, Unfolding, and Fluctuations by Computer-Simulation .2. 3-Dimensional Lattice Model of Lysozyme.***Biopolymers*1978,**17**(2):1531–1548. - 39.
Koga N, Takada S:

**Folding-based molecular simulations reveal mechanisms of the rotary motor F-1-ATPase.***Proc Natl Acad Sci USA*2006,**103**(2):5367–5372. - 40.
Hilser VJ, Garcia-Moreno B, Oas TG, Kapp G, Whitten ST:

**A statistical thermodynamic model of the protein ensemble.***Chem Rev*2006,**106**(2):1545–1558. - 41.
Hinsen K:

**Analysis of domain motions by approximate normal mode calculations.***Proteins*1998,**33**(2):417–429. - 42.
Atilgan AR, Durell SR, Jernigan RL, Demirel MC, Keskin O, Bahar I:

**Anisotropy of fluctuation dynamics of proteins with an elastic network model.***Biophys J*2001,**80**(2):505–515. - 43.
Tama F, Sanejouand YH:

**Conformational change of proteins arising from normal mode calculations.***Protein Eng*2001,**14**(2):1–6. - 44.
Haliloglu T, Bahar I, Erman B:

**Gaussian dynamics of folded proteins.***Phys Rev Lett*1997,**79**(2):3090–3093. - 45.
Bahar I, Atilgan AR, Erman B:

**Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential.***Folding & Design*1997,**2**(2):173–181. - 46.
Tirion MM:

**Large Amplitude Elastic Motions in Proteins from a Single-Parameter, Atomic Analysis.***Phys Rev Lett*1996,**77**(2):1905–1908. - 47.
Krebs WG, Alexandrov V, Wilson CA, Echols N, Yu HY, Gerstein M:

**Normal mode analysis of macromolecular motions in a database framework: Developing mode concentration as a useful classifying statistic.***Proteins*2002,**48**(2):682–695. - 48.
Yang L, Song G, Jernigan RL:

**How well can we understand large-scale protein motions using normal modes of elastic network models?***Biophys J*2007,**93**(2):920–929. - 49.
Ma JP:

**Usefulness and limitations of normal mode analysis in modeling dynamics of biomolecular complexes.***Structure*2005,**13**(2):373–380. - 50.
Bahar I, Rader AJ:

**Coarse-grained normal mode analysis in structural biology.***Curr Opin Struct Biol*2005,**15**(2):586–592. - 51.
Tama F, Brooks CL:

**Symmetry, form, and shape: Guiding principles for robustness in macromolecular machines.***Annu Rev Biophys Biomol Struct*2006,**35:**115–133. - 52.
Ming DM, Wall ME:

**Quantifying allosteric effects in proteins.***Proteins*2005,**59**(2):697–707. - 53.
Ming DM, Wall ME:

**Interactions in native binding sites cause a large change in protein dynamics.***J Mol Biol*2006,**358**(2):213–223. - 54.
Xu CY, Tobi D, Bahar I:

**Allosteric changes in protein structure computed by a simple mechanical model: Hemoglobin T <-> R2 transition.***J Mol Biol*2003,**333**(2):153–168. - 55.
Yang LW, Bahar I:

**Coupling between catalytic site and collective dynamics: A requirement for mechanochemical activity of enzymes.***Structure*2005,**13**(2):893–904. - 56.
Flores SC, Keating KS, Painter J, Morcos F, Nguyen K, Merritt EA, Kuhn LA, Gerstein MB:

**HingeMaster: normal mode hinge prediction approach and integration of complementary predictors.***Proteins*2008,**73**(2):299–319. - 57.
Kim MK, Chirikjian GS, Jernigan RL:

**Elastic models of conformational transitions in macromolecules.***J Mol Graph Model*2002,**21**(2):151–160. - 58.
Miyashita O, Onuchic JN, Wolynes PG:

**Nonlinear elasticity, proteinquakes, and the energy landscapes of functional transitions in proteins.***Proc Natl Acad Sci USA*2003,**100**(2):12570–12575. - 59.
Maragakis P, Karplus M:

**Large amplitude conformational change in proteins explored with a plastic network model: Adenylate kinase.***J Mol Biol*2005,**352**(2):807–822. - 60.
Zheng WJ, Brooks BR, Hummer G:

**Protein conformational transitions explored by mixed elastic network models.***Proteins*2007,**69**(2):43–57. - 61.
Franklin J, Koehl P, Doniach S, Delarue M:

**MinActionPath: maximum likelihood trajectory for large-scale structural transitions in a coarse-grained locally harmonic energy landscape.***Nucleic Acids Res*2007,**35:**W477–482. - 62.
Eyal E, Yang LW, Bahar I:

**Anisotropic network model: systematic evaluation and a new web interface.***Bioinformatics*2006,**22**(2):2619–2627. - 63.
Zheng WJ, Brooks BR, Doniach S, Thirumalai D:

**Network of dynamically important residues in the open/closed transition in polymerases is strongly conserved.***Structure*2005,**13**(2):565–577. - 64.
Zheng WJ, Brooks BR, Thirumalai D:

**Low-frequency normal modes that describe allosteric transitions in biological nanomachines are robust to sequence variations.***Proc Natl Acad Sci USA*2006,**103**(2):7664–7669. - 65.
Zheng W, Brooks BR, Thirumalai D:

**Allosteric transitions in the chaperonin GroEL are captured by a dominant normal mode that is most robust to sequence variations.***Biophys J*2007,**93**(2):2289–2299. - 66.
Zheng W, Brooks B:

**Identification of dynamical correlations within the myosin motor domain by the normal mode analysis of an elastic network model.***J Mol Biol*2005,**346**(2):745–759. - 67.
Zheng W, Liao JC, Brooks BR, Doniach S:

**Toward the mechanism of dynamical couplings and translocation in hepatitis C virus NS3 helicase using elastic network model.***Proteins*2007,**67**(2):886–896. - 68.
Zheng WJ, Brooks BR:

**Normal-modes-based prediction of protein conformational changes guided by distance constraints.***Biophys J*2005,**88**(2):3109–3117. - 69.
Zheng WJ, Brooks BR:

**Modeling protein conformational changes by iterative fitting of distance constraints using reoriented normal modes.***Biophys J*2006,**90**(2):4327–4336. - 70.
Wako H, Kato M, Endo S:

**ProMode: a database of normal mode analyses on protein molecules with a full-atom model.***Bioinformatics*2004,**20**(2):2035–2043. - 71.
Echols N, Milburn D, Gerstein M:

**MolMovDB analysis and visualization of conformational change and structural flexibility.***Nucleic Acids Res*2003,**31**(2):478–82. - 72.
Yang LW, Liu X, Jursa CJ, Holliman M, Rader A, Karimi HA, Bahar I:

**iGNM: a database of protein functional motions based on Gaussian Network Model.***Bioinformatics*2005,**21**(2):2978–2987. - 73.
Landau M, Mayrose I, Rosenberg Y, Glaser F, Martz E, Pupko T, Ben-Tal N:

**ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures.***Nucleic Acids Res*2005,**33:**W299-W302. - 74.
Cieplak M, Hoang TX:

**Universality classes in folding times of proteins.***Biophys J*2003,**84**(2):475–488. - 75.
Zheng WJ:

**A unification of the elastic network model and the Gaussian network model for optimal description of protein conformational motions and fluctuations.***Biophys J*2008,**94**(2):3853–3857. - 76.
Lu MY, Poon B, Ma JP:

**A new method for coarse-grained elastic normal-mode analysis.***J Chem Theor Comput*2006,**2**(2):464–471. - 77.
Kondrashov DA, Van Wynsberghe AW, Bannen RM, Cui Q, Phillips GN:

**Protein structural variation in computational models and crystallographic data.***Structure*2007,**15**(2):169–177. - 78.
Taly A, Corringer PJ, Grutter T, de Carvalho LP, Karplus M, Changeux JP:

**Implications of the quaternary twist allosteric model for the physiology and pathology of nicotinic acetylcholine receptors.***Proc Natl Acad Sci USA*2006,**103**(2):16965–16970. - 79.
Hu LG, Benson ML, Smith RD, Lerner MG, Carlson HA:

**Binding MOAD (Mother of All Databases).***Proteins*2005,**60**(2):333–340. - 80.
Bode C, Kovacs IA, Szalay MS, Palotai R, Korcsmaros T, Csermely P:

**Network analysis of protein dynamics.***FEBS Lett*2007,**581**(2):2776–2782. - 81.
Reubold TF, Eschenburg S, Becker A, Kull FJ, Manstein DJ:

**A structural model for actin-induced nucleotide release in myosin.***Nat Struct Biol*2003,**10**(2):826–830. - 82.
Kikkawa M, Sablin EP, Okada Y, Yajima H, Fletterick RJ, Hirokawa N:

**Switch-based mechanism of kinesin motors.***Nature*2001,**411**(2):439–445. - 83.
Nitta R, Kikkawa M, Okada Y, Hirokawa N:

**KIF1A alternately uses two loops to bind microtubules.***Science*2004,**305**(2):678–683. - 84.
Chen Y, Davis TA, Hager WW, Rajamanickam S:

**Algorithm 8xx CHOLMOD, supernodal sparse Cholesky factorization and update/downdate.***ACM Trans Math Softw*2008,**35:**22. - 85.
Tsai CJ, del Sol A, Nussinov R:

**Allostery: Absence of a change in shape does not imply that allostery is not at play.***J Mol Biol*2008,**378**(2):1–11. - 86.
Chennubhotla C, Bahar I:

**Markov propagation of allosteric effects in biomolecular systems: application to GroEL-GroES.***Mol Syst Biol*2006,**2:**36. - 87.
Chennubhotla C, Bahar I:

**Signal propagation in proteins and relation to equilibrium fluctuations.***PLoS Comput Biol*2007,**3**(2):1716–26. - 88.
Lu H, Liang J:

**Perturbation-based Markovian TransmissionModel for Macromolecular Machinery in Cell.***Engineering in Medicine and Biology Society, 29th Annual International Conference of the IEEE*2007, 5029–5034. - 89.
Ming D, Cohn JD, Wall ME:

**Fast dynamics perturbation analysis for prediction of protein functional sites.***BMC Struct Biol*2008,**8:**5. - 90.
Riccardi D, Cui Q, Phillips GN Jr:

**Application of elastic network models to proteins in the crystalline state.***Biophys J*2009,**96**(2):464–75. - 91.
Zheng W:

**Normal mode based modeling of allosteric couplings that underlie cyclic conformational transition in F**_{ 1 }**ATPase.***Proteins*2009,**76**(2):747–762. - 92.
Zheng W, Thirumalai D:

**Coupling between normal modes drives protein conformational dynamics: illustrations using allosteric transitions in myosin II.***Biophys J*2009,**96**(2):2128–2137.

## Acknowledgements

This study is supported by the funding from SUNY Buffalo and a grant from AHA (0835292N).

## Author information

### Affiliations

### Corresponding author

## Additional information

### Authors' contributions

WZ conceived of the study, implemented and tested the methods, and drafted the paper. MT provided assistance with webserver applications and references preparation. Both authors read and approved the final manuscript.

## Electronic supplementary material

**Supplementary tables**

Additional file 1: . The data provided offer detailed information for the two lists of test cases (Tables S1&S2) and the predicted key residues in myosin and kinesin (Tables S3&S4). (DOC 90 KB)

## Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

## Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## About this article

### Cite this article

Zheng, W., Tekpinar, M. Large-scale evaluation of dynamically important residues in proteins predicted by the perturbation analysis of a coarse-grained elastic model.
*BMC Struct Biol* **9, **45 (2009). https://doi.org/10.1186/1472-6807-9-45

Received:

Accepted:

Published:

### Keywords

- Root Mean Square Deviation
- Normal Mode Analysis
- Important Residue
- Conformational Dynamic
- Residue Position