A protein molecule is not a rigid body. The scale of protein motions is very broad: motions range from local fluctuations such as those seen in loop regions to global ones involving changes in the relative position of rigid domains. Flexible regions and linkers connecting rigid regions are often observed in large proteins. Flexible regions are often necessary for proteins to perform their specific biological functions [1–4], e.g. by enabling proteins to adjust their conformations in response to external stimulation. Such stimuli can include the binding of a ligand or a change of the surrounding environment. Structural flexibility is therefore an important characteristic that must be examined to understand proteins.
When we specifically examine motions of a protein backbone segment in ordered structures, the movement is theoretically classified into two forms: internal and external motion . The former is a deformation of the segment itself, but the latter involves only translational and rotational motions of the segment. In the external motion, the segment fluctuates as a rigid body by changing dihedral angles of the flanking residues. For this reason, it is considered that the internal and external motions fundamentally differ (Additional file 1: Figure S1). It is expected that the distinction between these motions will provide new insights into the relation between structural flexibility and its function .
Actually, NMR provides a powerful experimental technique to analyze protein dynamics at the atomic and molecular levels . Particularly, NOEs and relaxation experiments provide information related to picosecond-microsecond motions of the backbone atoms [8–10]. Model-free analysis enables quantitative determination of fluctuation and slow conformational change (i.e. millisecond order) of the backbone amide vector [11, 12]. The latter motion is assumed to be related to internal motion, as described above. Although NMR provides a detailed view of protein dynamics, it is time-consuming.
In contrast, computational methods are useful to calculate the dynamics of proteins for which structures are available. One method is to compare structures of a protein crystallized under different conditions or different conformers of NMR. Structural differences show a flexible region [13–15]. Another computational method is to simulate protein dynamics. Among several methods, Normal Mode Analysis (NMA) provides a straightforward means of calculating the dynamics from its structure. Although NMA is less CPU-intensive than other computer simulation methods such as Molecular Dynamics (MD), Monte-Carlo (MC) simulation, and Framework Rigidity Optimized Dynamics Algorithm (FRODA)/Floppy Inclusions and Rigid Substructure Topography (FIRST) software [16, 17], it can detect concerted motions of clusters of atoms and support discussion of their motions for elucidation of their functions [18–21]. Using NMA results, Nishikawa and Go examined internal and external motions of secondary structure , and Ishida et al. studied a subtilisin-eglin c complex to explore internal and external motions of enzymes and inhibitors .
With the increasing number of available protein structures and the development of high-performance computers, databases of protein dynamics have been constructed. In fact, i GNM  and ProMode  are databases of protein motion analyzed that respectively use a Gaussian Network Model (GNM) and NMA. Another database, MolMovDB , presents numerous graphical representations including motions of loops, domains, and subunits. In addition, DynDom  provides domain, hinge axes, and hinge bending residues in proteins determined from two different conformations of the same protein.
Recently, web-based tools for predicting internal motion have been developed. For example, FlexOracle  and HingeProt  predict hinge regions in proteins. Furthermore, DFprot  predicts main-chain deformability, which corresponds fundamentally to the internal motions described above.
However, all these tools, along with NMA, offer only limited practical use because they require knowledge of the three-dimensional structure information of a protein. Recently, a few prediction techniques that address protein motions using only amino acid sequence information have been proposed. Of those, ASP  and the Protein Continuum Secondary Structure Predictor  identify conformational switches in proteins using secondary structure information. FlexPred also predicts ordered conformational change in the protein backbone using information of sequence neighbors, evolutionary conservation, and solvent accessibility [31, 32]. These definitions of protein motions are similar to those of internal motion, as described above. The support vector machine-based predictor, Wiggle, predicts functionally flexible regions defined using a coarse-grained-protein dynamic modeling approach .
In this paper, we present a novel method for predicting internal and external motion in ordered structures. The proposed method is based on the Random Forest (RF) algorithm using information associated with the adjacent paired amino acid residues and a predicted secondary structure. The method presents the advantage of enabling prediction of protein motions using amino acid sequence information alone as the input. The proposed method exhibits moderate correlation between predicted internal and external motions with those calculated by NMA: the respective correlation coefficients are 0.525 and 0.597. To investigate the possibility that the proposed method detects flexible regions related with protein function, we applied it to 20 proteins that undergo large conformational change upon protein-protein interaction. The results revealed, in 85% of the proteins studied, overlaps between the predicted high internal motion region and observed conformational change region.