Universal partitioning of the hierarchical fold network of 50-residue segments in proteins
© Ito et al; licensee BioMed Central Ltd. 2009
Received: 07 October 2008
Accepted: 20 May 2009
Published: 20 May 2009
Several studies have demonstrated that protein fold space is structured hierarchically and that power-law statistics are satisfied in relation between the numbers of protein families and protein folds (or superfamilies). We examined the internal structure and statistics in the fold space of 50 amino-acid residue segments taken from various protein folds. We used inter-residue contact patterns to measure the tertiary structural similarity among segments. Using this similarity measure, the segments were classified into a number (Kc) of clusters. We examined various Kc values for the clustering. The special resolution to differentiate the segment tertiary structures increases with increasing Kc. Furthermore, we constructed networks by linking structurally similar clusters.
The network was partitioned persistently into four regions for Kc ≥ 1000. This main partitioning is consistent with results of earlier studies, where similar partitioning was reported in classifying protein domain structures. Furthermore, the network was partitioned naturally into several dozens of sub-networks (i.e., communities). Therefore, intra-sub-network clusters were mutually connected with numerous links, although inter-sub-network ones were rarely done with few links. For Kc ≥ 1000, the major sub-networks were about 40; the contents of the major sub-networks were conserved. This sub-partitioning is a novel finding, suggesting that the network is structured hierarchically: Segments construct a cluster, clusters form a sub-network, and sub-networks constitute a region. Additionally, the network was characterized by non-power-law statistics, which is also a novel finding.
Main findings are: (1) The universe of 50 residue segments found here was characterized by non-power-law statistics. Therefore, the universe differs from those ever reported for the protein domains. (2) The 50-residue segments were partitioned persistently and universally into some dozens (ca. 40) of major sub-networks, irrespective of the number of clusters. (3) These major sub-networks encompassed 90% of all segments. Consequently, the protein tertiary structure is constructed using the dozens of elements (sub-networks).
Despite the vast number of amino-acid sequences, protein folds (or superfamilies) are quantitatively limited [1–4]. Consequently, protein fold classification is an important subject for elucidating the construction of protein tertiary structures. A key word to characterize protein folds is "hierarchy". Well-known databases – SCOP  and CATH  – have classified the tertiary structures of protein domains hierarchically. Similarly, a tree diagram was produced to classify the folds .
Mapping the tertiary structures of full-length protein domains to a conformational space, a structure distribution is generated: a so-called protein fold universe [8–11]. A key word to characterize the fold universe is "space partitioning". A two-dimensional (2D) representation of the fold universe was proposed in earlier reports [12, 13], where the universe was partitioned into three fold (α, β, and α/β) regions. A three-dimensional (3D) fold universe was partitioned into four fold regions: all-α, all-β, α/β, and α+β . Software that is accessible on a web site, PDBj http://eprots.protein.osaka-u.ac.jp/globe.cgi, serves the distribution on a global surface .
The structures of short protein segments have also been studied: Segments of a few (2–3) amino-acid residues long were projected in a two-dimensional (2D) space, where some typical combinations frequently appeared . Fold universes of segments of 4–9 residues long  and 10–20 residues long [17–19] showed several clearly distinguishable structural clusters. A systematic survey for 10–50 residue segments has shown that the fold universe is classifiable into segment universes of three types: short (10–22 residues), medium (23–26 residues), and long (27–50 residues) . In this work, the 3D shape of the universe varied abruptly at 23 and 27 residues long. A sequence-structure correlation found in short segments supports the tertiary structure prediction of full-length proteins [21–23].
These studies of protein segments and domains exemplify some structural clusters existing in the low-dimensional (2D or 3D) conformational space. The benefit of the low-dimensional expression is that one can readily imagine the shape of the universe. Increasing the segment length, however, the lowering of the space dimensionality hides the internal architecture of the structure distribution. Consequently, the internal architecture of the distribution for 50-residue segments (or longer segments) is unclear . To compensate the full-dimensional information to the low-dimensional expression, a network is helpful in which two structures close to each other in the full-dimensional conformational space are connected.
Presume an ensemble of points (or nodes). Inter-node linkages form the networks. The network concept has been applied recently to biological systems [24–27]. Structurally similar segments can be linked for the segment fold universe. The structural similarity is computed for the overall structures of two segments (i.e., all coordinates of the segments). Therefore, the similarity is a quantity defined in full-dimensional space. Consequently, a 2D or 3D universe consisting of linked nodes involves full-dimensional information. To assign inter-node linkage in the ensemble, a score is important to quantify the structural similarity between two tertiary structures. Inter-residue contact (native contact) patterns have been used as reaction coordinates in protein folding studies [28–30]. When two structures have similar native contact patterns, they exhibit similar inter-residue packing. Results of several studies indicate that the native contacts are useful indicators to assess the protein folding process [31–43] and folding time scale [41–43].
Herein, we constructed a fold network of 50-residue segments taken from four major structural classes of protein domains. We used the inter-residue contact pattern for the similarity score. The resultant networks showed the main partitioning, as expected. Furthermore, as a new finding, the network of the segment structures was partitioned into dozens of universal communities (sub-networks). From these observations, we propose a novel protein structure hierarchy with community sites at a hierarchy level. The novelty of the currently identified hierarchy was ensured by non-power-law statistics in the hierarchy, which differs from power-law statistics characterizing other hierarchies ever found for protein tertiary structures.
As described in Methods, 50-residue segments were taken from representative proteins and classified into Kc clusters, each of which consists of structurally similar segments. We calculated the native contact patterns that are common in each cluster, and constructed networks by connecting the clusters according to their contact pattern similarity. In Results, we first examine the general aspects of the obtained clusters. Second, we check the conformational distribution using a 3D map. Finally, we analyze the characterization of 50-residue segment universe using a network analysis.
As described in this paper, indices i and j are used for specifying residue positions in a 50-residue segment, s and t for segment ordinal numbers, u and v for cluster ordinal numbers, and w for a community ordinal number.
General aspects for clusters
The segments were generated by sliding a 50-residue window one residue by one residue along the domain sequences (see Methods). Consequently, two segments taken from the same protein domain with mutual adjacency in the sequence might have similar structures and might therefore be involved in a cluster. We did the following analysis to verify this possibility quantitatively: Presume that a cluster u involves n m segments originated in a protein m. Subsequently, we introduced a quantity: , where the summation is taken over proteins that supply segment(s) to the cluster u, and Np is the number of those proteins. Figure 1B presents a plot of the average of O u as a function of Kc: . For Kc = 1000, <O > converged to 2.2. Consequently, a protein supplies only two or three segments to a cluster on average: i.e., a cluster does not contain excessive segments derived from a single protein for Kc ≥ 1000.
Fold universe and network of clusters
The inter-cluster (inter-node) links were assigned to the K c clusters according to the adjacency matrix a uv . Directly connected clusters have mutually similar inter-residue contact patterns. Internal architectures of the networks were investigated by dividing the networks into communities (sub-networks) using Newman's method . In parallel, we projected the networks into a 3D space to obtain positions in the conformational space (see Additional file 1 for details). Although the clusters were embedded in the 3D space, the inter-cluster links were given to clusters that are mutually close in the full-dimensional space.
Each community was characterized by five biophysical structural features: the α, β, αβ secondary-structure elements, the radius of gyration, and the number of inter-residue contacts, denoted respectively as n α , n β , n αβ , Rg, and Ncontact. Then, the communities were classified into four types (α, β, αβ, and randomly structured communities) depending on the five structural features (see Methods for details).
The protein-domain universe is known to be an extremely biased distribution [8, 45]. Many studies have suggested a power-law statistic to represent the relation between the number of families and the number of folds [9, 46, 47]. For instance, Shakhnovich and co-workers created a protein-domain universe graph (PDUG) with adoption of a DALI Z-score for the similarity score, and showed that the domain universe followed a power-law distribution . Consequently, it is interesting to check if the currently produced network of the 50-residue segments follows the power law distribution.
Robustness of communities
We next calculated the number of communities at various Kc. We classified the communities into major and minor communities. Major ones are communities consisting of more than three clusters. Then, minor ones are small isolated communities consisting of only one or two clusters without links to other communities. No community involves only one cluster linked to another community. The Kc dependence of the number (Ncom) of the major communities is presented in Figure 10B. The minor communities do not characterize the overall property of the network because only 10% of clusters belong to the minor communities at any Kc. The increment of Ncom with increasing Kc was rapid for 100 ≤ Kc ≤ 1000 and slow for Kc ≥ 1000. The values of Ncom were, respectively, 36, 38, and 38 at Kc = 1000, 2000, and 3000. This result shows that the number of communities was conserved for Kc ≥ 1000.
Figure 10B portrays that the current universe for the 50-residue segments consists of some dozens (ca. 40) of major communities. Kihara and Skolnick reported that the current PDB database might cover almost all structures of small proteins . Crippen and Maiorov generated many self-avoiding conformations of a chain and suggested that the possible structures of a 50-residue chain are classifiable roughly into a small number of types, although the secondary-structure formation was not incorporated in their model . A study proposed the conjecture that tertiary-structure evolution of proteins might be achieved using limited repertoires of basic units such as supersecondary structure elements . Results of such studies are consistent with our results because we have shown that protein tertiary structures can be decomposed into the dozens of major communities of 50-residue segments. Actually, 90% of clusters belong to the major communities. To link those studies with our study more closely, detailed contents of each major community should be investigated. In fact, such a research project is proceeding now. Moreover, the role of the minor communities in the protein structure construction should be studied.
The currently observed 50-residue segment universe was characterized by the non-power-law distribution. Our result apparently differs from the power-law distribution widely known for the hierarchical protein domain universe [9, 46, 47, 53]. The emergence of the non-power-law statistics might be related to the usage of the inter-residue contact, which is a more relaxed similarity measure than widely used ones such as RMSD or the DALI Z-score. It is known that in the power-law statistics the rate for isolated clusters in the entire clusters is high . In our non-power law statistics, the rate was low because the relaxed measure provided linkages between clusters. Thus, the two statistics compensate to each other to survey the fold universe. From the non-power-law universe, we could show a novel hierarchy (Figure 12) in the universe and the existence of 40 repertories (Figure 10) to construct the protein tertiary structures, which have not been reported from the power-law universe. These results were also found in the 60- and 70-residue segment universes (data not shown). This suggests that the non-power law is likely to be a general property for segment universes.
The current network helps to trace conformational changes of segments along the network linkages. Supplementary Results displays that the conformation gradually changes when shifting the view from a cluster to another (see Additional file 1).
The inter-residue contact (native contact) has been widely used as a reaction coordinate in protein folding (see Introduction). We intend to use the currently obtained networks for protein folding study. The networks of fixed-length segments are readily applicable for conformational sampling in protein folding, where the chain length is usually fixed. The randomly structured clusters are located at the root of the distribution (Figure 4 and Figure 5), from which the segment conformation can diversify to mainly α, mainly β, or αβ regions with increased compactness (Figure 7).
We constructed a 50-residue segment network for investigating the protein local structure universe. The network was partitioned into some dozens (ca. 40) of major communities with high modularity (0.60 <Qmod < 0.65), independent of the spatial resolution (Kc). The major communities existed universally and persistently in the universe. Surprisingly, 90% of all segments were covered by the major communities. Consequently, numerous similarities exist among local regions (i.e., 50-residue segments) of proteins. Furthermore, the currently constructed segments networks are characterized by non-power-law (non-scale-free) statistics, which apparently differs from reported characteristics for the fold universe of full-length proteins.
This section includes six subsections. The first three – "Generation of 50-residue segment library", "Clustering segments", and "Computation of inter-residue contact patterns" – are preparative subsections describing construction of the 50-residue segment fold universe. In the subsection titled "Construction of a universe and network", construction of the fold universe and the network is described. "Modularity analysis" presents analyses used to examine the network. The subsection "Characterization of communities by structural features" describes a method to characterize communities depending on five structural features. Specification of indices i, j, s, t, u, v, and w is given at the beginning of Results.
Generation of 50-residue segment library
We generated a structure library of 50-residue segments with reference to the all-α, all-β, α/β, and α+β fold classes defined in the SCOP database (release 1.69) . The SCOP database presents a list that provides a representative for each protein family. We selected tertiary structures of the representative domains from the PDB database  with elimination of multi-chain domains, those involving structurally undetermined regions, and those shorter than 50 residues. Furthermore, we eliminated domains consisting of 400 residues or more, which might involve structurally repeating units. Then we obtained 1803 domains (456 from all-α, 393 from all-β, 393 from α/β, and 561 from α+β). A domain that is nr amino-acid residues long produces nr - 49 segments from sliding a 50-residue window along the sequence one residue-by-one residue. Finally, we obtained an ensemble of 186 821 segments (32 040 from all-α, 39 375 from all-β, 63 177 from α/β, and 52 229 from α+β). The residue site of each segment was re-numbered from 1 to 50 in our study.
We classify the collected segments into clusters as follows: First, the inter-Cα atomic distances were calculated for segment s, where the distance between residues i and j is denoted as r s (i, j). We eliminated residue pairs |i - j| < 3 because the distances for these pairs are similar for all segments. In other words, those distances have less sensitivity to discriminate the structural differences of segments. Then, the number (Npair) of the Cα-atomic pairs in a 50-residue segment is 1128: Npair = 1128. The set of distances is expressed as a Npair-dimensional vector: = [r s (1, 4), r s (1, 5), ..., r s (47, 50)]. We define the root mean square distance (rmsd st ) between and as in the Npair-dimensional Cartesian space: .
For classifying the 186 821 segments into Kc clusters, we applied Lloyd's K-means algorithm  to the set of rmsd st values, where s, t = 1, ..., 186821. One should set Kc in advance in the K-means algorithm. We examined various values for Kc (Kc ≤ 5000). In Lloyd's method, the Kc clusters are set randomly at the beginning. The finally converged clusters are output. We have checked that the main results are independent of the initial set of clusters.
The n u is the number of constituent segments of the cluster u.
where the summation is taken over all the Kc clusters.
Computation of inter-residue contact patterns
In this subsection, we present computation of the inter-cluster and intra-cluster structural similarity based on the inter-residue contact patterns. The inter-residue contacts in segment s were defined as follows: Calculating all the inter-heavy atomic distances between residues i and j for the segment, their minimum distance was registered as the inter-residue distance q s (i, j). Then, if q s (i, j) < 6.0 Å, we judged that the residues i and j were contacting and set a quantity c s (i, j) to 1 (otherwise, c s (i, j) = 0). Here, we again eliminated residue pairs of |i - j| < 3 in the calculation of c s (i, j). The set of c s (i, j) constructs a matrix C s , where element (i, j) is c s (i, j).
The upper limit (6.0 Å) for q s (i, j) allows no penetration of a water molecule between residues i and j: At q s (i, j) = 6.0 Å, the substantial space for water penetration between the residues is approximately 2.0 Å (= 6.0 - 2 × 2.0) assuming that radii of segment heavy atoms are 2.0 Å. This space of 2.0 Å is smaller than the diameter of a water molecule (2.8 Å).
The inter-residue contact patterns are similar between clusters u and v only when . Herein, we set f0 to 0.7. The meaning of 0.7 is explained in the Results section.
The larger the value of , the more similar the inter-residue contact patterns in each cluster are, on average.
Construction of a universe and network
We constructed a distribution (i.e., fold universe) of Kc clusters in a 3D conformational space with embedding clusters into the 3D. Details are presented in Additional file 1. As explained in the Introduction, lowering of the space dimensionality hides the internal architecture of the fold universe. To compensate the full-dimensional information to the 3D distribution, links were assigned to clusters with similar inter-residue contact patterns (a uv = 1). The generated networks were subjected to the modularity analysis described in the next subsection.
where I w is the number of links connecting clusters within a community w, Ncom is the number of communities existing in the entire network, and I is the number of links existing in the entire network. The quantity d w is called the "total degree", which is defined for each community as d w = 2Iw + Iw-other, where Iw-other is the number of links connecting clusters in the community w and clusters outside the community. The value of Qmod is 0–1: Qmod approaches 1 when the number of links connecting different communities decreases. For instance, the network in Figure 14A has Qmod of 0.466 (I = 34, I1 = 18, I2 = 15, d1 = 37, and d2 = 31). That of Figure 14B has Qmod of 0.388 (I = 37, I1 = 18, I2 = 15, d1 = 40, and d2 = 34). The two networks are equivalent except for the inter-community links.
Characterization of communities by structural features
The manner of differentiating the communities is important. Herein, we characterize the communities depending on five biophysical structural features: radius of gyration (Rg), number of inter-residue contacts ( with removal of pairs of |i - j| < 3), number of α-helical residues (n α ), number of β-helical residues (n β ), and the sum of n α and n β (i.e., n αβ = n α + n β ).
First, we calculate the five quantities for each segment. The secondary-structure assignment to each residue in a segment is done using software available at the STRIDE web site http://webclu.bio.wzw.tum.de/stride/. Next, we took the average for each of the five quantities over segments in a community. We designate the average quantities in a community w as Rg(w), Ncontact(w), n α (w), n β (w), and n αβ (w). Then, we classify the communities into α, β, αβ, and randomly structured ones according to the five quantities: Randomly structured communities are those with Rg > 14 Å and Ncontact(w) < 100 or those with n αβ (w) < 15. In the remaining communities, α communities are those with n α (w) > 0.7 × n αβ (w). In the remaining communities, β communities are those with n α (w) > 0.7 × n αβ (w). The finally remaining communities are classified as αβ communities. Each segment in the αβ communities significantly involves both an α helix and a β strand.
KI and JH were partly supported by BIRD of Japan Science and Technology Agency (JST). JH was also partly supported by New Energy and Industrial Technology Development Organization (NEDO).
- Chothia C: Proteins. One thousand families for the molecular biologist. Nature 1992, 357: 543–544. 10.1038/357543a0View ArticlePubMedGoogle Scholar
- Gibrat JF, Madej T, Bryant SH: Surprising similarities in structure comparison. Curr Opin Struct Biol 1996, 6: 377–385. 10.1016/S0959-440X(96)80058-3View ArticlePubMedGoogle Scholar
- Coulson AFW, Moult J: A unifold, mesofold, and superfold model of protein fold use. Proteins 2002, 46: 61–71. 10.1002/prot.10011View ArticlePubMedGoogle Scholar
- Liu X, Fan K, Wang W: The number of protein folds and their distribution over families in nature. Proteins 2004, 54: 491–499. 10.1002/prot.10514View ArticlePubMedGoogle Scholar
- Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995, 247: 536–540.PubMedGoogle Scholar
- Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM: CATH – a hierarchic classification of protein domain structures. Structure 1997, 5: 1093–1108. 10.1016/S0969-2126(97)00260-8View ArticlePubMedGoogle Scholar
- Efimov AV: Structural trees for protein superfamilies. Proteins 1997, 28: 241–260. 10.1002/(SICI)1097-0134(199706)28:2<241::AID-PROT12>3.0.CO;2-IView ArticlePubMedGoogle Scholar
- Holm L, Sander C: Mapping the protein universe. Science 1996, 273: 595–602. 10.1126/science.273.5275.595View ArticlePubMedGoogle Scholar
- Dokholyan NV, Shakhnovich B, Shakhnovich EI: Expanding protein universe and its origin from the biological Big Bang. Proc Natl Acad Sci USA 2002, 99: 14132–14136. 10.1073/pnas.202497999PubMed CentralView ArticlePubMedGoogle Scholar
- Hou J, Sims GE, Zhang C, Kim S-H: A global representation of the protein fold space. Proc Natl Acad Sci USA 2003, 100: 2386–2390. 10.1073/pnas.2628030100PubMed CentralView ArticlePubMedGoogle Scholar
- Hou J, Jun S-R, Zhang C, Kim S-H: Global mapping of the protein structure space and application in structure-based inference of protein function. Proc Natl Acad Sci USA 2005, 102: 3651–3656. 10.1073/pnas.0409772102PubMed CentralView ArticlePubMedGoogle Scholar
- Holm L, Sander C: Protein structure comparison by alignment of distance matrices. J Mol Biol 1993, 233: 123–138. 10.1006/jmbi.1993.1489View ArticlePubMedGoogle Scholar
- Orengo CA, Flores TP, Taylor WR, Thornton JM: Identification and classification of protein fold families. Protein Eng 1993, 6: 485–500. 10.1093/protein/6.5.485View ArticlePubMedGoogle Scholar
- Standley DM, Kinjo AR, Kinoshita K, Nakamura H: Protein structure databases with new web services for structural biology and biomedical research. Brief Bioinfo 2008, 9: 276–285. 10.1093/bib/bbn015View ArticleGoogle Scholar
- Takahashi K, Go N: Conformational classification of short backbone fragments in globular proteins and its use for coding backbone conformations. Biophys Chem 1993, 47: 163–178. 10.1016/0301-4622(93)85034-FView ArticleGoogle Scholar
- Tomii K, Kanehisa M: Systematic detection of protein structural motifs. In Pattern discovery in biomolecular data. Edited by: Wang JTL, Shapiro BA, Shasha D. New York: Oxford University Press; 1999:97–110.Google Scholar
- Choi IG, Kwon J, Kim S-H: Local feature frequency profile: A method to measure structural similarity in proteins. Proc Natl Acad Sci USA 2004, 101: 3797–3802. 10.1073/pnas.0308656100PubMed CentralView ArticlePubMedGoogle Scholar
- Ikeda K, Tomii K, Yokomizo T, Mitomo D, Maruyama K, Suzuki S, Higo J: Visualization of conformational distribution of short to medium size segments in globular proteins and identification of local structural motifs. Protein Sci 2005, 14: 1253–1265. 10.1110/ps.04956305PubMed CentralView ArticlePubMedGoogle Scholar
- Sawada Y, Honda S: Structural diversity of protein segments follows a power-law distribution. Biophys J 2006, 91: 1213–1223. 10.1529/biophysj.105.076661PubMed CentralView ArticlePubMedGoogle Scholar
- Ikeda K, Hirokawa T, Higo H, Tomii K: Protein-segment universe exhibiting transitions at intermediate segment length in conformational subspaces. BMC Structural Biology 2008, 8: 37. 10.1186/1472-6807-8-37PubMed CentralView ArticlePubMedGoogle Scholar
- Simons KT, Kooperberg C, Huang E, Baker D: Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol 1997, 268: 209–225. 10.1006/jmbi.1997.0959View ArticlePubMedGoogle Scholar
- Bonneau R, Strauss CE, Rohl CA, Chivian D, Bradley P, Malmström L, Robertson T, Baker D: De novo prediction of three-dimensional structures for major protein families. J Mol Biol 2002, 322: 65–78. 10.1016/S0022-2836(02)00698-8View ArticlePubMedGoogle Scholar
- Chikenji G, Fujitsuka Y, Takada S: A reversible fragment assembly method for de novo protein structure prediction. J Chem Phys 2003, 119: 6895–6903. 10.1063/1.1597474View ArticleGoogle Scholar
- Jeong H, Mason SP, Barabási AL, Oltvai ZN: Lethality and centrality in protein networks. Nature 2001, 411: 41–42. 10.1038/35075138View ArticlePubMedGoogle Scholar
- Holme P, Huss M, Jeong H: Subnetwork hierarchies of biochemical pathways. Bioinformatics 2003, 19: 532–538. 10.1093/bioinformatics/btg033View ArticlePubMedGoogle Scholar
- Guimerà R, Amaral LAN: Functional cartography of complex metabolic networks. Nature 2005, 433: 895–900. 10.1038/nature03288PubMed CentralView ArticlePubMedGoogle Scholar
- Palla G, Derényi I, Farkas I, Vicsek T: Uncovering the overlapping community structure of complex net-works in nature and society. Nature 2005, 435: 814–818. 10.1038/nature03607View ArticlePubMedGoogle Scholar
- Go N: Theoretical studies of protein folding. Annu Rev Biophys Bioeng 1983, 12: 183–210. 10.1146/annurev.bb.12.060183.001151View ArticlePubMedGoogle Scholar
- Go N, Abe H: Randomness of the process of protein folding. Int J Pept Protein Res 1983, 22: 622–632.View ArticlePubMedGoogle Scholar
- Wolynes PG, Onuchic JN, Thirumalai D: Navigating the folding routes. Science 1995, 267: 1619–1620. 10.1126/science.7886447View ArticlePubMedGoogle Scholar
- Galzitskaya OV, Finkelstein AV: A theoretical search for folding/unfolding nuclei in three-dimensional protein structures. Proc Natl Acad Sci USA 1999, 96: 11229–11304. 10.1073/pnas.96.20.11299View ArticleGoogle Scholar
- Munoz V, Eaton WA: A simple model for calculating the kinetics of protein folding from three-dimensional structures. Proc Natl Acad Sci USA 1999, 96: 11311–11316. 10.1073/pnas.96.20.11311PubMed CentralView ArticlePubMedGoogle Scholar
- Shea J-E, Brooks CL III: From folding theories to folding proteins: a review and assessment of simulation studies of protein folding and unfolding. Annu Rev Phys Chem 2001, 52: 499–535. 10.1146/annurev.physchem.52.1.499View ArticlePubMedGoogle Scholar
- Koga N, Takada S: Roles of native topology and chain-length scaling in protein folding: A simulation study with a Go-like model. J Mol Biol 2001, 313: 171–180. 10.1006/jmbi.2001.5037View ArticlePubMedGoogle Scholar
- Makarov DE, Keller CA, Plaxco KW, Metiu H: How the folding rate constant of simple, single-domain proteins depends on the number of native contacts. Porc Natl Acad Sci USA 2002, 99: 3535–3539. 10.1073/pnas.052713599View ArticleGoogle Scholar
- Zhou HX: Theory for the rate of contact formation in a polymer chain with local conformational transitions. J Chem Phys 2003, 118: 2010–2015. 10.1063/1.1531588View ArticleGoogle Scholar
- Nakamura HK, Sasai M, Takano M: Scrutinizing the squeezed exponential kinetics observed in the folding simulation of an off-lattice Go-like protein model. Chem Phys 2004, 307: 259–267. 10.1016/j.chemphys.2004.07.011View ArticleGoogle Scholar
- Mitomo D, Nakamura HK, Ikeda K, Yamagishi A, Higo J: Transition state of a SH3 domain detected with principle component analysis and a charge-neutralized all-atom protein model. Proteins 2006, 64: 883–894. 10.1002/prot.21069View ArticlePubMedGoogle Scholar
- Ikebe J, Kamiya N, Shindo H, Nakamura H, Higo J: Conformational sampling of a 40-residue protein consisting of α and β secondary-structure elements in explicit solvent. Chem Phys Lett 2007, 443: 364–368. 10.1016/j.cplett.2007.06.102View ArticleGoogle Scholar
- Kamiya N, Mitomo D, Shea J-E, Higo J: Folding of the 25 residue Abeta(12–36) peptide in TFE/water: temperature-dependent transition from a funneled free-energy landscape to a rugged one. J Phys Chem B 2007, 111: 5351–5356. 10.1021/jp067075vView ArticlePubMedGoogle Scholar
- Baker D: A surprising simplicity to protein folding. Nature 2000, 405: 39–42. 10.1038/35011000View ArticlePubMedGoogle Scholar
- Kamagata K, Arai M, Kuwajima K: Unification of the folding mechanisms of non-two-state and two-state proteins. J Mol Biol 2004, 339: 951–965. 10.1016/j.jmb.2004.04.015View ArticlePubMedGoogle Scholar
- Kamagata K, Kuwajima K: Surprisingly high correlation between early and late stages in non-two-state protein folding. J Mol Biol 2006, 357: 1647–1654. 10.1016/j.jmb.2006.01.072View ArticlePubMedGoogle Scholar
- Newman MEJ: Finding community structure in net-works using the eigenvectors of matrices. Phys Rev E 2006, 74: 036104. 10.1103/PhysRevE.74.036104View ArticleGoogle Scholar
- Grant A, Lee D, Orengo C: Progress towards mapping the universe of protein folds. GenomeBiology 2004, 5: 107.PubMed CentralPubMedGoogle Scholar
- Koonin EV, Wolf YI, Karev GP: The structure of the protein universe and genome evolution. Nature 2002, 420: 218–223. 10.1038/nature01256View ArticlePubMedGoogle Scholar
- Qian J, Luscombe NM, Gerstein M: Protein Family and Fold Occurrence in Genomes: Power-law Behaviour and Evolutionary Model. J Mol Biol 2001, 313: 673–681. 10.1006/jmbi.2001.5079View ArticlePubMedGoogle Scholar
- Barabási AL, Albert R: Emergence of scaling in random networks. Science 1999, 286: 509–512. 10.1126/science.286.5439.509View ArticlePubMedGoogle Scholar
- Newman MEJ, Girvan M: Fast algorithm for detecting community structure in networks. Phys Rev E 2004, 69: 026113. 10.1103/PhysRevE.69.026113View ArticleGoogle Scholar
- Kihara D, Skolnick J: The PDB is a covering set of small protein structures. J Mol Biol 2003, 334: 793–802. 10.1016/j.jmb.2003.10.027View ArticlePubMedGoogle Scholar
- Crippen GM, Maiorov VN: How Many Protein Folding Motifs are There? J Mol Biol 1995, 252: 144–151. 10.1006/jmbi.1995.0481View ArticlePubMedGoogle Scholar
- Soding J, Lupas AN: More than the sum of their parts: on the evolution of proteins from peptides. BioEssay 2003, 25: 837–846. 10.1002/bies.10321View ArticleGoogle Scholar
- Krishnadev O, Brinda KV, Vishveshwara S: A graph spectral analysis of the structural similarity of protein chains. Proteins 2005, 61: 152–163. 10.1002/prot.20532View ArticlePubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The protein data bank. Nucleic Acids Res 2000, 28: 235–242. 10.1093/nar/28.1.235PubMed CentralView ArticlePubMedGoogle Scholar
- Lloyd SP: Least squares quantization in PCM. IEEE Transactions on Information Theory 1982, 28: 129–137. 10.1109/TIT.1982.1056489View ArticleGoogle Scholar
- Frishman D, Argos P: Knowledge-based protein secondary structure assignment. Proteins 1995, 23: 566–579. 10.1002/prot.340230412View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.