Clustering and percolation in protein loop structures
 Xubiao Peng^{1},
 Jianfeng He^{2}Email author and
 Antti J. Niemi^{1, 3}
DOI: 10.1186/s129000150049x
© Peng et al. 2015
Received: 7 April 2015
Accepted: 13 October 2015
Published: 29 October 2015
Abstract
Background
High precision protein loop modelling remains a challenge, both in template based and template independent approaches to protein structure prediction.
Method
We introduce the concepts of protein loop clustering and percolation, to develop a quantitative approach to systematically classify the modular building blocks of loops in crystallographic folded proteins. These fragments are all different parameterisations of a unique kink solution to a generalised discrete nonlinear Schrödinger (DNLS) equation. Accordingly, the fragments are also local energy minima of the ensuing energy function.
Results
We show how the loop fragments cover practically all ultrahigh resolution crystallographic protein structures in Protein Data Bank (PDB), with a 0.2 Ångström rootmeansquare (RMS) precision. We find that no more than 12 different loop fragments are needed, to describe around 38 % of ultrahigh resolution loops in PDB. But there is also a large number of loop fragments that are either unique, or very rare, and examples of unique fragments are found even in the structure of a myoglobin.
Conclusions
Protein loops are built in a modular fashion. The loops are composed of fragments that can be modelled by the kink of the DNLS equation. The majority of loop fragments are also common, which are shared by many proteins. These common fragments are probably important for supporting the overall protein conformation. But there are also several fragments that are either unique to a given protein, or very rare. Such fragments are probably related to the function of the protein. Furthermore, we have found that the amino acid sequence does not determine the structure in a unique fashion. There are many examples of loop fragments with an identical amino acid sequence, but with a very different structure.
Keywords
Loop modeling Protein backbone C α trace problemBackground
Protein taxonomy [1–5] reveals that crystallographic protein structures have surprisingly little conformational diversity. It might be that the majority of different conformations have already been found [6, 7]. This apparent convergence in protein structure provides the rationale for the development of comparative modelling or threading techniques [8–12]. These approaches try to predict the tertiary structure of a folded protein using libraries of known protein structures as templates. According to the communitywide Critical Assessment for Structural Prediction (CASP) tests [13], at the moment this kind of methods have the best predictive power to determine a folded conformation.
In the loop regions, comparative modelling approaches still continue lacking in their precision [14, 15]. It is not uncommon that there are gaps in the loop regions that need to be filled by various insertion techniques. The success in loop modelling is also often limited to supersecondary structures where αhelices and βstrands are connected to each other by relatively short twists and turns [16, 17]. In the case of a very short loop, with no more than three residues, the shape can be determined by a combination of geometrical considerations and stereochemical constraints [18]. In the case of longer loops, both template based and template independent methods are being developed to predict their shapes [19–21]. The underlying assumption is that the number of loop conformations which can be accommodated by a given sequence should be limited. The different fragments which are already available in the Protein Data Bank (PDB) [22] database could then be used like Lego bricks, as structural building blocks in constructing the loops. A given amino acid sequence is simply divided into short fragments, and the shape of the ensuing loop is deduced using homologically related fragments that have known structures. The entire protein is then assembled by joining these fragments together. For the process of joining the fragments, both allatom energy functions and comparisons with closely homologous template structures in the Protein Data Bank can be utilised [8, 9, 12, 14].
In the present article we propose a new systematic, purely quantitative method to identify and classify the modular building blocks of PDB loops; we identify a loop following the DSSP [23] convention. Our approach is based on a firstprinciples energy function [24–29]. It is built on the concept of universality [30–36] to model the fragments of even long protein loops in terms of different parameterisations of a unique kink that solves a variant [37, 38] of the discrete nonlinear Schrödinger (DNLS) equation [39, 40]. Our starting point is the observation made in [41] that over 92 % of loops in those PDB structures that have been measured with better than 2.0 Å resolution, can be composed from 200 different parameterisations of the kink profile, with better than 0.65 Ångström RMSD (rootmeansquaredistance) accuracy. Here we refine this observation, with the aim to develop it into a systematic loop fragment classification scheme. For this we consider only those ultrahigh precision PDB structures that have been measured with better than 1.0 Å resolution. This ensures that the Bfactors in the loop regions are small, and in particular that the structures have not been subjected to extensive refinement procedures. Indeed, two loop fragments should be considered different only, when the average interatomic distance is larger than the average DebyeWaller Bfactor fluctuation distance. If the Bfactors are large, any systematic attempt to identify and/or distinguish two fragments becomes ambiguous. In the case of these intrahigh resolution structures we can aim for the RMSD precision of 0.2 Å. We estimate this to be the extent of zero point fluctuations i.e. a distance around 0.2 Å corresponds to the intrinsic uncertainty in the determination of heavy atom positions along the protein backbone. Thus any difference less than 0.2 Å between average atomic coordinates is essentially undetectable. By explicit constructions, we show how in the case of this subset of ultrahigh resolution PDB protein structures, the loops can be systematically modeled using combinations of the unique kink of the generalised DNLS equation. As such, our approach provides a foundation for a general approach to classify loops in high precision crystallographic PDB structures, in terms of an energy function based firstprinciples mathematical concept.
Method
C α based Frenet frames
This transformation has no effect on the backbone coordinates r _{ i }, and it leaves the C α backbone intact.
The C α trace visualization, loops and kinks
The C α map
describes a βstrand. We note that the Fig. 4 a is akin the Newman projection of stereochemistry: The vector t _{ i } which is denoted by the red dot at the center of the figure, points along the backbone from the proximal C α at r _{ i } towards the distal C α at r _{ i+1}, and the colour intensity displays the statistical distribution of the r _{ i+2} direction. We also note that the Fig. 4 provides nonlocal information on the backbone geometry; the information content extends over several peptide units. This is unlike the Ramachandran map, which can only provide localised information in the immediate vicinity of a single C α carbon. As shown in [46], the C α backbone bond and torsion angles (κ _{ i },τ _{ i }) are sufficient to reconstruct the entire backbone, while the Ramachandran angles are not.
Here [x] denotes the integer part of x, and Γ is the total rotation angle (in radians) that the projections of the C α atoms of the consecutive loop residues make around the north pole. The folding index is a positive integer when the rotation is counterclockwise, and a negative integer when the rotation is clockwise. The folding index can be used to detect and classify loop structures and entire folded proteins, in terms of its values. The value is equal to twice the number of times the ensuing pathway encircles the northpole in the map of Fig. 4; for the trajectory shown in Fig. 4 b the folding index is +2.
The discrete nonlinear Schrödinger equation
For the torsion angles, from (11) we conclude that the overall scale of the parameters (d,q) and (e,b) cancel in the expression (11). This leaves us with only three independent parameters. In (10) there are four parameters when we use translation invariance to remove s. Thus the profile of a single kink becomes fully determined in terms of seven independent parameters. This coincides exactly with the number of independent coordinates along a C α backbone segment, with six residues. For this, we may always place the first residue to coincide with the origin of a Cartesian (xyz) coordinate system. We can always place the second residue along the zaxis, and we can always place the third residue on the x=0 plane. Thus, there is only one independent coordinate for the three first residues. Since the remaining three residues can each be placed to arbitrary angular directions, there are six additional independent coordinates. Accordingly, a segment with six residues indeed engages seven independent parameters.
Clustering and percolation

We define a cluster to be a set of loop fragments such that for each fragment in a given cluster there is at least one other fragment within a prescribed RMS cutoff distance.
Two clusters are disjoint, when the RMSD between any fragment in the first cluster and any fragment in the second cluster exceeds this prescribed RMS cutoff distance.

We define the initiator of a cluster to be an a priori random loop fragment which defines the cluster by completion, as follows: We start with the initiator. We identify all those fragments in our entire data set which deviate from the initiator by less than the given RMS cutoff distance. We continue the process by identifying all those fragments, that deviate from the fragments that we have identified in the previous step, by less than the RMS cutoff distance. We repeat the procedure until we find no additional fragments in PDB, within the RMS cutoff distance from any of those fragments that have been identified in the previous steps.
The cluster is clearly independent of its initiator, any element of the cluster could be used as the initiator. But the cluster depends on the RMS cutoff distance. Moreover, if the RMS cutoff distance is too large, no clear clustering is observed.
Thus, two loop fragments that have been measured with 2.0 Å resolution should be (in average) considered different only, when their RMS distance exceeds 0.65 Å.
Using a combination of Fig. 5 with various tests that we have performed, we have arrived at the conclusion that 0.2 Å in RMS distance can be currently adopted as a reasonable estimate for the minimal zeropoint fluctuation distance in ultrahigh resolution structures, those that have been measured with better than 1.0 Å resolution. Thus we shall try and see, to what extent loops in these protein structures can be classified in terms of elemental fragments, such that two fragments are considered different when their RMS distance exceeds 0.2 Å. According to Fig. 5, over 99 % of individual C α carbons that have been measured with below 1.0 Å resolution, have a Bfactor fluctuation distance which is larger than 0.2 Å; our choice of cutoff distance is close to the 3 σ level.
We note that other cutoff values can be introduced; the ultimate appears to be 0.1 Å. But our qualitative conclusions are fairly independent of the value chosen, provided it is small enough to provide a clustering pattern. In this article our goal is to present a proofofconcept. To our knowledge, no related analysis has been previously attempted, to systematically classify the loop structures in ultrahigh resolution crystallographic protein conformations, in a quantitative fashion using an energy function. In particular, no commonly accepted experimental standard exist, that we could rely on, to infer the “most preferred” cutoff value. We hope that such a value can be eventually inferred, from careful experimental measurements. Thus, at the moment we have no criterion to prefer any other particular value, 0.2 Å i.e. around 3 σ appears to be a reasonable choice at this point.
We start the identification of loop fragments, using the set of 200 fragments constructed in [41]. But our results are independent of the starting point, quite similar results can be obtained using a fairly generic set of loop fragments as a starting point. We note that the fragments in [41] have between five and nine residues, and most of them (116 out of 200) have six residues. We have already argued that six is the optimal number of residues in a loop fragment, as it matches the number of independent parameters in the kink profile (10), (11). Thus, we shall consider only fragments that have six residues, in the clustering algorithm. In this manner, we find that we can classify all PDB fragments into clusters, each determined by their initiator.
We have found that there are clusters that have a very large number of fragments. But we also find that there are clusters with only a single, or very few fragments. It is natural to expect that those clusters which are large, contain mostly fragments that are structurally important. On the other hand, those clusters which are small should include mainly fragments that are functionally important. Furthermore, we find several examples of amino acid sequences that are included in different clusters: The sequence does not define the structure, in a unique fashion. This leads us to address the concept of cluster percolation: Given the sequence of a loop fragment in a cluster, percolation means that there are other, possibly new clusters where the same sequence appears but with a different structure.
Results
Clustering
We have constructed our clusters by starting with the 200 loop fragments that were introduced in [41]. Around 92 % of all loops in those PDB structures that have been measured with resolution better than 2.0 Å, are within a 0.65 Å RMS distance from some of the 200 loop fragments. However, when we decrease the RMSD cutoff distance to 0.2 Å, which is the cutoff distance used in the present article, the coverage drops to below 2 % [41].
We remark that the authors of reference [41] did not investigate clustering, as the concept is defined here. In [41] all the RMS distances were evaluated from the fixed set of 200 loop fragments, and the coverage of PDB loop structures was determined in terms of these fixed loop fragments.
When we specify to the present subset of PDB structures in [41] that have been measured with better than 1.0 Å resolution, we find that a total of 102 out of the 200 fragments in [41] have been measured with this resolution. We use these 102 loop fragments as the initiators, to start our clustering construction.
clusters
The list of 12 initiators for clusters that have 6 residues and give rise to 30 or more entries in the ensuing clusters (PDB code, chain, PDB sites), together with the number of entries
Cluster #  Initiator  # entries 

I  1vyr_A (174–179)  76 
II  1g4i_A (56–61)  138 
III  1gkm_A (163–168)  186 
IV  4f18_A (1244–1249)  199 
V  1a6m_A (18–23)  215 
VI  1cex_A (140–145)  273 
VII  1a6m_A (56–61)  308 
VIII  1iee_A (47–52)  481 
IX  1brf A (5–10)  1166 
X  1ixh_A (200–205)  1405 
XI  2o7a_A (62–67)  1586 
XII  1gkm_A (9–14)  2374 
We proceeded to describe some of the major features of the ensuing 12 clusters. Additional details including a breakdown according to amino acid constituents in each cluster, are presented in Figure S2 of Additional file 1.
On the other hand, a comparison with (8) suggests that the initiator IV exhibits a somewhat small variation in the values of the torsion angles, for a kink. This can be seen in Fig. 6. The torsion angle values suggest that the initiator IV resembles more a bent αhelix than a kink. In Fig. 10 b, c we show the spectrum of the bond and torsion angles of the initiator IV, both before and after we have implemented the \(\mathbb Z_{2}\) gauge transformation. Since this bent structure determines an isolated cluster according to our 0.2 Å cutoff criteria, it is included among our loop fragments.
The coverage of the 12 clusters obtained using the initiators in Table 1, as a function of the cutoff distance
Cutoff (Å)  0.2  0.3  0.4  0.5 

Coverage (%)  37.8  43.6  49.6  56.4 
Cluster elongation and completion
There is also an overlap with each of the 12 clusters that we obtained previously. Together the 13 clusters cover around 96.1 % of all PDB loop structures.
It is apparent that an initiator with only five residues is too short to identify a clustering pattern of PDB loops, even with 0.2 Å precision. Consequently we have elongated this initiator. For this, we have systematically added residues at the beginning and at the end of the individual elements in its cluster, to search for clustering patterns. For example, we may take the element 1p1x_A (80–84), elongate it to 1p1x_A (80–85) and 1p1x_A (79–84), and then use these two elongated ones as initiators to do the clusterings: We denote by H, S and L a residue which is located in a helix, strand and loop respectively, according to the PDB classification. The five residue long cluster which is generated by 1p1x_A (80–84) consists of several different elements, such as for example LLLLL, HLLLL, LLLLS etc.
The 30 clusters with six residues, obtained by elongation of the LLLLL subset of the cluster which is generated by 1p1x_A (80–84)
Cluster #  Initiator  Match #  Cluster #  Initiator  Match # 

1  1kwf_A (324–329)  32  16  1xg0_A (15–20)  96 
2  1byi_A (123–128)  34  17  2pve_A (23–28)  98 
3  4iau_A (78–83)  34  18  1vyr_A (23–28)  114 
4  2o9s_A (841–846)  37  19  1j0p_A (54–59)  135 
5  4ayo_A (233–238)  37  20  2rh2_A (48–53)  151 
6  1pwm_A (171–176)  38  21  3p8j_A (240–245)  200 
7  1gdq_A (123–128)  39  22  4gda_B (62–67)  240 
8  2wur_A (30–35)  40  23  7a3h_A (232–237)  309 
9  3zsj_A (190–195)  41  24  1n55_A (31–36)  368 
10  4kxu_A (257–262)  42  25  1f94_A (40–45)  507 
11  1n4u_A (121–126)  43  26  2pfh_A (305–310)  628 
12  1nls_A (155–160)  49  27  1ab1_A (41–46)  723 
13  3dk9_A (356–361)  51  28  1gci_A (188–193)  777 
14  1o7j_C (119–124)  52  29  3ne0_A (1094–1099)  1505 
15  4hen_A (169–174)  95  30  3hyd_A (1–6)  2275 
By completing the elongation process we have identified 3240 different clusters with 0.2 Å cutoff. These clusters cover around ∼85 % of all those PDB loop sites in our set of resolution better than 1.0 Å proteins. Among these clusters there are 1677 unique ones, in the sense that the cluster has only single element. Thus, around 14 % of all loop structures in PDB appear to be unique, to the given protein. In addition, there are 1531 rare clusters with two or more, but less than 32 elements. Thus, there are 32 clusters with 32 or more elements.
The remaining ∼15 % of loop fragments that are not covered by the 3240 clusters, can be constructed by completion. For example, we can search for novel clusters by using the patterns other than LLLLL in the five residue cluster generated by 1p1x_A (80–84). But when the four patterns HLLLL, LLLLH, SLLLL and LLLLS are included the coverage increases no more than around one per cent.
Cluster percolation
We have also investigated the relation between the sequence and the structure, using the 42 clusters listed in Tables 1 and 3. Here we only describe some of the major features, more details can be found in Figure S3 in Additional file 1.
There are several examples of identical sequences that correspond to different structures in different proteins. Accordingly a sequence clearly does not determine a unique structure. When a given sequence gives rise to multiple structures, we have a phenomenon we call cluster percolation. These sequences with multiplet structures may be utilised to try and introduce novel clusters.
Sequences that appear both in and outside of cluster VIII; only the entry outside of the cluster is identified. The RMSD is evaluated from the initiator of cluster VIII; H stands for helix, L for loop and S for strand
Sequence  PDB entry  PDB structure  RMSD(Å) 

TDGSTD  2vb1_A (47–52)  LLLLSS  0.24 
TDGSTD  3lzt_A (47–52)  LLLLSS  0.26 
TDGSTD  4lzt_A (47–52)  LLLLSS  0.27 
DAGMRF  3odv_A (20–25)  HHLLSS  0.71 
ESGNVV  2agt_A (126–131)  LLLLLL  0.63 
ESGNVV  2pzn_A (126–131)  LLLLLL  0.72 
ESGNVV  3u2c_A (126–131)  LLLLLL  0.54 
ADGKPV  4hen_A (54–59)  SLLSSS  1.43 
ESGLSK  1g2y_B (18–23)  HHHLHH  1.19 
NVGWPR  1mn8_B (47–52)  HLLLLL  0.79 
KDGVAD  4a7u_A (91–96)  LLLLSS  0.68 
SDGNGM  1iee_A (100–105)  HLLHHH  1.12 
SDGNGM  2vb1_A (100–105)  HHLLHH  0.38 
SDGNGM  4b4e_A (100–105)  HLLHHH  1.07 
SDGNGM  4lzt_A (100–105)  HLLLLH  0.33 
QQGLTL  3akq_A (161–166)  HHLLLL  0.62 
QQGLTL  3akt_A (161–166)  HHLLLL  0.66 
QQGLTL  3akt_B (161–166)  HHLLLL  0.59 
Figure 18 b shows the comparison of the sequence ADGKPV to the initiator. The difference between the structures of 4hen A (54–59) and the initiator is again clear. The structure of 4hen A (54–59) is also quite different from the structures in Fig. 18 a, and from the Cluster VIII shown in Fig. 12.
Example: Myoglobin
Myoglobin is a widely studied protein, thus we have analysed its loop structure from the present perspective. We have chosen the crystallographic oxymyoglobin structure 1A6M [50] which is one of the few myoglobin structures that have been measured with resolution better than 1.0 Å, for our comparative study.
RMS Distance between the four kinks in 1A6M and the corresponding segments in the three other ligation states (in Å ngströms)
Segment  1A6N  1A6K  1A6G 

41–46  0.07  0.04  0.17 
48–53  0.04  0.02  0.03 
77–82  0.04  0.05  0.07 
78–83  0.06  0.05  0.07 
We conclude that the four kinks are stable, in the sense that they do not change their conformation when the ligation state changes.
Chain inversion
We note that a regular secondary structure such as an αhelix becomes mapped onto itself i.e. remains invariant under chain inversion. But we have found that the 12 clusters that we have constructed are not inversion invariant; the inversion does not map a cluster onto itself. Thus one might expect that new clusters could be found by inversion of these clusters. However, surprisingly we have found only one single example of a PDB segment by inversion. This is the segment (1115–1120) in the PDB structure 1MC2. Thus local chain inversion is apparently a broken symmetry, in the case of protein loops. This sets the loops apart from the regular structures like αhelices and βstrands.
Discussion
We have introduced the concept of loop clustering to analyse those ultrahigh resolution crystallographic protein structures in PDB, that have been measured with resolution 1.0 Å or less. We have chosen these structures since we expect, that in the case of a ultrahigh resolution measurement there should be less need to introduce structure validation. Thus there should also be less bias towards a priori chemical knowledge and stereochemical paradigms, in this subset of PDB proteins. Moreover, our investigation of 2.0 Å subset shows that high resolution is necessary to reveal the clustering structure in the case of protein crystals.
We have inquired to what extent the protein structures can be constructed in a modular fashion. For the modular building blocks we have chosen different parameterisations of the unique kink solution to a generalised discrete nonlinear Schrödinger equation. The precision we have used as a criterion in making a difference between two structures is 0.2 Å in RMSD. We have concluded that this should be the shortest meaningful RMS distance that can be introduced, at the moment, to classify different modular protein components.
We have identified a set of 12 different kink parameterisations, which cover around 38 % of all PDB loop structures. Accordingly, these are loop patterns that are abundantly present in the folded proteins. It appears to us, that these kinks are often located in such protein segments that are structurally important, as opposed to those that are functionally important. We have introduced various techniques to extent the initial set of 12 kinks, and we have found that around 52 % of loop regions become covered when we introduce a set of 29 additional kinks. But in order to cover the remaining ∼48 % of protein loops, we need to substantially increase the number of kinks. For example, we need to introduce over 1000 kinks to cover over 88 % of loops. In particular, we have concluded that there are several kinks that are very rare, even unique, in PDB when we use the present cutoff value. We propose that a rare or even unique kink should have a an important functional rôle, in a protein. This can be exemplified by the myoglobin 1A6M segments (41–46), (48–53) and (78–83) which are all rare. These segments also constitute the CD corner and EF corner in myoglobin, which have been argued to be closely related to the ligand migration process [51, 52].
Conclusions
Protein loops are built in a modular fashion, in terms of various parametrisations of the kink solution to a generalised version of the discrete nonlinear Schrödinger equation. Most loops can be built from a very small number of modular components, these loops are most likely important for the overall structure of the protein. However, there are also several unique, or very rare loops, which are most likely related to the function. The amino acid sequence does not define the structure uniquely, instead a given sequence can give rise to several different conformations.
Availability of supporting data
The datasets supporting the result of this article are available in Protein Data Bank (PDB) by confining the resolution better than 1.0 Å (http://www.rcsb.org).
Abbreviations
 DNLS:

Discrete Nonlinear Schrö
 dinger; PDB:

Protein Data Bank
 RMS:

Rootmeansquare
 CASP:

Critical Assessment for Structural Prediction
Declarations
Acknowledgements
AJN acknowledges support from Vetenskapsrådet, Carl Trygger’s Stiftelse för vetenskaplig forskning, and Qian Ren Grant at BIT.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Sillitoe I, Cuff A, Dessailly B, Dawson N, Furnham N, Lee D, et al.New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures. Nucleic Acids Res. 2013; 41(Database issue):D490.PubMed CentralView ArticlePubMedGoogle Scholar
 Sillitoe I, Lewis TE, Cuff A, Das S, Ashford P, Dawson NL, et al.CATH: comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res. 2015; 43(D1):D376–81.PubMed CentralView ArticlePubMedGoogle Scholar
 Murzin AG, Brenner SE, Hubbard T, Chothia C.SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995; 247:536–40.PubMedGoogle Scholar
 Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, et al.Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 2008; 36(suppl 1):D419–25.PubMed CentralPubMedGoogle Scholar
 Andreeva A, Howorth D, Chothia C, Kulesha E, Murzin AG. SCOP2 prototype: a new approach to protein structure mining. Nucleic Acids Res. 2014; 42(D1):D310–4.PubMed CentralView ArticlePubMedGoogle Scholar
 Rackovsky S. Quantitative organization of the known protein Xray structures. I. Methods and shortlengthscale results. Proteins. 1990; 7:378–402.View ArticlePubMedGoogle Scholar
 Skolnick J, Arakaki AK, Seung YL, Brylinski M. The continuity of protein structure space is an intrinsic property of proteins. Proc Natl Acad Sci USA. 2009; 106:15690–5.PubMed CentralView ArticlePubMedGoogle Scholar
 Schwede T, Kopp J, Guex N, Peitsch MC. SWISSMODEL: an automated protein homologymodeling server. Nucleic Acids Res. 2003; 31(13):3381–5.PubMed CentralView ArticlePubMedGoogle Scholar
 Chivian D, Baker D. Homology modeling using parametric alignment ensemble generation with consensus and energybased model selection. Nucleic Acids Res. 2006; 34(17):e112.PubMed CentralView ArticlePubMedGoogle Scholar
 Song Y, DiMaio F, Wang RYR, Kim D, Miles C, Brunette T, et al.Highresolution comparative modeling with RosettaCM. Structure. 2013; 21(10):1735–42.View ArticlePubMedGoogle Scholar
 Zhang Y. Protein structure prediction: when is it useful?Curr Opin Struc Biol. 2009; 19(2):145–55.View ArticleGoogle Scholar
 Roy A, Kucukural A, Zhang Y. ITASSER: a unified platform for automated protein structure and function prediction. Nat protoc. 2010; 5(4):725–38.PubMed CentralView ArticlePubMedGoogle Scholar
 Moult J. A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr Opin Struc Biol. 2005; 15(3):285–9.View ArticleGoogle Scholar
 Olson MA, Feig M, Brooks CL. Prediction of protein loop conformations using multiscale modeling methods with physical energy scoring functions. J Comput Chem. 2008; 29(5):820–31.View ArticlePubMedGoogle Scholar
 Jamroz M, Kolinski A. Modeling of loops in proteins: a multimethod approach. BMC Struct Biol. 2010; 10(1):5.PubMed CentralView ArticlePubMedGoogle Scholar
 Fidelis K, Stern PS, Bacon D, Moult J.Comparison of systematic search and database methods for constructing segments of protein structure. Protein Eng. 1994; 7(8):953–60.View ArticlePubMedGoogle Scholar
 van Vlijmen HW, Karplus M. PDBbased protein loop prediction: parameters for selection and methods for optimization. J Mol Biol. 1997; 267(4):975–1001.View ArticlePubMedGoogle Scholar
 Nekouzadeh A, Rudy Y. Threeresidue loop closure in proteins: A new kinematic method reveals a locus of connected loop conformations. J Comput Chem. 2011; 32(12):2515–25.PubMed CentralView ArticlePubMedGoogle Scholar
 Fiser A, Do RKG, Šali A. Modeling of loops in protein structures. Protein Sci. 2000; 9(9):1753–73.PubMed CentralView ArticlePubMedGoogle Scholar
 Jacobson MP, Pincus DL, Rapp CS, Day TJ, Honig B, Shaw DE, et al. A hierarchical approach to allatom protein loop prediction. Proteins. 2004; 55(2):351–67.View ArticlePubMedGoogle Scholar
 Eswar N, Eramian D, Webb B, Shen MY, Sali A. Protein structure modeling with MODELLER. In: Structural Proteomics. New York: Springer; 2008, pp. 145–159.View ArticleGoogle Scholar
 Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al.The protein data bank. Nucleic Acid Res. 2000; 28:235–42.PubMed CentralView ArticlePubMedGoogle Scholar
 Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogenbonded and geometrical features. Biopolymers. 1983; 22(12):2577–637.View ArticlePubMedGoogle Scholar
 Niemi AJ. Phases of bosonic strings and two dimensional gauge theories. Phys Rev D. 2003; 67:106004.View ArticleGoogle Scholar
 Danielsson UH, Lundgren M, Niemi AJ. Gauge field theory of chirally folded homopolymers with applications to folded proteins. Phys Rev E. 2010; 82:021910.View ArticleGoogle Scholar
 Hu S, Jiang Y, Niemi AJ. Energy functions for stringlike continuous curves, discrete chains, and spacefilling one dimensional structures. Phys Rev D. 2013; 87:105011.View ArticleGoogle Scholar
 Ioannidou T, Jiang Y, Niemi AJ. Spinors, strings, integrable models, and decomposed YangMills theory. Phys Rev D. 2014; 90(2):025012.View ArticleGoogle Scholar
 Niemi AJ. Gauge fields, strings, solitons, anomalies, and the speed of life. Theor Math Phys. 2014; 181(1):1235–62.View ArticleGoogle Scholar
 Niemi AJ. WHAT IS LIFESubcellular Physics of Live Matter. 2014. arXiv preprint arXiv:14128321.
 Widom B. Surface Tension and Molecular Correlations near the Critical Point. J Chem Phys. 1965; 43:3892–7.View ArticleGoogle Scholar
 Kadanoff LP. Scaling laws for Ising models near T(c). Physics. 1966; 2:263–72.Google Scholar
 Wilson KG. Renormalization Group and Critical Phenomena. I. Renormalization Group and the Kadanoff Scaling Picture. Phys Rev B. 1971; 4:3174–83.View ArticleGoogle Scholar
 Wilson KG, Kogut J. The renormalization group and the ε expansion. Phys Rep. 1974; 12(2):75–199.View ArticleGoogle Scholar
 Fisher ME. The renormalization group in the theory of critical behavior. Rev Mod Phys. 1974; 46:597–616.View ArticleGoogle Scholar
 De Gennes PG. Scaling concepts in polymer physics. New York: Cornell University press; 1979.Google Scholar
 Schafer L. Excluded volume effects in polymer solutions, as Explained by the Renormalization Group. Berlin: Springer; 1999.View ArticleGoogle Scholar
 Chernodub M, Hu S, Niemi AJ. Topological solitons and folded proteins. Phys Rev E. 2010; 82(1):011916.View ArticleGoogle Scholar
 Molkenthin N, Hu S, Niemi AJ. Discrete Nonlinear Schrödinger Equation and Polygonal Solitons with Applications to Collapsed Proteins. Phys Rev Lett. 2011; 106:078102.View ArticlePubMedGoogle Scholar
 Faddeev L. D, Takhtadzhyan L. A. Hamiltonian Methods in the Theory of Solitons. Berlin: Springer; 1987.View ArticleGoogle Scholar
 Ablowitz MJ, Prinari B, Trubatch AD, Vol. 302. Discrete and continuous nonlinear Schrödinger systems. London: Cambridge University Press; 2004.Google Scholar
 Krokhotin A, Niemi AJ, Peng X. Soliton concepts and protein structure. Phys Rev E. 2012; 85(3):031906.View ArticleGoogle Scholar
 Hu S, Lundgren M, Niemi AJ. Discrete Frenet frame, inflection point solitons, and curve visualization with applications to folded proteins. Phys Rev E. 2011; 83:061908.View ArticleGoogle Scholar
 Lundgren M, Niemi AJ, Sha F. Protein loops, solitons, and sidechain visualization with applications to the lefthanded helix region. Phys Rev E. 2012; 85:061909.View ArticleGoogle Scholar
 Lundgren M, Niemi AJ. Correlation between protein secondary structure, backbone bond angles, and sidechain orientations. Phys Rev E. 2012; 86(2):021904.View ArticleGoogle Scholar
 Peng X, Chenani A, Hu S, Zhou Y, Niemi AJ. A three dimensional visualisation approach to protein heavyatom structure reconstruction. BMC Struct Biol. 2014; 14(1):27.PubMed CentralView ArticlePubMedGoogle Scholar
 Hinsen K, Hu S, Kneller GR, Niemi AJ. A comparison of reduced coordinate sets for describing protein structure. J Chem Phys. 2013; 139:124115.View ArticlePubMedGoogle Scholar
 Lundgren M, Krokhotin A, Niemi AJ. Topology and structural selforganization in folded proteins. Phys Rev E. 2013; 88(4):042709.View ArticleGoogle Scholar
 Hu S, Krokhotin A, Niemi AJ, Peng X. Towards quantitative classification of folded proteins in terms of elementary functions. Phys Rev E. 2011; 83(4):041907.View ArticleGoogle Scholar
 Petsko GA, Ringe D. Fluctuations in protein structure from Xray diffraction. Ann Rev Biophys Bioeng. 1984; 13:331–71.View ArticleGoogle Scholar
 Vojtěchovskỳ J, Chu K, Berendzen J, Sweet RM, Schlichting I. Crystal structures of myoglobinligand complexes at nearatomic resolution. Biophys J. 1999; 77(4):2153–74.PubMed CentralView ArticlePubMedGoogle Scholar
 Lucas MF, Guallar V. An atomistic view on human hemoglobin carbon monoxide migration processes. Biophys J. 2012; 102(4):887–96.PubMed CentralView ArticlePubMedGoogle Scholar
 Cottone G, Lattanzi G, Ciccotti G, Elber R. Multiphoton Absorption of Myoglobin–Nitric Oxide Complex: Relaxation by DNEMD of a Stationary State. J Phys Chem B. 2012; 116(10):3397–410.PubMed CentralView ArticlePubMedGoogle Scholar