A comprehensive analysis of non-sequential alignments between all protein structures
© Abyzov and Ilyin; licensee BioMed Central Ltd. 2007
Received: 14 May 2007
Accepted: 16 November 2007
Published: 16 November 2007
The majority of relations between proteins can be represented as a conventional sequential alignment. Nevertheless, unusual non-sequential alignments with different connectivity of the aligned fragments in compared proteins have been reported by many researchers. It is interesting to understand those non-sequential alignments; are they unique, sporadic cases or they occur frequently; do they belong to a few specific folds or spread among many different folds, as a common feature of protein structure. We present here a comprehensive large-scale study of non-sequential alignments between available protein structures in Protein Data Bank.
The study has been conducted on a non-redundant set of 8,865 protein structures aligned with the aid of the TOPOFIT method. It has been estimated that between 17.4% and 35.2% of all alignments are non-sequential depending on variations in the parameters. Analysis of the data revealed that non-sequential relations between proteins do occur systematically and in large quantities. Various sizes and numbers of non-sequential fragments have been observed with all possible complexities of fragment rearrangements found for alignments consisting of up to 12 fragments. It has been found that non-sequential alignments are not limited to proteins of any particular fold and are present in more than two hundred of them. Moreover, many of them are found between proteins with different fold assignments. It has been shown that protein structure symmetry does not explain non-sequential alignments. Therefore, compelling evidences have been provided that non-sequential alignments between proteins are systematic and widespread across the protein universe.
The phenomenon of the widespread occurrence of non-sequential alignments between proteins might represent a missing rule of protein structure organization. More detailed study of this phenomenon will enhance our understanding of protein stability, folding, and evolution.
Protein structure comparison is a fundamental approach in many areas of biomedical studies. Its applications range from protein classification and establishing evolutionary relationship between proteins to functional prediction, molecular modeling and protein engineering. While structure comparison can be done in a number of ways, protein structure alignment is one of the major techniques used, populated today with more than 40 methods, the most complete list of which can be found at Wikipedia . These methods rely on a wide variety of statistical, geometrical, physical, and other structure properties in order to produce an alignment. But most of them follow a simple sequential rule: two proteins are aligned in sequential order, by placing their chains adjacent to each other from N-terminal to C-terminal and introducing gaps.
Understanding more about these types of alignments is interesting; are they unique, sporadic cases; do they occur frequently; do they belong to a few specific folds or spread among many different folds as a common feature of protein structure. Such a large-scale study is also important for the theoretical understanding of protein organization, the evolution of proteins, and using non-sequential approach has a practical application as a designing tool in protein engineering.
Many researches have reported cases of non-sequential alignments such as circular permutations, domain or region swaps [3–15], and β-hairpin flip [6, 10]. The most studied case of non-sequential alignments is a circular permutation, when the N-terminal of each aligned protein is aligned with the C-terminal of the other protein. The circular permutations have been analyzed by both sequence and structure related computational methods [16, 17]. A suggested evolutionary mechanism for circular permutation in proteins  states that first a gene duplication of the precursor gene occurs in such a way that both genes become fused in frame, leading to a tandem protein. After generation(s) of a new start codon within the 5' part of the tandem gene and a stop codon at an equivalent position in the 3' part of the gene, a protein is encoded that represents a circular permutation of the precursor gene product. Later the mechanism was shown to be valid for a protein family of adenine-n6 DNA methyltransferases . Many naturally occurring proteins were experimentally redesigned to have circular permutation and it was shown that they preserve their structure and function [20–30]; thus providing evidence that circular reordering of protein structural elements does not affect protein folding and functionality.
The appearance of similar domains/regions in different orders in sequence as a domain/region swap have been analyzed by Fliess and coworkers . Their study was based on sequence alignments of proteins in the Swiss-Prot database , where they found 140 swap cases and concluded that the swapping of regions is a relatively rare evolutionary event. A comparatively large (at that time) structure based large-scale analysis of non-sequential cases has been reported about a decade ago , where 426 representative structures from PDB were analyzed by the SARF2 method. Along with other results, that work presented several cases of non-sequential alignments and estimated that they are found in 11% of cases.
Since then several methods for protein structure alignment have been developed which can produce non-sequential alignments [15, 33–38] including TOPOFIT , developed in our group. MASS  method was developed to produce multiple structure alignments; GANGSTA  and SCALI  were suggested to be used for structure classification; SSM  and KENOBI  appear to be computationally efficient and OPAAS  was applied to analysis of alternative structure alignments. TOPOFIT compares topologies of Delaunay tessellation patterns calculated using positions of Cα-atoms in protein structures and does not assume any sequential order of residues in an alignment. Its distinctive feature is that the method does not balance between lower RMSD and a higher number of aligned positions (N e ) but rather identifies the largest group of residues which have the same neighbors in the same locations common in both compared structures, defined mathematically as a topological invariant and detected by saturation point (topomax point) in the spatial tessellation graph. Such an objective methodology provides unambiguous identification and separation of the structurally invariant parts from the variable parts by identifying a precise border between the two. Unlike all other methods (which can produce non-sequential alignments), composing alignments of fragments or secondary structure elements, TOPOFIT extends an alignment pair by pair of residues; thus, is not biased by fragment choice or secondary structure element definition. The method is also computationally efficient, so that all proteins in the PDB (as of July 2005) have already been calculated, grouped into clusters and stored in the TOPOFIT-DB database . We have used TOPOFIT in our comprehensive large-scale analysis of non-sequential relations between proteins. To the best of our knowledge this is the first comprehensive large-scale analysis of non-sequential alignments between all available protein structures.
Non-sequential alignments between proteins do occur systematically and in large quantities
Another dataset has been collected by compiling alignments between protein families as defined by SCOP  (release 1.69). For each family, the first structure in the list of proteins for the corresponding family has been used as a representative, resulting in 2,845 representatives. 4,045,590 structural alignments have been produce and stored in TOPOFIT_DB database  by comparing the representatives. As for dataset D1 only alignments with Z-score > 7 have been used, resulting in total of 4,648 alignments. The distributions of their alignment sizes and RMSD are similar to the ones for dataset D1. These alignments will be referred to below as dataset D2.
General statistics on non-sequential cases.
Best alignment: #28,949 (35.2%) (dataset D1)
Alternative: 17,428 (21.2%) (dataset D1)
Alternative with tightened (dataset D1)
Alternative with tightened (dataset D2)
Types of observed non-sequential alignments
General statistics on all different alignment types is shown in Table 1 and described in the following sections.
Non-sequential alignments can be trivial if they occur as a result of symmetry or shift in protein structure, but such cases are easily detected: in this case an alternative sequential alignment should exist. It is known that proteins with symmetries and repeats have many alternative alignments, thus, for each protein pair we have evaluated all possible alternative alignments with similar length (ΔNe < 20). Once, an alternative sequential alignment has been found the protein pair was considered to be sequential. Only those non-sequential alignments without any alternative sequential alignments have been considered as true non-sequential cases and are included in the following analysis.
General classification of non-sequential alignments
We have classified non-sequential alignments between proteins into three classes based on the types of alignment fragments in the alignment: forward (all fragments are of forward type), reverse (all fragment are of reverse type), and mixed (different fragment types). Furthermore, each class has been subdivided into subclasses based on the pattern of fragment permutation: simple (order of fragments is not permuted), circular (cases fitting the definition of circular permutation), swaps (two fragment are swapped but is not a circular permutation), and complex (all other cases). Statistics on the number of non-sequential cases using different thresholds (see Methods) and considering alternative alignments have been summarized in Table 1.
As seen from Table 1, the majority of non-sequential alignments (13.2–22.7%) are of the forward class; the number of mixed alignments is smaller but, is still significantly large (3.9–10.7%), while the reverse alignments are much less populated (0.3–1.8%) with only several hundred such cases found. The forward circular alignments is the most populated class, with more than 50% of all non-sequential alignments belonging to this class.
There is a clear tendency that the more complicated alignments are less prevalent for forward and reverse classes, i.e. there are fewer complex than swap alignments, while there are fewer swap than circular alignments. Contrary to this tendency, more complicated alignments in the mixed class are more abundant, i.e. there are more complex than swap alignments, while there are more swaps than circular alignments. Interestingly, the number of simple alignments in this class is of the same order as the number of complex ones, i.e. there is a tendency that if an alignment has two types of fragments (reverse and forward) then it is either very simple (has no permutations) or very complex (has too many permutations) alignment. Table 1 also demonstrates that variation in parameters (using different thresholds and considering alternative alignments) does change the proportion of non-sequential alignments; nevertheless, the proportion remains significant, of the order of 20%. The Table 1 also shows that the usage of different data sets results in comparable numbers, thus, crosschecking the obtained numbers.
NS alignments occur across many folds, as well as between different folds
Distribution of non-sequential alignments by protein classes based on analysis of dataset D2.
α and β
Symmetry and/or shift related
α and β
α and β
α and β
Distribution of non-sequential (NS) alignments among different protein folds as defined by SCOP.
% of all NS alignments (dataset D1)
% of all alignments in fold (dataset D1)
% of all NS alignments (dataset D2)
% of all alignments in fold (dataset D2)
c.1) TIM α/β-barrel
b.69) 7-bladed β-propeller
c.66) S-ALMD methiltrtansferase
b.68) 6-bladed β-propeller
a.102) α/α a toroid
b.82) Double-stranded β-helix
b.29) Concanavalin A-like lectins/glucanases
b.80) Right-handed β-helix
d.159) Metallo-dependent phosphatases
f.4) Transmembrane β-barrels
a.24) 4-helical up-and-down bundle
h.4) Antiparallel coiled-coil
c.68) Nucleotide-diphospho-sugar transferases
b.67) 5-bladed β-propeller
c.2) NAD(P)-binding Rossmann-fold domains
c.3) FAD/NAD(P)-binding domain
Different folds or no fold assignment
The table also shows that the numbers, obtained using the two data sets, agree with cases of large discrepancy (e.g. fold of 'FAD/NAD(P)-binding domain') being exceptional. The reason for this is the outdated version of SCOP (dataset D2), when compared to TOPOFIT-DB (dataset D1), and ambiguity in assigning SCOP folds to TOPOFIT-DB's centroids, which are not split into domains and can represent multi-domain proteins. Thus, the discrepancies in numbers are explained purely by technical rather than biological or methodological reasons and results obtained using the two datasets are consistent.
Protein structure symmetry does not explain non-sequential alignments
To show that non-sequential cases are found not only in symmetrical structures we have made an additional test. Knowing that 48.9% of non-sequential alignments are found when aligned structures belong to different folds (using dataset D2), we have excluded folds from the analysis where there are at least two proteins with non-sequential alignment. Thus, all potentially symmetrical folds have been excluded resulting in a new dataset (reduced dataset), where all non-sequential alignments occur only between proteins of different folds. It was found that non-sequential cases are found in 7.7% of cases of reduced dataset, which is smaller than 21.2% on the whole data set, but is still very significant. In other words, at least one third of non-sequential alignments are found in non-symmetrical structures.
The previously observed results can be briefly summarized: 1) Non-sequential alignments are found in many non-symmetrical folds; 2) Non-sequential alignments are spread more or less evenly across folds, i.e. there is no specific fold(s) preferable for non-sequential alignments; 3) Up to 50% of non-sequential alignments are found for proteins with different folds; 4) The proportion of non-sequential alignments for proteins with different folds is comparable with proportions for proteins with the same fold; 5) At least one third of non-sequential alignments are found in non-symmetrical structures. Thus, the conclusion is that non-sequential alignments do occur in any class and type of protein structures and a protein structure symmetry/shift does not explain non-sequential alignments. In other words, the occurrence of non-sequential alignments is a general feature of protein structure.
All possible complexities of fragment rearrangements have been observed
Non-sequential alignments can be very simple that only one fragment is non-sequential, whereas, they can be so complex that only one fragment can be put in sequential order in both sequences. In other words, we have observed very simple and complex rearrangements of structurally equivalent elements in proteins. In order to address rearrangement complexity we introduce the term "rank" of an alignment, which is the number of rearrangements of structurally equivalent parts of proteins needed to put them in sequential order in the sequences of both proteins. According to this definition, sequential alignments are represented as a single structural equivalent and thus have rank zero, while circular permutations and cases similar to the one shown in Figure 1, have rank one and more complex alignments have rank two or higher. Technically, we have calculated rank as the number of segment rearrangements rather than fragment rearrangements (see Methods). This was done to ensure that rank is not overestimated due to the presence of several fragments in one segment. Using this definition, it is easy to see that any alignment with n fragments can have the highest rank of n - 1, because at least one structural element is not rearranged relative to others (we do not consider reverse alignments here).
Analysis of the redundant data set
It is interesting to understand whether there are any non-sequential cases in highly similar proteins, both in structure and in sequence, i.e. those that have been grouped in TOPOFIT-DB in clusters. Thus, alignments between the structures of each of 8,865 clusters have been collected for a total of 2,509,599 alignments. The analysis reveals that the absolute majority of detected non-sequential cases are circular permutations with few exceptions. Statistically, 31,358 out of 2,509,599 alignments were non-sequential, out of which 95.5% (29,938 cases) were circular permutations, 3.5% represented alignment of different conformation of same protein, and the remaining 1% have been accounted for non-sequential alignments in only 7 protein families: fructose-1,6-bisphosphatase (1fpk:A and 1d9q:B), arrestin (1cf1:A and 1ayr:B), annexin (1hm6:A and 1hvg), aspartate/ornithine carbamoyltransferase (2atc:B and 1rac:B), 3-isopropylmalate dehydrogenase (1iso and 1hqs:A), NADH peroxidase (1f3p:A and 1nhs), α-β tubulin (1jff:B and 1tub:B). Thus, we can state the absolute majority of proteins with high sequence similarity have only circular permutations cases of non-sequential alignments.
In the presented study a comprehensive large-scale analysis of non-sequential alignments between all PDB structures (as of July 2005) has been performed. We have found that up to 35.2% of all significant alignments are non-sequential. Consideration of different thresholds and alternative alignments has been made to ensure robust detection of non-sequential cases. These variations in methodology revealed that non-sequential alignments are found in at least 17.4% of cases. Thus, the estimated proportion of non-sequential alignments is in the range of values between 17.4 to 35.2%, which is a significant proportion of structural relations not detected by most of the current methods.
It was found that the majority (more than 50%) of the non-sequential alignments fit to the formal definition of circular permutation. It is important to stress here how this number should be understood. Often, proteins aligned in a circular way are assumed to be evolutionary related and this assumption is often encoded into an alignment method to detect such cases. There is no such assumption (of evolutionary origin) in the methodology used in this study and thus, a large number of circular alignments alone does not necessarily mean an evolutionary relationship between the compared proteins. The same way, the origin of more complex non-sequential alignments is not clear.
Besides circular permutations, non-sequential alignments with a large variety of alignment patterns have been found. All possible complexities of rearrangements, various sizes and numbers of non-sequential fragments have been observed. It has been found that non-sequential alignments are not limited to proteins of any particular fold and are present in more than two hundred of different folds. Moreover, up to 50% of non-sequential alignments are found for proteins with a different fold assignment. While many of the non-sequential alignments were found for proteins with symmetrical structures, it has been shown that protein structure symmetry does not explain non-sequential alignments. Therefore, compelling evidence of different forms has been provided, confirming that non-sequential alignments between proteins are diverse and widespread across the protein universe.
Many cases of reverse alignments in various folds have been found in this study. To the best of our knowledge, only one case of reverse alignment is well known, the α-helix bundle with several helices, where one or many of the helices can be aligned in the opposite direction. The α-helix bundles have been studied experimentally and successful attempts on redesigning the four-helix bundle to have inverted helices have been reported [45, 46]. Such successful redesign of α-helix bundle can be theoretically extended to other protein folds with the cases of reverse alignments observed in this study. Thus, the existence of the reverse alignments for proteins of other folds can serve as the basis for new approaches in protein engineering to redesign proteins.
The discovery of the existence of all theoretically possible complexities of fragment rearrangement in proteins is intriguing (see Results and Figure 11). The plot is not complete due to limited statistics, which we assume as of the lack of the data for the large proteins. We believe that there is a strong confidence in a statement that any possible combination of fragments can be found in any protein structure. Currently, one can introduce a hypothesis to test (with strong support from all the presented results), which can be formulated as follows: the three-dimensional shape of tertiary structure does not depend on the order of protein fragments in the polypeptide chain, the protein core has just to be organized in a complementary manner and internal fragments have to fit to each other, while the external loops might reconnect the internal fragments in any reasonable way. The protein core here is the structural invariant, which was introduced earlier in our TOPOFIT method , while the external loops are the fragments outside of the structural invariant.
Such a hypothesis can be tested experimentally and will provide a strong empirical basis for protein redesign as a recombination of different fragments; one can see many practical applications from it to create new proteins. The validation of the hypothesis will broaden our understanding of protein structure organization and folding, and can be directly applied in fragment-based methods for protein structure and function prediction . It is encouraging that the hypothesis is supported by experimental studies on circularly permuting protein structure [20–30] and redesigning four-helix bundle proteins to have several different topologies of helices [45, 46]. Therefore, a similar reengineering by rearranging fragments may be applied to other protein folds.
The discovery of the widespread occurrence of the non-sequential alignments among many different protein folds presents an interesting phenomenon. Based on this phenomenon, one may suggests that there is some unknown common rule that governs relations between proteins detected by the non-sequential alignments, a missing rule(s) in our understanding of protein structure organization. Finding such a rule can be a challenge for the future research, but, apparently, the existence of the non-sequential alignments is not rare effect but rather a systematic feature of all proteins. More detailed studies of these alignments will bring new insight in our understanding of protein evolution, protein stability and protein folding and functionality. As a first step toward understanding the non-sequential alignments, a testable hypothesis has been suggested, stating that the three-dimensional shape of protein structure does not depend on the order of protein fragments in the polypeptide chain.
Selecting representative data sets
For this study the structural relations between the representative proteins from the TOPOFIT-DB  database (centroids), have been analyzed. The data set from TOPOFIT-DB contains all 33,315 proteins from PDB (as of July 12, 2005). All structures in the database are divided into clusters of high similarity, both in structure and in size, with assigned (to each cluster) centroids representing each cluster. The 8,865 protein clusters in TOPOFIT-DB can be considered as an analog of a structural families in CATH  and SCOP . For each cluster a centroid structure is chosen as a representative by maximum sum of Z-scores to all other proteins in the cluster. Comparison of the centroids and proteins inside each cluster resulted in 39,276,862 structural alignments stored in the database. For this study, only centroid-centroid alignments from TOPOFIT-DB with Z-score > 7 have been used, leading to a total of 82,263 alignments.
A second data set has been collected by comparing alignments between protein families as defined by SCOP (release 1.69). For each family the first structure, in the list of proteins assigned to the family, has been used as a representative, resulting in 2,845 representatives. 4,045,590 structural alignments have been produce and stored in TOPOFIT_DB database  by comparing the representatives. For this study, only alignments with Z-score > 7 have been used, leading to a total of 4,648 alignments.
Identifying sequential parts (segments) and noise filtering procedure
Since TOPOFIT alignments can be fragmented we define alignment fragment as the sequential part of an alignment without "long gaps", gaps longer than 2 residues. The cut off has been chosen based on the analysis of gap distribution in all alignments. Then we define an alignment segment as a sequential (reverse or forward) part of a structural alignment (see Figure 1). An alignment segment is different from an alignment fragment as the segment can have long gaps (longer than 2 residues) and consequently, may consists of one or more fragments. Thus, a fragment is a particular case of a segment. In Figure 1 segments are highlighted in different colors. For simplicity only the term "segment" is used in the following description of the procedure. During the procedure some alignment residue pairs were considered as noise and removed (circled on the figure). Let us define an interfering segment z, for a pair of segments x and y, as a segment located in between the two segments in either of the sequences (see example on the Figure 1). The input parameter in the algorithm is the value of F min , which controls the minimal size of a segment, i.e. all segments smaller than F min are eventually removed from the alignment or combined with other segments.
number of segments interfering with it (smaller preference);
number of aligned residues in the interfering segments (smaller preference);
cumulative number of residues in the tested pair of segments (larger preference).
The best pair is found by comparing those values, where each next value is used only if the preceding values were equal. Segments in the best pair are combined only if the pair has no interfering segments. Otherwise, the interfering segment having a minimal number of aligned residues is removed from the structural alignment. So, on each step, the number of segments decreases by one. Steps are repeated until all segments are combined into one or the segment to remove has length more or equal then value of F min .
The procedure considers forward and reverse segments simultaneously, however only segments of the same type (both are either forward or reverse) are being combined. Special care is taken with segments of length one; they are evaluated in pairs with both forward and reverse segments. Here it is important to stress that the minimal fragment parameter F min is not like a conventional threshold because short fragments are not simply removed from the alignment, but first are tested for the possibility of being combined with longer fragments and only upon failure are removed.
Robustness of non-sequential alignment detection, signal/noise discrimination, optimal values of F min
The major change in distribution occurs at F min changing from 2 to 3. Not only has the area under the distribution changed dramatically (i.e. number of non-sequential cases reduced), but the spike in the distribution at lower values has disappeared. Thus, it is evident that the noise is mostly represented by short fragments of length 1 and 2 residues. The distributions for F min values from 3 to 6 do not differ much, while larger F min values lead to significant disruptions in the shape of distributions in the region from 75 to 110. Consequently, non-sequential alignments mostly consist of aligned segments of 6 or more aligned residues. Therefore, the best signal-to-noise discrimination can be archived when the value of the F min parameter equals 3–6 residues. This is where the majority of the noise is filtered out while the signal (quantity of non-sequential alignments) is not cut. In the overall analysis presented here, the value F min = 4 has been used, while additionally a tightened criteria, F min = 6, has been applied for cross checking.
Applying tightened criteria resulted in an 11 % decrease (25,849 compare to 28,949) in the number of non-sequential cases detected. Thus, we concluded that at selected values of the F min parameter, detection of non-sequential cases is robust.
The rank of an alignment is defined as the number of rearrangements of structurally equivalent parts of proteins needed to put them in sequential order in the sequences of both proteins. Technically, the rank was calculated as the number of segment permutations. In order to calculate the number of permutations in an alignment, the corresponding alignment segments have been ordered by sequence order in the first aligned protein and numbered incrementally starting from one. Then, the segments have been ordered by sequence order in the second aligned protein. In case the considered alignment is non-sequential, renumbering will permute the order of the numbers assigned. For example, the order of numbers for the alignment shown in Figure 1 will be (1,3,2,4). A simple bubble sort algorithm has been used to calculate the number of permutations needed to sort the numbers in ascending order. For the alignment shown in Figure 1 only one permutation is needed. For reverse alignments, a reverse order of amino acids for second sequence has been considered while calculating permutations and for mixed alignments, a reverse order of amino acids for the second sequence has been considered only if the cumulative N e of reverse segments is higher than the cumulative N e of forward segments.
The non-sequential alignments were visualized and analyzed in integrated software package, Friend  with the integrated TOPOFIT method . The final views (shown in figures) of proteins structures were produced with Chimera . Data analysis has been performed with the aid of the ROOT software package . All data are publicly available in TOPOFIT-DB and can be accessed at our web site .
We are grateful to Chesley Leslin for his outstanding help in collecting data for TOPOFIT-DB and for the maintenance of the database and reading the manuscript. We also thank the members of our laboratory and the Biology department at Northeastern University for useful discussions and comments.
- Needleman SB, Wunsch CD: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 1970, 48(3):443–453. 10.1016/0022-2836(70)90057-4View ArticlePubMedGoogle Scholar
- Einspahr H, Parks EH, Suguna K, Subramanian E, Suddath FL: The crystal structure of pea lectin at 3.0-A resolution. J Biol Chem 1986, 261(35):16518–16527.PubMedGoogle Scholar
- Alexandrov NN, Fischer D: Analysis of topological and nontopological structural similarities in the PDB: new examples with old structures. Proteins 1996, 25(3):354–365. Publisher Full Text 10.1002/(SICI)1097-0134(199607)25:3%3C354::AID-PROT7%3E3.3.CO;2-WView ArticlePubMedGoogle Scholar
- Essen LO, Perisic O, Lynch DE, Katan M, Williams RL: A ternary metal binding site in the C2 domain of phosphoinositide-specific phospholipase C-delta1. Biochemistry 1997, 36(10):2753–2762. 10.1021/bi962466tView ArticlePubMedGoogle Scholar
- Fuentes-Prior P, Noeske-Jungblut C, Donner P, Schleuning WD, Huber R, Bode W: Structure of the thrombin complex with triabin, a lipocalin-like exosite-binding inhibitor derived from a triatomine bug. Proc Natl Acad Sci USA 1997, 94(22):11845–11850. 10.1073/pnas.94.22.11845PubMed CentralView ArticlePubMedGoogle Scholar
- Gong W, O'Gara M, Blumenthal RM, Cheng X: Structure of pvu II DNA-(cytosine N4) methyltransferase, an example of domain permutation and protein fold assignment. Nucleic Acids Res 1997, 25(14):2702–2715. 10.1093/nar/25.14.2702PubMed CentralView ArticlePubMedGoogle Scholar
- Polekhina G, Board PG, Gali RR, Rossjohn J, Parker MW: Molecular basis of glutathione synthetase deficiency and a rare gene permutation event. Embo J 1999, 18(12):3204–3213. 10.1093/emboj/18.12.3204PubMed CentralView ArticlePubMedGoogle Scholar
- Gooptu B, Hazes B, Chang WS, Dafforn TR, Carrell RW, Read RJ, Lomas DA: Inactive conformation of the serpin alpha(1)-antichymotrypsin indicates two-stage insertion of the reactive loop: implications for inhibitory function and conformational disease. Proc Natl Acad Sci USA 2000, 97(1):67–72. 10.1073/pnas.97.1.67PubMed CentralView ArticlePubMedGoogle Scholar
- Grishin NV, Osterman AL, Brooks HB, Phillips MA, Goldsmith EJ: X-ray structure of ornithine decarboxylase from Trypanosoma brucei: the native structure and the structure in complex with alpha-difluoromethylornithine. Biochemistry 1999, 38(46):15174–15184. 10.1021/bi9915115View ArticlePubMedGoogle Scholar
- Grishin NV: Fold change in evolution of protein structures. J Struct Biol 2001, 134(2–3):167–185. 10.1006/jsbi.2001.4335View ArticlePubMedGoogle Scholar
- Tsai LC, Shyur LF, Lee SH, Lin SS, Yuan HS: Crystal structure of a natural circularly permuted jellyroll protein: 1,3–1,4-beta-D-glucanase from Fibrobacter succinogenes. J Mol Biol 2003, 330(3):607–620. 10.1016/S0022-2836(03)00630-2View ArticlePubMedGoogle Scholar
- Levdikov VM, Blagova EV, Brannigan JA, Cladiere L, Antson AA, Isupov MN, Seror SJ, Wilkinson AJ: The crystal structure of YloQ, a circularly permuted GTPase essential for Bacillus subtilis viability. J Mol Biol 2004, 340(4):767–782. 10.1016/j.jmb.2004.05.029View ArticlePubMedGoogle Scholar
- Shin DH, Lou Y, Jancarik J, Yokota H, Kim R, Kim SH: Crystal structure of YjeQ from Thermotoga maritima contains a circularly permuted GTPase domain. Proc Natl Acad Sci USA 2004, 101(36):13198–13203. 10.1073/pnas.0405202101PubMed CentralView ArticlePubMedGoogle Scholar
- Yuan X, Bystroff C: Non-sequential structure-based alignments reveal topology-independent core packing arrangements in proteins. Bioinformatics 2005, 21(7):1010–1019. 10.1093/bioinformatics/bti128View ArticlePubMedGoogle Scholar
- Uliel S, Fliess A, Unger R: Naturally occurring circular permutations in proteins. Protein Eng 2001, 14(8):533–542. 10.1093/protein/14.8.533View ArticlePubMedGoogle Scholar
- Jung J, Lee B: Circularly permuted proteins in the protein structure database. Protein Sci 2001, 10(9):1881–1886.PubMed CentralView ArticlePubMedGoogle Scholar
- Ponting CP, Russell RB: Swaposins: circular permutations within genes encoding saposin homologues. Trends Biochem Sci 1995, 20(5):179–180. 10.1016/S0968-0004(00)89003-9View ArticlePubMedGoogle Scholar
- Jeltsch A: Circular permutations in the molecular evolution of DNA methyltransferases. J Mol Evol 1999, 49(1):161–164. 10.1007/PL00006529View ArticlePubMedGoogle Scholar
- Viguera AR, Blanco FJ, Serrano L: The order of secondary structure elements does not determine the structure of a protein but does affect its folding kinetics. J Mol Biol 1995, 247(4):670–681. 10.1006/jmbi.1994.0171PubMedGoogle Scholar
- Ay J, Gotz F, Borriss R, Heinemann U: Structure and function of the Bacillus hybrid enzyme GluXyn-1: native-like jellyroll fold preserved after insertion of autonomous globular domain. Proc Natl Acad Sci USA 1998, 95(12):6613–6618. 10.1073/pnas.95.12.6613PubMed CentralView ArticlePubMedGoogle Scholar
- Ay J, Hahn M, Decanniere K, Piotukh K, Borriss R, Heinemann U: Crystal structures and properties of de novo circularly permuted 1,3–1,4-beta-glucanases. Proteins 1998, 30(2):155–167. 10.1002/(SICI)1097-0134(19980201)30:2<155::AID-PROT5>3.0.CO;2-MView ArticlePubMedGoogle Scholar
- Keitel T, Simon O, Borriss R, Heinemann U: Molecular and active-site structure of a Bacillus 1,3–1,4-beta-glucanase. Proc Natl Acad Sci USA 1993, 90(11):5287–5291. 10.1073/pnas.90.11.5287PubMed CentralView ArticlePubMedGoogle Scholar
- Pieper U, Hayakawa K, Li Z, Herzberg O: Circularly permuted beta-lactamase from Staphylococcus aureus PC1. Biochemistry 1997, 36(29):8767–8774. 10.1021/bi9705117View ArticlePubMedGoogle Scholar
- Wright G, Basak AK, Wieligmann K, Mayr EM, Slingsby C: Circular permutation of betaB2-crystallin changes the hierarchy of domain assembly. Protein Sci 1998, 7(6):1280–1285.PubMed CentralView ArticlePubMedGoogle Scholar
- Tougard P, Bizebard T, Ritco-Vonsovici M, Minard P, Desmadril M: Structure of a circularly permuted phosphoglycerate kinase. Acta Crystallogr D Biol Crystallogr 2002, 58(Pt 12):2018–2023. 10.1107/S0907444902015548View ArticlePubMedGoogle Scholar
- Barrientos LG, Louis JM, Ratner DM, Seeberger PH, Gronenborn AM: Solution structure of a circular-permuted variant of the potent HIV-inactivating protein cyanovirin-N: structural basis for protein stability and oligosaccharide interaction. J Mol Biol 2003, 325(1):211–223. 10.1016/S0022-2836(02)01205-6View ArticlePubMedGoogle Scholar
- Chu V, Freitag S, Le Trong I, Stenkamp RE, Stayton PS: Thermodynamic and structural consequences of flexible loop deletion by circular permutation in the streptavidin-biotin system. Protein Sci 1998, 7(4):848–859.PubMed CentralView ArticlePubMedGoogle Scholar
- Horne WS, Yadav MK, Stout CD, Ghadiri MR: Heterocyclic peptide backbone modifications in an alpha-helical coiled coil. J Am Chem Soc 2004, 126(47):15366–15367. 10.1021/ja0450408PubMed CentralView ArticlePubMedGoogle Scholar
- Manjasetty BA, Hennecke J, Glockshuber R, Heinemann U: Structure of circularly permuted DsbA(Q100T99): preserved global fold and local structural adjustments. Acta Crystallogr D Biol Crystallogr 2004, 60(Pt 2):304–309. 10.1107/S0907444903028695View ArticlePubMedGoogle Scholar
- Fliess A, Motro B, Unger R: Swaps in protein sequences. Proteins 2002, 48(2):377–387. 10.1002/prot.10156View ArticlePubMedGoogle Scholar
- Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O'Donovan C, Phan I, et al.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 2003, 31(1):365–370. 10.1093/nar/gkg095PubMed CentralView ArticlePubMedGoogle Scholar
- Szustakowski JD, Weng Z: Protein structure alignment using a genetic algorithm. Proteins 2000, 38(4):428–440. 10.1002/(SICI)1097-0134(20000301)38:4<428::AID-PROT8>3.0.CO;2-NView ArticlePubMedGoogle Scholar
- Dror O, Benyamini H, Nussinov R, Wolfson H: MASS: multiple structural alignment by secondary structures. Bioinformatics 2003, 19(Suppl 1):i95–104. 10.1093/bioinformatics/btg1012View ArticlePubMedGoogle Scholar
- Krissinel E, Henrick K: Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D Biol Crystallogr 2004, 60(Pt 12 Pt 1):2256–2268. 10.1107/S0907444904026460View ArticlePubMedGoogle Scholar
- Kolbeck B, May P, Schmidt-Goenner T, Steinke T, Knapp EW: Connectivity independent protein-structure alignment: a hierarchical approach. BMC Bioinformatics 2006, 7: 510. 10.1186/1471-2105-7-510PubMed CentralView ArticlePubMedGoogle Scholar
- Shih ES, Hwang MJ: Alternative alignments from comparison of protein structures. Proteins 2004, 56(3):519–527. 10.1002/prot.20124View ArticlePubMedGoogle Scholar
- Shih ES, Gan RC, Hwang MJ: OPAAS: a web server for optimal, permuted, and other alternative alignments of protein structures. Nucleic Acids Res 2006, 34(Web Server):W95–98. 10.1093/nar/gkl264PubMed CentralView ArticlePubMedGoogle Scholar
- Ilyin VA, Abyzov A, Leslin CM: Structural alignment of proteins by a novel TOPOFIT method, as a superimposition of common volumes at a topomax point. Protein Sci 2004, 13(7):1865–1874. 10.1110/ps.04672604PubMed CentralView ArticlePubMedGoogle Scholar
- Leslin CM, Abyzov A, Ilyin VA: TOPOFIT-DB, a database of protein structural alignments based on the TOPOFIT method. Nucleic Acids Res 2007, (35 Database):D317–321. [http://mozart.bio.neu.edu/topofit] 10.1093/nar/gkl809
- Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995, 247(4):536–540. 10.1006/jmbi.1995.0159PubMedGoogle Scholar
- Shindyalov IN, Bourne PE: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng 1998, 11(9):739–747. 10.1093/protein/11.9.739View ArticlePubMedGoogle Scholar
- Nagano N, Orengo CA, Thornton JM: One fold with many functions: the evolutionary relationships between TIM barrel families based on their sequences, structures and functions. J Mol Biol 2002, 321(5):741–765. 10.1016/S0022-2836(02)00649-6View ArticlePubMedGoogle Scholar
- Corbett KD, Shultzaberger RK, Berger JM: The C-terminal domain of DNA gyrase A adopts a DNA-bending beta-pinwheel fold. Proc Natl Acad Sci USA 2004, 101(19):7293–7298. 10.1073/pnas.0401595101PubMed CentralView ArticlePubMedGoogle Scholar
- Kresse HP, Czubayko M, Nyakatura G, Vriend G, Sander C, Bloecker H: Four-helix bundle topology re-engineered: monomeric Rop protein variants with different loop arrangements. Protein Eng 2001, 14(11):897–901. 10.1093/protein/14.11.897View ArticlePubMedGoogle Scholar
- Micklatcher C, Chmielewski J: Helical peptide and protein design. Curr Opin Chem Biol 1999, 3(6):724–729. 10.1016/S1367-5931(99)00031-9View ArticlePubMedGoogle Scholar
- Kolodny R, Petrey D, Honig B: Protein structure comparison: implications for the nature of 'fold space', and structure and function prediction. Curr Opin Struct Biol 2006, 16(3):393–398. 10.1016/j.sbi.2006.04.007View ArticlePubMedGoogle Scholar
- Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O, Lewis T, Bennett C, Marsden R, Grant A, Lee D, et al.: The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res 2005, (33 Database):D247–251.
- Abyzov A, Errami M, Leslin CM, Ilyin VA: Friend, an integrated analytical front-end application for bioinformatics. Bioinformatics 2005, 21(18):3677–3678. 10.1093/bioinformatics/bti602View ArticlePubMedGoogle Scholar
- Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE: UCSF Chimera--a visualization system for exploratory research and analysis. J Comput Chem 2004, 25(13):1605–1612. 10.1002/jcc.20084View ArticlePubMedGoogle Scholar
- ROOT software[http://root.cern.ch]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.