CUSP: an algorithm to distinguish structurally conserved and unconserved regions in protein domain alignments and its application in the study of large length variations
© Sandhya et al; licensee BioMed Central Ltd. 2008
Received: 15 January 2008
Accepted: 31 May 2008
Published: 31 May 2008
Distantly related proteins adopt and retain similar structural scaffolds despite length variations that could be as much as two-fold in some protein superfamilies. In this paper, we describe an analysis of indel regions that accommodate length variations amongst related proteins. We have developed an algorithm CUSP, to examine multi-membered PASS2 superfamily alignments to identify indel regions in an automated manner. Further, we have used the method to characterize the length, structural type and biochemical features of indels in related protein domains.
CUSP, examines protein domain structural alignments to distinguish regions of conserved structure common to related proteins from structurally unconserved regions that vary in length and type of structure. On a non-redundant dataset of 353 domain superfamily alignments from PASS2, we find that 'length- deviant' protein superfamilies show > 30% length variation from their average domain length. 60% of additional lengths that occur in indels are short-length structures (< 5 residues) while 6% of indels are > 15 residues in length. Structural types in indels also show class-specific trends.
The extent of length variation varies across different superfamilies and indels show class-specific trends for preferred lengths and structural types. Such indels of different lengths even within a single protein domain superfamily could have structural and functional consequences that drive their selection, underlying their importance in similarity detection and computational modelling. The availability of systematic algorithms, like CUSP, should enable decision making in a domain superfamily-specific manner.
Protein databanks such as the PDB , with nearly 47,000 structures in the current year, are growing at a rapid pace. Interestingly, the increase in the number of protein structures in the last decade is not accompanied by a concomitant rise in the number of novel folds. This suggests that protein folds are resilient to exploit their large degrees of conformational freedom and can tolerate large modifications in sequence and length. Structural comparisons of related proteins show that changes, in the form of substitutions, deletions or insertions are accommodated into existing protein scaffolds. Protein domains show from two-three residue variation to over two-fold length variations as in the PDB entries for P-loop NTP hydrolases and the TIM fold.
Recent studies correlating domain length variations with the taxonomy spans of domains report that over one-third of all domains tend to increase/decrease in domain size. The fraction of domains that increase in domain size is two-fold larger than domains that decrease in size. Analysis of protein length distributions across the main kingdoms have also shown that mean protein lengths are 40–60% greater in eukaryotes than in prokaryotes. Such expansions in length correlate with the accretion of functional motifs during the evolution of sophisticated regulation networks in higher eukaryotes.
Structural variation is influenced by the number, length and location of insertions and deletions of residues (indels) . Pascarella and Argos , noted that less than 2% of indels are longer than 10 residues suggesting that a gradual accretion of protein length through shorter indels can achieve structural diversity. Reeves and co-workers , in an analysis of domain variations in CATH superfamilies have shown that even at low sequence identities (< 30%), 50% of the domain structure is conserved. However, changes in the form of structural re-orientations and the number of structural elements are high between remotely related proteins. Domain length variations although discontinuous in sequence co-locate in 3D space and mediate functional variety.
In a separate analysis on the study of physical parameters between related domains of a superfamily, "structural templates" were shown to have a strong correlation of physical parameters such as solvent accessibility, hydrogen bonding patterns, spatial orientations and interactions between different members . Such segmental conservation of features suggests that such features are not as well preserved in poorly conserved regions resulting in structural and functional diversity amongst related domains through variable regions. Length accretions are critical in mediating structural and functional variety in proteins and it is, therefore, important to understand their properties and determine if class-distinct trends operate on protein domains. We extend earlier analysis on indel properties further by annotating such regions in terms of their preferred structural types, lengths and biochemical parameters and look for class-specific trends, if any. Such indels also extend functional and structural support to protein domains and this is also discussed briefly for a few superfamilies.
We report an algorithm, CUSP, which identifies conserved units of structure in proteins and distinguishes such regions from indels where length variations are introduced. The PASS2 database provides structure-based alignments of non-redundant representatives of protein superfamilies sharing < 40% sequence identity. Since initial equivalences are specified using STAMP 4.0 or LSQMAN, such alignments maximize structurally similar regions amongst related domains and distinguish them from indels that are structurally variable across different members. These alignments derived through COMPARER have examined protein domains that show not only low sequence conservation but also demonstrate variety in length and thus serve as ideal starting points to describe indel regions.
CUSP was used to examine length variations in 353 multi-membered superfamily alignments (> 3 sequence diverse relatives) from the PASS2 database . To determine if observed trends are affected by the inclusion of more proteins, sequence homologues of the structural entries in PASS2 were included from the GenDis database. In a separate analysis, the CUSP algorithm was also applied to such alignments to detect core features of a domain superfamily. Further, we have extended the study to analyze the conservation of a biochemical property such as solvent exposure in structurally conserved and unconserved regions.
353 structure-based superfamily alignments from PASS2 database , with more than 3 members, at < 40% identity cutoff and nearly equal representation from the four major structural classes (72, 81, 88 and 112 superfamilies from α, β, α/β (AorB) and α+β (AplusB) classes respectively) were considered for the analysis. Since PASS2 derives from SCOP hierarchical schemes, only non-redundant representatives are considered in the alignments and biases due to over-representation of similar structures are avoided.
CUSP: Detection of conserved units of structure in protein structural alignments
In a similar manner, average solvent accessibility scores were also associated with each structural block (not shown in the figure). Since the score is averaged over each position in the alignment, the block score is indicative of the extent of conservation of each structural type (H, C, E or -) in each block. Each structural block was associated with the tags 'Poor' (block score < 3), medium (block score 3–4.5) and high (block score 4.5–5.0). Finally, a consensus structural alignment is derived for the protein superfamily that not only delineates the structurally conserved blocks (SSB: H, E or C) from structurally unconserved blocks (USB: *, -) but also annotates such regions based on block scores as 'high, medium or poor' to indicate degree of conservation.
Validation of the algorithm and scoring schemes
The scoring scheme that we have employed was arrived at after examining domain superfamily alignments that varied in the number of representative members in the alignment. Although it is well appreciated that different approaches produce quite different alignments, structural alignments of ten superfamilies, derived independently using other alignment methods such as CE and CDD, were also tested with the CUSP scoring scheme. Primarily, we wanted to determine if the applied scores were robust in identifying structurally conserved features in related domains. The number of structurally equivalent positions reported by either method was obtained and compared with the number of 'core' conserved residues identified by the CUSP scoring scheme when applied to domain superfamily alignments from PASS2. In each of the superfamilies considered, the CUSP scoring scheme was robust in capturing strictly conserved features. Specifically, inherent biases in the superfamily, for instance, the conservation of a minimum of 4 helices in the cytochrome C superfamily, could be captured independent of the alignment method and to that extent the schemes employed are predictive and can describe the strictly conserved features of a superfamily.
Length variation in protein superfamilies
where l i = length of a protein superfamily member
M = Mean domain size of each superfamily
Application of CUSP on diverse folds and functional implications of indels
Structurally conserved features of several domain superfamilies, by careful examinations of multiple alignments, have been studied in detail in the past and are available in literature. We have applied CUSP to a few classical domain superfamilies such as globin, ferritin and cytochrome C domain superfamilies (see Additional information) to determine CUSPs performance in identifying core regions and in distinguishing indel regions in these well characterized folds. In addition, other structure alignment methods were also applied to such folds. The functional and structural implications of indel regions detected by CUSP were also examined.
Results and Discussion
Extent of length variation in protein domain superfamilies
List of length-rigid and length-deviant domain superfamilies. This list is shown only for helix-rich class. Please look into Additional Tables 1 and 2 for full list. *Highly populated domain superfamilies (> 10 numbers).
a) List of 'Length-rigid superfamilies' (> 4 members).
Average domain size
Nuclear receptor ligand-binding domain
Calponin-homology domain, CH-domain
b) List of 'Length-deviant superfamilies' (> 4 members).
Winged helix" DNA-binding domain
C-terminal effector domain of bipartite response regulator
Putative DNA-binding domain
IHF-like DNA-binding proteins
6-phosphogluconate dehydrogenase C-terminal domain-like
Terpenoid cylases/Protein prenyltransferases
CUSP assignments of structurally conserved and unconserved blocks in proteins
Structural modifications, it is observed, can form extensions of pre-existing structures or insert as new structural elements in the middle of domains. Such insertions although not contiguous in sequence may lie close to each other in structure and even form sub-domain like structures. Alternately, they may accrue as additional regular (α-helix, β-strand) and irregular structures (coils) at the N and C terminal ends (Table S2). Since CUSP delineates protein alignments into structurally conserved regions and unconserved regions, it would be useful to identify if a selection principle is operational in identifying where structural modifications, because of additional lengths, can occur in related protein domains.
Extent of length variation accommodated in SSB and USB
80% of length variations in length-deviant superfamilies from all classes are observed in USB regions with some superfamilies from the α/β class accommodating a wider range of length variation (Additional information, Section I: S2 – S4). Truncated structural elements account for 10% of these length differences (Additional Figure S1a).
Structural characteristics and lengths of 'indels'
The length distributions of such indels in different classes also show interesting trends. We find that 60% of indels are < 5 residues (Figure 3b). Medium-sized indels of between 5–10 residues are noticed in 20% of all indels in the dataset. Only 6% of all indels are found to be > 15 residues in length. Similar trends were also observed in earlier analysis on homologous superfamilies [5, 6] although on smaller and different datasets.
45% of the additional α-helices in indels of helix-rich length-deviant superfamilies are shorter than 5 residues (Figure 3c). A majority of the α-helices appearing in USB regions of β and α-β protein superfamilies are < 5 residues although in all superfamilies, longer α-helices (between 5 and 15 residues) are also observed (~20%). ~70% of indels appearing as β-strands are short length (< 5 residues) and this may relate to the cost involved in satisfying the inherent nature of β-strands to form sheets. Additional strands of longer length (> 10 residues) are observed in fewer than 5% of all length-deviant β-rich superfamilies. Such strands in indels could be extensions of pre-existing strands or occur as shorter length β-hairpins and, therefore, strands longer than 15 residues are not noticed in indel regions.
We observe that percentage variation in terms of the total number of α-helices, coils and β-strands is more in length-deviant superfamilies than in length-rigid superfamilies (see Additional information, Section I: S1, and Tables S4 and S5). In some of the length-deviant superfamilies (Additional Tables S2 and S5), the number of additional structures is large enough to form domain like structures.
Manual examination of the structural alignments of the giant and dwarf domains of length-deviant domain superfamilies shows that in all classes, the accretion of single, long secondary structures is less common and instead many short length indels are arranged to form super secondary structural motifs (Table S2). Thus, isolated or solvent-exposed extra secondary structures are avoided and additional units confer structural or functional support in each domain superfamily. In order to address if these trends are observed after including immediate sequence homologues of these superfamiles, we consulted the pre-curated results from GenDis database  for the top-five length-deviant and length-rigid superfamilies belonging to the four major structural classes. For each superfamily, between 250 to 800 sequence homologues were considered for assessing trends in length variation. We find similar trends of length variation in the superfamilies distinguished as "length-rigid" and "length-deviant" using structural homologues alone, even on the inclusion of sequence homologues in these superfamilies (data not shown). This suggests that superfamilies identified as length-deviant/rigid are likely to remain so even with the availability of more structures.
Solvent accessibility in conserved structural blocks
The conservation of a biochemical property such as solvent accessibility in regions annotated as SSB and USB was analyzed (see Additional information, S5) to determine if such regions behaved distinctly from each other. Additional Figure S2a shows that in Beta class superfamilies, structurally unconserved regions (USB) arising from indels or structural replacements are usually exposed to the solvent. Amongst these, in structurally conserved regions (SSB), β-strands show a distinct preference for avoiding solvent while coils and α-helices are partially/well- exposed to the solvent. Likewise, Additional Figure S2b shows the distribution of the average PSA scores in different types of structurally conserved blocks in all classes. (Additional Figures S3–S5 show trends in these parameters for other classes).
Application of CUSP algorithm in the identification of structural scaffolds
The CUSP algorithm examines structure-derived alignments to delineate structurally conserved regions from structurally variable regions in protein domain superfamilies. Thus, applications of the algorithm on domain superfamily alignments are well capable of identifying 'core' regions, common to all members, from 'variant' regions. In order to verify this, we have examined the scores assigned to various conserved blocks identified by the program on some well characterized folds such as the Globins, Ferritins and Cytochrome C (see Additional information). Each of these folds is known to show considerable variations in length that are accommodated as indels.
Ferritin like superfamily
This superfamily of the alpha-rich fold includes members that are di-iron carboxylate proteins. The average domain size of the superfamily is 250 residues and includes small domains such as ruberythrin (1dvba1, 147 residues) and giant domains such as methane monoxygenase hydroxylase/MMO (1mtyd,512 residues). The two domains catalyze dioxygen-dependent oxidation-hydroxylation reactions. All members are characterized by the presence of a duplicated motif consisting of two consecutive helices. An iron-coordinating glutamic or aspartic acid is located in the first helix and there is an EXXH (single-letter code for amino acids) motif in the second, but there are no other obvious sequence homologies. CUSP when applied to structure-based alignments for the domain superfamily detects the consecutive helices that strictly co-ordinate Fe (Figure 5b). These conserved helices, in fact, typify the conserved scaffold of the domain superfamily and are also detected from independently derived CE alignments of Ferritin domains. As seen in Table S3, both CE and CUSP agree well on the number of structurally equivalent residues for the domain superfamily. Large difference in size that occurs as additional helices and several loops are found to be associated with the number of interacting domains in the giant member MMO which is far more than ruberythrin. This difference could account for the acquisition of extra structural elements that can interact with different domains.
Role of indels in structural and functional diversity
In our analysis, we have estimated the extent of length variation in protein superfamilies and employed it as a measure of structural variation between homologous proteins. Numerous measures have been used to quantify protein structural similarity and these include RMSD, SSAP, contact maps, DALI and VAST scores [23–26]. We are interested in the tolerance of folds to large variations in length and have, therefore, employed standard deviation and mean length variation to determine this. Proteins of similar lengths may still differ in the orientations of individual secondary structures and adopt different folds. To that extent, a simple scoring scheme that parses pre-derived structural alignments of known related proteins from the PASS2 database and quantifies the extent of length variation in all protein superfamilies is used to empirically estimate trends emerging in the dataset. We have also performed the analysis on multi-membered domain superfamilies (> 3 members) for an empirical assessment of the data involving 353 domain superfamilies. Additionally, trends obtained in the dataset are noticed on consideration of the most length-deviant or highly populated domain superfamilies alone.
We have presented a method, CUSP, which processes protein structure-driven alignments to identify conserved structural units, common to all related proteins. In doing so, regions that allow variations to accumulate and confer uniqueness to each protein, annotated as USB, are also identified for every superfamily. The scoring schemes were arrived at after examining alignments derived independently from other approaches such as CE and CDD. In 8 of the 10 superfamilies examined, CUSP detects > 60% of the conserved residues reported by other alignment methods. For the two superfamilies which show < 45% coverage, the large difference in the number of structural entries examined may be responsible for the difference in performance. While the alignments from CDD included very close sequence homologues, the structural representatives considered in the CE alignment included domain members of similar lengths and also include more sequence diverse members. A strict cut-off of 75% is employed to characterize structural types at each alignment position as H, E or C and this in fact, increases the stringency of the scores. These assignment and scores therefore, are representative and predictive of SSB assignments in all and new sequence relatives in the superfamily. Cut-off schemes similar to ours have been employed earlier in JOY representations of structural alignments and in estimating equivalences of secondary structures (SSE) for deriving matrices.
We have also attempted a study of the domain contexts, associations of length deviant domains and their functional consequences (Table S2, and manuscript under preparation). Reeves et. al.,  have examined equivalent secondary structures between CATH superfamilies and suggest that such additional structural elements contribute effectively to functional variety in the highly populated superfamilies.
Since the CUSP algorithm works with a scoring scheme to detect consensus trends in a majority of the superfamily members, the extent of conservation of each structural type in each block is annotated and it is possible, therefore, to extract features that correlate with the extent of conservation of each structural type. An analysis of the nature of such USBs shows that additional lengths can either occur as extensions or insert in the middle of a protein structure. A class-specific trend for the type of structure adopted in indel regions has also emerged in the current analysis and each class prefers a specific type of structure (Figure 3, Figure S1 (b-d)). Figure 4 shows examples of different superfamilies that exhibit class-specific nature in accommodating length variations.
We find that in all superfamilies examined, the structurally unconserved regions amongst related proteins do not all retain a uniform pattern in solvent accessibility. This coincides with the expectation that it is in such regions that variation in lengths between proteins is introduced. To preserve the core scaffold, which may be the driving force in limiting the number of folds, indel regions are more prone to structural changes and this may result in greater solvent exposure in some proteins or alter protein surfaces to modify interaction interfaces. β-strands show a universal preference for solvent avoidance and this reflects the preference of such strands to avoid isolations from the protein core and integrate into the structure as well-ordered sheets (Table S2). In proteins of the α-β class, coils show a clear preference for solvent exposure, more so in α + β class superfamilies where they are vital in segregating α and β units. Inferences on solvent exposure, in the present analysis, are limited to individual domains of the proteins and do not consider multi-domain contexts and oligomerisation states of the proteins.
Based on the extent of length variation observed in different superfamilies, we have clustered all the superfamilies into length-rigid and length-deviant groups. Interestingly, length-rigid proteins are not as well-populated (as reflected in the number of members that are functionally diverse and in the number of families) as length-deviant proteins. While on the one hand, this does indicate that with the availability of more structures, trends in length-deviations could be affected in the identified rigid superfamilies, one may argue that such superfamilies are not preferred due to their strict length limitations and limited functional promiscuity (as reflected in the number of families). Length-deviant proteins, on the other hand, are found to include superfolds such as the P-loop NTP hydrolases, Ferredoxin folds etc., that have already been shown to be well represented in many genomes.
In many length-deviant protein superfamilies, despite large differences in length (over two fold in some cases), the core is often well preserved. The large additional lengths often do not involve the active site and in many cases they affect the oligomerization states and interacting surfaces of the protein (Ferritin like domain superfamily), introduce substrate-specificity (SH3 domains) and in some cases play an auto-regulatory role (Table S2). Since our analysis is derived from the PASS2 database of domain superfamilies, which in turn is guided by the domain definitions of SCOP, it is highly likely that severe length deviation, exhibited as additional domains, have escaped our attention.
These interesting trends that we have obtained on the nature and type of indels in protein superfamilies from different classes could impact the area of comparative modeling in indel regions of newer superfamily members. We have obtained some distinct trends on indels that are class-specific, with information on typical lengths. Such information, we expect, will be useful in the choice of specific structural types for newer relatives of protein superfamilies. Each superfamily shows a distinct trend in length variability and such information can be fed, by the assignment of variable gap penalties, into sequence alignment approaches to improve homology detection amongst members that vary considerably in length. We trust that such analyses would provide guiding principles during sequence searches, alignment and homology modeling of distant relationships.
R.S was an International Senior Research Fellow of the Wellcome Trust, U.K. S.S thanks the Council of Scientific and Industrial Research, India for PhD research fellowship. We gratefully acknowledge NCBS-TIFR for infrastructural support.
- Berman HM, Bhat TN, Bourne PE, Feng Z, Gilliland G, Weissig H, Westbrook J: The Protein Data Bank and the challenge of structural genomics. Nat Struct Biol 2000, 7(Suppl):957–959. 10.1038/80734View ArticleGoogle Scholar
- Wolf Y, Madej T, Babenko V, Shoemaker B, Panchenko AR: Long-term trends in evolution of indels in protein sequences. BMC Evol Biol 2007, 7: 19. 10.1186/1471-2148-7-19View ArticleGoogle Scholar
- Zhang J: Protein-length distributions for the three domains of life. Trends Genet 2000, 16(3):107–109. 10.1016/S0168-9525(99)01922-8View ArticleGoogle Scholar
- Bhaduri A, Pugalenthi G, Sowdhamini R: PASS2: an automated database of protein alignments organised as structural superfamilies. BMC Bioinformatics 2004, 5: 35. 10.1186/1471-2105-5-35View ArticleGoogle Scholar
- Pascarella S, Argos P: Analysis of insertions/deletions in protein structures. J Mol Biol 1992, 224(2):461–471. 10.1016/0022-2836(92)91008-DView ArticleGoogle Scholar
- Reeves GA, Dallman TJ, Redfern OC, Akpor A, Orengo CA: Structural diversity of domain superfamilies in the CATH database. J Mol Biol 2006, 360(3):725–741. 10.1016/j.jmb.2006.05.035View ArticleGoogle Scholar
- Chakrabarti S, Sowdhamini R: Regions of minimal structural variation among members of protein domain superfamilies: application to remote homology detection and modelling using distant relationships. FEBS Lett 2004, 569(1–3):31–36. 10.1016/j.febslet.2004.05.028View ArticleGoogle Scholar
- Russell RB, Barton GJ: Structural features can be unconserved in proteins with similar folds. An analysis of side-chain to side-chain contacts secondary structure and accessibility. J Mol Biol 1994, 244(3):332–350. 10.1006/jmbi.1994.1733View ArticleGoogle Scholar
- Kleywegt GJJT: A super position. CCP4/ESF-EACBM Newsletter on Protein Crystallography 1994, 31: 9–14.Google Scholar
- Sali A, Blundell TL: Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. J Mol Biol 1990, 212(2):403–428. 10.1016/0022-2836(90)90134-8View ArticleGoogle Scholar
- Pugalenthi G, Bhaduri A, Sowdhamini R: GenDiS: Genomic Distribution of protein structural domain Superfamilies. Nucleic Acids Res 2005, (33 Database):D252–255.Google Scholar
- Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995, 247(4):536–540.Google Scholar
- Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22(12):2577–2637. 10.1002/bip.360221211View ArticleGoogle Scholar
- Lee B, Richards FM: The interpretation of protein structures: estimation of static accessibility. J Mol Biol 1971, 55(3):379–400. 10.1016/0022-2836(71)90324-XView ArticleGoogle Scholar
- Mizuguchi K, Deane CM, Blundell TL, Johnson MS, Overington JP: JOY: protein sequence-structure representation and analysis. Bioinformatics 1998, 14(7):617–623. 10.1093/bioinformatics/14.7.617View ArticleGoogle Scholar
- Godzik A: The structural alignment between two proteins: is there a unique answer? Protein Sci 1996, 5(7):1325–1338.View ArticleGoogle Scholar
- Shindyalov IN, Bourne PE: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng 1998, 11(9):739–747. 10.1093/protein/11.9.739View ArticleGoogle Scholar
- Marchler-Bauer A, Bryant SH: CD-Search: protein domain annotations on the fly. Nucleic Acids Res 2004, (32 Web Server):W327–331. 10.1093/nar/gkh454Google Scholar
- Bashford D, Chothia C, Lesk AM: Determinants of a protein fold. Unique features of the globin amino acid sequences. J Mol Biol 1987, 196(1):199–216. 10.1016/0022-2836(87)90521-3View ArticleGoogle Scholar
- Lecomte JT, Vuletich DA, Lesk AM: Structural divergence and distant relationships in proteins: evolution of the globins. Curr Opin Struct Biol 2005, 15(3):290–301. 10.1016/j.sbi.2005.05.008View ArticleGoogle Scholar
- Nordlund P, Eklund H: Di-iron-carboxylate proteins. Curr Opin Struct Biol 1995, 5(6):758–766. 10.1016/0959-440X(95)80008-5View ArticleGoogle Scholar
- Lougheed JC, Holton JM, Alber T, Bazan JF, Handel TM: Structure of melanoma inhibitory activity protein, a member of a recently identified family of secreted proteins. Proc Natl Acad Sci USA 2001, 98(10):5515–5520. 10.1073/pnas.091601698View ArticleGoogle Scholar
- Mizuguchi K, Go N: Seeking significance in three-dimensional protein structure comparisons. Curr Opin Struct Biol 1995, 5(3):377–382. 10.1016/0959-440X(95)80100-6View ArticleGoogle Scholar
- Holm L, Sander C: Dali: a network tool for protein structure comparison. Trends Biochem Sci 1995, 20(11):478–480. 10.1016/S0968-0004(00)89105-7View ArticleGoogle Scholar
- Gibrat JF, Madej T, Bryant SH: Surprising similarities in structure comparison. Curr Opin Struct Biol 1996, 6(3):377–385. 10.1016/S0959-440X(96)80058-3View ArticleGoogle Scholar
- Orengo CA, Taylor WR: SSAP: sequential structure alignment program for protein structure comparison. Methods Enzymol 1996, 266: 617–635.View ArticleGoogle Scholar
- Johnson MS, Overington JP, Blundell TL: Alignment and searching for common protein folds using a data bank of structural templates. J Mol Biol 1993, 231(3):735–752. 10.1006/jmbi.1993.1323View ArticleGoogle Scholar