Systematic analysis of short internal indels and their impact on protein folding
© Kim and Guo; licensee BioMed Central Ltd. 2010
Received: 11 February 2010
Accepted: 4 August 2010
Published: 4 August 2010
Protein sequence insertions/deletions (indels) can be introduced during evolution or through alternative splicing (AS). Alternative splicing is an important biological phenomenon and is considered as the major means of expanding structural and functional diversity in eukaryotes. Knowledge of the structural changes due to indels is critical to our understanding of the evolution of protein structure and function. In addition, it can help us probe the evolution of alternative splicing and the diversity of functional isoforms. However, little is known about the effects of indels, in particular the ones involving core secondary structures, on the folding of protein structures. The long term goal of our study is to accurately predict the protein AS isoform structures. As a first step towards this goal, we performed a systematic analysis on the structural changes caused by short internal indels through mining highly homologous proteins in Protein Data Bank (PDB).
We compiled a non-redundant dataset of short internal indels (2-40 amino acids) from highly homologous protein pairs and analyzed the sequence and structural features of the indels. We found that about one third of indel residues are in disordered state and majority of the residues are exposed to solvent, suggesting that these indels are generally located on the surface of proteins. Though naturally occurring indels are fewer than engineered ones in the dataset, there are no statistically significant differences in terms of amino acid frequencies and secondary structure types between the "Natural" indels and "All" indels in the dataset. Structural comparisons show that all the protein pairs with short internal indels in the dataset preserve the structural folds and about 85% of protein pairs have global RMSDs (root mean square deviations) of 2Å or less, suggesting that protein structures tend to be conserved and can tolerate short insertions and deletions. A few pairs with high RMSDs are results of relative domain positions of the proteins, probably due to the intrinsically dynamic nature of the proteins.
The analysis demonstrated that protein structures have the "plasticity" to tolerate short indels. This study can provide valuable guides in modeling protein AS isoform structures and homologous proteins with indels through placing the indels at the right locations since the accuracy of sequence alignments dictate model qualities in homology modeling.
Sequence insertions/deletions (indels) occur during evolution and alternative splicing (AS) process in eukaryotes. The generation of various protein isoforms through alternative splicing has been considered as one of the major evolutionary mechanisms for increasing the proteome size and functional diversity [1, 2]. Recent high-throughput analysis based on mRNA-SEQ data from diverse human tissue and cell lines suggested that alternative splicing is almost universal (up to 94%) in human multi-exon genes . While there are several types of splicing events that result in different splice isoforms when compared to the primary sequences, such as truncation, substitution, insertion and deletion, the internal insertion/deletion cases are the dominant form of alternative splicing variants and are of great interest due to its potential impact on the folding and stability of isoform structures [3, 4]. In addition, genes containing "switch-like" exons are more likely to have isoforms with indels . It is critical to our understanding of the function of alternatively spliced protein isoforms if we know how sequence changes, especially sequence insertions and deletions, affect the structure of the splice variants as structures hold key information for the function of proteins.
Our current knowledge about how alternative splicing affects protein structures is very limited. While there are about 28,000 annotated protein isoforms from recent UniProt release 15.11 (November 24, 2009)  and over 60,000 protein structures deposited in Protein Data Bank (PDB) , fewer than 10 pairs of alternatively spliced isoforms have documented structures . Prediction of isoform structures generally falls into the category of homology modeling. However, homology modeling of proteins with indels is not a trivial task. The key to the success in homology modeling with indels is alignment accuracy, especially the positioning of the insertion or deletion sequences. For example, several groups at CASP8 (the 8th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction) used the same protein 2G39 as the template to model target protein T0438, but only three of nine models placed the insertion sequence (12 amino acids) in the right place . Another infamous/famous example in indel positioning is the modeling of the long AS isoform of Piccolo C2A domain that has a nine-residue insertion in a loop. Instead of folding as part of the loop, the nine-residue insert displaces a β-strand that is pushed into the calcium-binding region through local rearrangement, leading to a dramatic change in calcium binding affinity .
While it is generally believed that insertions and deletions are well tolerated in loops [10, 11], insertions and deletions within secondary structures (α-helices and β-sheets) may have a dramatic effect on the overall structure and are considered deleterious and unfavorable during evolution [4, 12]. Tress et al. argued that AS isoform is probably an unlikely route to increase functional diversity due to probably large structural impact induced by indels . Yet in a number of studies with genetically engineered insertions and deletions on T4 lysozyme, Matthews' group showed that the protein has structural plasticity to tolerate indels within secondary structures [13–15]. Three recent large scale analyses also offered a similar view that protein structures have some degree of "plasticity" to tolerate insertions and deletions through maintaining the same structural fold [16–18].
The major goal of this paper is to investigate the impact of short internal indels (less than 40 amino acids) on protein structures, especially for indels within secondary structures. Large indels may fold as an individual domain or the protein pairs may adopt different folds due to the large differences between two sequences . Terminal indels are not considered in this study as terminal fragments are relatively flexible and terminal deletion/truncation have become a standard protocol in recombinant protein expression and protein crystallization [19–21]. In addition, the widely-used cloning vectors with His-tags introduce sequence artifacts that are included in the PDB SEQRES records and determining the exact tag sequences is problematic [22, 23]. Though there are several similar surveys about indel statistics since 1992 [10, 24–27], our approach is different as our goal is to study the impact of indels on protein structure and to provide guidance for isoform structure prediction. Therefore it is critical that the locations of the indels are unique and unambiguous. Otherwise the structural changes would be less well defined. For example, it is not uncommon that two proteins with high global sequence identity have a low local sequence similarity . To address this issue, we take local sequence similarity into account and only consider protein pairs with both high global and local (sequences flanking the indels) sequence similarity (>75%) to ensure the uniqueness of indel sequences and positions. In addition, we include the "disordered conformation" in our structural analysis. It has been demonstrated that intrinsically disordered or unstructured regions are responsible for many important cellular functions and a link between alternative splicing and protein intrinsic disorder has been recently reported [17, 29, 30].
In this paper, we report a systematic analysis of a large non-redundant indel dataset with highly homologous protein pairs. Previously we found that the immunoglobulin (Ig) family, rich in certain amino acids including tyrosine, glycine, and serine in the third complementarity-determining region of the Ig heavy chain (CDR-H3), was overrepresented in an indel dataset [28, 31, 32]. Therefore those Ig-related indel sequences are not considered in our current analysis. Our results show that internal indels tend to have less regular secondary structures (α-helices and β-strands), but are rich in "disordered conformation", which is in line with the work by Romero et al . Our data also show that proteins with short indels, including the ones within regular secondary structures, generally preserve the structural fold with some local structure rearrangement and refolding presumably for structural stability and functionality. The source of the indel, either naturally occurring or experimentally engineered, are described and the statistical significance of the features from natural indels is discussed. A webserver SCINDEL http://bioinfozen.uncc.edu/scindel was developed for convenient visualization of indel induced structural changes.
Generation of a non-redundant dataset of short internal indels
The second step of the procedure is to ensure the uniqueness of indel sequences and locations by checking the local sequence similarity. Two proteins with high global sequence identity may have regions that show low local sequence similarity . If an indel happens to be in the low sequence similarity area, the placement of this indel may change dramatically with minor changes of alignment parameters, resulting in different indel sequences and locations. In our approach, similarities of the sequences flanking the indels (20 amino acids on each side) were calculated. We only consider indel sequences from protein pairs with both high global sequence similarity and highly similar flanking regions (above a cutoff value of 75%).
Lastly, indel sequences were further processed to generate a non-redundant dataset of indels ranging from 2 to 40 amino acids. Two indel sequences are considered redundant if two protein pairs are from the same family and have the same indel sequences with very similar secondary structures at approximately the same residue positions.
To check if an internal indel is a result from engineered mutants or from natural variants, we combined the information from the SEQADV record of PDB files with manual inspection of related publications. The PDB SEQADV record describes conflicts between residue sequences in the ATOM/HETATM records and those in sequence databases . Since there are several possible reasons for these conflicts, including engineered mutants, natural variants, disordered fragments, or cloning artifact, careful manual inspection is needed.
Secondary structure types and relative solvent accessibility of indel residues
Each indel residue was assigned to one of four secondary structure states, helix, strand, coil and disordered. DSSP program was used to assign the first three secondary structure states: helix, strand and coil . Following the widely used convention, H (α-helix), G (310-helix) and I (π-helix) from DSSP are classified as helix type while E (extended strand) and B (residue in isolated β-bridge) states are classified as strand type. All the other states from DSSP are considered as coil. The disordered residues are defined by comparing the "ATOM" and "SEQRES" records in PDB file. If a residue or a fragment appears in "SEQRES", but is missing from the "ATOM" record in a PDB file, this residue or fragment is considered as disordered or unstructured . The relative solvent accessibility was calculated by dividing the absolute value from DSSP by the maximum accessibility of each residue . We employ a three-state classification for relative solvent accessibility: buried (≤7%), intermediate (>7% and ≤37%), and exposed (>37%) . The disordered/unstructured residues were considered as exposed in solvent accessibility analysis.
For comparison purpose, a non-redundant data set with 4731 protein chains, in which no pair of protein chains has more than 25% sequence identity, each structure has a resolution of better than 2.5 Å, and the size is in the 50-1000 amino acids range, was used as background analysis for amino acid composition, secondary structure types, and residue solvent accessibility.
Protein sequence and structure comparisons
Sequence alignment is done with the Needle program with default parameters [35, 36]. Two different structure alignment programs, FAST  and CE , were used for global and local structure alignment respectively. The structural difference/similarity is measured by the Cα RMSD of aligned residues between two structures. The structural changes induced by indels were evaluated by comparing the structure and sequence alignments of each pair. A webserver SCINDEL http://bioinfozen.uncc.edu/scindel was developed for convenient visualization of both the sequence and structure alignments and structural changes caused by indels.
Results and discussion
A non-redundant dataset of short internal indels from highly homologous protein pairs
The protein chains were first clustered into 11,541 groups using BLASTClust as described in the Methods section. The first filtering step of removing redundant protein chains in each cluster resulted in 1,932 clusters that have two or more members. A total of 1,237,062 internal indels were identified from 445,552 distinct protein pairs. A dataset of 1,114 non-redundant indels were generated after removing indels that are false, redundant, or the results of low local sequence similarity. As described earlier , the dataset is rich in indels (931 of 1114) derived from one specific family member, immunoglobulin variable domain (b.1.1.1) based on the latest SCOP release 1.75 . These indels are generally located in the CDR-H3 loops that play crucial roles in antigen recognition and binding specificity [32, 45]. The CDR-H3 loops are dominated by residues tyrosine, glycine, and serine, but have fewer lysine, glutamine, and glutamic acid [28, 31, 32]. Due to the over-representation of indels from the immunoglobulin family, these indels were removed and the final non-redundant indel dataset includes 183 indels. The detailed information for each indel, including length, amino acid sequence, host proteins, start and end positions of the indel, and SCOP classification is available at http://bioinfozen.uncc.edu/scindel/nonredundant_indels.html.
Statistical analysis of the indel sequences and structures
We also checked the conformation of five residues flanking the indel sequences on each side and found that more residues are in β-sheet or α-helix states when they are further way from the indel sites (Additional file 1, Figure S2).
Source of the non-redundant indel sequences
A number of studies have been devoted to study the structural and functional consequences of insertions and deletions with artificially engineered constructs [12–15]. For example, Matthews' group has used T4 lysozyme as a model system to investigate the effect of insertions on protein structures, including an insertion in an α-helix [13, 14]. The very question about the indel dataset in this study is how many of the indel sequences are from naturally occurring proteins rather than "man-made". Based on the SEQADV records of PDB files and manual inspection, we found that about 70% of the indel sequences in our dataset are the results of engineered insertion/deletion mutants. Besides the T4 lysozyme structural studies, the majority of the engineered insertion/deletion mutants were constructed to investigate the functional importance of a particular fragment. Deletion mutants have also served as one of the rational engineering approaches to better crystallization in protein structure determination . Indels from protein pair 2J4OA-2POPA (19 AAs) and 2RHSB-2RHQB (4 AAs) are such examples. The indel statistics from naturally occurring indels in terms of secondary structure types and solvent accessibilities are significantly different from those of the reference dataset and are highly similar to those derived from all indels (Figures 4B and 4C). Though there are variations in amino acid frequencies between all indels and naturally occurring ones, the general trend is surprisingly similar (Figure 4A). Due to the small size of the naturally occurring indel dataset (55 indels, 263 residues), the significances of differences between the observed numbers of amino acids, secondary structure types, and relative solvent accessibility and the expected numbers (using either "Background" or "All" indels as references) were calculated with χ2 test. Not surprisingly, there are significant differences between the observed numbers in the naturally occurring indel dataset and the expected numbers (based on background distributions) with p-values of 5.1e-4, 2.6e-68, and 9.32e-41 for amino acid, secondary structure types, and relative solvent accessibility, respectively (Additional file 1, Table S2). More importantly, there are no statistically significant differences between the naturally occurring indels and the all indel dataset in terms of amino acid frequencies (p = 0.46) and secondary structure types (p = 0.91) (Additional file 1, Table S2), suggesting that the features from our all indel dataset may represent the properties of real, non-engineered indel sequences. Statistical analysis also showed that the levels of solvent accessibility are different at a significance level of 0.01 between all indels and naturally occurring indels (p = 0.0094), largely due to the differences in the intermediate and exposed categories (Figure 4C). Unlike secondary structure types, the classification of relative solvent accessibility into three categories is rather broad and arbitrary; so the differences may become minor if different bins or classifications are used.
Impacts on structural changes by indels
We shall point out that even though all the short indels (engineered or natural) in our dataset do not show big impact on protein structures, it does not necessary mean that short internal indels have no deleterious effect. First of all, all the proteins with natural indels in our dataset are probably the ones that survived from evolution events while the others with dramatic structural effect might have disappeared during evolution. Secondly, even in the cases where indels do not induce structural change, a disastrous loss of function may occur. Nevertheless, these data (natural and engineered indels) strongly suggest the inherent structural plasticity of protein structures [16–18].
We performed a systematic study to investigate the impact of short internal indels on protein structures through mining the highly homologous protein pairs in PDB. In addition to protein evolution, indels can be the results of alternative splicing. We found that short internal indels tend to occur between secondary structure elements and a significant number of indels are disordered, which is in agreement with the earlier studies that demonstrated the associations among indels, structural disorder, and functional diversity . The rationale of choosing highly homologous protein pairs (with high sequence identity for both overall and indel flanking sequences) is two-fold: 1) to avoid "random" positioning of indel(s) in a protein pair due to low local sequence similarity even though overall sequence similarity is high; and 2) to provide a better approximation to the AS isoforms with internal gaps (100% identical in indel flanking regions). These steps ensure unambiguous indel sequences and their unique positions, reducing the possibility of including false indels due to sequence alignment errors.
One important observation from this study is that most of the indels in the dataset are not derived from evolution events. Indels have been engineered into proteins for various purposes, including structural and functional studies of short peptides and better protein crystallization. Our statistical analysis showed that there are significant differences between naturally occurring indels and the control dataset. On the other hand, there are no statistically significant differences between naturally occurring indels and all indels in terms of amino acid frequencies and secondary structure types. These data suggest that the indel properties derived from our all indel dataset are very useful.
The very question about modeling isoform structures or structural changes due to indels is how to improve the sequence alignment for comparative modeling since the performance of current comparative modeling techniques rely heavily on accurate alignments . Very rarely, sequence alignment errors can be recovered by current comparative modeling programs. We believe this systematic analysis, along with earlier reports on the case studies with individual or a small number of indel pairs, will help us in this regard as well as in our understanding of the structural plasticity of proteins.
List of abbreviations used
Protein Data Bank
Root Mean Square Deviation
This research was supported by the NSF CAREER grant (DBI#0844749) and UNC Charlotte startup fund to JTG. The authors would like to thank Mr. Jon McCafferty for his help with programming.
- Pennisi E: Why do humans have so few genes? Science (New York, NY) 2005, 309(5731):80..View ArticleGoogle Scholar
- Xing Y, Lee C: Alternative splicing and RNA selection pressure--evolutionary consequences for eukaryotic genomes. Nature reviews 2006, 7(7):499–509. 10.1038/nrg1896View ArticlePubMedGoogle Scholar
- Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB: Alternative isoform regulation in human tissue transcriptomes. Nature 2008, 456(7221):470–476. 10.1038/nature07509PubMed CentralView ArticlePubMedGoogle Scholar
- Tress ML, Martelli PL, Frankish A, Reeves GA, Wesselink JJ, Yeats C, Olason PI, Albrecht M, Hegyi H, Giorgetti A, Raimondo D, Lagarde J, Laskowski RA, Lopez G, Sadowski MI, Watson JD, Fariselli P, Rossi I, Nagy A, Kai W, Storling Z, Orsini M, Assenov Y, Blankenburg H, Huthmacher C, Ramirez F, Schlicker A, Denoeud F, Jones P, Kerrien S, et al.: The implications of alternative splicing in the ENCODE protein complement. Proceedings of the National Academy of Sciences of the United States of America 2007, 104(13):5495–5500. 10.1073/pnas.0700800104PubMed CentralView ArticlePubMedGoogle Scholar
- UniProt-Consortium: The Universal Protein Resource (UniProt) 2009. Nucleic acids research 2009, (37 Database):D169–174.
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic acids research 2000, 28(1):235–242. 10.1093/nar/28.1.235PubMed CentralView ArticlePubMedGoogle Scholar
- Stetefeld J, Ruegg MA: Structural and functional diversity generated by alternative mRNA splicing. Trends in biochemical sciences 2005, 30(9):515–521. 10.1016/j.tibs.2005.07.001View ArticlePubMedGoogle Scholar
- Keedy DA, Williams CJ, Headd JJ, Arendall WB, Chen VB, Kapral GJ, Gillespie RA, Block JN, Zemla A, Richardson DC, Richardson JS: The other 90% of the protein: assessment beyond the Calphas for CASP8 template-based and high-accuracy models. Proteins 2009, 77(Suppl 9):29–49. 10.1002/prot.22551PubMed CentralView ArticlePubMedGoogle Scholar
- Garcia J, Gerber SH, Sugita S, Sudhof TC, Rizo J: A conformational switch in the Piccolo C2A domain regulated by alternative splicing. Nature structural & molecular biology 2004, 11(1):45–53.View ArticleGoogle Scholar
- Pascarella S, Argos P: Analysis of insertions/deletions in protein structures. Journal of molecular biology 1992, 224(2):461–471. 10.1016/0022-2836(92)91008-DView ArticlePubMedGoogle Scholar
- Wright CF, Christodoulou J, Dobson CM, Clarke J: The importance of loop length in the folding of an immunoglobulin domain. Protein Eng Des Sel 2004, 17(5):443–453. 10.1093/protein/gzh052View ArticlePubMedGoogle Scholar
- Laskowski RA, Thornton JM: Understanding the molecular machinery of genetics through 3 D structures. Nature reviews 2008, 9(2):141–151.View ArticlePubMedGoogle Scholar
- Heinz DW, Baase WA, Dahlquist FW, Matthews BW: How amino-acid insertions are allowed in an alpha-helix of T4 lysozyme. Nature 1993, 361(6412):561–564. 10.1038/361561a0View ArticlePubMedGoogle Scholar
- Sagermann M, Baase WA, Matthews BW: Sequential reorganization of beta-sheet topology by insertion of a single strand. Protein Sci 2006, 15(5):1085–1092. 10.1110/ps.052018006PubMed CentralView ArticlePubMedGoogle Scholar
- Vetter IR, Baase WA, Heinz DW, Xiong JP, Snow S, Matthews BW: Protein structural plasticity exemplified by insertion and deletion mutants in T4 lysozyme. Protein Sci 1996, 5(12):2399–2415. 10.1002/pro.5560051203PubMed CentralView ArticlePubMedGoogle Scholar
- Wang P, Yan B, Guo JT, Hicks C, Xu Y: Structural genomics analysis of alternative splicing and application to isoform structure modeling. Proceedings of the National Academy of Sciences of the United States of America 2005, 102(52):18920–18925. 10.1073/pnas.0506770102PubMed CentralView ArticlePubMedGoogle Scholar
- Romero PR, Zaidi S, Fang YY, Uversky VN, Radivojac P, Oldfield CJ, Cortese MS, Sickmeier M, LeGall T, Obradovic Z, Dunker AK: Alternative splicing in concert with protein intrinsic disorder enables increased functional diversity in multicellular organisms. Proceedings of the National Academy of Sciences of the United States of America 2006, 103(22):8390–8395. 10.1073/pnas.0507916103PubMed CentralView ArticlePubMedGoogle Scholar
- Birzele F, Csaba G, Zimmer R: Alternative splicing and protein structure evolution. Nucleic acids research 2008, 36(2):550–558. 10.1093/nar/gkm1054PubMed CentralView ArticlePubMedGoogle Scholar
- Halle B: Flexibility and packing in proteins. Proceedings of the National Academy of Sciences of the United States of America 2002, 99(3):1274–1279. 10.1073/pnas.032522499PubMed CentralView ArticlePubMedGoogle Scholar
- Li X, Romero P, Rani M, Dunker AK, Obradovic Z: Predicting Protein Disorder for N-, C-, and Internal Regions. Genome Inform Ser Workshop Genome Inform 1999, 10: 30–40.PubMedGoogle Scholar
- Dale GE, Oefner C, D'Arcy A: The protein as a variable in protein crystallization. Journal of structural biology 2003, 142(1):88–97. 10.1016/S1047-8477(03)00041-8View ArticlePubMedGoogle Scholar
- Carson M, Johnson DH, McDonald H, Brouillette C, Delucas LJ: His-tag impact on structure. Acta Crystallogr D Biol Crystallogr 2007, 63(Pt 3):295–301. 10.1107/S0907444906052024View ArticlePubMedGoogle Scholar
- David FP, Yip YL: SSMap: a new UniProt-PDB mapping resource for the curation of structural-related information in the UniProt/Swiss-Prot Knowledgebase. BMC bioinformatics 2008, 9: 391. 10.1186/1471-2105-9-391PubMed CentralView ArticlePubMedGoogle Scholar
- Benner SA, Cohen MA, Gonnet GH: Empirical and structural models for insertions and deletions in the divergent evolution of proteins. Journal of molecular biology 1993, 229(4):1065–1082. 10.1006/jmbi.1993.1105View ArticlePubMedGoogle Scholar
- Wrabl JO, Grishin NV: Gaps in structurally similar proteins: towards improvement of multiple sequence alignment. Proteins 2004, 54(1):71–87. 10.1002/prot.10508View ArticlePubMedGoogle Scholar
- Chang MS, Benner SA: Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments. Journal of molecular biology 2004, 341(2):617–631. 10.1016/j.jmb.2004.05.045View ArticlePubMedGoogle Scholar
- Hsing M, Cherkasov A: Indel PDB: a database of structural insertions and deletions derived from sequence alignments of closely related proteins. BMC bioinformatics 2008, 9: 293. 10.1186/1471-2105-9-293PubMed CentralView ArticlePubMedGoogle Scholar
- Kim RM, J Guo J-T: A systematic study of homologous protein structures with insertions/deletions. In Computational Systems Bioinformatics: 2009. Volume 000. Stanford, CA, USA: Life Sciences Society; 2009:103–113.Google Scholar
- Dunker AK, Lawson JD, Brown CJ, Williams RM, Romero P, Oh JS, Oldfield CJ, Campen AM, Ratliff CM, Hipps KW, Ausio J, Nissen MS, Reeves R, Kang C, Kissinger CR, Bailey RW, Griswold MD, Chiu W, Garner EC, Obradovic Z: Intrinsically disordered protein. J Mol Graph Model 2001, 19(1):26–59. 10.1016/S1093-3263(00)00138-8View ArticlePubMedGoogle Scholar
- Iakoucheva LM, Brown CJ, Lawson JD, Obradovic Z, Dunker AK: Intrinsic disorder in cell-signaling and cancer-associated proteins. Journal of molecular biology 2002, 323(3):573–584. 10.1016/S0022-2836(02)00969-5View ArticlePubMedGoogle Scholar
- Birtalan S, Zhang Y, Fellouse FA, Shao L, Schaefer G, Sidhu SS: The intrinsic contributions of tyrosine, serine, glycine and arginine to the affinity and specificity of antibodies. Journal of molecular biology 2008, 377(5):1518–1528. 10.1016/j.jmb.2008.01.093View ArticlePubMedGoogle Scholar
- Zemlin M, Klinger M, Link J, Zemlin C, Bauer K, Engler JA, Schroeder HW Jr, Kirkham PM: Expressed murine and human CDR-H3 intervals of equal length exhibit distinct repertoires that differ in their amino acid composition and predicted range of structures. Journal of molecular biology 2003, 334(4):733–749. 10.1016/j.jmb.2003.10.007View ArticlePubMedGoogle Scholar
- Wang G, Dunbrack RL Jr: PISCES: a protein sequence culling server. Bioinformatics (Oxford, England) 2003, 19(12):1589–1591. 10.1093/bioinformatics/btg224View ArticleGoogle Scholar
- Needleman SB, Wunsch CD: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology 1970, 48(3):443–453. 10.1016/0022-2836(70)90057-4View ArticlePubMedGoogle Scholar
- Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 2000, 16(6):276–277. 10.1016/S0168-9525(00)02024-2View ArticlePubMedGoogle Scholar
- Zhang Y, Stec B, Godzik A: Between order and disorder in protein structures: analysis of "dual personality" fragments in proteins. Structure 2007, 15(9):1141–1147. 10.1016/j.str.2007.07.012PubMed CentralView ArticlePubMedGoogle Scholar
- Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22(12):2577–2637. 10.1002/bip.360221211View ArticlePubMedGoogle Scholar
- Vucetic S, Obradovic Z, Vacic V, Radivojac P, Peng K, Iakoucheva LM, Cortese MS, Lawson JD, Brown CJ, Sikes JG, Newton CD, Dunker AK: DisProt: a database of protein disorder. Bioinformatics (Oxford, England) 2005, 21(1):137–140. 10.1093/bioinformatics/bth476View ArticleGoogle Scholar
- Miller S, Janin J, Lesk AM, Chothia C: Interior and surface of monomeric proteins. Journal of molecular biology 1987, 196(3):641–656. 10.1016/0022-2836(87)90038-6View ArticlePubMedGoogle Scholar
- Kim D, Xu D, Guo JT, Ellrott K, Xu Y: PROSPECT II: protein structure prediction program for genome-scale applications. Protein engineering 2003, 16(9):641–650. 10.1093/protein/gzg081View ArticlePubMedGoogle Scholar
- Zhu J, Weng Z: FAST: a novel protein structure alignment algorithm. Proteins 2005, 58(3):618–627. 10.1002/prot.20331View ArticlePubMedGoogle Scholar
- Shindyalov IN, Bourne PE: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein engineering 1998, 11(9):739–747. 10.1093/protein/11.9.739View ArticlePubMedGoogle Scholar
- Andreeva A, Howorth D, Brenner SE, Hubbard TJ, Chothia C, Murzin AG: SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic acids research 2004, (32 Database):D226–229. 10.1093/nar/gkh039Google Scholar
- Xu JL, Davis MM: Diversity in the CDR3 region of V(H) is sufficient for most antibody specificities. Immunity 2000, 13(1):37–45. 10.1016/S1074-7613(00)00006-6View ArticlePubMedGoogle Scholar
- Grishin NV: Fold change in evolution of protein structures. Journal of structural biology 2001, 134(2–3):167–185. 10.1006/jsbi.2001.4335View ArticlePubMedGoogle Scholar
- Guo JT, Jaromczyk JW, Xu Y: Analysis of chameleon sequences and their implications in biological processes. Proteins 2007, 67(3):548–558. 10.1002/prot.21285View ArticlePubMedGoogle Scholar