- Methodology article
- Open Access
Analysis of binding properties and specificity through identification of the interface forming residues (IFR) for serine proteases in silico docked to different inhibitors
- Cristina Ribeiro†1,
- Roberto C Togawa†2,
- Izabella AP Neshich3,
- Ivan Mazoni3,
- Adauto L Mancini3,
- Raquel C de Melo Minardi5,
- Carlos H da Silveira4,
- José G Jardine3,
- Marcelo M Santoro1 and
- Goran Neshich3Email author
© Ribeiro et al; licensee BioMed Central Ltd. 2010
- Received: 8 June 2010
- Accepted: 20 October 2010
- Published: 20 October 2010
Enzymes belonging to the same super family of proteins in general operate on variety of substrates and are inhibited by wide selection of inhibitors. In this work our main objective was to expand the scope of studies that consider only the catalytic and binding pocket amino acids while analyzing enzyme specificity and instead, include a wider category which we have named the Interface Forming Residues (IFR). We were motivated to identify those amino acids with decreased accessibility to solvent after docking of different types of inhibitors to sub classes of serine proteases and then create a table (matrix) of all amino acid positions at the interface as well as their respective occupancies. Our goal is to establish a platform for analysis of the relationship between IFR characteristics and binding properties/specificity for bi-molecular complexes.
We propose a novel method for describing binding properties and delineating serine proteases specificity by compiling an exhaustive table of interface forming residues (IFR) for serine proteases and their inhibitors. Currently, the Protein Data Bank (PDB) does not contain all the data that our analysis would require. Therefore, an in silico approach was designed for building corresponding complexes
The IFRs are obtained by "rigid body docking" among 70 structurally aligned, sequence wise non-redundant, serine protease structures with 3 inhibitors: bovine pancreatic trypsin inhibitor (BPTI), ecotine and ovomucoid third domain inhibitor. The table (matrix) of all amino acid positions at the interface and their respective occupancy is created. We also developed a new computational protocol for predicting IFRs for those complexes which were not deciphered experimentally so far, achieving accuracy of at least 0.97.
The serine proteases interfaces prefer polar (including glycine) residues (with some exceptions). Charged residues were found to be uniquely prevalent at the interfaces between the "miscellaneous-virus" subfamily and the three inhibitors. This prompts speculation about how important this difference in IFR characteristics is for maintaining virulence of those organisms.
Our work here provides a unique tool for both structure/function relationship analysis as well as a compilation of indicators detailing how the specificity of various serine proteases may have been achieved and/or could be altered. It also indicates that the interface forming residues which also determine specificity of serine protease subfamily can not be presented in a canonical way but rather as a matrix of alternative populations of amino acids occupying variety of IFR positions.
- Serine Protease
- Protein Data Bank
- Hydrophobic Residue
- Polar Residue
Serine proteases play an important role in processes such as blood clotting, digestion and in some pathways of cell development . Serine proteases can hydrolyze either peptide bonds or esters. Proteases digest proteins by hydrolyzing the peptide bonds which are responsible for keeping amino acids together [2, 3]. The cleavage specificity of elastase, trypsin, chymotrypsin and other serine proteases depends on the volume/size, form/shape, and polarity/charge/hydrophobicity of the specific part of a protein surface where a substrate will be docking - the specificity pocket [4, 5]. There are three amino acid residues responsible for the enzymatic activity that are present in all serine proteases, which are denominated as the catalytic triad: His 57, Asp 102 and Ser 195 (chymotrypsin numbering system is used throughout - see ). Interestingly, out of those three amino acids, only Asp 102 does not make part of the interface (the definition of which is based on decreased solvent accessible area upon substrate/inhibitor binding). This is due to the fact that Asp 102 is already not accessible to solvent in isolated enzyme, because this amino acid is located at the very bottom of the active site cleft where solvent molecules (water) can not access it. The role of the first two amino acids (His 57 and Asp 102) during trypsin catalysis, for example, is to function as a proton shuttle. Trypsin cleaves peptides after Lys and Arg residues with the co-participation of Asp 189, which interacts with the positive charge on peptide [7, 8]. Chymotrypsin, on the other hand, cleaves proteins after aromatic (and also large hydrophobic) residues . To achieve such specificity, in chymotrypsin, one can easily identify the existence of the hydrophobic pocket, normally shielded by Met 192. Thrombin is a protease which cleaves peptides with more specificity than trypsin: it requires Arg on "P1" position [10, 11]. Our main objective in this work is to expand the scope of studies that analyze enzyme specificity by including into observations not only the catalytic triad and binding pocket but also a wider category of amino acids which we have named the Interface Forming Residues (IFR) [12–15]. Namely, only a part of the molecular surface is shielded from solvent upon formation of a bi-molecular complex. The residues having a lower accessibility to solvent upon complex formation have an important role in the process of docking and also in defining specificity [15, 16]. Therefore, we were motivated to calculate which amino acids are shielded from solvent in the bi-molecular complexes involving serine proteases and different types of inhibitors and then create a table (matrix) of all amino acid positions at the interface and their respective occupancies. By mapping those amino acids as a specific IFR, we are now able to analyze characteristics of each position; and by doing so, we also can make position-specific alignment among different subfamilies of serine proteases. The key step we needed to solve during our procedural approach was to find a sufficient number of PDB  structures containing complexes of serine proteases with respective inhibitors. It became clear that we would need to either produce those by some novel method or abandon our work because there were not enough samples available in the PDB. The solution to this challenge is presented in details in materials and methods.
Interface Forming Residue (IFR) tables
In order to provide a sufficient volume of data for analysis of the interfaces around active sites of serine proteases (even though there is no enough structural information available currently on complexes formed by one particular protease with any specific substrate and/or inhibitor) we needed to employ the in silico approach for building corresponding complexes. The key feature of our work is mapping the IFR 3D profile into a 2D matrix--from a known enzyme-inhibitor structure to those with no known structure for such complex. Mapping is done after structurally aligning all serine proteases with non-redundant sequences.
List of all PDB files containing serine protease structures used for structural alignment
Analysis of the tabulated IFR data (as presented in the Figure 1) clearly indicates subfamily differences with respect to characteristics, position and frequency of occupancy of interface residues. The data compiled here can be used as an additional tool for characterizing function and specificity. This is particularly relevant for planning structure/sequence changes intended to alter protein activity, e.g., from trypsin to chymotrypsin type activity. More generally, one subfamily type of serine protease can be adjusted to behave like another if corresponding changes are made in an initial protease sequence. Attempts have been made to alter protein activity [4, 19], but they have focused on a single or, at most, 5 to10 residues. Our data, shown on Figure 1, indicate that such alterations can be achieved by eliminating key differences in IFR occupancy.
In general, positions were designated as "Ph" if they were occupied by mostly (present in at least 50% of complexes) P olar and occasionally some h ydrophobic residues. Similarly, IFR positions occupied by only C harged residues were marked as "C", those with only P olar residues as "P" and those occupied by only H ydrophobic residues as "H". A combination of letters would mean presence of residues belonging to multiple/distinct amino acid classes. Capital letters are used to designate a predominant residue presence (again, present in at least 50% of complexes analysed) while small letters designate an occasional presence.
There are less restrictive positions which can be occupied by different classes of amino acids in either one sub family (such as position 143 and 175) or across all subfamilies (positions 37, 60, 96, 97, 192 and 224). A specific analysis of the position 97, shows that it is less restrictive for trypsins (positively charged: K, R; polar: N, Y; negatively charged: D) and elastases (polar: N; negatively charged: D; hydrophobic: V) than for chymotrypsins (hydrophobic: L, F) and thrombins (positively charged: R).
From the data presented in Figures 2 and 4 (restrictive IFR positions of complexes formed with BPTI and ovomucoid third domain inhibitors, respectively) IFR locations 94 and 172 are shown to contain only polar residues (no presence of hydrophobic residues was found in contrast to the same locations in the Figure 3). By comparing the data from the figures 2, 3 and 4, the complexes of serine proteases with ecotine inhibitor compile 13 restrictive (gray), 10 half-restrictive (light gray) and 17 non-restrictive IFR positions (total: 40 positions). In the case of the BPTI and ovomucoid inhibitors, respective data are: 14, 7 and 15 (36); 11, 7 and 16 (34). Those numbers indicate that ecotine binding to serine proteases involves 32.5% restrictive, 25% less restrictive and 42.5% non restrictive IFR positions while BPTI and ovomucoid show respectively: 39%, 19.5% and 41.5%; 32%, 20.5% and 47%. Consequently, these numbers show that the ovomucoid inhibitor binding to serine proteases is less specific than the binding of ecotine, which in turn is less specific than BPTI. The experimental Ki [M] values reported in the literature for BPTI against trypsin is 6 × 10-14 while for the chymotrypsin it goes down to 9 × 10-9 and for elastase to 3.5 × 10-6 (ref: http://www.gbiosciences.com/ResearchProducts/Aprotinin.aspx). On the other side, turkey ovomucoid third domain was reported to interact with very similar association constant with eight different serine proteinases-bovine chymotrypsins A and B, porcine pancreatic elastase I, proteinase K, Streptomyces griseus proteinases A and B, and subtilisin-Carlsberg . Pál et al.  reported the following Ki (M) values for ecotine against trypsin, chymotrypsin and elastase, respectively: 1 × 10-12, 4 × 10-12 and 1.3 × 10-9.
Amino acid composition at serine protease interfaces
Some subfamilies do not follow these patterns. The interface of the streptogrisin subfamily shows a high presence of glycine residues for complexes with each of the three inhibitors (negative gray bars in Figure 5), while the rest of the surface shows preference for polar residues (green bars). Elastase shows the lowest differences among interface and surface preferences for 4 classes of amino acids defined in this work. Charged residues were found to be uniquely prevalent at the interfaces between the "miscellaneous-virus" subfamily and the three inhibitors. This prompts speculation about how important this difference in IFR characteristics is for maintaining virulence of those organisms. The upper part of the Figure 5 shows a strong presence of hydrophobic residues at the interfaces (together with polar residues including glycine), while the bottom part of the Figure 5 shows an absence (except for the 3 experimental complexes) of the hydrophobic residues at the interface (left portion of this graph). The bottom part of the Figure 5 shows an increase in the presence of hydrophobic residues at the free enzyme surface area. The surfaces (not including interface areas) have a prevalence of charged residues while the interface contains more polar (including glycine) residues (with some exceptions as described above). Thus, the IFR pocket of serine proteases is not formed by predominantly hydrophobic residues; it is a rather polar environment. Also, the interface contains much fewer (with one exception being "miscellaneous-virus") charged residues (as compared to the rest of the protease surface).
The percent of the total enzyme buried area occupied by amino acid sub-classes
Inhibitor bound to serine proteases
[%] Hydrophobic residues in buried area
[%] Charged residues in buried area
[%] Polar residues at buried area (glycine %)
[%] (Charged + Polar) residues in buried area
Ovomucoid Third domain
Those differences may be aligned with the corresponding function that each protease performs. However, it is important to realize the boundaries we should not cross when applying what we have learned here based on inhibitor-protease binary complexes to substrate-protease complexes. Nevertheless, we assume that inhibitors of serine proteases bind to the active site clefts of their target proteases in a manner that is thought to resemble the binding mode of substrates (Laskowski et al. ). Trypsin and thrombin handle the positive charges of a polypeptide by imposing the (complementary) charge/polarity of their own IFR surface. Chymotrypsin mitigates the spatial constraints imparted by large hydrophobic (and some polar) residues present in the substrate/inhibitor by accommodating glycines and hydrophobic residues into its own interface. Elastases, on the other hand, utilize hydrophobic amino acids among their IFRs to complement small, hydrophobic amino acids at the cleavage site of a polypeptide. The general trends for interfaces described above, when combined with the selectivity of IFR positions discussed earlier in this section, provide a more complete description of the specificity of serine protease subfamilies in both general and detailed way, although, as we expected from the beginning, there is no canonical code that we may convey from this analysis.
Benchmarking the accuracy of the "hard docking" procedure for IFR identification
Evaluation of the success rate (ACC) of predicting IFR ensemble
Coagulation factor VIII
Evaluation of plausibility for in silico generated bi-molecular complexes
The presence of steric clashes involving enzyme and inhibitor residues was analyzed to evaluate our hard docking methodology. Steric clashes could yield unfavourable interactions within complexes that would lead to their decreased stability which might be translated to their faster de complexation in vivo.
The data collected, analyzed and presented in this work serves as a tool for understanding binding properties in general and protease specificity in particular, based on Interface Forming Residue (IFR) profile alignment.
The procedure we have developed for this work describes a): the ensemble of IFRs for any complex formed between a serine protease and an inhibitor and b): the differences in interface characteristics between various proteases with respect to specificity. From our data one may not extract a simple, concise and very straightforward rule for separation among proteases in terms of specificity (defined here mainly by IFR position occupancy and characteristics). Nature obviously prepared a mechanism for fine tuning of the activity that yields terms of specificity for each subfamily of serine proteases. There is obviously some space left for overlaps among IFR characteristics and positions that dictate the specificity for each sub family and consequently, what could have been a simple table with reduced complexity in terms of "colors" and characteristics of amino acids employed at each IFR position, became somewhat blurred. The data presented here are precise in pointing to such granular/spectral distribution in IFR occupancy that would be directly coupled to the specificity type of studied enzymes. It is our perception that the beauty of the adaptability of molecular mechanisms, expressed here in form of their specificity, can still be presented in a such simplified way (described in Figures 1 to 4) showing how specificity can be fine-tuned from different points of entry (multiple, rather than single): IFR positions and characteristics of amino acids that occupy those critical positions. The procedure we described may also open the possibility to yield some details that may account for differences in how strong might be the binding between an enzyme and an inhibitor (an issue we need to explore in the future with more details based on elaborated IFR tables and experimental data available in literature).
It has been established that increased structural plasticity in the binding pocket increases the variation of substrate size that can fit into the critical space directed toward the catalytic triad which correspondingly decreases the specificity . Consequently, it is expected that the higher the stiffness around the binding pocket, the higher the selective pressure will be on a particular substrate. Specificity then can be seen as directly proportional to structural limitations imposed first by the size of the docking space and then by the physical and chemical characteristics of this space. By focusing our attention on the type of residues occupying the enzyme and inhibitor interfaces, we have set aside the evaluation of how plasticity of binding pocket influences enzyme specificity. This factor, however, is not taken as a non important one. In fact, we will be undertaking further examination in order to understand better how plasticity may be accounted for by IFR properties - therefore we will be tackling this task in the future. As a start, the residue types occupying the interface positions implicitly contain the sort of information which relates directly to plasticity of the site.
The work presented here offers insight into the structure/function relationship of serine proteases. The superposition of structurally aligned backbones of serine proteases showed that they use essentially identical scaffolding and achieve a variety of specificities by varying surface residues. Previously, there have been attempts to modify the functionality of serine proteases. These attempts usually focus on several residues and generally result in the loss of enzyme activity. Our approach explains why and corroborates with the results presented by Ma et al.  and Novozymes Biotech, Inc. (Davis, CA, US) which patented a technology (Microbial trypsin mutants having chymotrypsin activity - United States Patent 20050037368) by confirming necessity for several IFR amino acids to be either substituted, or deleted ("no-show" event) or to be introduced in original protease sequence in order to transform trypsin to chymotrypsin.
Based on the results we are reporting here, we are poised to assemble, in the near future, the serine protease superfamily interface data resource as an expanding collection of sequence, structural, and functional information about the serine proteases interface forming residues. A combination of graphics, images and numerical data will be used to aid in the complete analysis of structure/function relationships that still require our attention before we may fully understand the precise role of each amino acid employed at the particular location by enzymes in order to achieve desired specificity.
In order to provide a sufficient volume of data for analysis of the interfaces around active sites of serine proteases (even if there is no currently available information on complex formation of one particular protease with any specific substrate and/or inhibitor) we needed to employ the in silico approach for building corresponding complexes. The key feature of our work is mapping the IFR 3D profile into a 2D matrix--from a known enzyme-inhibitor structure to those with no known structure for such complex. Mapping is done after structurally aligning all serine proteases with non-redundant sequences.
Selection of serine protease: structural data set
The PDB IDs for serine proteases were obtained from SCOP database  parseable files (release 1.73). All 1086 PDB IDs from family b.47 were selected. FASTA files relative to these IDs were obtained in the PDB. A BLAST search (blastp), using default parameters, was run to determine the percentage of identity between the selected sequences. Those sequences with lower than 95% identity were selected for further examination (meaning that there should be at least 12 to 15 different residues occupying corresponding positions among selected sequences. In general, the sequences are 240 to 260 amino acids long, which means that applied filtering eliminates structures obtained after single, or even after limited multiple, point mutations). From the resulting subset (which we named here as the "primary datamart"), further selection eliminated all but those structures showing the highest resolution (lower numerical values in Å) and best R-value. This yielded what we named as a "secondary datamart". The "secondary datamart" at this point contains 67 serine proteases. Those structures covered the following sub families: tryptase, chymase, chymotrypsins, elastase, exfoliative toxin, glutamyl endopeptidase, kallikrein, NS3 protease, streptogrisin, trypsin, venom, thrombin, serine proteases from prokaryotes (which we named as "miscellaneous-prokaryotes"), serine proteases from viruses (which we called "miscellaneous-virus") and serine proteases that do not fit any of the mentioned sub families which we called "miscellaneous". The above described division resulted in 15 "sub-families" on which we applied additional manual inspection (in order to re-confirm their classification) by consulting the following sources: PDBsum , BRENDA  and SCOP parseable files.
The list of all corresponding PDB IDs for serine proteases used in this work is presented in the Table 1.
Selection of experimentally described enzyme/inhibitor structural complexes
From the 57 available PDB files containing inhibitor/serine protease complexes (data from January 2009), comprising 11 SCOP inhibitor families, the three largest families were selected and the same approach as described above was followed to select for what we named here as the three "prototype complexes". Complexes containing BPTI, ecotin and ovomucoid inhibitors correspond respectively to g.8, b.16 and g.68 SCOP subfamilies and are represented, respectively, by 1FY8 (inhibiting rat trypsin), 1AZZ (inhibiting crab collagenase) and 1CHO (inhibiting cattle alpha-chymotrypsin) PDB files. We named this group as "prototype complexes" because they represent the experimentally described complexes with structures deposited into the PDB.
Before the alignment procedure took place, we needed to edit the PDB files. The 1CHO.PDB file was edited to replace the three enzyme chain IDs (E, F, G) with a single "chain E". This was done in order to easily distinguish the enzyme chains from the inhibitor chain named "I". For the 1AZZ file, containing two chains, only chain A (enzyme) and chain C (inhibitor) were used (the PDB file contains a description of a hetero-dimer). The 1FY8 was not edited before further processing. In order to obtain an IFR ensemble for the serine proteases lacking an available structure for the complex with chosen inhibitor (or any inhibitor at all), we first structurally aligned all 70 selected serine proteases (67 from the "secondary datamart" plus 3 serine proteases from the "prototype complexes") using PrISM  and its set of default parameters, http://wiki.c2b2.columbia.edu/honiglab_public/index.php/Software:PrISM. The PrISM software also provided a multiple sequence alignment (being generated as an output obeying the structural alignment), which was saved and parsed to preserve both residue numbering and residue positions occupied in the structural alignment.
Rigid body docking
As the primary purpose was to delineate a general area defined as an interface, additional precision, as obtained by minimization and dynamics procedures (generally applied to remove space-clashes), was not considered critical and therefore not applied in this stage of the work. It is important to note that many complexes are in fact "forced" to be formed in silico by a "rigid body docking" procedure, but due to various factors, including insufficient shape complementary between the inhibitor side and corresponding binding pocket on the enzyme side, they would form with a certain degree of instability and would therefore be short lived. A qualitative description of this scenario is described in Figure 9 (also see Figure 6. for an estimate of the degree of space clashing among enzymes and inhibitors in all in silico formed complexes).
Surface determination in rigid body docked complexes
The three sets of 70 enzyme-inhibitor complex structures were also studied with respect to solvent accessibility of surface amino acids (calculated both before and upon inhibitor binding). The change in solvent accessibility of surface amino acids for both inhibitor and enzyme (in isolation and upon binding) was calculated by the Surfv algorithm . Residues with a change in solvent accessibility were compiled into the ensemble of IFRs.
Interface Forming Residues Table
For each set of 70 complexes, we mapped all of the sequence positions that belonged to the IFR ensemble (table shown at Figure 1). We took the native trypsin-BPTI complex (1FY8) as a reference for sequence numbering because it follows standard serine protease sequence numbering. All residues showing a detectable loss of accessible area were labeled in the multiple structure-based sequence alignment obtained with PrISM according to the residue positions in the sequence of 1FY8 chain E. In the case of gap in the reference sequence, a letter was added to the previous number (designating the position of the particular amino acid in a given sequence), so that the label keeps following the reference sequence numbering. Columns without at least one residue labeled as an IFR were eliminated to facilitate the analysis. For the case of BPTI, 36 residues occupying IFR positions were compiled (table shown at Figure 2) and ranged from positions 37 to 228 (also see Figure 1). Corresponding numbers for the ecotine and ovomucoid third domain are 40 IFRs from 37 to 226 and 34 IFRs from 37 to 217A, respectively. The multiple alignment of IFRs for the 70 serine proteases complexed with BPTI is presented in the Figure 1 (the corresponding tabular data for ecotine and ovomucoid inhibitor are not shown here). The derivatives from the tables with 70 serine proteases bound to BPTI, ecotine, and ovomucoid third domain are presented in the tables shown at Figures 2, 3 and 4; there we have only those columns which had some prevalent residues occupying that specific position. Particular differences among 4 principal classes of serine proteases were aimed. In other words, some positions were eliminated as they had a same residue present in each of the four sub-families of serine proteases (such as the position of HIS 57 which does not contribute to differentiation among 4 subfamilies). Those positions which were not identified as an IFR for the particular pair of serine protease and inhibitor are presented by a "-" sign. It is important to mention that the Asp_102 is not seen in Figure 1 due to a very peculiar characteristic of the catalytic pocket (as explained previously in the "Background" session).
Once we constructed the 3 sets of 70 complexes of proteases with respective inhibitors, we made an inquiry into the relationships between structural characteristics and specificity of enzymes. We calculated the difference in residue type occupancy of the total enzyme surface and the respective interfaces. The goal was to find if the interface was different from the rest of the protein surface in terms of types of residues present. Using the Surfv algorithm, we calculated the total enzyme surface area and determined the fraction occupied by each of four classes of amino acids. The same procedure was applied for the IFR area. Amino acid classes are as indicated: Charged, Polar, Hydrophobic and Glycine (a single member sub group of polar). The results of this inquiry are shown in Figure 5, as well as tabulated in the Table 2, and are discussed in more detail in results and discussion sections.
Amino acid classification and color coding
Before we can discuss our findings, it is necessary to explain the founding principles for amino acid classification (and their color coding) that we adopted for this work. In this paper we decided to pay special attention to three amino acid classes (hydrophobic, polar and charged) and also to treat, in a separate way, the three amino acids G, C and H. In the case of Gly, its occasionally given separate consideration because of its peculiarity: Gly has one hydrogen atom instead of the side chain (in a sense, Gly residues received an individual treatment here just as in the Ramachandran plot , for example). In case of the Cys, special treatment is given because it is the only residue that forms disulfide bridges. Lastly, His is treated separately because this residue is "neutral polar" at physiological pH but positively charged when protonated.
The amino acid color code is an adaptation of "aminochromography" suggested by William Taylor . Residues AVLIMFP are colored grey (small hydrophobic [A and P]) and also large hydrophobic [VLIMF]). Residues STYNQWG are colored green (polar - we are including Y and W to this ensemble although both residues have aromatic rings: phenol and indole, respectively). Those aromatic rings are often considered as another important feature for Y and W - the source of their hydrophobicity. The hydrophobic character of W and Y is often imposed above their polar characteristics. However, we consider W as an ambivalent residue (polar and hydrophobic) and Y as a polar residue. Residues D and E are colored red (negatively charged). Residues R and K are colored blue (positively charged). Residue C is colored yellow (disulphide bridge forming), and residue H [positively charged but neutral at neutral pH] is coloured cyan (light-blue).
The classification of amino acids follows the scheme given in "Structural Bioinformatics", page 18  and is supported by hydrophobicity scales by Janin , Kyte and Doolitle  and Rose . It is known that some of the hydrophobicity tables cited here (and in other papers found in general literature) present an apparent contradiction with respect to classification of certain amino acids. For example G and W are often a subject of controversy in terms of being considered polar or hydrophobic. We needed to decide which option to use in our paper, and the preceding description defines which one we opted for. It is clear therefore that an alternative classification is also possible although, we would consider it less precise.
Validation of Methodology
To validate our methodology for defining IFRs through "hard docking", we selected 16 native complexes (not present in the "secondary datamart") of serine proteases bound to BPTI--the inhibitor with the largest number of native complexes available in the PDB (Table 3 lists the PDB IDs of the selected complexes). We investigated how successful our methodology was at predicting IFR residues. Namely, by using the 1FY8 complex as a reference and transferring the inhibitor coordinates to the enzyme chains of the sixteen complexes, we compared real IFRs and ones obtained by rigid body docking. This was done by aligning the enzyme chains of the sixteen complexes using PyMOL and rigid body docking the inhibitor of 1FY8. The docking results were then compared to those reported for native bound inhibitors. Afterwards, we used the Surfv algorithm to determine the residues composing the interface (IFR) of the sixteen in silico complexes. The same algorithm was used to determine the IFRs in the respective native complexes. A comparison between rigid body docking and native IFR in natural complexes was then possible. From these results, a "confusion matrix"  was generated. Detailed analysis of corresponding data is given in results and discussion sections.
We would like to sincerely thank José Miguel Ortega (Biodados-UFMG) for critically reviewing the manuscript and his indispensable comments and suggestions. This work was developed as part of CR PhD thesis, which has been sponsored by the Brazilian Ministry of Education (CAPES) and Foundation for Research Support of Minas Gerais State (FAPEMIG). We would also like to express our appreciation to Marko Spasic from University of California, Berkeley for the careful reading and editing of this manuscript.
- Neurath H: Proteolytic processing and physiological regulation. Trends Biochem Sci 1989, 14: 268–271. 10.1016/0968-0004(89)90061-3View ArticlePubMedGoogle Scholar
- Hedstrom L: An overview of serine proteases. Curr Protoc Protein Sci 2002., Chapter 21: Unit 21 10Google Scholar
- Hedstrom L: Serine protease mechanism and specificity. Chem Rev 2002, 102: 4501–4524. 10.1021/cr000033xView ArticlePubMedGoogle Scholar
- Perona JJ, Craik CS: Structural basis of substrate specificity in the serine proteases. Protein Sci 1995, 4: 337–360. 10.1002/pro.5560040301PubMed CentralView ArticlePubMedGoogle Scholar
- Bartlett GJ, Porter CT, Borkakoti N, Thornton JM: Analysis of catalytic residues in enzyme active sites. J Mol Biol 2002, 324: 105–121. 10.1016/S0022-2836(02)01036-7View ArticlePubMedGoogle Scholar
- Greer J: Comparative modeling methods: application to the family of the mammalian serine proteases. Proteins 1990, 7: 317–334. 10.1002/prot.340070404View ArticlePubMedGoogle Scholar
- Ruhlmann A, Kukla D, Schwager P, Bartels K, Huber R: Structure of the complex formed by bovine trypsin and bovine pancreatic trypsin inhibitor. Crystal structure determination and stereochemistry of the contact region. J Mol Biol 1973, 77: 417–436. 10.1016/0022-2836(73)90448-8View ArticlePubMedGoogle Scholar
- Read RJ, James MN: Refined crystal structure of Streptomyces griseus trypsin at 1.7 A resolution. J Mol Biol 1988, 200: 523–551. 10.1016/0022-2836(88)90541-4View ArticlePubMedGoogle Scholar
- Matthews BW, Sigler PB, Henderson R, Blow DM: Three-dimensional structure of tosyl-alpha-chymotrypsin. Nature 1967, 214: 652–656. 10.1038/214652a0View ArticlePubMedGoogle Scholar
- Bode W, Mayr I, Baumann U, Huber R, Stone SR, Hofsteenge J: The refined 1.9 A crystal structure of human alpha-thrombin: interaction with D-Phe-Pro-Arg chloromethylketone and significance of the Tyr-Pro-Pro-Trp insertion segment. Embo J 1989, 8: 3467–3475.PubMed CentralPubMedGoogle Scholar
- Schechter I, Berger A: On the size of the active site in proteases. I. Papain. Biochem Biophys Res Commun 1967, 27: 157–162. 10.1016/S0006-291X(67)80055-XView ArticlePubMedGoogle Scholar
- Neshich G, Mazoni I, Oliveira SR, Yamagishi ME, Kuser-Falcao PR, Borro LC, Morita DU, Souza KR, Almeida GV, Rodrigues DN, et al.: The Star STING server: a multiplatform environment for protein structure analysis. Genet Mol Res 2006, 5: 717–722.PubMedGoogle Scholar
- Neshich G, Borro LC, Higa RH, Kuser PR, Yamagishi ME, Franco EH, Krauchenco JN, Fileto R, Ribeiro AA, Bezerra GB, et al.: The Diamond STING server. Nucleic Acids Res 2005, 33: W29–35. 10.1093/nar/gki397PubMed CentralView ArticlePubMedGoogle Scholar
- Neshich G, Mancini AL, Yamagishi ME, Kuser PR, Fileto R, Pinto IP, Palandrani JF, Krauchenco JN, Baudet C, Montagner AJ, Higa RH: STING Report: convenient web-based application for graphic and tabular presentations of protein sequence, structure and function descriptors from the STING database. Nucleic Acids Res 2005, 33: D269–274. 10.1093/nar/gki111PubMed CentralView ArticlePubMedGoogle Scholar
- Neshich G, Togawa RC, Mancini AL, Kuser PR, Yamagishi ME, Pappas G Jr, Torres WV, Fonseca e Campos T, Ferreira LL, Luna FM, et al.: STING Millennium: A web-based suite of programs for comprehensive and simultaneous analysis of protein structure and sequence. Nucleic Acids Res 2003, 31: 3386–3392. 10.1093/nar/gkg578PubMed CentralView ArticlePubMedGoogle Scholar
- Da Silva MC, de Sa MF, Chrispeels MJ, Togawa RC, Neshich G: Analysis of structural and physico-chemical parameters involved in the specificity of binding between alpha-amylases and their inhibitors. Protein Eng 2000, 13: 167–177. 10.1093/protein/13.3.167View ArticlePubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28: 235–242. 10.1093/nar/28.1.235PubMed CentralView ArticlePubMedGoogle Scholar
- Sridharan S, Nicholls A, Honig B: A new vertex algorithm to calculate solvent accessible surface areas. Biophys J 1992, 61: A174.Google Scholar
- Page MJ, Cera ED: Is it possible to transform trypsin to thrombin? IUBMB Life 2007, 59: 413–414. 10.1080/15216540600994357View ArticlePubMedGoogle Scholar
- Laskowski M, Ardelt W: Turkey Ovomucoid Third Domain Inhibits Eight Different Serine Proteinasees of Varied Specificity on the Same ..Leu18-Glu19...Reactive Site. Biochemistry 1985, 24: 5313–30. 10.1021/bi00341a007View ArticlePubMedGoogle Scholar
- Pál G, Sprengel G, Patthy A, Gráf L: Aleteration of the specificity of ecotine, an E coli serine proteinase inhibitor, by site directed mutagenesis. FEBS Letters 1994, 342: 57–60. 10.1016/0014-5793(94)80584-9View ArticlePubMedGoogle Scholar
- Janin J, Bahadur RP, Chakrabarti P: Protein-protein interaction and quartenary structure. Q Rev of Biophys 2008, 41(2):133–180.View ArticleGoogle Scholar
- Laskowski M, Qasim MA: What can the structures of enzyme-inhibitor complexes tell us about the structures of enzyme substrate complexes? Biochim Biophys Acta 2000, 1477(1–2):324–37.View ArticlePubMedGoogle Scholar
- da Silveira CH, Pires DE, Minardi RC, Ribeiro C, Veloso CJ, Lopes JC, Meira W Jr, Neshich G, Ramos CH, Habesch R, Santoro MM: Protein cutoff scanning: A comparative analysis of cutoff dependent and cutoff free methods for prospecting contacts in proteins. Proteins 2009, 74: 727–743. 10.1002/prot.22187View ArticlePubMedGoogle Scholar
- Bone R, Silen JL, Agard DA: Structural plasticity broadens the specificity of an engineered protease. Nature 1989, 339: 191–195. 10.1038/339191a0View ArticlePubMedGoogle Scholar
- Ma W, Tang C, Lai L: Specificity of trypsin and chymotrypsin: loop-motion-controlled dynamic correlation as a determinant. Biophys J 2005, 89: 1183–1193. 10.1529/biophysj.104.057158PubMed CentralView ArticlePubMedGoogle Scholar
- Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, Murzin AG: Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res 2008, 36: D419–425. 10.1093/nar/gkm993PubMed CentralView ArticlePubMedGoogle Scholar
- Laskowski RA, Chistyakov VV, Thornton JM: PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids. Nucleic Acids Res 2005, 33: D266–268. 10.1093/nar/gki001PubMed CentralView ArticlePubMedGoogle Scholar
- Barthelmes J, Ebeling C, Chang A, Schomburg I, Schomburg D: BRENDA, AMENDA and FRENDA: the enzyme information system in 2007. Nucleic Acids Res 2007, 35: D511–514. 10.1093/nar/gkl972PubMed CentralView ArticlePubMedGoogle Scholar
- Yang AS, Honig B: Sequence to structure alignment in comparative modeling using PrISM. Proteins 1999, (3):66–72. 10.1002/(SICI)1097-0134(1999)37:3+<66::AID-PROT10>3.0.CO;2-KGoogle Scholar
- Delano WL: The PyMOL Molecular Graphics System.Delano Scientific, San Carlos, CA, USA; 2002. [http://www.pymol.org]Google Scholar
- Ramachandran GN, Ramarkrishnan C, Sasisekharan V: Stereochemistry of polypeptide chain conformations. J Mol Biol 1963, 7: 95–99. 10.1016/S0022-2836(63)80023-6View ArticlePubMedGoogle Scholar
- Taylor WR: Residual colours: a proposal for aminochromography. Protein Eng 1997, 10: 743–746. 10.1093/protein/10.7.743View ArticlePubMedGoogle Scholar
- Philip E: Bourne and Helge Weissig. Structural Bioinformatics, Willey-Liss; 2003.Google Scholar
- Janin J: Surface and Inside Volumes in Globular Proteins. Nature 1979, 277: 491–492. 10.1038/277491a0View ArticlePubMedGoogle Scholar
- Kyte J, Doolite R: A Simple Method for Displaying the Hydropathic Character of a Protein. J Mol Biol 1982, 157: 105–132. 10.1016/0022-2836(82)90515-0View ArticlePubMedGoogle Scholar
- Rose G, Geselowitz A, Lesser G, Lee R, Zehfus M: Hydrophobicity of Amino Acid Residues in Globular Proteins. Science 1985, 229: 834–838. 10.1126/science.4023714View ArticlePubMedGoogle Scholar
- Kohavi R, Provost F: Glossary of Terms. Machine Learning 1998, 30: 271–274. 10.1023/A:1017181826899View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.