- Research article
- Open Access
Context dependent reference states of solvent accessibility derived from native protein structures and assessed by predictability analysis
© Singh and Ahmad; licensee BioMed Central Ltd. 2009
- Received: 09 September 2008
- Accepted: 27 April 2009
- Published: 27 April 2009
Solvent accessibility (ASA) of amino acid residues is often transformed from absolute values of exposed surface area to their normalized relative values. This normalization is typically attained by assuming a highest exposure conformation based on extended state of that residue when it is surrounded by Ala or Gly on both sides i.e. Ala-X-Ala or Gly-X-Gly solvent exposed area. Exact sequence context, the folding state of the residues, and the actual environment of a folded protein, which do impose additional constraints on the highest possible (or highest observed) values of ASA, are currently ignored. Here, we analyze the statistics of these constraints and examine how the normalization of absolute ASA values using context-dependent Highest Observed ASA (HOA) instead of context-free extended state ASA (ESA) of residues can influence the performance of sequence-based prediction of solvent accessibility. Characterization of burial and exposed states of residues based on this normalization has also been shown to provide better enrichment of DNA-binding sites in exposed residues.
We compiled the statistics of highest observed ASA (HOA) of residues in their different contexts and analyzed their distribution in all 400 possible combinations for each residue type. We observe that many trippetides are more exposed than ESA and that HOA residues are often found in turn, coil and bend conformations. On the other hand several residues are never observed in an exposure state close to ESA values. A neural networks trained with HOA-normalized data outperforms the one trained with ESA-normalized values. However, the improvements are subtle in some residues, while they are more significant in others.
HOA based normalization of solvent accessibility from native structures is proposed and it shows improvement in sequence-based predictability, as well as enrichment in interface residues on surface. There may still be some difference between the highest possible ASA and highest observed ASA due to an insufficiently covered space of ASA distribution in the PDB, which limit the overall improvement in prediction to a relatively modest degree.
- Prediction Performance
- Extended State
- Solvent Accessibility
- Accessible Surface Area
- Mean Absolute Error
Protein three-dimensional structure prediction directly from amino acid sequence is an important issue in bioinformatics. An intermediate approach to this problem is to predict the so-called one-dimensional structural properties of proteins. The solvent accessibility or accessible surface area (ASA) of an amino acid residue in a protein structure is one such property and the knowledge of this property can significantly enhance the overall structure and function prediction of proteins [1, 2]. Given an amino acid sequence, the goal of such prediction is to estimate the ASA of each residue making use of previously observed ASA values taken from known protein structures. The knowledge from previously observed structures is modeled using machine learning and other methods [3–16]. Various methods of predicting ASA from sequence or sequence-derived evolutionary information have been developed such as neural networks [8–12], Bayesian analysis , information theory [14, 15], multiple linear regressions [16, 17], and support vector machine [18–22]. Among these, machine-learning methods such as neural networks [8–12] and support vector machines [18–22] have been shown to be the most effective for ASA prediction. Although methods to predict solvent accessibility of each atom have also been developed, more actively pursued area is to estimate residue-wise ASA .
In almost all ASA prediction methods, solvent accessibility is first normalized to its relative value. On the one hand, it is required for training some computational models with bound-value outputs and on the other, it gives a better idea of fractional exposure of a residue normalized by a hypothetical maximally exposed residue. Mere restricting the model outputs to finite values could have been achieved by simply rescaling all residue ASAs ignoring their identity (e.g. transforming all values by a sigmoidal function). However, the values obtained by such transformation would have little physical meaning. Moreover, the trained parameters required to model such transformed values may make the relationship between residue environments and target ASA values even more difficult to model. Thus, each residue ASA is typically normalized by its corresponding extended state ASA (ESA) values, which uses the reference state of Ala-X-Ala for normalizing ASA of residue X . Thus 20 ESA values are used to normalize all residue ASAs.
We argue that this type of normalization- although better than a single value for all amino-acids, still suffers from two shortcomings. First of all, currently employed extended state of a tripeptide has no practical meaning for the residues in folded proteins and hence reference states should come from folded proteins rather than extended states. Secondly, the structural constraints imposed by actual sequence neighbors of residues are different from the case when the residue is surrounded by Ala residues on its C- and N- terminals. First of these two questions (folded context versus extended state) could be answered by using the highest values of observed ASA as reference rather than the extended state. The second question of sequence context may be answered by using 20 × 20 × 20 possible reference states instead of 20. There will still be a limitation that the highest observed ASA may not represent the highest possible ASA value due to the insufficient number of solved structures, and one has to be content with the approximations introduced by this.
Our primary benchmark in ASA normalization here is to estimate the effect of improved scaling criterion on the improvement in its prediction from sequence. An unbiased role of normalization in prediction can be assessed by using high quality data sets and developing prediction models for the two systems of normalization under similar conditions. Most non-redundant data sets of protein structures, including those used for ASA prediction are based on similarity and resolution conditions and largely ignore the incidence of missing atoms in structures [e.g. [18, 22, 25]]. This may be especially important for accurate calculation of solvent accessibility for each residue. To unambiguously determine the role of normalization in prediction, we develop new data sets of protein structures by systematic quality check on structures and removing samples with missing atoms. Neural networks are then trained by using two different sets of values as target vectors. Finally, the ASA values from both cases are transformed to absolute values in area units and performance comparison is made in terms of the mean absolute error and coefficient of correlation between predicted and experimental values of ASA in area units. Results indicate that HOA based normalization can improve the performance of neural network based prediction. The improvement depends on the type of residue and the score used to measure. Improvement in the correlation coefficient between predicted and experimental values reaches to as high as 10% using HOA instead of ESA for normalization. We also demonstrate that this type of normalization may be effective in estimating interface residues simply from their over-exposed status.
Distribution of context-dependent ASAs
Many tripeptides have higher ASA in folded state than the Ala-X-Ala extended state
Number of over-exposed residues with their exposed surface area (ASA) greater than the Ala-X-Ala extended state ASA (ESA).
In some tripeptides ESA values are never observed
Figure 1 shows that all residues in general but the hydrophobic ones in particular (e.g. Trp) have several tripeptide contexts which lie to the left of dropped line showing ESA values in their plots. This means that there are many tripeptides environments in which the central residue remains in a partially buried state, either due to a folded nature of the tripeptide or due to inevitable long range contacts. This argument is limited by the fact that some of the highest observed ASA (HOA) values may not represent the actual highest possible ASA, because the structures showing them more exposed may eventually be solved in future.
HOA histograms of some residue types have sharp peaks
Some residues such as His, Lys and Ser show a sharp peak in their histograms and most HOA values are close to ESA values. This suggests that the highest exposure these residues can have, depends only weakly on their neighbors and HOA-normalized values will at best rescale the ESA-normalized values in these cases. On the other hand residues such as Cys and Trp have less sharp peaks in their histogram, showing that the highest ASA of these residues strongly depends on their sequence neighbors. However, flatness of histograms is also caused by a low frequency of these residues in protein structures. It remains to be explored if these residues will continue to have leptokurtosis in their histograms when more data on their tripeptides becomes available.
Observation of ASA values higher than ESA has led in the past to relative ASA being more than 100%. Using HOA-normalization, relative values will never cross 100% and may be more suitable for using a machine learning method for prediction. However, the tripeptide data may be modified when more data becomes available and the results presented here may also need minor revisions with more solved structures.
Correlations between HOA- and ESA- normalized values are strong with subtle differences
Frequency of residues in Ala-X-Ala conformations, their extended state ASA (ESA) values and highest observed ASA (HOA) obtained from the entire data set of proteins (8.9 million residues, overall including residues with different sequence neighbors).
Further statistics and implications of even these small differences in HOA and ESA-normalized values are examined in the following sections.
Overall ASA distribution of a tripeptide
HOA residues primarily come from coil, turn and bend conformations
Distribution of 8000 HOA residues environments in various secondary structures.
% HOA residues
Number of HOA residues
Why do HOA values differ from ESA values?
Actual protein environment instead of a purely computational extended state is used in our proposed method. Traditionally Ala-X-Ala or Gly-X-Gly environments are generated in simulation software. These software programs produce a hypothetical state of a tripeptide, which may never actually be observed in a protein or even a fully capped tripeptide. In our approach, we scan the entire set of observed tripeptides in actual crystal structures and the effect of solvent, charge screening and the effect of subsequent peptide bonds on neighbors are implicitly taken care of. Thus the new approach of finding a normalization value is more realistic than currently used method.
Instead of assuming Ala residues on both sides, a detailed residue context is used that allow for taking care of additional constraints as well as potential role in over-exposing some of the residues in a given context.
HOA from native structures versus molecular calculations
Are there sufficient examples in PDB sto obtain highest observed ASA (HOA) close enough to highest possible ASA (HPA) for each tripeptide environment?
Is there any advantage of using native protein structures over conformations derived from molecular simulations of tripeptides?
We discuss these issues in the following.
Is there sufficient data for obtaining HOA values?
As stated above, Highest Observed ASA (HOA), analyzed here may be different in some cases than the Highest Possible ASA (HPA), because the ensemble size formed from the data set may not have sufficient number of representatives in the protein structures solved so far. This insufficiency has partly been the reason that extended state has been generally used as a reference state. To address this concern, we first note that the overall number of residues from which HOA values have been extracted here is ~8.9 million (the number of residues on which the effect has been analyzed is 376000). This is a sufficiently large data size and if some of the 8000 tripeptide patterns have not shown up sufficiently in HOA ensemble of millions of residues, they must be indeed rare and that is unlikely to affect the results of current analysis. We only use HOA of a residue for normalization if the HOA query for that tripeptide was based on a minimum number of observations in the universal ensemble of tripeptides (actual number in the predicted data is likely to be much rarer). To estimate the effect of insufficiency of HOA search space, we present some additional statistics as follows.
Of all the 8000 tripeptide patterns, more than 97% occur at least 100 times in the overall search space (OSS). A frequency of 100 in OSS corresponds to only a few occurrences (~4 for each of these tripeptide patterns) in the normalization benchmark dataset (NBD) whose size (~376000) is about 4% of OSS. It may be noted that occurrence of 4% of 8000 patterns does not mean that there are anywhere close to 4% residues which were normalized by infrequent pattern HOA data because 4% refers to 8000 possible tripeptide patterns and rarer of them are even rarer in the NBD dataset as 96% of 8000 residue patterns are far more degenerate than are these 4% cases. Thus, although HOA values for very few tripeptides are based on a small number of observations, it is unlikely to affect the results of the current study. The statistics will also be refined from time to time to take into account if any newly observed ASA of a tripeptide suprpasses its currenly listed HOA value.
Is there any advantage in using native conformations over simulated structures?
Implications to ASA Prediction
Comparison of the prediction performance obtained by using ESA- and HOA-normalized target ASA values.
We also compared the mean absolute error in absolute ASA prediction for data corresponding to each of the 8000 tripeptide and found that in 5220 (65.3%) cases mean absolute error of HOA-normalized tripeptides was lower compared to 2771 (34.6%) cases in which HOA-normalized tripeptide prediction error was higher (9 cases showed no difference). This shows that there are many more tripeptides contexts in which prediction performance is improved by using HOA-normalization, than those whose performance fell (apparently due to noise in the prediction model). Test of significance on individual tripeptides is not possible because the prediction is performed on a (smaller) non-redundant data set of proteins. There are about 376000 residue-wise predictions for 8000 patterns implying ~47 instances per tripeptide type on the average. Given that prediction itself is not 100% accurate and prediction errors have large standard deviations, it is not possible to detect statistical significance in the differences between prediction performances for each of these tripeptide patterns.
Application to interface prediction
A perl program to normalize ASA values by the proposed method has been provided for download on the web at http://hoa.netasa.org/. This program converts absolute ASA values to HOA- or ESA-normalized values and vice-versa. Users can also provide their own HOA data, which enables a quick update or return to ESA values for some of the tripeptides. HOA data will be regularly updated if higher ASA values are observed for a new tripeptide.
In this study, we developed the statistics of highest observed ASA in various tripeptide environments of residues. Using ASA data normalized by these ASA values, we could predict ASA with ~15% MAE and 0.67 correlations from evolutionary information. Individual residues show varied degrees of improvement in their prediction when trained with data normalized by new method. We also show that the exposed regions defined by newly developed method of normalization are better enriched in binding sites for the DNA-binding proteins. It remains to be seen, if the proposed method of normalization has other universal applications, although the present observations suggest that trend.
Solvent accessibility information about protein structures were directly taken from DSSP database available on the web http://swift.cmbi.ru.nl/gv/dssp. From all the available 37,964 entries in DSSP database at the time of starting this work (October 2006)), we removed those whose coordinate files in PDB had some missing atoms and whose resolution is poorer than 2.5 Å. All residues were checked for completeness to ensure the quality of tripeptide ASA data. This resulted in 18,758 structures, all of which were used to obtain highest observed values of ASA for each tripeptide. This data set consists of more than 8.9 million residues and hence an identical number of tripeptides. We call this data set overall search space (OSS) from which highst observed ASA (HOA) is extracted.
For the purpose of evaluating predictive performance, we used a dataset taken from the protein sequence culling server PISCES with sequence identity less than 25% and X-ray resolution of 2.5 Å . This dataset consists of 4478 protein chains. The chains with missing coordinates, unknown structure regions and length less than 30 amino acids were removed by an in-house program. Further, known membrane protein chains were also removed from the dataset. This resulted in 1708 proteins chains. DSSP program was used to calculate the residue solvent accessible surface area for a given protein structure . This dataset contains about 376000 residues and is called normalization benchmark data set (NBD).
Residue context normalization values
Tripeptide patterns of solvent accessibility look similar to our 1P1N patterns of the look up tables, which we developed for predicting ASA . In this method, ASA of a residue was assigned by taking the n-peptide environment of a residue from query sequence and then scanning a previously compiled database of such n-peptides ASA values. The database to be scanned is called a lookup table or ASA dictionary and a residue's tripeptide environment is defined as 1P1N, 2P2N etc. (P for previous and N for next neighbor). 1P1N refers to a tripeptide environment. However 1P1N dictionaries consist of mean observed values in tripeptides, whereas we are interested in the highest observed ASA value here. Thus, there are 20 types of residue, which may be preceded by any of these 21 environments (20 amino acid residues or case of absent neighbor in a terminal). So, there are 21 degrees of freedom on the location preceding and an equal number of choices are possible for residue followed by next neighbor. Some of these patterns have a very low frequency in the entire DSSP database. So, for our analysis and prediction, we considered the patterns which occurred more than X times in our sample. ASA of all other residue contexts in which corresponding normalization cannot be performed in the newly proposes system (due to insufficient data), were excluded from analysis. Value of X was taken as 30 in the current study. Although values higher than 30 were tried to have a reasonable number of tripeptide patterns as well as to have enough number of data in each category, this number was found to be in intuitive balance.
Normalization from the dictionary
To make a normalization of ASA for a residue, we start looking up in the tables for a pattern. For example, if we want to normalize ASA for Ala in sequence where Ala occurs as Gly-Ala-Ser, the normalization will start with a search for G-A-S pattern in the table. If the pattern is present in that table, we normalize the ASA of Ala by that pattern. If the pattern does not exist in the dictionary, we go to the previous normalization method of Ala-X-Ala  or exclude it from the analysis.
Neural network details
This work primarily aims to study normalization method and therefore an established and widely used protocol for predicting ASA has been used. Thus, based on many published methods, including ours [e.g. [1, 8, 9, 12]], evolutionary information, amino acid composition and protein chain length are the descriptors used for the prediction model. Effort is not made to better the existing best performance for prediction, but to generate a simple reproducible model with identical inputs to the two normalizing methods, so that the role of normalization can be established.
This may be noted that normalizing ASA by HOA values prior to forming target vectors may be regarded as giving some kind of residue neighbor information to the feature vectors. However, residue neighbor information is provided in any sequence-based prediction implicitly anyway and although the improvement in performance seen here could be due to this explicit availability in the initial weight matrix, the fact remains that such normalization improves model performance.
Evolutionary information, forming the input vectors for the prediction model, was generated using the program PSI-BLAST . E-value cutoff for this purpose is 0.1 and similar sequences are searched in the non-redundant protein sequence database (NCBI NR database) to build the multiple alignments. Three iterations of PSI-BLAST were performed; no masking of low complexity regions or membrane domain was used. The alignments were represented as profiles or position-specific substitution matrices (PSSMs). PSSM rows provide the log odd frequency of occurrence for the 20 amino acid residues at each position of the sequence. In positions, where similar sequences are not observed or if no other residue occurs in given position of aligned sequences PSSM row is simply the corresponding entry for that residue type in BLOSUM62 substitution matrix. PSSM data obtained from BLASTPGP were directly used as inputs to our feed-forward neural network, consisting of an input, an output and a hidden layer.
Design and Training
There are 20 units for each residue from PSSM. In order to allow a window to extend beyond the N-terminus and the C-terminus, a special null indicator was added for each residue. The protein sequences were presented to the neural networks as windows, or subsequences, of 17 residues including the amino acid of interest, which slide along the entire sequence. The total number of windows or patterns for a particular protein is therefore equal to the number of residues in the protein. Additional information of amino acid composition and chain length were also presented as input vector. Therefore, each input vector size is be 21 × 17 + 20 +1 = 378 units. The Stuttgart Neural Network simulator (SNNS) version 4.2 package with default setting of BP algorithms was used to train a fully connected, feed forward neural network . The network architectures discussed involve an input layer consisting of nodes equal to the number of input vectors, hidden layer consisting of three nodes and one output layer consisting of a single node.
Measurement of prediction performance
We adopted the same measurements used in our earlier works . They are reproduced for quick reference.
Mean absolute error (MAE)
Where the summation is carried out for all residues and N is the number of residues in the entire data. MAE is measured in percent units for relative ASA and in Å2 for absolute values. O and P refer to observed and predicted values of ASA and ABS indicates that the absolute errors are considered.
Pearson's correlation coefficient (CC)
Where o i and p j are the experimental and predicted values of relative solvent accessibility, respectively.
Where MAE(HOA) and MAE(ESA) refer to the mean absolute error obtained by HOA- and ESA-normalized method respectively. MAE itself refers to relative or absolute ASA depending on which improvement is being measured. MAE is replaced by coefficient of correlation, when comparing performance by that score.
Three fold validation methods were carried out. The whole dataset was randomly divided into three approximately equal parts. Training was done on two-thirds of the data and testing on the remaining third. After running this process three times, an average of MAE and CC, over all the three test datasets was calculated and is listed in the results tables (Table 4).
Statistical significance of difference in prediction performance
Improvement in prediction performance is obtained by comparing mean absolute error of prediction. However, since there are several types of normalization considered, we reverse-transform all predicted and observed values to absolute area unit before comparison. Thus the absolute error in absolute ASA prediction error is used as a measure of performance in such comparisons. Overall performance of neural network trained on a given normalization scheme is given by (1) such that ASA refers to absolute area units in such comparisons. Reverse transformation to absolute units ensures that the comparisons of prediction performance are carried out on the same scale. Absolute error in each residue ASA is computed and p-values are obtained between the statistics of two error-distributions, for which comparison is made. A Student's t-test is used to assess the statistical significance of difference between these two distributions. Since, the error distributions are generally not normal, an additional test of significance was performed, by using Mann-Whitney's u-test, which gave largely similar results. Both tests of significance were carried out using modules in open source programming language Octave http://www.octave.org.
Authors acknowledge the help from the Department of Biosciences, Jamia Millia Islamia (University) for providing lab facilities to conduct this research, as well as Prof. Seemi FB Khan for continued support.
- Rost B, Sander C: Improved prediction of protein secondary structure by using sequence profiles and neural networks. Proc Natl Acad Sci USA 1993, 90: 7558–7562. 10.1073/pnas.90.16.7558PubMed CentralView ArticlePubMedGoogle Scholar
- Wagner M, Adamczak R, Porollo A, Meller J: Linear regression models for solvent accessibility prediction in proteins. J Comput Biol 2005, 12: 355–369. 10.1089/cmb.2005.12.355View ArticlePubMedGoogle Scholar
- Pascarella S, De Persio R, Bossa F, Argos P: Easy method to predict solvent accessibility from multiple protein sequence alignments. Proteins 1998, 32: 190–199. 10.1002/(SICI)1097-0134(19980801)32:2<190::AID-PROT5>3.0.CO;2-PView ArticlePubMedGoogle Scholar
- Mucchielli-Giorgi MH, Hazout S, Tuffery P: PredAcc: prediction of solvent accessibility. Bioinformatics 1999, 15: 176–177. 10.1093/bioinformatics/15.2.176View ArticlePubMedGoogle Scholar
- Li X, Pan XM: New method for accurate prediction of solvent accessibility from protein sequence. Proteins 2001, 42: 1–5. 10.1002/1097-0134(20010101)42:1<1::AID-PROT10>3.0.CO;2-NView ArticlePubMedGoogle Scholar
- Gianese G, Bossa F, Pascarella S: Improvement in prediction of solvent accessibility by probability profiles. Protein Eng 2003, 16: 987–992. 10.1093/protein/gzg139View ArticlePubMedGoogle Scholar
- Rost B, Sander C: Conservation and prediction of solvent accessibility in protein families. Proteins 1994, 20: 216–226. 10.1002/prot.340200303View ArticlePubMedGoogle Scholar
- Cuff JA, Barton GJ: Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins 2000, 40: 502–511. 10.1002/1097-0134(20000815)40:3<502::AID-PROT170>3.0.CO;2-QView ArticlePubMedGoogle Scholar
- Ahmad S, Gromiha MM: NETASA: neural network based prediction of solvent accessibility. Bioinformatics 2002, 18: 819–824. 10.1093/bioinformatics/18.6.819View ArticlePubMedGoogle Scholar
- Pollastri G, Baldi P, Fariselli P, Casadio R: Prediction of coordination number and relative solvent accessibility. Proteins 2002, 47: 142–153. 10.1002/prot.10069View ArticlePubMedGoogle Scholar
- Ahmad S, Gromiha MM, Sarai A: Real-value prediction of solvent accessibility from amino acid sequence. Proteins 2003, 50: 629–635. 10.1002/prot.10328View ArticlePubMedGoogle Scholar
- Adamczak R, Porollo A, Meller J: Accurate prediction of solvent accessibility using neural networks-based regression. Proteins 2004, 56: 753–767. 10.1002/prot.20176View ArticlePubMedGoogle Scholar
- Thompson MJ, Goldstein RA: Predicting solvent accessibility: higher accuracy using Bayesian statistics and optimized residue substitution classes. Proteins 1996, 25: 38–47. Publisher Full Text 10.1002/(SICI)1097-0134(199605)25:1<38::AID-PROT4>3.3.CO;2-HView ArticlePubMedGoogle Scholar
- Richardson CJ, Barlow DJ: The bottom line for prediction of residue solvent accessibility. Protein Eng 1999, 12: 1051–1054. 10.1093/protein/12.12.1051View ArticlePubMedGoogle Scholar
- Naderi-Manesh H, Sadeghi M, Arab S, Moosavi Movahedi AA: Prediction of protein surface accessibility with information theory. Proteins 2001, 42: 452–459. 10.1002/1097-0134(20010301)42:4<452::AID-PROT40>3.0.CO;2-QView ArticlePubMedGoogle Scholar
- Carugo O: Predicting residue solvent accessibility from protein sequence by considering the sequence environment. Protein Eng 2000, 13: 607–609. 10.1093/protein/13.9.607View ArticlePubMedGoogle Scholar
- Wang JY, Lee HM, Ahmad S: Prediction and evolutionary information analysis of protein solvent accessibility using multiple linear regression. Proteins 2005, 61: 481–491. 10.1002/prot.20620View ArticlePubMedGoogle Scholar
- Yuan Z, Burrage K, Mattick JS: Prediction of protein solvent accessibility using support vector machines. Proteins 2002, 48: 566–570. 10.1002/prot.10176View ArticlePubMedGoogle Scholar
- Kim H, Park H: Prediction of protein relative solvent accessibility with support vector machines and long-range interaction 3D local descriptor. Proteins 2004, 54: 557–562. 10.1002/prot.10602View ArticlePubMedGoogle Scholar
- Yuan Z, Huang B: Prediction of protein accessible surface areas by support vector regression. Proteins 2004, 57: 558–564. 10.1002/prot.20234View ArticlePubMedGoogle Scholar
- Minh NN, Jagath CR: Prediction of protein relative solvent accessibility with a two-stage SVM approach. Proteins 2005, 59: 30–37. 10.1002/prot.20404View ArticleGoogle Scholar
- Wang JY, Lee HM, Ahmad S: SVM-Cabins: prediction of solvent accessibility using accumulation cutoff set and support vector machine. Proteins 2007, 68: 82–91. 10.1002/prot.21422View ArticlePubMedGoogle Scholar
- Singh YH, Gromiha MM, Sarai A, Ahmad S: Atom-wise statistics and prediction of solvent accessibility in proteins. Biophysical Chemistry 2006, 124: 145–154. 10.1016/j.bpc.2006.06.013View ArticlePubMedGoogle Scholar
- Oobatake M, Ooi T: Hydration and heat stability effects on protein unfolding. Prog Biophys Mol Biol 1993, 59: 237–284. 10.1016/0079-6107(93)90002-2View ArticlePubMedGoogle Scholar
- Wang , Dunbrack RL Jr: PISCES: a protein sequence culling server. Bioinformatics 2003, 19: 1589–1591. 10.1093/bioinformatics/btg224View ArticlePubMedGoogle Scholar
- Raih MF, Ahmad S, Zheng R, Mohamed R.: Solvent accessibility in native and isolated domain environments: general features and implications to interface predictability. Biophys Chem 2005, 114(1):63–69. 10.1016/j.bpc.2004.10.005View ArticlePubMedGoogle Scholar
- Porollo A, Meller J: Prediction-based fingerprints of protein-protein interactions. Proteins 2007, 66: 630–645. 10.1002/prot.21248View ArticlePubMedGoogle Scholar
- Ahmad S, Gromiha MM, Sarai A: Analysis and prediction of DNA-binding proteins and their binding residues based on composition. Bioinformatics 2004, 20: 477–486. 10.1093/bioinformatics/btg432View ArticlePubMedGoogle Scholar
- Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bond and geometrical features. Biopolymer 1983, 22: 2577–2637. 10.1002/bip.360221211View ArticleGoogle Scholar
- Wang JY, Ahmad S, Gromiha MM, Sarai A: Look-up tables for protein solvent accessibility prediction and nearest neighbor effect analysis. Biopolymers 2004, 75: 209–216. 10.1002/bip.20113View ArticlePubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215: 403–410.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.