K2D2: Estimation of protein secondary structure from circular dichroism spectra
© Perez-Iratxeta and Andrade-Navarro; licensee BioMed Central Ltd. 2008
Received: 17 July 2007
Accepted: 13 May 2008
Published: 13 May 2008
Circular dichroism spectroscopy is a widely used technique to analyze the secondary structure of proteins in solution. Predictive methods use the circular dichroism spectra from proteins of known tertiary structure to assess the secondary structure contents of a protein with unknown structure given its circular dichroism spectrum.
We developed K2D2, a method with an associated web server to estimate protein secondary structure from circular dichroism spectra. The method uses a self-organized map of spectra from proteins with known structure to deduce a map of protein secondary structure that is used to do the predictions.
The K2D2 server is publicly accessible at http://www.ogic.ca/projects/k2d2/. It accepts as input a circular dichroism spectrum and outputs the estimated secondary structure content (alpha-helix and beta-strand) of the corresponding protein, as well as an estimated measure of error.
Circular dichroism (CD) spectroscopy is a widely used technique to analyze the secondary structure of proteins in solution. It is based on the dependence of the optical activity of the protein in the 170–240 nm wavelength with the backbone orientation of the peptide bonds with minor influences from the side chains . Different types of secondary structure producing characteristic spectra, the spectrum of a given protein can be used to estimate its percentage content on the major secondary structure types. During the past 30 years, many methods that address this problem have been developed, which apply a variety of approaches from singular value decomposition to optimization algorithms, regression or neural networks [2–13]. One of these methods is K2D . It uses a self-organizing map (SOM) algorithm, a type of neural network. Spectra from proteins with solved tertiary structure are used as training set to produce the SOM. From the resulting map of spectra, "secondary structure maps" are derived. The secondary structure map is directly related to the spectra SOM and this relation is applied to estimate the percentages of content in alpha helix and beta strand of a protein given its CD spectrum.
Here we present K2D2, a re-implementation of K2D using the latest version of the SOM_PAK package . K2D2 accepts a broader wavelength range for the input spectra, 190 to 240 nm further to the 200 to 240 nm wavelength range originally accepted by K2D, and has been trained with a much extended set of spectra. As a result K2D2 displays a considerable advance in performance over K2D.
CD spectra and structural data
Performance on benchmarks for K2D and K2D2.
We selected best resolution tertiary structures corresponding to the proteins in the reference set from the Protein Data Bank (PDB) . We used the DSSP program  on the PDB files in order to assign secondary structure class to the individual amino acids in every protein in the reference set. We assigned alpha-helix to the protein residues labeled as H and beta-strand to those labeled E and then computed the fraction of amino acids in the protein in each conformation (see Table 1). In addition to the CDDATA.43 spectra, we included in the training set six additional reference spectra from : three spectra of poly(L-lysine) in aqueous solution in alpha, beta and random conformations, and three model spectra in alpha, beta and random conformation constructed from 15 proteins .
Spectra SOM and secondary structure maps
Estimated maximum error
In principle, the more similar a given spectrum is to its closest SOM spectra node, the better would be the prediction. In other words, if a spectrum is very different to anything the method has "previously seen" (as for training set), results cannot be expected to be very accurate. To provide users with an estimate of the maximum total error of the prediction (as sum for the errors for the alpha and beta predictions) we used the distances to the closest node map and the corresponding observed total errors from the benchmark. At a given distance, the maximum error is the largest total error that was observed in the benchmark. Thus, the total error for the prediction is expected to be less than the estimated maximum error. If the distance is larger than anything observed in the benchmark, a message is given indicating that an estimation of maximum error is not possible; in this situation the structure prediction should not be taken into account.
K2D2 can be accessed at K2D2 site . Users must choose the input wavelength range (200–240 nm or 190–240 nm) and provide the spectrum of the problem protein (see Figure 1A). Spectra must be in Δε units. As results are better for the 190–240 nm wavelength range, this option is recommended if the user can supply spectra in this range, although we maintain the short range input as it is sometimes difficult to obtain the former. The results consist of the estimated values for percentages of residues in alpha-helix and beta-strand, an estimated error for the prediction, and a graphic comparing the predicted spectrum with the user input (see Figure 1B). The plot provides a visual assessment of the accuracy of the prediction.
Results and Discussion
The performance of K2D2 was measured in a left-one-out benchmark, comparing real and predicted values, by means of the Pearson correlation coefficient (r) and the root mean square deviation (RMSD). We obtained averages of r of 0.93 for alpha and 0.82 for beta, and average values of RMSD of 0.08 and 0.09, respectively (see Table 1). In comparison, K2D was reported to produce average r values of 0.91 for alpha and 0.73 for beta, and average RMSD values of 0.11 and 0.14, respectively . The performance of K2D2 for alpha helix did not improve much, something to expect, as K2D's prediction was already very good. In contrast, the prediction for beta strand was much improved. Furthermore, K2D was originally tested with only 24 proteins, and when evaluating it with the expanded set of 43 proteins we observed an even bigger difference (see Table 1). Thus, K2D2 produces significantly better predictions than K2D.
We have presented K2D2, a re-implementation of the K2D method for prediction of protein secondary structure from CD spectra. By using a larger wavelength range and larger dataset training, K2D2 represents an important improvement over K2D.
Reported performance for different implementations of published methods.
K2D2 compares well with other published methods for prediction of protein secondary structure from CD spectra (see Table 2). We note, however, that the performance values are not readily comparable across methods because they have been trained and evaluated with different datasets (See the effect of this on K2D's performance in Table 1). Moreover, performance from methods that predict different number of secondary structure types are also not comparable because the variance of the predictions for methods that predict more types would be smaller as the predictions are normalized.
In any case, K2D2 and its predecessor, K2D, have a feature that we believe make them unique when compared to other methods, that is to warn conclusively the user when the prediction is not reliable according to the similarity between the user's input spectrum and the one computed from the training set. In summary, we believe that K2D2 represents a significant improvement and we have strived to make it easy to access and to use. We encourage users to provide suggestions for further improvements and to share novel CD spectra of proteins of known structures that can be used by us and by other developers of similar methods to improve the accuracy of the predictions.
Finally, since other methods might present alternative features not considered by us and since the benchmark results are apparently not that different, we recommend users to follow recent bibliography to see which prediction programs are used by colleagues doing similar type of analyses, and to try more than one method if the results of the predictions are unconvincing.
Availability and requirements
Project name: K2D2
Project home page: http://www.ogic.ca/projects/k2d2/
Operating systems: Platform independent
Programming language: Perl
List of abbreviations used
root mean square deviation
Self Organizing Map
Protein Data Bank.
We thank N. Sreerama and R.W. Woody for making accessible the spectra data at their CDPRO site. MAA is a recipient of a Canada Research Chair.
- Fasman GD: Circular Dichroism and the Conformational Analysis of Biomolecules. In Plenum Press. New York; 1996.Google Scholar
- Chen YH: A new approach to the calculation of secondary structures of globular proteins by optical rotatory dispersion and circular dichroism. Biochem Biophys Res Commun 1971, 44: 1285–1291. 10.1016/S0006-291X(71)80225-5View ArticleGoogle Scholar
- Brahms S: Determination of protein secondary structure in solution by vacuum ultraviolet circular dichroism. J Mol Bio 1980, 138: 149–178. 10.1016/0022-2836(80)90282-XView ArticleGoogle Scholar
- Hennessey JP Jr: Information content in the circular dichroism of proteins. Biochemistry 1981, 20: 1085–1094. 10.1021/bi00508a007View ArticleGoogle Scholar
- Provencher SW: Estimation of globular protein secondary structure from circular dichroism. Biochemistry 1981, 20: 33–37. 10.1021/bi00504a006View ArticleGoogle Scholar
- Perczel A: Convex constraint analysis: a natural deconvolution of circular dichroism curves of proteins. Protein Eng 1991, 669–679.Google Scholar
- Böhm G: Quantitative analysis of protein far UV circular dichroism spectra by neural networks. Protein Eng 1992, 5(3):191–195. 10.1093/protein/5.3.191View ArticleGoogle Scholar
- Andrade MA: Evaluation of secondary structure of proteins from UV circular dichroism spectra using an unsupervised learning neural network. Protein Eng 1993, 6(4):383–390. 10.1093/protein/6.4.383View ArticleGoogle Scholar
- Sreerama N: A self-consistent method for the analysis of protein secondary structure from circular dichroism. Anal Biochem 1993, 209: 32--44. 10.1006/abio.1993.1079View ArticleGoogle Scholar
- Johnson WC: Analyzing protein circular dichroism spectra for accurate secondary structures. Proteins 1999, 35: 307–312. 10.1002/(SICI)1097-0134(19990515)35:3<307::AID-PROT4>3.0.CO;2-3View ArticleGoogle Scholar
- Sreerama N: Estimation of protein secondary structure from circular dichroism spectra: comparison of CONTIN, SELCON, and CDSSTR methods with an expanded reference set. Anal Biochem 2000, 287: 252–260. 10.1006/abio.2000.4880View ArticleGoogle Scholar
- Unneberg P: SOMCD: method for evaluating protein secondary structure from UV circular dichroism spectra. Proteins 2001, 42: 460–470. 10.1002/1097-0134(20010301)42:4<460::AID-PROT50>3.0.CO;2-UView ArticleGoogle Scholar
- Manavalan P: Variable selection improves the prediction of protein secondary structure from circular dichroism. Anal, Biochem 1987, 167: 76–85. 10.1016/0003-2697(87)90135-7View ArticleGoogle Scholar
- Kohonen T: SOM_PAK: The Self-Organizing Map Program Package. Espoo, Finland, Helsinki University of Technology; 1996.Google Scholar
- Pancoska P: Comparison of and limits of accuracy for statistical analyses of vibrational and electronic circular dichroism spectra in terms of correlations to and predictions of protein secondary structure. Protein Sci 1995, 4: 1384–1401.View ArticleGoogle Scholar
- Chang CT: Circular dichroic analysis of protein conformation: inclusion of the beta-turns. Anal Biochem 1978., 91:Google Scholar
- Sreerama N: Analysis of protein circular dichroism spectra based on the tertiary structure classification. Anal Biochem 2001, 299: 271–274. 10.1006/abio.2001.5420View ArticleGoogle Scholar
- Wallace BA: Analyses of circular dichroism spectra of membrane proteins. Protein Sci 2003, 12: 875–884. 10.1110/ps.0229603View ArticleGoogle Scholar
- Berman H: The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res 2007, 25: D301–3. 10.1093/nar/gkl971View ArticleGoogle Scholar
- Kabsch W: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22: 2577–2637. 10.1002/bip.360221211View ArticleGoogle Scholar
- Yang JT: Calculation of protein conformation from circular dichroism. Methods Enzymol 1986, 130: 208–269.View ArticleGoogle Scholar
- Lees JG: Novel methods for secondary structure determination using low wavelength (VUV) circular dichroism spectroscopic data. BMC Bioinformatics 2006, 7: 507. 10.1186/1471-2105-7-507View ArticleGoogle Scholar
- Lees JG: A reference database for circular dichroism spectroscopy covering fold and secondary structure space. Bioinformatics 2006, 22: 1955–1962. 10.1093/bioinformatics/btl327View ArticleGoogle Scholar