Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs

BMC Structural Biology

Table 1 Prediction accuracy for different protein sequence representations based on 10-fold cross validation tests.

Feature representation	Classifier¹ Feature selection²	FlexRP (Logistic Regression)	SVM	C4.5	IB1	Naïve Bayes
Composition vector	N/A	67.37%	68.74%	57.70%	57.33%	65.20%
PSI-BLAST profile	N/A	66.38%	67.35%	62.47%	61.62%	66.24%
Binary encoding	No selection	66.38%	66.06%	58.82%	59.92%	61.84%
Binary encoding	Linear coefficient	69.58%	68.74%	62.82%	57.05%	69.10%
Binary encoding	Entropy based	69.19%	68.74%	63.24%	58.21%	69.00%
K-spaced AA pairs	Linear coefficient	74.37%	74.60%	66.04%	68.74%	72.97%
K-spaced AA pairs	Entropy based	79.51%³	78.46%	66.25%	66.93%	76.01%

¹The tested classifiers include the proposed FlexRP method, Support Vector Machine (SVM), decision tree (C4.5), instance-based learner (IB1), and Naïve Bayes.
² The sequence representations based on binary codes and frequencies of the k-spaced amino acid pairs were processed using two feature selection methods.
³ The best result is shown in bold.

ISSN: 1472-6807