Skip to main content

Advertisement

Table 1 Prediction accuracy for different protein sequence representations based on 10-fold cross validation tests.

From: Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs

Feature representation Classifier1 Feature selection2 FlexRP (Logistic Regression) SVM C4.5 IB1 Naïve Bayes
Composition vector N/A 67.37% 68.74% 57.70% 57.33% 65.20%
PSI-BLAST profile N/A 66.38% 67.35% 62.47% 61.62% 66.24%
Binary encoding No selection 66.38% 66.06% 58.82% 59.92% 61.84%
Binary encoding Linear coefficient 69.58% 68.74% 62.82% 57.05% 69.10%
Binary encoding Entropy based 69.19% 68.74% 63.24% 58.21% 69.00%
K-spaced AA pairs Linear coefficient 74.37% 74.60% 66.04% 68.74% 72.97%
K-spaced AA pairs Entropy based 79.51%3 78.46% 66.25% 66.93% 76.01%
  1. 1The tested classifiers include the proposed FlexRP method, Support Vector Machine (SVM), decision tree (C4.5), instance-based learner (IB1), and Naïve Bayes.
  2. 2 The sequence representations based on binary codes and frequencies of the k-spaced amino acid pairs were processed using two feature selection methods.
  3. 3 The best result is shown in bold.