Detailed comparison of protein structure benchmark sets. The figure compares the performance of TM-align on the complete set of similarity relationships defined by SCOP (left column) and the performance on the novel SCOP-CATH consensus benchmark set proposed in this study (right column). For this purpose, the TM-Align performance is visualized via various plots which show in some detail the evaluation of classification errors. Panels (a) and (b) shows the distribution of scores for the various levels of the classifications. Although the fold scores are somewhat shifted to the right, the score distributions overlap significantly, which allows no clear thresholds for safe classifications of structure pairs. Panels (b)-(f) compare the various errors for the comprehensive and consensus benchmark sets. As errors we count wrong domains scored better than correct domains. The errors are significantly reduced on the consensus set (d) and (f). Finally, in panels (g)-(h) the errors (number of wrong folds scored better than certain correct folds) are summarized as boxplots. Again less errors are observed in the consensus set: whereas for the best scored correct domains quite few wrong folds are scored better in both sets, quite many better scoring but wrong folds are observed for the correct members with low scores. See main text for a more detailed description. Overall the number of errors is reduced over-proportionally (about 50% error reduction) as compared to the reduction of pairs in the consensus benchmark (about 16% pairs reduction).