Fold classification based on secondary structure – how much is gained by including loop topology?

BMC Structural Biology

Table 1 Clustering scores of various methods.

Sample	Size	averaging method	SSEA	DSSP		Ours
				NCL	CL	NCL	CL
ALL	1183	U	2.30	2.27	2.49	2.36	2.50
		R	2.08	2.07	2.27	2.09	2.26
		L	1.71	1.70	1.84	1.68	1.85
MEDIUM	631	U	1.82	1.87	1.98	1.81	2.04
		R	1.62	1.66	1.77	1.59	1.78
		L	1.18	1.18	1.27	1.11	1.26
LONG	475	U	1.96	2.03	2.05	1.92	2.00
		R	1.81	1.85	1.90	1.76	1.86
		L	1.64	1.68	1.73	1.61	1.71
RANDOM	591	U	1.76	1.77	1.87	1.88	1.98
		R	1.64	1.63	1.73	1.71	1.81
		L	1.42	1.37	1.47	1.43	1.53

Average log-odds score of various clustering functions. Sample MEDIUM consists of those protein domains in ALL that have between 70 and 140 residues, and LONG are those that are longer. RANDOM is the average of 40 samples obtained by splitting ALL in a random fashion into equal parts (on the average). Averaging methods: U is unweighted, R is weighted with the root of fold size and L is weighted with the fold size (in a sample); in each case folds that have fewer than 2 representatives in a sample are excluded. SSEA is the score computed by SSEA program from DSSP output, DSSP is the score obtained from DSSP output and our alignment program, "ours" uses our structure determination and our alignment programs. Our annotations of closed loops were transferred to DSSP output to obtain CL version of that score.

ISSN: 1472-6807