Skip to main content

Table 1 Statistics of structure predictions by taxonomic groups. For each group, the total number of sequences in COG (Total COG sequences) is shown, along with the number and fraction (%) of chosen representatives with <50% identity (Representatives), the number and fraction of these representatives with structure predictions (Representatives predicted), the number of SMOG sequence segments with predictions (SMOG segments predicted), the number and fraction of SMOG segments fully covered by regions of structure prediction (Fully covered), the number and fraction of SMOG segments covered by a single domain region, among fully covered (Single domain). The category of "Other Bacteria" includes the bacterial groups that are less represented in the COG database (Deinococcus-Thermus, Thermotogae, Fusobacteria, Aquificae, Cyanobacteria).

From: Exploring dynamics of protein structure determination and homology-based prediction to estimate the number of superfamilies and folds

Group

Total COG sequences

Representatives (%)

Representatives

predicted (%)

SMOG segments

predicted

Fully covered

(% of predicted)

Single domain

(% of fully covered)

Bacteria

Proteobacteria alpha

23383

12997 (56)

8676 (67)

16681

6671 (40)

6329 (95)

Proteobacteria gamma

33375

12733 (38)

7977 (63)

15011

6234 (42)

5846 (94)

Other Proteobacteria

10979

5959 (54)

4015 (68)

7613

3108 (41)

2913 (94)

Firmicutes

20921

13626 (65)

9314 (68)

17249

7092 (41)

6641 (94)

Actinobacteria

9390

4241 (45)

3059 (72)

5741

2408 (42)

2257 (94)

Chlamydiae – Spirochaetes

2829

2039 (72)

1468 (72)

3078

1119 (36)

1029 (92)

OtherBacteria

13870

10670 (77)

7485 (70)

14905

5995 (40)

5680 (95)

Archaea

Euryarchaeota

21118

14893 (71)

9968 (67)

19625

7625 (39)

7235 (95)

Crenarchaeota

1254

1131 (90)

774 (68)

1428

549 (38)

518 (94)

Eukaryota

Fungi

7198

5880 (71)

2778 (47)

6801

3801 (56)

3616 (95)

Total

144317

84169 (58)

55514 (66)

108132

44602 (41)

42064 (94)