Skip to main content
Figure 1 | BMC Structural Biology

Figure 1

From: Exploring dynamics of protein structure determination and homology-based prediction to estimate the number of superfamilies and folds

Figure 1

Clustering and structure prediction for sequence domains. A. Formation of SMOGs. Individual proteins in each COG are split in sequence-based domains using ADDA database. The resulting sequence segments are grouped by sequence similarity within each COG; then these groups from different COGs are further clustered by complete linkage. The produced clusters comprise sequence modules from orthologous groups of proteins (SMOGs), which are used as elementary units for structure assignment and sequence-based clustering (see Methods for details). B. Structure prediction in SMOG sequences. Main steps of the procedure are labeled on the right. First, individual SMOG segments are compared to sequences and profiles for SCOP representatives from ASTRAL. Using alignments between members of the same SMOG, structure assignments at the SCOP superfamily level are propagated to the regions in the SMOG segments that are not directly linked to SCOP domains. These initial assignments are used to split SMOG segments into smaller fragments, generate PSI-BLAST profiles for these fragments, and perform PSI-BLAST searches against the database of SCOP domain sequences. These searches improve the precision of the initial assignments and produce additional assignments. In a given SMOG, regions with the same superfamily assignment are clustered with other regions of this SMOG, based on PSI-BLAST alignments of SMOG sequences to each other. These clusters are referred to as DOGs (see Methods for details). C. Formation of links between SMOGs. SMOGs 1 and 2 are linked based on the fraction W of queries from SMOG 1 that provide detection of sequences from SMOG 2 with E-value cutoff E. In the shown example, W = 3/5 = 0.6. If all individual hits have E-value lower than E, the link will be formed for W cutoffs lower than 0.6 (e.g. W = 0.5), but not for higher cutoffs (e.g. W = 1.0).

Back to article page