Improving sequence-structure correlations with strongly correlated positions: example with the globin superfamily. Spearman's rank correlations between sequence similarity scores calculated from subsets of alignment positions with TM-align RMSD scores are plotted for increasing subset size from left to right. Y-axis: correlation coefficient. Data are separated into close (> 30% identical) homologues (red lines) and distant homologues (blue lines). Thick lines depict the effect of adding positions in order of correlation from most to least; thin dashed lines show the effect of adding positions in reverse order. Dashed horizontal lines show correlations for full sequence identities for the distantly (upper line at -0.68) and closely (lower line at -0.73) related sets.