Dependence of α-helical and β-sheet amino acid propensities on the overall protein fold type
© Fujiwara et al.; licensee BioMed Central Ltd. 2012
Received: 24 March 2012
Accepted: 19 July 2012
Published: 2 August 2012
Skip to main content
© Fujiwara et al.; licensee BioMed Central Ltd. 2012
Received: 24 March 2012
Accepted: 19 July 2012
Published: 2 August 2012
A large number of studies have been carried out to obtain amino acid propensities for α-helices and β-sheets. The obtained propensities for α-helices are consistent with each other, and the pair-wise correlation coefficient is frequently high. On the other hand, the β-sheet propensities obtained by several studies differed significantly, indicating that the context significantly affects β-sheet propensity.
We calculated amino acid propensities for α-helices and β-sheets for 39 and 24 protein folds, respectively, and addressed whether they correlate with the fold. The propensities were also calculated for exposed and buried sites, respectively. Results showed that α-helix propensities do not differ significantly by fold, but β-sheet propensities are diverse and depend on the fold. The propensities calculated for exposed sites and buried sites are similar for α-helix, but such is not the case for the β-sheet propensities. We also found some fold dependence on amino acid frequency in β-strands. Folds with a high Ser, Thr and Asn content at exposed sites in β-strands tend to have a low Leu, Ile, Glu, Lys and Arg content (correlation coefficient = −0.90) and to have flat β-sheets. At buried sites in β-strands, the content of Tyr, Trp, Gln and Ser correlates negatively with the content of Val, Ile and Leu (correlation coefficient = −0.93). "All-β" proteins tend to have a higher content of Tyr, Trp, Gln and Ser, whereas "α/β" proteins tend to have a higher content of Val, Ile and Leu.
The α-helix propensities are similar for all folds and for exposed and buried residues. However, β-sheet propensities calculated for exposed residues differ from those for buried residues, indicating that the exposed-residue fraction is one of the major factors governing amino acid composition in β-strands. Furthermore, the correlations we detected suggest that amino acid composition is related to folding properties such as the twist of a β-strand or association between two β sheets.
In 1974, Chou and Fasman published the calculated frequency of occurrence and conformational propensity of each amino acid in the secondary structures of 15 proteins, consisting of 2473 amino acid residues . Since then, a vast number of protein structures have been determined and classified to reflect both structural and evolutionary relatedness [2, 3]. SCOP classification (Structural Classification of Protein) is one of the major database which provides a detailed and comprehensive description of the relationships of all known proteins structures. The classification is on hierarchical levels: the first two levels, family and superfamily, describe near and far evolutionary relationships; the third, fold, describes geometrical relationships. Most of the folds (899/1086) are assigned to one of the four structural classes; “all-α”, “all-β”, “α/β” (for proteins with α-helices and β-strands that are largely interspersed) and “α + β” (for those in which α-helices and β-strands are largely segregated). Remaining folds are assigned to "Multi-domain", "Membrane and cell surface" or "Small" proteins classes. In 2009, we developed a quaternary structural database for proteins, OLIGAMI  in which the oligomer information was added to the SCOP classification , to allow an exhaustive survey of tertiary or quaternary structures of proteins.
A large number of studies have been carried out to obtain amino acid propensities for α-helix and β-sheet [1, 5–28]. The propensities have been estimated from statistical analysis of three-dimensional structures [1, 6–15], experimental determination of α-helix or β-sheet content in peptides [16–23], and experimental determination of the thermodynamic stability of mutant proteins [23–28]. The obtained propensities for α-helix are consistent between studies, with the pair-wise correlation coefficient (R) frequently being >0.8, although Richardson et al.  and Engel et al.  showed that amino acid propensities are different for specific locations of α-helix depending on amino acids. Engel et al. also show that most helices are amphiphilic and have a strong tendency to both begin and end on the solvent-inaccessible face of the α-helix, suggesting that the propensities for α-helix differ between solvent-accessible and solvent-inaccessible faces. On the other hand, the β-sheet propensities obtained by several studies differ significantly, indicating that the context significantly affects β-sheet propensity. β-sheets consist of various combination of β-strands; the number of strands, parallel, anti-parallel, mixed β-sheet and so on. For IgG-binding domain from protein G, which have four antiparallel β-strands, Minor and Kim showed that β-sheet propensity measured at the center strand  differs significantly from that measured at an edge strand . This context-dependent nature of the β-sheet propensity may be reflected in its dependence on overall protein fold. Previously, Jiang et al.  and Costantini et al.  calculated the secondary structure propensities for four protein structural classes; “all-α”, “all-β”, “α/β”, and “α + β” and showed that β-sheet propensity depends on these structural classes. However, it has not been clarified that their dependencies result from the difference in what kind of context, since each folding class contains various folds that have different context. So it is interesting to address whether the amino acid propensity of each amino acid vary depending on the fold type.
In this study, to clarify the relationship between the amino acid propensity and the context in more detail, we calculated the occurrence of each amino acid residue in α-helical and β-strand conformations as a function of the SCOP fold of the protein (i.e. lower structural level than previously addressed), and categorized the residues as exposed to solvent or buried interior. The results indicate that α-helix propensities do not differ significantly by fold but that β-sheet propensities are diverse and indeed depend on the fold. Furthermore, we found the some relationships between a structural feature and an amino acid composition by analyzing correlations between a protein fold and an amino acid propensity.
SCOP folds included in the dataset of α-helices, β-strands and other conformation
NE j 1
N j 2
N α j 3(f αexp j )6
N β j 4(f βexp j )7
N O j 5(f Oexp j )8
All α proteins
Four-helical up-and-down bundle
Nuclear receptor ligand-binding domain
Adenine nucleotide alpha hydrolase-like
NAD(P)-binding Rossmann-fold domains
Periplasmic binding protein-like II
P-loop containing nucleoside triphosphate hydrolases
Ribonuclease H-like motif
Tryptophan synthase beta subunit-like PLP-dependent enzymes
α + β
Acyl-CoA N-acyltransferases (Nat)
Protein kinase-like (PK-like)
Thioesterase/thiol ester dehydrase-isomerase
All β proteins
Concanavalin A-like lectins/glucanases
PH domain-like barrel
Single-stranded right-handed beta-helix
Trypsin-like serine proteases
The secondary structure assignment program DSSP  was used for all secondary structure assignments. DSSP program assigns secondary structures, i.e., H: α-helix, G: 310-helix, I: 5-helix (π-helix), E: extended strand, B: residue in isolated β-bridge, S: bend and T, hydrogen bonded turn. We regarded H: α-helix and G: 310-helix as α-helix and E as β-strand, and remaining residues except T are defined as other conformation.
The P ij value difference between the populations is considered significant if the test variable Z is >1.25, which corresponds to a 90% confidence level, then the populations were considered to be different.
Mean amino acid propensities for α-helix and β-strand conformations
We also calculated the amino acid propensities for exposed and buried residues (P exp i and P bur i ) in the secondary structural elements (Table 2). For α-helices, the three mean propensities P α i , P αexp i and P αbur i have similar trends. On the other hand, mean propensities for exposed residues (P βexp i ) and buried residues (P βbur i ) for β-strands differ significantly (Table 2). It is especially interesting that Lys and Arg, but not two other charged residues, Asp and Glu, are preferred as exposed residues in β-strands. Not surprisingly, all charged amino acids are disfavored as buried residues in β-strands. The buried regions disfavor charged amino acids for β-strands, whereas the α-helix can tolerate charged amino acids.
As previously reported in statistical studies, charged amino acids (including Lys and Arg) yield low values for P β [1, 6, 10, 13], which is in agreement with the mean propensities, P β i , determined in the present work. Our results, however, show that Lys and Arg have relatively high P βexp values for exposed residues, but this property is masked when comparing mean propensities. In our dataset, the fraction of exposed residues in β-strands is low (29%) compared to α-helices (46%). Most residues in β-strands are buried inside proteins and covered by α-helices or loop regions; exposed residues are thus less frequently encountered in β-strands, and their contributions to the mean P β i are therefore small. Jiang and coworkers  have suggested that the hydrophobicities of amino acid side chains are the key determinant of β-sheet structures, but our data suggest that this result is true for buried residues but not for exposed residues in β-sheet structures. Minor and Kim  measured the propensity of the 20 amino acids for the β-sheet formation in a variant of the IgG-binding domain from protein G, which have four antiparallel β-strands. Amino acid substitutions were made at a guest site on the solvent-exposed surface of the center strand. The propensities from those experiments show a strong correlation with the logarithmic P βexp i values obtained here (R = 0.82), although they show a weaker correlation with our logarithmic P βbur i values (R = 0.63). Furthermore, there is poor correlation between the propensities determined by Minor and Kim  and those of Chou and Fasman . These results show that the preference for β-strands differs for exposed and buried sites.
In particular, a wide range of P α ij values was obtained for the aromatic residues Phe (0.66–2.00) and Tyr (0.58–1.89), depending on fold type, and the mean propensity for all folds is approximately 1.0 for these amino acids (Figure 1A and Table 2). The propensities of the charged residues Lys (0.65–1.56) and Arg (0.80–1.71) also varied widely depending on a fold. On the other hand, in >80% of SCOP folds, Leu or Glu are favored in the α-helical conformation, whereas Val, Pro, Ser, Thr, Asn, Asp and Gly are disfavored. Ala is favored in the α-helical conformation in the majority of the folds (79%) but is disfavored in two folds (Protein kinase-like and 4-helical cytokines). In particular, the value of the propensity of Ala for the "4-helical cytokines" fold is quite low (P α ij = 0.64). Met, Cys, Trp and His do not have a fold-type population difference at the >90% confidence level in any pair of folds, although their propensities vary widely among the various folds. Therefore, we did not further assess these amino acids.
Richardson et al. showed that Ala is not favored in ends of α-helix , suggesting that a short α-helix does not favor Ala. The mean length of α-helix of the 4 helical cytokines fold is, however, the third longest of those of 39 folds (The longest and the second longest are those of "Ferritin-like" and "Four-helical up-and-down bundle" folds, respectively). Then, the correlation coefficient between the mean length of α-helix and the amino acid propensity for each amino acid were calculated, so that they were smaller than 0.4. This result indicates that there is no relationship between the mean length of α-helix and the helical propensity of any amino acid.
Engel et al. show that most helices are amphiphilic [7, 12], suggesting that the propensities for α-helix depend on the exposed residue fraction. So, we examined the correlations between the exposed residue fraction and the frequency of amino acids in α-helices. No amino acid showed a strong correlation (R < −0.7 or R > 0.7) between the exposed residue fraction and the amino acid frequency, although the charged residues, Lys and Asp have a relatively strong positive correlation (RK = 0.66, RD = 0.54). In contrast, the correlation coefficients of Glu and Arg (also charged amino acids) are small (RE = 0.26, RR = 0.07).
As shown in Figure 1B, a wide range of P β ij values was obtained for Trp (0.45–2.22), Thr (0.73–1.87), Lys (0.46–1.45) and Arg (0.51–1.42) depending on fold type. For Lys, although P β ij was <0.9 in 18 of 24 folds (mean value of P β ij = 0.79), three folds (the lipocalins fold, OB-fold, and protein kinase–like fold) yielded P β ij values > 1.2, which had the population differences corresponding to 90% confidence level with that of other folds. These three folds are “all-β” or “α + β”, and all have largely exposed β-strands, whereas β-strands are usually covered by α-helical or loop regions, especially in “α/β” proteins (Table 1). It has long been thought that β-strands prefer hydrophobic residues [1, 6, 10]; however, it now appears that largely exposed β-sheet structures prefer hydrophilic residues such as Lys. In contrast, the four amino acids Val, Ile, Phe and Tyr are favored (P β ij > 1.1) in β-strands of more than 80% of folds, with Val (1.40–2.68) and Ile (1.17–2.33) having particularly high propensities in this regard. The six amino acids Pro, Ala, Asn, Asp, Glu and Gly are disfavored (P β ij < 0.9) in β-strands for more than 80% of folds, and Pro (0.16–0.71) and Asp (0.22–0.91) have quite low propensities.
The exposed residue fractions were observed in the range from about 10% to 46% for 24 folds (Table 1) and Glu and Lys have strong and positive correlations between the amino acid propensities and the exposed residue fractions of β-strands in each fold (RE = 0.76, RK = 0.73). Gln, Arg and Ile also have relatively strong correlations, although the correlation for Ile is negative (RQ = 0.67, RR = 0.5, RI = −0.68). As opposed to the strong positive correlation found for Glu, there is no correlation for the other negatively charged amino acid, Asp. The exposed residue fraction appears to be one of the major factors governing charged amino acid composition of folds for β-strands.
For residues exposed in a β-strand (Figure 2C), a wide range of P βexp ij values was obtained for Ser (0.42–1.69), Lys (0.84–1.58) and Arg (0.68–1.85). A wide range of P βbur ij values was obtained for Cys (0.61–2.61), Phe (0.66–1.83), Tyr (0.64–1.92), Trp (0.31–1.77) and His (0.41–1.87) for residues buried in a β-strand (Figure 2D). P βexp ij values of Val, Ile, Phe, Tyr, Trp and Thr are high (P βexp ij > 1.1) for more than 75% of folds, indicating that these amino acids, which have a β-branched or aromatic side chain, are favored in the exposed regions of β-strands in all fold types. In contrast, amino acids that are disfavored in all folds in β-strands are Pro (0.22–0.87), Ala (0.28–0.70) and Gly (0.23–0.88) for exposed regions, and Pro (0.12–0.87) for buried regions. It is interesting that P βexp ij values for all folds for Ala are lower by comparison (P βexp ij < 0.7), indicating that an exposed residue on a β-strand is an extremely unfavorable position for Ala as well as for Pro and Gly. These strong tendencies support that the backbone solvation is a major factor determining thermodynamic β-propensities .
Overall, there is a greater number of strong correlations (R < −0.7 or R > 0.7) for β-strands than for α-helices (Figure 3). For example, four strong positive correlations and five strong negative correlations are observed for β-strands, but there are only two paired strong correlations for α-helices (Ala and Gly, Tyr and Trp). Most of the positive correlations for β-strands involve paired amino acids having similar physicochemical characters (shown along the diagonal in Figure 3B), such as Val and Ile, Tyr and Trp, Ser and Gln/Thr/Asn, Asn and Thr, and Glu and Lys/Arg. In contrast, most of the negative correlations for β-strands involve pairs of amino acids having different physicochemical characters, such as Val and Tyr/Trp/Gln/Ser, Ile and Trp/Gln/Ser/Glu/Arg, Leu and Ser/Thr/Asn, Met and Asn, and Ala and Lys.
Interestingly, the aromatic amino acid, Phe, shows low correlations with Trp and Tyr, for both α-helices and β-strands, although strong positive correlations between Trp and Tyr are observed for both α-helices and β-strands.
In contrast, for β-strands, most of the correlations shown in Figure 3B are strong correlations for exposed (Figure 6A) and buried (Figure 6B) residues. The strong negative correlations for Val/Ile and Tyr/Trp/Gln were observed for buried but not exposed residues. In other words, a fold type that prefers Val or Ile does not prefer Tyr, Trp or Gln, especially for buried residues.
Correlation coefficients for buried residues
f WYQ vs. f VI
f WYQ vs. f VIL
f WYQS vs. f VIL
Correlation coefficients for solvent-exposed residues
f IL vs. f STN
f EKR vs. f STN
f ILEKR vs. f STN
Wang et al.  showed that isolated β-strands in molecular dynamics simulations are not twisted, suggesting that the stabilization of the twist must be due to inter-strand interactions. Another computer simulation study found that inter-strand interactions by side chains induce a twist and that β-branched side chains are important for twist formation . On the other hand, Koh et al.  and Bosco et al.  used statistical analyses to show that β-sheet structure is mainly determined by the backbone, and the contribution of side chains is small. This indicates that twisting is an inherent property of a polypeptide chain, implying that a β-strand should twist regardless of its amino acid sequence. However, some folds have a large/flat β-sheet, such as the SCOP groups concanavalin A and SS β-helix. Previous studies have targeted only the twisted β-strand and not focused on the flat β-sheet. Our results suggest that the amino acid composition in the exposed regions of β-strands may be related to the twist and bend of the strand, showing that side chain interactions are also an important factor for β-strand twisting. An intuitive explanation is that the long side chains of Leu, Ile, Lys, Arg and Glu in the exposed regions come close together to form the hydrophobic core, resulting in the formation of a twist and/or bend in β-strands. In contrast, the side chains of Ser, Thr and Asn have low hydrophobicities and are short so that the hydrophobic interactions between the side chains are weak and produce a flat β-sheet. Therefore, it seems that the strain within a β-sheet is one of the major factors governing amino acid propensities of folds for β-strands.
The folds can be classified by their β-sheet types into three; parallel, antiparallel and mixed β-sheet. For "all-β" protein class and "α + β" protein class, β-sheets of all folds used in this study are completely antiparallel β-sheet except for SS β-helix which has completely parallel β-sheet. The folds of "α/β" protein class have completely or mainly parallel β-sheets. β-sheets of the three folds, "Flavodoxin-like", "NAD(P)-binding Rossmann-fold domains" and "TIM beta/alpha-barrel" are completely parallel, whereas "Periplasmic binding protein-like II" and "Thioredoxin fold" have mixed β-sheet.
For the exposed residues of β-strands (Figure 8), the plots for the folds of "all-β" proteins class were widely distributed, although they are commonly completely antiparallel β-sheet except for SS β-helix. Furthermore, the folds of "α/β" proteins class have different amino acid compositions from that of SS β-helix, although they have parallel β-sheets. Figure 7 shows that the plots for the folds of "all-β" proteins class were widely distributed and the plot of SS β-helix is in the center of the graph. The residue fractions (f βbur VIL ) of the three folds that have completely parallel β-sheets were also widely distributed (51.4, 47.2 and 42.7%).
We checked the robustness of our results using the dataset of more than 1,500 residues and less than 2,000 residues, which is not included in the dataset used in this study; six folds for α-helix and eight folds for β-strands. For β-strands, strong correlations were also observed for buried residues (RWYQS-VIL = −0.81) and for exposed residues (RILEKR-STN = −0.78). There are no strong correlations for buried residues (RWYQS-VIL = −0.64) and for exposed residues (RILEKR-STN = −0.48) in α-helices. These results are the same as those obtained for the dataset containing more than 2,000 residues. Therefore, the results presented here seem to be independent of the dataset selection.
The amino acid propensities for secondary structures were investigated for each SCOP fold. The helix propensities calculated for exposed and buried residues are also similar to each other. For β-sheet propensities, however, propensities calculated for exposed residues are remarkably different from those of buried residues, which are similar to those calculated for all residues because β-sheets tend to be located in the interior of proteins.
We also detected correlations between amino acid compositions in β-strands. At buried sites, the content of Tyr, Trp, Gln and Ser correlates negatively with the content of the aliphatic amino acids Val, Ile and Leu. All-β proteins tend to have a higher content of Tyr, Trp, Gln and Ser, whereas α/β proteins tend to have a higher content of aliphatic amino acids at buried sites. In all-β proteins, the H-bonds between buried side chains may be necessary for correct alignment of two large β sheets. For exposed residues, there is a tendency that a fold with a high content of Ile, Leu, Glu, Lys and Arg would have a low content of Ser, Thr and Asn. Generally, α/β proteins have twisted and bent β-strands and favor longer side chains at exposed sites.
These findings are very useful for the design of β-sheet. They are especially effective when there is structural information such as whether a residue is exposed or buried, two large β-sheets are packed together, a β-sheet has α-helices at least one side of β-sheets and a β-strand is twisted or not. Hecht and coworkers have succeeded in designing de novo proteins with binary patterning techniques, in which polar and non-polar amino acids are placed at desired sites along the sequence by synthesizing DNA with degenerated codon . If one desire to design a de novo protein library of SS β-helix, for example, he should consider to bias in favor of Ser, Thr, and Asn rather than Glu, Lys, Arg for exposed sites on β-strands because the frequency of Ser, Thr, and Asn is relatively high and conversely the frequency of Ile, Leu, Glu, Lys, Arg is low for exposed sites on β-strands of SS β-helix folds (Figure 8).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.