Research article | Open | Published:
Dependence of α-helical and β-sheet amino acid propensities on the overall protein fold type
BMC Structural Biologyvolume 12, Article number: 18 (2012)
A large number of studies have been carried out to obtain amino acid propensities for α-helices and β-sheets. The obtained propensities for α-helices are consistent with each other, and the pair-wise correlation coefficient is frequently high. On the other hand, the β-sheet propensities obtained by several studies differed significantly, indicating that the context significantly affects β-sheet propensity.
We calculated amino acid propensities for α-helices and β-sheets for 39 and 24 protein folds, respectively, and addressed whether they correlate with the fold. The propensities were also calculated for exposed and buried sites, respectively. Results showed that α-helix propensities do not differ significantly by fold, but β-sheet propensities are diverse and depend on the fold. The propensities calculated for exposed sites and buried sites are similar for α-helix, but such is not the case for the β-sheet propensities. We also found some fold dependence on amino acid frequency in β-strands. Folds with a high Ser, Thr and Asn content at exposed sites in β-strands tend to have a low Leu, Ile, Glu, Lys and Arg content (correlation coefficient = −0.90) and to have flat β-sheets. At buried sites in β-strands, the content of Tyr, Trp, Gln and Ser correlates negatively with the content of Val, Ile and Leu (correlation coefficient = −0.93). "All-β" proteins tend to have a higher content of Tyr, Trp, Gln and Ser, whereas "α/β" proteins tend to have a higher content of Val, Ile and Leu.
The α-helix propensities are similar for all folds and for exposed and buried residues. However, β-sheet propensities calculated for exposed residues differ from those for buried residues, indicating that the exposed-residue fraction is one of the major factors governing amino acid composition in β-strands. Furthermore, the correlations we detected suggest that amino acid composition is related to folding properties such as the twist of a β-strand or association between two β sheets.
In 1974, Chou and Fasman published the calculated frequency of occurrence and conformational propensity of each amino acid in the secondary structures of 15 proteins, consisting of 2473 amino acid residues . Since then, a vast number of protein structures have been determined and classified to reflect both structural and evolutionary relatedness [2, 3]. SCOP classification (Structural Classification of Protein) is one of the major database which provides a detailed and comprehensive description of the relationships of all known proteins structures. The classification is on hierarchical levels: the first two levels, family and superfamily, describe near and far evolutionary relationships; the third, fold, describes geometrical relationships. Most of the folds (899/1086) are assigned to one of the four structural classes; “all-α”, “all-β”, “α/β” (for proteins with α-helices and β-strands that are largely interspersed) and “α + β” (for those in which α-helices and β-strands are largely segregated). Remaining folds are assigned to "Multi-domain", "Membrane and cell surface" or "Small" proteins classes. In 2009, we developed a quaternary structural database for proteins, OLIGAMI  in which the oligomer information was added to the SCOP classification , to allow an exhaustive survey of tertiary or quaternary structures of proteins.
A large number of studies have been carried out to obtain amino acid propensities for α-helix and β-sheet [1, 5–28]. The propensities have been estimated from statistical analysis of three-dimensional structures [1, 6–15], experimental determination of α-helix or β-sheet content in peptides [16–23], and experimental determination of the thermodynamic stability of mutant proteins [23–28]. The obtained propensities for α-helix are consistent between studies, with the pair-wise correlation coefficient (R) frequently being >0.8, although Richardson et al.  and Engel et al.  showed that amino acid propensities are different for specific locations of α-helix depending on amino acids. Engel et al. also show that most helices are amphiphilic and have a strong tendency to both begin and end on the solvent-inaccessible face of the α-helix, suggesting that the propensities for α-helix differ between solvent-accessible and solvent-inaccessible faces. On the other hand, the β-sheet propensities obtained by several studies differ significantly, indicating that the context significantly affects β-sheet propensity. β-sheets consist of various combination of β-strands; the number of strands, parallel, anti-parallel, mixed β-sheet and so on. For IgG-binding domain from protein G, which have four antiparallel β-strands, Minor and Kim showed that β-sheet propensity measured at the center strand  differs significantly from that measured at an edge strand . This context-dependent nature of the β-sheet propensity may be reflected in its dependence on overall protein fold. Previously, Jiang et al.  and Costantini et al.  calculated the secondary structure propensities for four protein structural classes; “all-α”, “all-β”, “α/β”, and “α + β” and showed that β-sheet propensity depends on these structural classes. However, it has not been clarified that their dependencies result from the difference in what kind of context, since each folding class contains various folds that have different context. So it is interesting to address whether the amino acid propensity of each amino acid vary depending on the fold type.
In this study, to clarify the relationship between the amino acid propensity and the context in more detail, we calculated the occurrence of each amino acid residue in α-helical and β-strand conformations as a function of the SCOP fold of the protein (i.e. lower structural level than previously addressed), and categorized the residues as exposed to solvent or buried interior. The results indicate that α-helix propensities do not differ significantly by fold but that β-sheet propensities are diverse and indeed depend on the fold. Furthermore, we found the some relationships between a structural feature and an amino acid composition by analyzing correlations between a protein fold and an amino acid propensity.
Selecting protein structures to be included in the dataset
This study uses sets of non-redundant PDB entries (three-dimensional coordinates) in each fold type. To facilitate the analysis, we wanted to extract monomeric or homo-oligomeric and single-domain proteins from PDB. This has been accomplished in OLIGAMI (http://protein.t.soka.ac.jp/oligami/)  which is database combined SCOP database (Structural Classification of Proteins)  and oligomeric information. From these coordinates, a non-redundant subset of PDB entries (in which no pair of structures had >60% sequence identity) was created for each fold of the four main SCOP classes of proteins: “all-α”, “all-β”, “α/β”, and “α + β”. The number of proteins (or protein domains) classified in each SCOP fold varies; for example, the SCOP fold “dipeptide transport proteins” contains only one entry, that of d-amino peptidase (PDB, 1HI9). This enzyme is a decamer of identical subunits, each with 88 and 68 residues in α-helical and β-strand conformations, respectively. Because the number of residues in this SCOP fold category is too small to extract statistically meaningful results, we selected only those SCOP folds that contained at least 2,000 residues in an α-helical, β-strand, or other conformation (Table 1). Consequently, we identified 39 (2,029 PDB entries) of 899 SCOP folds for the dataset of α-helices and 24 (1,879 PDB entries) of 899 SCOP folds for the dataset of β-strands. Twelve of these SCOP folds, such as the TIM barrel and Rossmann fold—both examples of α/β proteins—were included in the dataset for both α-helices and β-strands, and consequently we used 51 SCOP folds. We also identified 39 of 51folds for the dataset of other conformation as a control. SCOP release 1.73 was used for all calculations.
Determining amino acid propensities in the secondary structure elements
The propensity, P ij , of amino acid, i, for SCOP fold, j, in α-helices (Pα ij ) or β-strands (Pβ ij ) was calculated as follows:
where fS ij is the frequency of the amino acid i occurring in SCOP fold j in the secondary structure S (fS ij = NS ij /NS j ), and f i is the frequency of the amino acid i occurring in the protein (f i = N i /N t ). NS ij and NS j are the number of amino acid i, and the number of all amino acids in the secondary structure S in SCOP fold j. N i is the number of amino acid i, and N t is the total number of amino acids in all 51 SCOP folds. Therefore, the propensity means a relative quantity of the frequency of the amino acid i occurring in a secondary structure in a specific fold divided by the frequency of the amino acid i occurring in all proteins. If Pα i = 1, the amino acid i is contained equally in both the α-helical region and the protein. When Pα i > 1, the amino acid i is more frequent in the α-helical region than in the protein. The standard deviation for the normalized function, P ij , was calculated as follows.
The secondary structure assignment program DSSP  was used for all secondary structure assignments. DSSP program assigns secondary structures, i.e., H: α-helix, G: 310-helix, I: 5-helix (π-helix), E: extended strand, B: residue in isolated β-bridge, S: bend and T, hydrogen bonded turn. We regarded H: α-helix and G: 310-helix as α-helix and E as β-strand, and remaining residues except T are defined as other conformation.
Defining exposed and buried residues in the secondary structure elements
Amino acid residues were defined as “exposed” when >20% of the total accessible surface area was exposed to solvent. This threshold level of 20% was determined as the value that could classify an almost equal number of residues as exposed (1,241 residues) or buried (1,276 residues) in β-strands for 37 soluble β-barrel proteins. The total accessible surface area for a given amino acid, X, was calculated using the tri-peptide (G-X-G), using DSSP . The frequency of exposed, fSexp ij , and buried, fSbur ij , residues was calculated for each amino acid in an α-helical or β-strand conformation for each SCOP fold. The propensities for an α-helical or β-strand conformation for each SCOP fold for exposed residues, PSexp ij , and buried residues, PSbur ij , were obtained by dividing fSexp ij and fSbur ij by the frequency of the exposed and buried residues in all SCOP folds, fexp i and fbur i , respectively,
Population difference test
The Fisher-Irwin population test can be used to determine statistically significant differences between P ij values for different fold types. Because the n value (the sum of the number of each amino acid, i, from both populations) was large, the exact Fisher-Irwin test values were not calculated. Instead, a large sample number approximation was used .
The P ij value difference between the populations is considered significant if the test variable Z is >1.25, which corresponds to a 90% confidence level, then the populations were considered to be different.
Results and discussion
Amino acid propensities for the α-helical or β-strand conformation
For individual amino acids, a Pα of <0.9 denotes an α-helix breaker, a Pα of >1.1 denotes an α-helix-favored amino acid, and values between 0.9 and 1.1 denote that the amino acid is neutral in this regard . The same principle applies to Pβ. The amino acid propensities calculated using our dataset (Pα i and Pβ i ) are shown in Table 2. Their standard deviations ranged from 0.001 to 0.004. The results are in good agreement with previous reports [1, 6, 10].
We also calculated the amino acid propensities for exposed and buried residues (Pexp i and Pbur i ) in the secondary structural elements (Table 2). For α-helices, the three mean propensities Pα i , Pαexp i and Pαbur i have similar trends. On the other hand, mean propensities for exposed residues (Pβexp i ) and buried residues (Pβbur i ) for β-strands differ significantly (Table 2). It is especially interesting that Lys and Arg, but not two other charged residues, Asp and Glu, are preferred as exposed residues in β-strands. Not surprisingly, all charged amino acids are disfavored as buried residues in β-strands. The buried regions disfavor charged amino acids for β-strands, whereas the α-helix can tolerate charged amino acids.
As previously reported in statistical studies, charged amino acids (including Lys and Arg) yield low values for Pβ[1, 6, 10, 13], which is in agreement with the mean propensities, Pβ i , determined in the present work. Our results, however, show that Lys and Arg have relatively high Pβexp values for exposed residues, but this property is masked when comparing mean propensities. In our dataset, the fraction of exposed residues in β-strands is low (29%) compared to α-helices (46%). Most residues in β-strands are buried inside proteins and covered by α-helices or loop regions; exposed residues are thus less frequently encountered in β-strands, and their contributions to the mean Pβ i are therefore small. Jiang and coworkers  have suggested that the hydrophobicities of amino acid side chains are the key determinant of β-sheet structures, but our data suggest that this result is true for buried residues but not for exposed residues in β-sheet structures. Minor and Kim  measured the propensity of the 20 amino acids for the β-sheet formation in a variant of the IgG-binding domain from protein G, which have four antiparallel β-strands. Amino acid substitutions were made at a guest site on the solvent-exposed surface of the center strand. The propensities from those experiments show a strong correlation with the logarithmic Pβexp i values obtained here (R = 0.82), although they show a weaker correlation with our logarithmic Pβbur i values (R = 0.63). Furthermore, there is poor correlation between the propensities determined by Minor and Kim  and those of Chou and Fasman . These results show that the preference for β-strands differs for exposed and buried sites.
Fold dependency of amino acid propensities for α-helices
The propensities of amino acid i in the helical region of fold j, Pα ij , and the β-strand region of fold j, Pβ ij , were thus calculated for 39 and 24 of SCOP folds, respectively (Figure 1). Their standard deviations range from 0.01 to 0.05. With the exception of Met, Cys, Trp, Asn, Asp and His for Pα ij , and with the exception of Met, Pro and Cys for Pβ ij , the population of amino acids differed (>90% confidence level) for more than one pair of folds.
In particular, a wide range of Pα ij values was obtained for the aromatic residues Phe (0.66–2.00) and Tyr (0.58–1.89), depending on fold type, and the mean propensity for all folds is approximately 1.0 for these amino acids (Figure 1A and Table 2). The propensities of the charged residues Lys (0.65–1.56) and Arg (0.80–1.71) also varied widely depending on a fold. On the other hand, in >80% of SCOP folds, Leu or Glu are favored in the α-helical conformation, whereas Val, Pro, Ser, Thr, Asn, Asp and Gly are disfavored. Ala is favored in the α-helical conformation in the majority of the folds (79%) but is disfavored in two folds (Protein kinase-like and 4-helical cytokines). In particular, the value of the propensity of Ala for the "4-helical cytokines" fold is quite low (Pα ij = 0.64). Met, Cys, Trp and His do not have a fold-type population difference at the >90% confidence level in any pair of folds, although their propensities vary widely among the various folds. Therefore, we did not further assess these amino acids.
Richardson et al. showed that Ala is not favored in ends of α-helix , suggesting that a short α-helix does not favor Ala. The mean length of α-helix of the 4 helical cytokines fold is, however, the third longest of those of 39 folds (The longest and the second longest are those of "Ferritin-like" and "Four-helical up-and-down bundle" folds, respectively). Then, the correlation coefficient between the mean length of α-helix and the amino acid propensity for each amino acid were calculated, so that they were smaller than 0.4. This result indicates that there is no relationship between the mean length of α-helix and the helical propensity of any amino acid.
Engel et al. show that most helices are amphiphilic [7, 12], suggesting that the propensities for α-helix depend on the exposed residue fraction. So, we examined the correlations between the exposed residue fraction and the frequency of amino acids in α-helices. No amino acid showed a strong correlation (R < −0.7 or R > 0.7) between the exposed residue fraction and the amino acid frequency, although the charged residues, Lys and Asp have a relatively strong positive correlation (RK = 0.66, RD = 0.54). In contrast, the correlation coefficients of Glu and Arg (also charged amino acids) are small (RE = 0.26, RR = 0.07).
Figure 2 also presents propensities for exposed and buried amino acids for each SCOP fold. For the exposed regions of an α-helix (Figure 2A), less than ten amino acids show the population difference with 90% confidence for at least one pair of folds. Probably, this results from the fact that the dataset was limited to exposed residues. Glu (Pαexp ij : 1.0–1.92) is favored in exposed regions (Figure 2A) whereas Leu (Pαbur ij : 0.97–1.88) is favored in buried regions (Figure 2B) for more than 80% of the folds. Pro and Gly are extremely disfavored in both exposed and buried regions for more than 92% of the folds. The propensities of Ala in the exposed and buried regions of α-helix have a similar tendency as Pα ij . Ala is favored in the α-helical conformation in both exposed and buried regions for 72% and 79% of the folds, respectively, whereas Ala is disfavored by 8% and 13% of the folds when exposed or buried, respectively. For the "4-helical cytokines" fold, the values of the propensity of Ala in both exposed and buried regions are also low (Pαexp ij = 0.72 and Pαbur ij = 0.60). A wide range of Pαbur ij values was obtained for the aromatic residues Phe and Tyr, depending on fold type (Figure 2B), like as Pα ij .
Fold dependency of amino acid propensities for β-strands
As shown in Figure 1B, a wide range of Pβ ij values was obtained for Trp (0.45–2.22), Thr (0.73–1.87), Lys (0.46–1.45) and Arg (0.51–1.42) depending on fold type. For Lys, although Pβ ij was <0.9 in 18 of 24 folds (mean value of Pβ ij = 0.79), three folds (the lipocalins fold, OB-fold, and protein kinase–like fold) yielded Pβ ij values > 1.2, which had the population differences corresponding to 90% confidence level with that of other folds. These three folds are “all-β” or “α + β”, and all have largely exposed β-strands, whereas β-strands are usually covered by α-helical or loop regions, especially in “α/β” proteins (Table 1). It has long been thought that β-strands prefer hydrophobic residues [1, 6, 10]; however, it now appears that largely exposed β-sheet structures prefer hydrophilic residues such as Lys. In contrast, the four amino acids Val, Ile, Phe and Tyr are favored (Pβ ij > 1.1) in β-strands of more than 80% of folds, with Val (1.40–2.68) and Ile (1.17–2.33) having particularly high propensities in this regard. The six amino acids Pro, Ala, Asn, Asp, Glu and Gly are disfavored (Pβ ij < 0.9) in β-strands for more than 80% of folds, and Pro (0.16–0.71) and Asp (0.22–0.91) have quite low propensities.
The exposed residue fractions were observed in the range from about 10% to 46% for 24 folds (Table 1) and Glu and Lys have strong and positive correlations between the amino acid propensities and the exposed residue fractions of β-strands in each fold (RE = 0.76, RK = 0.73). Gln, Arg and Ile also have relatively strong correlations, although the correlation for Ile is negative (RQ = 0.67, RR = 0.5, RI = −0.68). As opposed to the strong positive correlation found for Glu, there is no correlation for the other negatively charged amino acid, Asp. The exposed residue fraction appears to be one of the major factors governing charged amino acid composition of folds for β-strands.
For residues exposed in a β-strand (Figure 2C), a wide range of Pβexp ij values was obtained for Ser (0.42–1.69), Lys (0.84–1.58) and Arg (0.68–1.85). A wide range of Pβbur ij values was obtained for Cys (0.61–2.61), Phe (0.66–1.83), Tyr (0.64–1.92), Trp (0.31–1.77) and His (0.41–1.87) for residues buried in a β-strand (Figure 2D). Pβexp ij values of Val, Ile, Phe, Tyr, Trp and Thr are high (Pβexp ij > 1.1) for more than 75% of folds, indicating that these amino acids, which have a β-branched or aromatic side chain, are favored in the exposed regions of β-strands in all fold types. In contrast, amino acids that are disfavored in all folds in β-strands are Pro (0.22–0.87), Ala (0.28–0.70) and Gly (0.23–0.88) for exposed regions, and Pro (0.12–0.87) for buried regions. It is interesting that Pβexp ij values for all folds for Ala are lower by comparison (Pβexp ij < 0.7), indicating that an exposed residue on a β-strand is an extremely unfavorable position for Ala as well as for Pro and Gly. These strong tendencies support that the backbone solvation is a major factor determining thermodynamic β-propensities .
Correlations between amino acid propensities and SCOP fold
To investigate the factors that determine the fold dependence of the amino acid propensity for the secondary structures, correlation coefficients were calculated using amino acid propensities obtained from 39 SCOP folds for α-helices (Figure 3A) and 24 SCOP folds for β-strands (Figure 3B). Figure 4, for example, shows the relationships between the propensities of Glu and Lys for α-helices and β-strands. Each data point represents a fold in which more than 2,000 residues are found in each of α-helices and β-strands. For β-strands (Figure 4B), these two amino acid propensities have a correlation coefficient of 0.70, which suggests that folds rich in Glu are likely to also be rich in Lys. In contrast, for α-helices (Figure 4A) no significant correlation was observed. For β-strands, “α/β” proteins (□ in Figure 4B) show low propensities for Glu and Lys, although lipocalins and OB-folds (both “all-β”, + in Figure 4B) show higher propensities for Glu and Lys. For “α+β” proteins (▵ in Figure 4B), there is no correlation between the propensities of Glu and Lys. The correlation coefficients for “all-β” proteins and “α/β” proteins are 0.83 and 0.86, respectively.
Overall, there is a greater number of strong correlations (R < −0.7 or R > 0.7) for β-strands than for α-helices (Figure 3). For example, four strong positive correlations and five strong negative correlations are observed for β-strands, but there are only two paired strong correlations for α-helices (Ala and Gly, Tyr and Trp). Most of the positive correlations for β-strands involve paired amino acids having similar physicochemical characters (shown along the diagonal in Figure 3B), such as Val and Ile, Tyr and Trp, Ser and Gln/Thr/Asn, Asn and Thr, and Glu and Lys/Arg. In contrast, most of the negative correlations for β-strands involve pairs of amino acids having different physicochemical characters, such as Val and Tyr/Trp/Gln/Ser, Ile and Trp/Gln/Ser/Glu/Arg, Leu and Ser/Thr/Asn, Met and Asn, and Ala and Lys.
Interestingly, the aromatic amino acid, Phe, shows low correlations with Trp and Tyr, for both α-helices and β-strands, although strong positive correlations between Trp and Tyr are observed for both α-helices and β-strands.
Correlations between SCOP fold and propensities for exposed or buried amino acids
We also calculated correlation coefficients for amino acid propensities of exposed and buried residues for α-helices (Figure 5), β-strands (Figure 6) and other conformation (Data not shown). Although amino acid propensities for α-helices have two strong correlations (Figure 3A), there is no strong correlation for exposed (Figure 5A) and buried (Figure 5B) residues for α-helices. The strong positive correlation between Trp and Tyr for all residues was absent for exposed residues, but a weak positive correlation was observed for buried residues. These results indicate that a fold that favors Trp on the interior side of an α-helix also favors Tyr in a interior of α-helices. Again, Phe had no correlation with Trp or Tyr for exposed or buried residues. The positive correlations among Ser, Asn and Thr, and the negative correlations between Ser/Thr and Glu, were observed only for exposed residues. Although some new correlations were observed, these values were relatively low for α-helices. For other conformation, strong correlation was not observed for both exposed and buried residues.
Correlation for buried amino acids in β-strand
In contrast, for β-strands, most of the correlations shown in Figure 3B are strong correlations for exposed (Figure 6A) and buried (Figure 6B) residues. The strong negative correlations for Val/Ile and Tyr/Trp/Gln were observed for buried but not exposed residues. In other words, a fold type that prefers Val or Ile does not prefer Tyr, Trp or Gln, especially for buried residues.
By visually inspecting buried residues for β-strands in the SCOP fold group of “concanavalin A–like lectins/glucanases” (concanavalin A), in addition to buried Tyr and Trp residues we found many polar amino acids such as Gln, Ser or Thr, and charged amino acids such as Glu, Lys or Arg, involved in H-bonds with each other to counterbalance the polarity in the hydrophobic environment. For the buried residues, we calculated the correlation coefficients between the combined frequencies of hydrophobic amino acids (Val, Ile and Leu) and some polar amino acids (Table 3 and Figure 7). The correlation coefficients calculated from the frequencies are the same as those calculated from the propensities, and thus it is easier to understand the amino acid occurrences. The combined frequencies of Trp, Tyr and Gln that are buried have a strong correlation (R = −0.87) with those of hydrophobic amino acids (Val, Ile and Leu). The inclusion of Ser in the group with Trp, Tyr and Gln increased the correlation coefficient to −0.93 (Figure 7). The fact that the correlation coefficients for Val/Ile/Leu and Tyr/Trp/Gln/Ser range from −0.19 to −0.75 indicates synergy in the correlation of the combined frequencies for β-strands that does not exist for α-helices and other conformation (Table 3). The synergy between these amino acid groups suggests that the amino acids within the same group can be exchanged. For example, in a fold type where Leu is preferred for buried residues, Ile will also be preferred. Thus, at buried sites, fold types with many aliphatic residues (Val, Ile and Leu) also contain low quantities of Tyr, Trp, Gln and Ser. Figure 7 also shows that “all-β” proteins tend to have a higher content of Tyr, Trp, Gln and Ser, whereas “α/β” proteins have a higher content of aliphatic amino acids at buried sites. The top six folds for the content of Tyr, Trp, Gln and Ser at buried sites in β-strands are “all-β” proteins and have two large β-sheets packed together (lipocalins, concanavalin A, 6-bladed beta-propeller (6-bb-propeller), galactose-binding domain-like (Gbd), double-stranded β-helix (DS β-helix), and immunoglobulin-like beta-sandwich folds (Ig)). Other “all-β” proteins that consisted of only one small β-sheet or small β-barrel structure have a small hydrophobic core. The H-bonds between the buried side chains may be necessary for correct alignment of two large β sheets in particular.
Correlation for exposed amino acids in β-strand
Negative correlations for Ile/Leu and Ser/Thr/Asn were observed in the exposed residues (Figure 6A), although the correlations for Ile and Thr/Asn were not observed when both exposed and buried residues were calculated together (Figure 3B). Negative correlations were also observed for Glu and Ser/Asn and for Arg and Thr. We examined the correlation of the combined frequencies for these exposed amino acids in β-strands as shown in Table 4. This result shows that strong correlations exist in the frequencies of certain hydrophobic amino acids (Ile, Leu), charged amino acids (Glu, Lys, Arg), and polar amino acids (Ser, Thr, Asn) in the exposed regions of β-strands. It is interesting that the frequencies of hydrophobic (Ile, Leu) and charged (Glu, Lys, Arg) amino acids correlate negatively with those for polar amino acids (Ser, Thr, Asn). A common feature for Ile, Leu, Glu, Lys and Arg is that they have relatively long side chains, including more than two hydrophobic methylene groups, whereas Ser, Thr and Asn have short side chains.
Figure 8 shows a strong correlation between the combined groupings of Ser, Thr and Asn with Ile, Leu, Glu, Lys and Arg (R = −0.90). For the exposed regions of β-strands, it is clear that in all “α/β” proteins and all “α+β” proteins, Ile, Leu, Glu, Lys and Arg are preferred and that Ser, Thr and Asn are disfavored. Fold types that prefer Ser, Thr or Asn have a relatively low content of Ile, Leu, Glu, Lys, or Arg, and they are “all-β” proteins. Figure 8 also shows the widespread distribution of the folds of “all-β” proteins. For the two SCOP folds DS β-helix and OB-fold of “all-β” proteins, the residues Ile, Leu, Glu, Lys or Arg are preferred in the exposed regions of the β-strands. These fold types have twisted and bent β-strands. Some Cα atoms in the β-strands are positioned at the bottom of the narrow and deep valley formed by the twisted and bent β-strands (Figure 9D and E). At such positions, the short, polar side chain of Ser, Thr or Asn is unable to reach the solvent, so amino acids with long side chains are favored. Much the same is true for “α/β” proteins (Figure 9F and G). The β-sheet is covered by α-helices and twists in “α/β” proteins, leaving only narrow spaces for the residues at the ends of the β-strands to reach solvent. In contrast, the two SCOP folds concanavalin A and single-stranded right-handed β-helix (SS β-helix) have a remarkably high content of Ser, Thr and Asn in the exposed regions of β-strands and have largely exposed and flat β-sheets (Figure 9A, B and C). Figure 9C shows that Ser, Asn and Thr are dominant in the flat β-sheet, and they do not significantly make contact with each other. These results suggest that amino acid composition in the exposed regions of β-strands governs the formation of a twist in β-sheets.
Wang et al.  showed that isolated β-strands in molecular dynamics simulations are not twisted, suggesting that the stabilization of the twist must be due to inter-strand interactions. Another computer simulation study found that inter-strand interactions by side chains induce a twist and that β-branched side chains are important for twist formation . On the other hand, Koh et al.  and Bosco et al.  used statistical analyses to show that β-sheet structure is mainly determined by the backbone, and the contribution of side chains is small. This indicates that twisting is an inherent property of a polypeptide chain, implying that a β-strand should twist regardless of its amino acid sequence. However, some folds have a large/flat β-sheet, such as the SCOP groups concanavalin A and SS β-helix. Previous studies have targeted only the twisted β-strand and not focused on the flat β-sheet. Our results suggest that the amino acid composition in the exposed regions of β-strands may be related to the twist and bend of the strand, showing that side chain interactions are also an important factor for β-strand twisting. An intuitive explanation is that the long side chains of Leu, Ile, Lys, Arg and Glu in the exposed regions come close together to form the hydrophobic core, resulting in the formation of a twist and/or bend in β-strands. In contrast, the side chains of Ser, Thr and Asn have low hydrophobicities and are short so that the hydrophobic interactions between the side chains are weak and produce a flat β-sheet. Therefore, it seems that the strain within a β-sheet is one of the major factors governing amino acid propensities of folds for β-strands.
The types of β-sheets and the amino acid propensity
The folds can be classified by their β-sheet types into three; parallel, antiparallel and mixed β-sheet. For "all-β" protein class and "α + β" protein class, β-sheets of all folds used in this study are completely antiparallel β-sheet except for SS β-helix which has completely parallel β-sheet. The folds of "α/β" protein class have completely or mainly parallel β-sheets. β-sheets of the three folds, "Flavodoxin-like", "NAD(P)-binding Rossmann-fold domains" and "TIM beta/alpha-barrel" are completely parallel, whereas "Periplasmic binding protein-like II" and "Thioredoxin fold" have mixed β-sheet.
For the exposed residues of β-strands (Figure 8), the plots for the folds of "all-β" proteins class were widely distributed, although they are commonly completely antiparallel β-sheet except for SS β-helix. Furthermore, the folds of "α/β" proteins class have different amino acid compositions from that of SS β-helix, although they have parallel β-sheets. Figure 7 shows that the plots for the folds of "all-β" proteins class were widely distributed and the plot of SS β-helix is in the center of the graph. The residue fractions (fβbur VIL ) of the three folds that have completely parallel β-sheets were also widely distributed (51.4, 47.2 and 42.7%).
Robustness of the dataset
We checked the robustness of our results using the dataset of more than 1,500 residues and less than 2,000 residues, which is not included in the dataset used in this study; six folds for α-helix and eight folds for β-strands. For β-strands, strong correlations were also observed for buried residues (RWYQS-VIL = −0.81) and for exposed residues (RILEKR-STN = −0.78). There are no strong correlations for buried residues (RWYQS-VIL = −0.64) and for exposed residues (RILEKR-STN = −0.48) in α-helices. These results are the same as those obtained for the dataset containing more than 2,000 residues. Therefore, the results presented here seem to be independent of the dataset selection.
The amino acid propensities for secondary structures were investigated for each SCOP fold. The helix propensities calculated for exposed and buried residues are also similar to each other. For β-sheet propensities, however, propensities calculated for exposed residues are remarkably different from those of buried residues, which are similar to those calculated for all residues because β-sheets tend to be located in the interior of proteins.
We also detected correlations between amino acid compositions in β-strands. At buried sites, the content of Tyr, Trp, Gln and Ser correlates negatively with the content of the aliphatic amino acids Val, Ile and Leu. All-β proteins tend to have a higher content of Tyr, Trp, Gln and Ser, whereas α/β proteins tend to have a higher content of aliphatic amino acids at buried sites. In all-β proteins, the H-bonds between buried side chains may be necessary for correct alignment of two large β sheets. For exposed residues, there is a tendency that a fold with a high content of Ile, Leu, Glu, Lys and Arg would have a low content of Ser, Thr and Asn. Generally, α/β proteins have twisted and bent β-strands and favor longer side chains at exposed sites.
These findings are very useful for the design of β-sheet. They are especially effective when there is structural information such as whether a residue is exposed or buried, two large β-sheets are packed together, a β-sheet has α-helices at least one side of β-sheets and a β-strand is twisted or not. Hecht and coworkers have succeeded in designing de novo proteins with binary patterning techniques, in which polar and non-polar amino acids are placed at desired sites along the sequence by synthesizing DNA with degenerated codon . If one desire to design a de novo protein library of SS β-helix, for example, he should consider to bias in favor of Ser, Thr, and Asn rather than Glu, Lys, Arg for exposed sites on β-strands because the frequency of Ser, Thr, and Asn is relatively high and conversely the frequency of Ile, Leu, Glu, Lys, Arg is low for exposed sites on β-strands of SS β-helix folds (Figure 8).
Chou PY, Fasman GD: Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. Biochemistry 1974, 13(2):211–222. 10.1021/bi00699a001
Hubbard TJ, Ailey B, Brenner SE, Murzin AG, Chothia C: SCOP: a Structural Classification of Proteins database. Nucleic Acids Res 1999, 27(1):254–256. 10.1093/nar/27.1.254
Orengo CA, Pearl FM, Bray JE, Todd AE, Martin AC, Lo Conte L, Thornton JM: The CATH Database provides insights into protein structure/function relationships. Nucleic Acids Res 1999, 27(1):275–279. 10.1093/nar/27.1.275
Fujiwara K, Ikeguchi M: OLIGAMI: OLIGomer Architecture and Molecular Interface. The Open Bioinformatics Journal 2008, 2: 50–53. 10.2174/1875036200802010050
Serrano L: The relationship between sequence and structure in elementary folding units. Adv Protein Chem 2000, 53: 49–85.
Williams RW, Chang A, Juretic D, Loughran S: Secondary structure predictions and medium range interactions. Biochim Biophys Acta 1987, 916(2):200–204. 10.1016/0167-4838(87)90109-9
Richardson JS, Richardson DC: Amino acid preferences for specific locations at the ends of alpha helices. Science 1988, 240(4859):1648–1652. 10.1126/science.3381086
Munoz V, Serrano L: Intrinsic secondary structure propensities of the amino acids, using statistical phi-psi matrices: comparison with experimental scales. Proteins 1994, 20(4):301–311. 10.1002/prot.340200403
Swindells MB, MacArthur MW, Thornton JM: Intrinsic phi, psi propensities of amino acids, derived from the coil regions of known structures. Nat Struct Biol 1995, 2(7):596–603. 10.1038/nsb0795-596
Jiang B, Guo T, Peng L, Sun Z: Folding Type-Specific Secondary Structure Propensities of Amino Acids, Derived from a-Helical, b-Sheet, a/b, and a + b Proteins of Known Structures. Biopolymers 1998, 45: 35–49. 10.1002/(SICI)1097-0282(199801)45:1<35::AID-BIP4>3.0.CO;2-#
Pal D, Chakrabarti P: beta-sheet propensity and its correlation with parameters based on conformation. Acta Crystallogr D: Biol Crystallogr 2000, 56(Pt 5):589–594.
Engel DE, DeGrado WF: Amino acid propensities are position-dependent throughout the length of alpha-helices. J Mol Biol 2004, 337(5):1195–1205. 10.1016/j.jmb.2004.02.004
Costantini S, Colonna G, Facchiano AM: Amino acid propensities for secondary structures are influenced by the protein structural class. Biochem Biophys Res Commun 2006, 342(2):441–451. 10.1016/j.bbrc.2006.01.159
Malkov SN, Zivkovic MV, Beljanski MV, Hall MB, Zaric SD: A reexamination of the propensities of amino acids towards a particular secondary structure: classification of amino acids based on their chemical structure. J Mol Model 2008, 14(8):769–775. 10.1007/s00894-008-0313-0
Bhattacharjee N, Biswas P: Position-specific propensities of amino acids in the beta-strand. BMC Struct Biol 2010, 10: 29. 10.1186/1472-6807-10-29
O'Neil KT, DeGrado WF: A thermodynamic scale for the helix-forming tendencies of the commonly occurring amino acids. Science 1990, 250(4981):646–651. 10.1126/science.2237415
Park SH, Shalongo W, Stellwagen E: Residue helix parameters obtained from dichroic analysis of peptides of defined sequence. Biochemistry 1993, 32(27):7048–7053. 10.1021/bi00078a033
Rohl CA, Chakrabartty A, Baldwin RL: Helix propagation and N-cap propensities of the amino acids measured in alanine-based peptides in 40 volume percent trifluoroethanol. Protein Sci 1996, 5(12):2623–2637. 10.1002/pro.5560051225
Yang J, Spek EJ, Gong Y, Zhou H, Kallenbach NR: The role of context on alpha-helix stabilization: host-guest analysis in a mixed background peptide model. Protein Sci 1997, 6(6):1264–1272. 10.1002/pro.5560060614
Lacroix E, Viguera AR, Serrano L: Elucidating the folding problem of alpha-helices: local motifs, long-range electrostatics, ionic-strength dependence and prediction of NMR parameters. J Mol Biol 1998, 284(1):173–191. 10.1006/jmbi.1998.2145
Kim CA, Berg JM: Thermodynamic beta-sheet propensities measured using a zinc-finger host peptide. Nature 1993, 362(6417):267–270. 10.1038/362267a0
Myers JK, Pace CN, Scholtz JM: Trifluoroethanol effects on helix propensity and electrostatic interactions in the helical peptide from ribonuclease T1. Protein Sci 1998, 7(2):383–388.
Myers JK, Pace CN, Scholtz JM: Helix propensities are identical in proteins and peptides. Biochemistry 1997, 36(36):10923–10929. 10.1021/bi9707180
Horovitz A, Matthews JM, Fersht AR: Alpha-helix stability in proteins. II. Factors that influence stability at an internal position. J Mol Biol 1992, 227(2):560–568. 10.1016/0022-2836(92)90907-2
Blaber M, Zhang XJ, Matthews BW: Structural basis of amino acid alpha helix propensity. Science 1993, 260(5114):1637–1640. 10.1126/science.8503008
Blaber M, Baase WA, Gassner N, Matthews BW: Alanine scanning mutagenesis of the alpha-helix 115–123 of phage T4 lysozyme: effects on structure, stability and the binding of solvent. J Mol Biol 1995, 246(2):317–330. 10.1006/jmbi.1994.0087
Minor DL, Kim PS: Measurement of the beta-sheet-forming propensities of amino acids. Nature 1994, 367(6464):660–663. 10.1038/367660a0
Minor DL, Kim PS: Context is a major determinant of beta-sheet propensity. Nature 1994, 371(6494):264–267. 10.1038/371264a0
Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22(12):2577–2637. 10.1002/bip.360221211
Warren GL, Petsko GA: Composition analysis of alpha-helices in thermophilic organisms. Protein Eng 1995, 8(9):905–913. 10.1093/protein/8.9.905
Chou PY, Fasman GD: Prediction of the secondary structure of proteins from their amino acid sequence. Adv Enzymol Relat Areas Mol Biol 1978, 47: 45–148.
Avbelj F, Baldwin RL: Role of backbone solvation in determining thermodynamic beta propensities of the amino acids. Proc Natl Acad Sci U S A 2002, 99(3):1309–1313. 10.1073/pnas.032665499
Wang L, O'Connell T, Tropsha A, Hermans J: Molecular simulations of beta-sheet twisting. J Mol Biol 1996, 262(2):283–293. 10.1006/jmbi.1996.0513
Chou KC, Nemethy G, Scheraga HA: Role of interchain interactions in the stabilization of the right-handed twist of beta-sheets. J Mol Biol 1983, 168(2):389–407. 10.1016/S0022-2836(83)80025-4
Koh E, Kim T, Cho HS: Mean curvature as a major determinant of beta-sheet propensity. Bioinformatics 2006, 22(3):297–302. 10.1093/bioinformatics/bti775
Ho BK, Curmi PM: Twist and shear in beta-sheets and beta-ribbons. J Mol Biol 2002, 317(2):291–308. 10.1006/jmbi.2001.5385
Hecht MH, Das A, Go A, Bradley LH, Wei Y: De novo proteins from designed combinatorial libraries. Protein Sci 2004, 13(7):1711–1723. 10.1110/ps.04690804
The authors declare that they have no competing interests.
KF conceived the project and wrote the manuscript. HT wrote the programs and performed all analyses. MI participated the discussion of the project and was involved the revision of the manuscript. All authors read and approved the final manuscript.