Mirrors in the PDB: left-handed α-turns guide design with D-amino acids

Background Incorporating variable amino acid stereochemistry in molecular design has the potential to improve existing protein stability and create new topologies inaccessible to homochiral molecules. The Protein Data Bank has been a reliable, rich source of information on molecular interactions and their role in protein stability and structure. D-amino acids rarely occur naturally, making it difficult to infer general rules for how they would be tolerated in proteins through an analysis of existing protein structures. However, protein elements containing short left-handed turns and helices turn out to contain useful information. Molecular mechanisms used in proteins to stabilize left-handed elements by L-amino acids are structurally enantiomeric to potential synthetic strategies for stabilizing right-handed elements with D-amino acids. Results Propensities for amino acids to occur in contiguous αL helices correlate with published thermodynamic scales for incorporation of D-amino acids into αR helices. Two backbone rules for terminating a left-handed helix are found: an αR conformation is disfavored at the amino terminus, and a βR conformation is disfavored at the carboxy terminus. Helix capping sidechain-backbone interactions are found which are unique to αL helices including an elevated propensity for L-Asn, and L-Thr at the amino terminus and L-Gln, L-Thr and L-Ser at the carboxy terminus. Conclusion By examining left-handed α-turns containing L-amino acids, new interaction motifs for incorporating D-amino acids into right-handed α-helices are identified. These will provide a basis for de novo design of novel heterochiral protein folds.


Background
Solid phase chemical synthesis allows for the incorporation of non-natural amino acids into polypeptides [1]. The field has developed rapidly, permitting the construction of synthetic, protein-sized molecules. This has allowed protein chemists to explore the physical and biological effects of varying amino acid stereochemistry. A dramatic example was the chemical synthesis of the ninety-nine amino acid long HIV-1 protease from both L and D-amino acids [2]. The resulting enantiomeric molecules were both well-folded and specifically active on a protease substrate of the same respective amino acid chirality as the enzyme. In this study, we use the Protein Data Base (PDB) as a source of structural information for specific D-amino acid sidechain interactions with α-helical backbones.
Much of the work on the role of variable stereochemistry on structure and stability has been conducted on short peptides [3,4]. This work has been motivated by natural examples of polypeptides that combine L and D amino acids. The antimicrobial toxin, gramicidin, is a well studied example of such a molecule, containing alternating L and D amino acids. This allows it to adopt the β-helix, a novel secondary structure composed of alternating positions in the β L and β R conformation [5,6]. The β-helix has been used as the foundation for novel cyclic peptide folds [7] and peptide nanotubes with ion channel activity and antimicrobial properties [8][9][10][11]. Other microbial peptides such as tolaasin use D-amino acids to enforce sharp bends in an α-helical domain [12]. Methods are being developed for incorporating L and D amino acids in computational de novo protein design [13][14][15][16].
Another practical application is the development of thermostable proteins that incorporate D-amino acids. Amino acids in proteins are rarely found in backbone conformations with positive φ and ψ angles at the α L region of Ramachandran space [17]. This paucity of α L residues is primarily due to unfavorable interactions between the sidechain and its backbone carbonyl and that of the preceding residue. The energetic cost of this steric clash has been estimated at around 1 kcal/mole by replacing L-Ala with D-Ala in a model α R -helical peptide [18,19]. The only amino acid that does not contribute this type of steric clash is Gly, which lacks a sidechain. Consequently, α L positions in proteins are primarily occupied by Gly [20,21]. This feature of glycine has been applied to the thermostabilization of a bacterial formate dehydrogenase which has five non-glycine amino acids throughout the protein in the α L conformation. Replacing these amino acids with Gly increases the activity at otherwise inactivating temperatures [22].
The backbone amide of glycine makes hydrogen bonding with exposed carbonyls at the C-terminal end of a helix [23][24][25][26]. This allows the chain to maintain a network of stabilizing interactions while terminating the helix and changing the direction of the chain. Other amino acids besides glycine are sometimes found in such positions, but are rare due to steric constraints already mentioned. Small polar amino acids are commonly found at the Nterminus of an α R helix, making sidechain hydrogen bonds to exposed amides of the backbone [27][28][29][30][31][32]. Together, these interactions are called 'helix caps'. D-amino acids can function as C-terminal helix caps. While substitution of α L positions with Gly may remove unfavorable contacts, the entropic cost of fixing glycine in a given conformation can mitigate energetic benefits gained. D-amino acids, which favor the α L conformation, have been substituted for Gly, sometimes resulting in increased protein stability [33][34][35]. Observed folding free energy changes have ranged from zero to over two kcals/ mol. In a monomeric helical peptide, adding D-Ala to the C-terminus of a helix resulted in no significant change in stability whereas D-Arg increased stability by approximately one kcal/mole, presumably due to stabilization of the helix macrodipole [36]. These varying results indicate that the roles of sidechain identity and stereochemistry in protein stability are still an open problem.
While much has been learned about standard capping interactions from the analysis of high-resolution protein structures in the PDB, the number of proteins containing D-amino acids is very low. Approximately 150 entries in the PDB contain D-amino acids that are not artifactual, and most of these are shorter than twenty amino acids [32]. A handful of these contain D-amino acids in helix C-capping contexts [1,34]. A number of designed heterochiral peptides are in the Cambridge Structural Database (CSD) of small molecules, but these are of limited use for the unbiased discovery of novel capping interactions.
One possible source of information is a set of small, contiguous left-handed turns and helices in proteins. These are rare due to the unfavorable steric interactions required to place L-amino acids in the α L conformation. For cases where such structures do exist, they often play key structural and functional roles [37]. Stabilizing interactions identified in a study of naturally occurring left-handed structures would be perpetrated by L-amino acids. Hence, the value to protein engineering and design is to realize that the structural enantiomer of such interactions would involve right-handed structures stabilized with D-amino acids.
This report outlines the search of a non-redundant subset of the PDB for left-handed turns and short α L helices. The total fraction of amino acids in the α L conformation is 4%, over half of which is attributed to glycine [20]. Despite this, a small set of left handed structures are identified for structural analysis. The intrinsic α L -helical preferences of most amino acids correlate with thermodynamic scales for inserting D-amino acids into α R helices. Furthermore, several N-and C-terminal capping motifs unique to lefthanded helices are described. These are tantalizing candidates for novel D-amino acid capping motifs of α R -helices. Implications for protein stabilization and heterochiral protein design are discussed.

Backbone Geometry in Left-handed Turns and Helices
A non-redundant subset of structures in the PDB was searched for three or more contiguous residues in an α L conformation. Seventy-two three-residue turns, ten fourresidue helices and two five-residue helices were found (see Additional Files 1, Table S1). In order to keep nomen-clature consistent with previous studies [25], the relative positions of amino acids within these turns and helices are described as follows: the Ncap residue is the first amino acid in a contiguous left-handed conformation; the Ccap residue is the last amino acid in a contiguous lefthanded conformation. The remaining positions are described in their position relative to the Ncap or Ccap: In three residue turns, N1 = C1.
Left-handed helices are understandably rare in proteins due to the inherent conformational preferences dictated by backbone stereochemistry. Less than one percent of residues are found in contiguous left-handed turns or helices of length three or greater. In the three-residue turns, the backbone angles progressively shift from being centered around the α L (φ, ψ ≈ 60°,40°) to the 3 10-L (φ, ψ ≈ 70°,20°) ( Figure 1). We do not detect a similar trend in four-residue structures although the number of examples is much smaller. Presumably this is due to the accommodation of an i, i+3 hydrogen bond in three-residue turns.

Amino Acid Preferences in Left-handed Structures
Amino acid propensities at specific positions in the lefthanded turns were computed as described in equations E1-E3 (Methods). The results, reported in Table 1, range from 0.0 -very unfavorable, to 1.0 -neither favorable nor unfavorable to 7.0 -very favorable. Due to the low counts and the very high frequency of Gly and Asn, which account for over a third of all residues in the data set, the 95% confidence intervals on many of the amino acids at specific positions are very large. The absolute values must therefore be interpreted very cautiously, and in cases where a favorable or unfavorable interaction is indicated from sequence statistics, the corresponding structures are also analyzed, or in some cases modelled using idealized structures.
The highest propensities at the N1 -N3 positions belong to Gly and L-Asn. L-Asp is highly represented at the Ncap and N1 positions. These are also the three amino acids with the highest individual α L propensity in the database [20]. The preference of L-Asn (and L-Asp) for the α L has been suggested to result from favorable dipole-dipole interactions of sidechain and backbone carbonyls [38]. βbranched amino acids, L-Ile, L-Val and L-Thr are highly unfavorable. L-Pro is clearly not found in these structures due to the restriction of φ ≈ -60° by the cyclic sidechain.
Can propensities obtained from L-amino acids in α L turns provide insight into the thermodynamic effects of Damino acids on α R -helix folding? To investigate this, database derived propensities were compared with experimental stabilities from host-guest studies ( Table 2). Host-guest peptide systems have been used to quantify the helix stabilizing propensities of the various amino acids. This approach has been applied to both L-amino acids [39,40] and D-amino acids [41,42].
The influence of D-amino acid substitutions on the stability of an amphipathic, monomeric helix were studied by Krause and coworkers [43]. Comparing estimated statistical energies calculated as -ln(P) of combined propensities over the Ncap, N1/C1 and Ccap positions, to the Krause scale shows a reasonable correlation of approximately R = 0.58 ( Figure 2A). The most distant outliers from the fit are the aromatic amino acids, Phe, Tyr and Trp. If these are omitted, the correlation improves: R = 0.85 (Table 3).
This strong correlation between database and experimental values is surprising, given the comparison of three-residue turns to the much longer eighteen-residue α-helix used in the host-guest studies. In an a-helix, an amino acid sidechain will often interact with i-3 and i-4 positions, either directly through van der Waals packing or hydrogen bonding, or indirectly through shielding of solvent interactions. It is possible that the host-guest scale is domi- nated by local stereochemical effects, rather than interactions with nearby residues that could have a cooperative effect on folding. To test this, a different set of database propensities were calculated using amino acids in an α L conformation where preceding and following amino acids were not α L . In this case, the correlation with the Krause scale also improves (R = 0.79 Figure 2B). This suggests that the experimental D-scale is describing the propensities of amino acids to assume backbone φ and ψ angles relating to an α R conformation, rather than report-ing on steric interactions with i-3, i-4 positions in a helical context. Because monomeric helix folding-unfolding is not a two-state process [44,45], the amphipathic monomeric helix used may not reflect thermodynamic contributions in a larger protein where helix folding is coupled with assembly of other structural elements.

Ramachandran plot of three-residue left-handed turns
If the stereochemically inverse comparison is done, computing database α R propensities within a helix and in isolation, and correlating them with L-amino acid substitutions in a model two-state helical coiled-coil system [39], we now find that propensities in the helix (R = 0.73) correlate better with experimental values than those outside a helix (R = 0.42) (Figure 3). A similar result was observed for L-amino acids in right-handed helices by O'Neil et. al [40], who found a reasonable correlation (R = 0.75) between an experimental scale and propensities estimated from the PDB [46].
In a direct comparison of the two experimental scales, the outliers are the β-branched amino acids (Ile, Val and Thr) and the aromatic amino acids (Phe, Tyr and Trp). When we remove these from the regression fit, the correlation improves from -0.41 to -0.88 (Figure 4). The six aromatic and β-branched amino acids are also the most highly ranked in several β-sheet propensity scales [47]. Thus, these particular residues are relatively unfavorable in a helix, regardless of its handedness because they favor the β L or β R region of conformational space, depending on their chirality. Less clear is the reason for the inverse correlation between α L and α R states of the remaining fourteen amino acids. It is possible that an L-amino acid that has both a low α R propensity and β R propensity will be more likely to occupy α L . Stabilization of one handedness is reflected in low occupancy of the other.
The amino acid with the lowest stability in α R helices is L-His [40]. Conversely D-His is one of the least destabilizing amino acids in α R helices [43]. L-His is observed with elevated frequency at the N1 position in this study. Assuming the neutral imidazole tautomer where Nδ 1 is deprotonated, histidine is the only other amino acid beside Asn and Asp that presents a lone pair separated by three bonds from the Cα carbon on the backbone. If the dipole-dipole interaction between backbone and sidechain carbonyls suggested for L-Asp and L-Asn [38] can stabilize the α L conformation, one may speculate that a similar mechanism may be at work in the case of the imidazole Nδ 1 lone pair and its dipolar interaction with the backbone carbonyl.

Backbone Conformations for Positions Flanking a Lefthanded Turn
As defined in this study, the N' and C' positions are the amino acids directly preceding and following the lefthanded turn. Most of these fall in the α R and β R /polyproline II (PP2) regions of Ramachandran space ( Figure 5). Certain regions are sparsely occupied. These empty regions differ in the context of the N and C-termini. At N', residues are in the γ R (ψ > 0°) rather than the α R region (φ ≈ -65°, ψ ≈ -40°). At the C', residues are rarely found in standard β R conformations and instead primarily occupy the PP2 region.
These unoccupied areas can be used to develop rules of conformational exclusion surrounding an α L helix. Similar rules have been developed in studies by Fitzkee and Rose, who found that an α R helix is not followed by certain regions of β [48,49]. In the right-handed helix, steric clashes between the C' carbonyl and that of a neighboring carbonyl from the C2 position of the helix prevent α R being followed directly by a β-strand [48]. In a left-handed helix, modeling suggests a similar constraint is enforced by a repulsive interaction between the C' and C3 carbonyl   Figure 6). This prevents the β R -strand conformation from following an α L helix. Placing the C' amino acid in α R or poly-proline II (PPII) relieves this steric clash. For C-capping residues in the α R conformation, a Schellmanlike capping interaction is possible, allowing for hydrogen bonds from the C' and C" backbone amides to the C3 carbonyl.
At N', the α R conformation is disallowed. When a helix is modeled with an α R residue followed by an α L helix, no strong steric clash is observed ( Figure 6). However, the Cβ sidechain methyl of the N' residue prevents solvation of the N2 backbone amide. Desolvation of polar groups are energetically unfavorable when no intrinsic hydrogen bond within the protein replaces the interaction [50]. This Comparison of statistical propensities to thermodynamic scales for D-amino acids Figure 2 Comparison of statistical propensities to thermodynamic scales for D-amino acids. (A) log propensities for the twenty amino acids to occur in left-handed turns are plotted relative to the D-amino acid host-guest studies of Krause et. al. [43]. Line represents the best fit using linear regression. (B) log propensities were calculated for α L amino acids where preceding and following residues were not α L .
Comparison of statistical propensities to thermodynamic scales for L-amino acids Figure 3 Comparison of statistical propensities to thermodynamic scales for L-amino acids. (A) log propensities for the twenty amino acids to occur in right-handed turns are plotted relative to thermodynamic scales from L-amino acid host-guest [39]. Line represents the best fit using linear regression. (B) log propensities were calculated for α R amino acids where preceding and following residues were not α R .
desolvation penalty can be partially relieved by placing the N' in either the β R , PPII or γ R conformation. Thus, two conformational rules unique to flanking positions of lefthanded helices emerge: α R -(α L ) n and (α L ) n -β R are disfavored where n ≥ 3. Similar rules would apply to the structural enantiomer where D-amino acids precede or follow an α R helix: α L -(α R ) n and (α R ) n -β L would be disfavored for n ≥ 3.

Sidechain-Backbone Interactions at the N-terminus
If the N' residue is in the β R conformation as pictured in Figure 6B, unfavorable desolvation of the N2 amide is avoided, but the N' sidechain projects away from the top of the helix, preventing any specific polar capping interactions with the N-terminal amides. Such capping interactions are prevalent in α R helices which often feature L-Thr, L-Asn or L-Asp at the N-terminus making sidechain oxygen acceptor hydrogen bonds to exposed backbone amides [27,28]. To accommodate this, the capping residue is usually in the β conformation. A similar propensity for small polar amino acids at the N' is observed in our database of left-handed turns. However, for these to facilitate sidechain-backbone capping hydrogen bonds while avoiding desolvation of N2, the residue must be in the γ R (ψ > 0°) conformation. Although both α R and α L N-terminal capping interactions involve small polar amino acids, the interactions presented here are structurally distinct from those previously identified L-Asn and L-Thr show an elevated propensity to occur at the N' position. N' residues in the γ R conformation are enriched for small, polar amino acids. L-Asn and L-Thr in the γ R conformation adopt rotamers that allow hydrogen bonding between the sidechain oxygen and the N1 and N2 amides ( Figure 7A, B). The χ 2 rotamer angle places the sidechain oxygen rather than nitrogen over the terminus, consistent with L-Asn functioning as a hydrogen bond acceptor. In this configuration, the sidechain oxygen also forms a hydrogen bond with its own backbone amide, contributing further to the stability of this motif.
In a designed turn-helix peptide, a D-Asp was utilized to contribute similar interactions at the N-terminus of an α R helix ( Figure 7C) [51]. These N-terminal interactions are a subset of a larger class of motifs in proteins and peptides described by Milner-White and colleagues as peptide 'nests' [30,52]. These nests often serve as anion binding sites, complexing both sidechains and prosthetic groups such as phosphates and iron-sulfur clusters [53,54].

Sidechain-Backbone Interactions at the C-terminus
The majority of C' amino acids in our survey of three residue left-handed turns are found in the α R /γ R conformation. This facilitates formation of Schellman-like interaction between the C' amide and the carbonyl of the C2 position. In α R helices, the Schellman capping motif often involves glycine which readily adopts the α L conformation [55]. In α L helices, an α R cap is facilitated by the chirality of L-amino acids, avoiding the entropic cost associated with fixing the conformation of glycine. We looked for additional stabilization of these Schellman-like caps through sidechain-backbone interactions. The highest propensity at the C' is L-Gln which occurs 3.6-fold more often than random expectation. An analysis of structures with a C-terminal L-Gln shows a bivalent hydrogen bond to the C2 carbonyl from both the backbone and sidechain amide ( Figure 8). This effect is very specific for L-Gln and a similar propensity is not observed for L-Asn. L-Thr and L-Ser also make capping interactions at the C-terminus of left-handed helices. A similar bivalent hydrogen bond is accepted by the C2 and C3 carbonyls from the C' sidechain hydroxyl and backbone amide (Figure 9). L-Lys is also elevated at the C' position, suggesting stabilization of a helix macrodipole, although L-Arg does not have a high propensity at this position.
It is interesting to compare our observations with studies on the energetics of C-terminus helix capping through chemical synthesis of proteins with D-amino acids. acids are less stable by nearly 1 kcal/mol. D-Val is less stable than D-Thr by approximately 0.5 kcals/mol. Although the study states that these energy differences relative to glycine correlate with changes in solvation of the carboxy terminus, it is possible that specific interactions such as the ones we observe are also contributing to capping energetics. This would explain the increased stability of D-Thr over D-Val, which has the facility to form Ccap hydrogen bonds in the α L conformation to an α R helix. The similarity in energetics of D-Gln and D-Ala show that in ubiquitin, D-Gln sidechain capping interactions are not playing a significant role in protein stabilization. In high resolution structures of the D-Gln 35 mutant, the sidechain does not make the same capping interaction we observe, but rather is involved in quaternary contacts with other ubiquitins in the asymmetric unit [1,34]. With three rotameric degrees of freedom, Gln has to pay a higher entropic cost to form the specific hydrogen bond to the C2 carbonyl. This may cancel the energy gained by forming a capping hydrogen bond. We have recently shown that Gly to D-Gln mutations can significantly increase the stability relative to the D-Ala substitution of the Trp-Cage. (manuscript in preparation).

Stabilization Through Tertiary Interactions
Two examples of five-residue left-handed helices are in our database. Alanine racemase is an enzyme which catalyzes the conversion of L-Ala to D-Ala and plays an important role in bacterial cell wall synthesis. Residues 40-44 in alanine racemase from B. stearothermophilus (PDB 1BD0) form a contiguous α L helix [56]. This feature was originally noticed by Kleywegt using a spatial motif search [57]. Strong sequence conservation maintains this structural motif across several other bacterial species ( Figure  10). L-Lys 39 and L-Tyr 43 are part of the enzyme active site and are functionally conserved positions [58]. L-His 45 serves as a Schellman-like C' in the α R conformation with an additional hydrogen bond between the imidazole Nδ 1 and the carbonyl of the C1 position. An additional stabilizing hydrogen bond is provided by L-Asp/L-Asn 41 to the N-terminus of an adjacent right-handed helix. This interaction serves both to stabilize the α L helix and specify the helix-bend-helix conformation. Bent motifs with adjacent helical structures of opposite handedness and chirality were found in previous simulations of heterochiral secondary structures in poly-alanine [15] and in the molecular structure of tolaasin [12]. A specific hydrogen Backbone conformational preferences for positions flanking an α L turn bond such as the one provided by residue 41 could be used in the design of de novo heterochiral helical bends.
The other five residue motif achieves stability through disulfides. Three of the repeat domains in reelin, a protein involved in neurological development, have been shown to contain a five residue 3 10-L helix [59]. L-Cys at the N" and N2 position participate in disulfides with an adjacent β-hairpin ( Figure 11). Although L-His is found at the capping position in all three reelin repeat domain structures, it does not participate in sidechain-backbone capping interactions as was observed in alanine racemase. This structure provides a useful exemplar upon which to design novel α L -helix-β-hairpin folds.

Availability and requirements
The PERL script used to identify α L regions is included as Additional Files 2: findalphaleft.pl

Conclusion
To make the rational engineering and design of heterochiral proteins tractable, the role of amino acid stereochemistry in stability and structure needs to be better understood. This study presents potential rules based on insights gained from the analysis of natural proteins.

Figure 6
Modeling favorable and unfavorable helix flanking conforma-tions Using left-handed turns and helices in the database of existing protein structures as a case study, we have found several candidates for motifs that could be applied to the thermostabilization of proteins by synthetic amino acids. As synthetic methods for building proteins continue to improve, the ability to construct larger molecules with mixtures of natural and synthetic amino acids becomes increasingly practical. Natural proteins can provide important insight into how designed proteins can take advantage of the increased chemical diversity made possible by synthetic methods.

Compiling a non-redundant set of PDB files
A list of non-redundant protein chains was assembled using PISCES http://dunbrack.fccc.edu/ [60,61]. Structures obtained through X-ray crystallography with a resolution greater than 2.5 Å and sequence homology less than 25% were included. The final database consisted of 3517 unique chains.
Searching for α L helices PERL scripts were constructed to search each file on the non redundant list for presence of α L helices of three residues or longer (see Additional Files 2). φ and ψ values were computed based on deposited backbone coordinates of the N, C, Cα and O atoms (see scripts for details). φ values between 35.0° and 95.0° and ψ values between 10.0°a nd 70.0° were classified as α L . Allowable ranges were settled on after starting with more generous ranges and narrowing the window until all structures showed i, i+3 and or i, i+4 hydrogen bonding (determined geometrically by checking the backbone N to O distance was less than or C' Gln capping of α L turns An α L -helix-turn-α R -helix in alanine racemase Figure 10 An α L -helix-turn-α R -helix in alanine racemase. Conserved interactions across multiple bacterial species include a histidine α L -helix C' and a tertiary Asn/Asp hydrogen bond to the N-terminus of the α R -helix. equal to 3.5Å). Initially, the search returned eighty-five α Lhelices and turns of which seventy-three were three residues long, ten were four residues long and two were five residues long.
In order to assess local structure quality, backbone B-factors were examined for the three-residue α-turns in our data set. Three turns in our data set with B-factors greater than one standard deviation from the mean were flagged for manual examination. WinCoot was used to visualize electron density maps based on structure factors deposited at the EDS. One structure for which there was poor electron density in the turn region was removed from the data set (see Additional Files 1: Figure S1).

Calculating amino acid propensities
Sequences of the three residue left-handed turns were analyzed to determine amino acid propensities at each position in the turn. The occurrence of an amino acid at each position was divided by occurrence in the PDB dataset to obtain normalized values.
For three-residue left-handed turns, propensities of the twenty amino acids were calculated for the N', Ncap, N1/ C1, Ccap and C' positions (Table 1). Propensities for each amino acid type aa i at position pos j were normalized to total occurrence in the database: The propensity for a particular amino acid to occur in the Ncap, N1 or Ccap position was compared to the overall frequency of that amino acid type in the α L conformation in any context. Overall frequencies were calculated using the same data set of proteins from which the left handed turns were selected.
The contribution of sampling error to the mean and 95% confidence intervals was estimated using a Wilson score interval for the counts in helices [62]. Corrected values are reported in Table 1.