The Ramachandran plots of glycine and pre-proline

Background The Ramachandran plot is a fundamental tool in the analysis of protein structures. Of the 4 basic types of Ramachandran plots, the interactions that determine the generic and proline Ramachandran plots are well understood. The interactions of the glycine and pre-proline Ramachandran plots are not. Results In glycine, the ψ angle is typically clustered at ψ = 180° and ψ = 0°. We show that these clusters correspond to conformations where either the Ni+1 or O atom is sandwiched between the two Hα atoms of glycine. We show that the shape of the 5 distinct regions of density (the α, αL, βS, βP and βPR regions) can be reproduced with electrostatic dipole-dipole interactions. In pre-proline, we analyse the origin of the ζ region of the Ramachandran plot, a region unique to pre-proline. We show that it is stabilized by a COi-1···CδHδi+1 weak hydrogen bond. This is analogous to the COi-1···NHi+1 hydrogen bond that stabilizes the γ region in the generic Ramachandran plot. Conclusion We have identified the specific interactions that affect the backbone of glycine and pre-proline. Knowledge of these interactions will improve current force-fields, and help understand structural motifs containing these residues.


Background
The Ramachandran plot [1] is the 2d plot of the φ-ψ torsion angles of the protein backbone. It provides a simple view of the conformation of a protein. The φ-ψ angles cluster into distinct regions in the Ramachandran plot where each region corresponds to a particular secondary structure. There are four basic types of Ramachandran plots, depending on the stereo-chemistry of the amino acid: generic (which refers to the 18 non-glycine non-proline amino acids), glycine, proline, and pre-proline (which refers to residues preceding a proline [2]). The generic and proline Ramachandran plots are now well understood [3] but the glycine and pre-proline Ramachandran plots are not.
The generic Ramachandran plot was first explained by Ramachandran and co-workers in terms of steric clashes [1]. This has become the standard explanation for the observed regions in the Ramachandran plot [4,5]. However, recent studies found significant discrepancies between the classic steric map and the Ramachandran plot of high-resolution protein structures [6][7][8][9]. These discrepancies have now been resolved. The first discrepancy is that the N···H i+1 and O i-1 ···C steric clashes in the classic steric map have no effect in the observed Ramachandran plot [3]. By removing these steric clashes, a better steric map can be constructed. The second discrepancy is that the Ramachandran plot cluster into distinct regions within the sterically-allowed regions of the Ramachandran plot [8,10]. These clusters have now been explained in terms of backbone dipole-dipole interactions [3,11,12].
The proline Ramachandran plot has been reproduced in a calculation [13]. The proline Ramachandran plot is severely restricted by the pyrrolidine ring, where the flexibility in the pyrrolidine ring couples to the backbone [14].
The observed glycine Ramachandran plot has a distinctive distribution ( Figure 1A) quite different to the generic Ramachandran plot. An early attempt to explain the observed Ramachandran plot in terms of a steric map of glycine [15] (Figure 2A) fails to account for the observed distribution. It does not explain the observed clustering at ψ = 180° and ψ = 0°, nor the clustering into 5 distinct regions [8]. Using a molecular-dynamics simulation of Ace-Gly-Nme [16], Hu and co-workers found that the glycine Ramachandran plot generated by standard forcefields reproduced the original steric map but not the observed Ramachandran plot. They calculated a somewhat better result with a quantum-mechanics/molecularmechanics model, which reproduced the observed clustering along ψ, but not the partitioning into the 5 clusters. In this study, we identify the specific interactions that define the observed glycine Ramachandran plot by studying the conformations of glycine in the structural database. We test these interactions with a simple model based on electrostatics and Lennard-Jones potentials.
Although the overall shape of the pre-proline Ramachandran plot ( Figure 1B) is well understood, there exists a region unique to pre-proline that remains unexplained. The basic shape of pre-proline was predicted by Flory using steric interactions [17]. This was later confirmed in a statistical analysis of the protein database [2]. However, the statistical analysis also revealed the existence of a little leg of density poking out below the β-region ( Figure 1B; purple in Figure 2C), which Karplus called the ζ region [10]. More recent calculations using standard molecular mechanics force-fields reproduced the energy surface of the original Flory calculation [13,18] but not the ζ region.
In this study, we focus on the physical origin of the ζ region.

A non-redundant PDB data-set
To extract the statistical distributions of the glycine and pre-proline Ramachandran plots, we chose a high-resolution subset of the PDB [19] provided by the Richardson lab [9] of 500 non-homologous proteins. These proteins have a resolution of better than 1.8 Å where all hydrogen atoms have been projected from the backbone and optimized in terms of packing. Following the Richardsons, we only consider atoms that have a B-factor of less than 30.

Regions in the glycine Ramachandran plot
Glycine is fundamentally different to the other amino acids in that it lacks a sidechain. In particular, glycine does not have the C β atom, which induces many steric clashes in the generic Ramachandran plot. We call the hydrogen atom that is shared with the other amino acids, the H α1 atom. We call the hydrogen atom that replaces the C β atom, the H α2 atom. The absence of the C β atom allows the glycine Ramachandran plot to run over the borders at -180° and 180° ( Figure 1A).
The observed glycine map has 5 regions of density [8]. In order to display the observed density in one continuous region, we shift the coordinates from φ-ψ to φ'-ψ' where φ': 0° < φ' < 360°, and ψ': -90° < ψ' < 270°. With the shifted glycine Ramachandran plot ( Figure 3A), we can clearly identify the different regions. Along the horizontal strip ψ' 180°, there are three separate regions. One of these is an elongated version of the β P region of the generic Ramachandran plot. The β P region corresponds to the polyproline II structure, which forms an extended left-handed helix along the protein chain [20]. The β PR region is a reflection of the β P region where a sequence of glycine residues in the β PR conformation will form a right-handed helix. Finally, there is a region that corresponds to the β S region of the generic Ramachandran plot. This region corresponds to the extended conformation of residues in βsheets. However, the glycine β S region, centered on (φ', ψ') = (180°, 180°), is slightly displaced from the β S region of the generic Ramachandran plot. There is also the diagonal α and α L regions ( Figure 3A), which are associated with helices and turns [3]. Unlike the generic Ramachandran plot, the glycine α region is symmetric to the α L region [8,21]. In the generic Ramachandran plot, there is also a γ region corresponding to the hydrogen bonded γ-turn [12]. The glycine Ramachandran plot does not have any density in the γ region.

Steric interactions in glycine
The original steric map of glycine ( Figure 2A) [15] fails to explain large parts of the observed glycine Ramachandran plot ( Figure 1A). In the observed glycine Ramachandran ( Figure 3A), there are two large excluded horizontal strips at 50° < ψ' < 120° and -120° < ψ' < -50°, which are not excluded in the glycine steric map (Figure 2A). Conversely, the glycine steric map excludes a horizontal strip at -30° < ψ' < 30° (Figure 2A), but this region is populated in the observed plot ( Figure 1A). There are also diagonal steric boundaries in the observed glycine Ramachandran plot ( Figure 1A), whereas the steric map predicts vertical boundaries ( Figure 2A).
We carried out a re-evaluation of the steric map of glycine ( Figure 2B) by following the methodology of Ho and coworkers [3]. For each interaction in the glycine backbone, we consider the variation of the inter-atomic distance with respect to the φ'-ψ' angles. We compare the observed variation to the variation generated from a model that uses canonical backbone geometry. We divide these interactions into 3 categories: the φ' dependent, ψ' dependent and φ'-ψ' co-dependent distances.
For some of the interactions, the results for glycine are identical to that of the generic Ramachandran plot [3]. For brevity, we omit the analysis of these interactions and summarize the results. The excluded horizontal strip -30°< ψ' < 30°, due to the N···H i+1 steric interaction in the glycine steric map (Figure 2A), does not exist in the observed distribution ( Figure 1A). Similarly, the O i-1 ···C steric clash in the original glycine steric map, which excludes a vertical strip centered on φ' = 0° (Figure 2A), does not exist in the observed distribution ( Figure 1A). We ignore the effect of the N···H i+1 and O i-1 ···C steric clashes. The diagonal boundaries of the observed distribution are defined by the φ'-ψ' co-dependent steric interactions O i-1 ···O and O i-1 ···N i+1 . In Figure 3A, we show the fit of these steric interactions to the data.
Backbone conformations of glycine and pre-proline Here, we analyze the most distinctive feature of the glycine Ramachandran plot -the tendency for ψ' to cluster near 180° and 0°. We focus on the ψ'-dependent interactions. For each interaction, we first calculate the model curve of the corresponding inter-atomic distance as a function of ψ' (see Methods). We then compare the   Figure 3B) to the curve. If a hard-sphere repulsion restricts ψ', then, in regions of ψ' where the model curve is below the van der Waals (VDW) diameter (horizontal dashed line in Figure  3B), the ψ' frequency distribution should drop correspondingly.
The observed ψ' dependence in glycine is due to the H α1 ···O, H α2 ···O, H α1 ···N i+1 and H α2 ···N i+1 steric clashes. A simple interpretation is that the ψ' dependence in glycine arise from conformations that place either the N i+1 or O atom between the two H α atoms ( Figure 4A). The observed limits in the distributions have been drawn in Figure 3A as horizontal lines.

Dipole-dipole interactions in glycine
The revised glycine steric map does not explain the diagonal shape of the α, α L , β P , β PR and β S regions. In the generic Ramachandran plot, it was found that the diagonal shape of regions could be reproduced using electrostatic dipoledipole interactions [3] but only when the dipole-dipole interactions were considered individually. The overall electrostatic interaction does not reproduce the observed Ramachandran plot [23]. Here, we use the same approach of treating individual electrostatic dipole-dipole interactions along the glycine backbone.
We calculate the energy map of φ-ψ for the 4 dipole-dipole interactions in the glycine backbone interaction: CO i-1 ···CO, NH···NH i+1 , CO···NH and CO i-1 ···NH i+1 ( Figure 5C-F). The electrostatic interactions are calculated with the Lennard-Jones potentials of the steric clashes identified in the section above. We find that the shapes of the different regions of the glycine Ramachandran plot ( Figure 3A) are reproduced ( Figure 5). The CO···NH interaction produces the diagonal α L , α and β S region (Figure 5E). The NH···NH i+1 interaction also produces a diagonal α L and α region ( Figure 5D). The α region is symmetric to the α L region. The CO i-1 ···CO interaction produces minima corresponding to the β P and β PR regions ( Figure 5C).
In the original glycine steric map (Figure 2A), the region near (φ, ψ) = (-180°, 180°) is forbidden due to a steric clash between O and H. Yet glycine has density in this region in the observed Ramachandran plot ( Figure 3A). This can also be seen in the frequency distribution of d(O···H) (Figure 3C), where there is a peak at d(O···H) ~ 2.4 Å. At this peak, the O and H atoms are in contact, as the VDW diameter is 2.5 Å. Thus, in the β S region of glycine, the favorable CO···HN dipole-dipole interaction overcomes the steric repulsion of the O and H atoms ( Figure 5E).

The pre-proline Ramachandran plot
Schimmel and Flory argued in 1968 that pre-prolineamino acids preceding proline -has a particularly Stick figure representation of glycine and pre-proline restricted Ramchandran plot, compared to the generic Ramachandran plot [17]. This was finally observed in the protein database by MacArthur and Thornton ( Figure 1B) [2].
There are three main differences between the pre-proline Ramachandran plot and the generic Ramachandran plot. In the pre-proline Ramachandran plot, there is a large excluded horizontal strip at -40° < ψ < 50°, which restricts α L and α regions. The α L region is shifted up higher. These two features were reproduced in the Schimmel-Flory calculation [17] and subsequent calculations [13,18]. The third feature is a little leg of density poking out below the β-region ( Figure 1B; purple in Figure 2C). Karplus called this the ζ region [10], which is unique to pre-proline.
Previous calculations [2,17,18] did not focus on the individual interactions, and did not account for the ζ region. Here, we identify the exact steric clashes that determine the pre-proline Ramachandran plot. We will then analyse the interactions responsible for the ζ region.

Steric interactions in the pre-proline backbone
In pre-proline, instead of an interaction with the N H atom in the succeeding generic amino acid, the pre-proline interacts with a CH 2 group of the succeeding proline (Figure 1B). The CH 2 group exerts a much larger steric effect on the pre-proline Ramachandran plot. MacArthur and Thornton [2] suggested that the dominant effect is due to the N···C δ i+1 and C β ···C δ i+1 steric clashes. Here we can analyse the efficacy of each clash by analysing the statistical distributions directly.

Dipole-dipole interactions in glycine
We consider the φ-ψ co-dependent interactions that involve the C δ , H δ1 and H δ2 atoms of the succeeding proline ( Figure 1B). For each interaction, we generate the contour plot in φ-ψ of the VDW diameter distance. By comparing the contour plot to the observed density in the pre-proline Ramachandran plot, we identify the interactions that induce the best match in the boundaries ( Figure  6A, the interactions are identified in Figure 2C). We found that the chunk taken out of the bottom-left β-region of the observed density is due to the O i-1 ···C δ i+1 steric clash. Another restriction on the α L and α regions is due to the H···C δ i+1 steric clash.
We next consider the ψ dependent interactions. In the preproline ψ frequency distribution, we found three distinct peaks (bottom Figure 6B). The left-most peak at ψ ~ -50°c orresponds to the α region of pre-proline. We focus on the two peaks in the β-region 50° < ψ < 180° The larger peak centred on ψ ~ 150° corresponds to the β S region of the generic Ramachandran plot. In the generic Ramachandran plot, this β S region is bounded by the C β ···O and C β ···N i+1 steric clashes. In pre-proline, the smaller peak centred on ψ ~ 70° corresponds to the ζ region and occurs in a region that would be excluded by the C β ···O steric clash. Instead the smaller peak is bounded from below by the N···C δ i+1 steric clash. This can be seen by comparing the ψ distribution to the model curve of N···C δ i+1 vs. ψ (middle of Figure 6B).
Using parameters from CHARMM22, we calculate the Lennard-Jones 12-6 potential due to the revised steric clashes ( Figure 7A). Lennard-Jones potentials cannot account for the ζ region.
Interactions that stabilize the pre-proline ζ region As the ζ region (purple in Figure 2B) brings the C β ···O interaction into steric conflict, there must be a compensating interaction that stabilizes the ζ region. What is this interaction? To understand this interaction, we consider an analogy with the γ region in the generic Ramachandran plot. In the γ region, a distorted CO i-1 ···HN i+1 hydrogen bond is formed, which brings the H i+1 atom into contact with the O i-1 atom. Similarly, in the ζ region of pre-proline, the O i-1 atom of pre-proline is in contact with the H δ1 and H δ2 atoms (see Figure 4B; Table  1), suggesting that the CO i-1 group interacts with the C δ H δ i+1 group of the succeeding proline.
Can the C δ H δ i+1 group interact with CO i-1 ? Such an interaction would fall under the class of the CH···O weak hydrogen bond, a well-documented interaction in proteins [24]. Studies of the CH···O weak hydrogen bond Pre-proline parameters use a distance criteria of d(H···O) < 2.8 Å [25][26][27]. There is little angular dependence found in the CH···O bond around the H atom where an angle criteria of ∠OHX > 90° is often used. This is much more permissive than the geometry of the canonical hydrogen bond. In Table 1, we list the hydrogen bond parameters of the CO i-1 ···C δ H δ i+1 interaction in the ζ region. As proline can take on two different major conformations, the UP and DOWN pucker, measurements of the geometry of the CO i-1 ···C δ H δ i+1 interaction must also be divided in terms of the UP and DOWN pucker. The observed geometry of the CO i-1 ···C δ H δ i+1 geometry satisfies the geometric criteria of the weak hydrogen bond ( Table 1).
As the CO i-1 ···C δ H δ i+1 weak hydrogen bond is a close contact, we need to model the interaction in order to understand its dependence on the φ-ψ angles. For the modelling, we consider strategies that have been used for the analogous CO i-1 ···HN i+1 hydrogen bond. The CO i-Energy plots in pre-proline as a function of φ-ψ  simpler approach, which modelled the hydrogen bond with electrostatic dipole-dipole interactions, also find a minimum in the γ region [23].
Here, we model the CO i-1 ···C δ H δ i+1 weak hydrogen bond as an electrostatic dipole-dipole interaction (see Methods). How do we model the C δ H δ i+1 group as an electrostatic dipole? Bhattacharyya and Chakrabarti [28] found that, of the CH groups in proline, the C δ H δ group forms the most CH···O hydrogen bonds. The C δ atom sits next to the electron-withdrawing N atom and thus, is more acidic than the other C atoms. Consequently, we place a small negative partial charge on the C δ atom. In our model, we find an energy minimum in the ζ region for both the UP pucker ( Figure 7B) and the DOWN pucker ( Figure 7C). We conclude that the CO i-1 ···C δ i+1 H δ1 i+1 weak hydrogen bond stabilizes the ζ region in pre-proline.

Conclusion
We have identified the interactions that determine the high-resolution Ramachandran plots of glycine and preproline.
For glycine, the Ramachandran plot of the glycine backbone modeled by standard force-fields fails to reproduce the observed Ramachandran plot [16]. Instead the modeled Ramachandran plot resembles the original steric map of glycine [1]. The failure of these calculations arises from the inadequate treatment of the H α atoms. We have identified a revised set of steric interactions that can reproduce the observed glycine Ramachandran plot. These are O i-Previous calculations of the pre-proline Ramachandran reproduced most of the observed pre-proline Ramachandran plot with the notable exception of the ζ region. Previous studies did not identify the specific steric interactions involved in defining the pre-proline Ramachandran plot. Here, we have identified them: N···C δ i+1 , O i-1 ···C δ i+1 and H···C δ i+1 ( Figure 2C). We have also identified the physical mechanism that stabilizes the ζ region (purple in Figure 2C). It is the CO i-1 ···C δ H δ i+1 weak hydrogen bond, which is directly analogous to the CO i-1 ···NH i+1 hydrogen bond that stabilizes γ-turns in the generic amino acid.
Combined with the analysis of the generic Ramachandran plot [3] and the proline Ramachandran plot [13,14], we have identified the interactions that define the high-resolution Ramachandran plots of all 20 amino acids. Although our analysis uses simple modeling techniques, the interactions identified here suggest concrete ways to resolve the inadequacies in current force-fields.

Local conformations of the φ-ψ map
To calculate the model curves of the inter-atomic distances as a function of the φ-ψ angles, we modeled the glycine and pre-proline protein fragments shown in Figure 1. Covalent bond lengths and angles were fixed to CHARMM22 values [22]. Only the φ-ψ angles vary. The φψ angles of the central residue were incremented in 5°s teps and the corresponding distance parameters and energies of the inter-atomic interactions were calculated. We used 2 types of interactions, partial charge electrostatics, E elec = 331·(q 1 ·q 2 ) kcal·mol -1 , and Lennard-Jones 12-6 potentials, E LJ = ε (σ/d) 12 -2 (σ/d) 6 ) kcal·mol -1 , where the parameters were taken from CHARMM22 [22]. There are no parameters in CHARMM22 for the H δ and C δ atoms. As such, we have assigned a partial charge of -0.20 to C δ and 0.10 to H δ1 and H δ2 . These are not based on any detailed arguments but are merely used to estimate the effect that such charges would have.