re-TAMD: exploring interactions between H3 peptide and YEATS domain using enhanced sampling

Background Analysis of preferred binding regions of a ligand on a protein is important for detecting cryptic binding pockets and improving the ligand selectivity. Result The enhanced sampling approach TAMD has been adapted to allow a ligand to unbind from its native binding site and explore the protein surface. This so-called re-TAMD procedure was then used to explore the interaction between the N terminal peptide of histone H3 and the YEATS domain. Depending on the length of the peptide, several regions of the protein surface were explored. The peptide conformations sampled during the re-TAMD correspond to peptide free diffusion around the protein surface. Conclusions The re-TAMD approach permitted to get information on the relative influence of different regions of the N terminal peptide of H3 on the interaction between H3 and YEATS.


Background
Docking of small ligands, chemical compounds or peptides, on proteins is quite an important problem encountered in various fields of structural bioinformatics, from drug design studies [1] to analysis of functional networks within the cell [2].
The efficiency of the docking depends on two ingredients: (i) the availability of a reliable score to select the ligand poses corresponding to the largest experimental affinity, (ii) the ability to efficiently sample the relative positions of the ligand within a given pocket or on the protein surface. Several possibilities exist for calculating scores: absolute free energy of interaction [3], QM/MM (quantum mechanics/molecular dynamics) based approaches [4] or rescoring of obtained poses [5].
Concerning the point (ii), one should notice that most of the past virtual screening approaches have focused on the docking of the ligand on a pre-defined pocket [6][7][8]. Nevertheless, several methods [9][10][11][12][13][14][15][16] were then developed to *Correspondence: therese.malliavin@pasteur.fr 1 Unité de Bioinformatique Structurale, UMR CNRS 3528 and Institut Pasteur, Paris, France Full list of author information is available at the end of the article use molecular dynamics simulations to explore the protein surface without being limited to a given spot. The development of such approaches is justified by the importance of detecting new pockets on protein surfaces: these pockets have been used for lead optimization [17,18] or, to overcome resistance problems [19,20].
In the context of molecular dynamics simulations, two main types of exploration approaches have been proposed. Firstly, molecular dynamics trajectories are recorded [9][10][11][12] on the studied protein solvated with a mixture of water and various polar and apolar small compounds representing different types of interactions. These trajectories are then analyzed to determine the most populated positions of the compounds on the protein surface, allowing to predict surface hot-spots [11,21] that should then be targeted by virtual screening studies.
Secondly, other methods have taken advantage of the growing efficiency of enhanced sampling approaches, such as metadynamics [22]. Two types of methods have been proposed: the funnel metadynamics for exploring the conformations of a ligand on a pocket, loosely-defined by a funnel [13,14], and metadynamics approaches that allow the exploration of the receptor surface by the ligand [15,16]. Both methods are effective, and permit converged estimation of the interaction free energy, but with a large computational cost.
We propose here an approach, re-TAMD (reconnaissance-TAMD) for exploring the protein surface based on the temperature-accelerated molecular dynamics (TAMD) [23,24], an enhanced sampling approach which proved its efficiency on various biological systems [25][26][27][28][29][30]. Similarly to TAMD, re-TAMD requires less computational power than metadynamics-derived approaches. Although the re-TAMD approach proposed here does not provide a formal picture of free-energy surface, it has the advantage of being specific to the studied system, unlike the methods based on fragment probes [9][10][11][12]16].
We applied this approach to the study of interactions involving post translational modifications (PTMs), which frequently occur in proteins for regulatory purposes. PTMs play an important role in histones [31], proteins which are wrapped by base pairs of DNA forming the nucleosome complex [32] and are involved in gene expression [33][34][35] and chromatin dynamics [36,37].
Recent studies have shown that lysines modified by acylations -a class of PTMs -interact with the YEATS domain (named after the Yaf9, ENL, AF9, and Sas5 family), a strongly conserved domain found in several epigenetics reader proteins across many species [38][39][40][41][42]. The study of interactions between PTMs and epigenetic readers is largely motivated by findings that show links between readers and cancer cell proliferation [43][44][45]. In the present work, we applied the re-TAMD approach to the study of the interaction between AF9's YEATS domain and the H3 histone N-t tail's acetylated lysine 18 (acK18). Using enhanced sampling, we looked at the influence of the peptide length on the interaction with the protein.

Studied systems, collective variables and trajectories
Several systems were prepared using selected atoms from the first model of the NMR (Nuclear Magnetic Resonance) structure (PDB entry: 2NDF) [38] (Fig. 1). Each molecule or complex was solvated in a water box using the Amber 14's LEaP program [46] and the Amber ff03 force field [47] along with a specific parameter file to account for residue acK18 (called ALY) [48]. The systems were then minimized, thermalized, and equilibrated using NAMD 2.9b2 [49].
The following systems were studied: (i) the complexes between the protein YEATS and the peptides 12-24 (pt13), 15-21 (pt7) and 17-19 (pt3) from the N terminal tail in histone H3, (ii) the YEATS domain in the absence of the peptides and (iii) the isolated peptides pt13, pt7 and pt3. The studied systems along with the launched simulations are given in Table 1.

Description of re-TAMD
The temperature-accelerated molecular dynamics (TAMD) approach is an enhanced sampling approach, based on the parallel evolution of the protein coordinates x in a classical MD simulation (Eq. 1) and of the target values z for the collective variables (CV) θ α (x) (Eq. 2): (2) where x are the physical variables (atomic coordinates) of the system, θ(x) are the current values of the collective variables and z the ever evolving target values of the collective variables. Several sets of collective variables were used on the peptides only (Fig. 2). M is the mass matrix, V (x) is the empirical classical potential of the system, η x,z (t) are white noises i.e. Gaussian processes with mean 0 and covariance < η It was shown in [23] that by adjusting the parameter κ so that z(t) ≈ θ (x(t)) and the friction coefficientγ so that the z move slower than x, one can generate a trajectory z(t) in z-space which effectively moves at the artificial temperatureT on the free energy hyper-surface F(z) defined at the physical temperature T. Then, usinḡ T > T in Eq. 2 accelerates the exploration of the free energy landscape by the z(t) trajectory, as energy barriers can be crossed more easily.
Along the re-TAMD trajectories, the artificial friction γ of the Langevin thermostat attached to the collective variables was set as a constant equal to 0.02 ps −1 , whereas the artificial thermal energyβ −1 was varied continuously depending on the smallest distance min(D) between the H3 peptide and the YEATS domain.
where h and k values are given in Table 2.
In order to keep the peptide close to the YEATS domain, at each simulation step, the distances D new i measured between the new target values of the i-th CV and the YEATS domain were compared to the corresponding previous distances D i . The following soft-ratcheting criterion [29,50,51] was used for accepting or rejecting the new target values of the peptide's collective variables. If at least one distance D new i is smaller than the corresponding distance D i all of the new target values are accepted as the current target values. Otherwise, the new values are accepted with a probability of min 1, f 1 f 2 · · · f N where: where c is the restraint coefficient that determines how strict is the distance restraint. The c values are given in Table 2.
For the native peptide, the values of h, k and c were chosen as values for which a complete exploration of the YEATS surface is performed by the peptide. As the peptides pt7 and pt3 have a smaller mass, they flow away from the protein surface, not allowing a satisfying exploration  of the protein surface. So, the parameter k was decreased and the parameter c increased ( Table 2) to prevent these peptides from separating from the protein.
The CPU time necessary to record one re-TAMD trajectory on the complex between the YEATS domain and the H3 peptide is between 10 and 17 days on computers with 16 cores, and GeForce GTX GPUs, using the CUDA version of NAMD 9.2b2.

Analysis of trajectories
The atomic interactions, hydrophobic and polar, between the peptides and the protein, were analyzed by calculating the number of proximities (distance smaller than 4 Å) between polar/hydrophobic groups present in the peptide and in the protein throughout the trajectory. This analysis was performed using a python script based on the MDAnalysis module [52]. The number of inter-atomic contacts was rescaled between 0 and 1 for each trajectory. The number of contacts per residue was determined as the sum of atomic contacts involving each residue divided by the largest contact value. The partial charges used for this script were taken from the AMBER ff03 force field [47]; partial charges with absolute values smaller than 0.2 were considered to correspond to hydrophobic groups. The (φ, ψ) distributions were also calculated using MDAnalysis. The protein surfaces were calculated using PyMol [53].

Results
An analysis of the acK18 position with respect to the residues of the native pocket (Table 3) reveals that, along the MD trajectories with the three different peptides, all acK18 display similar proximity with respect to most of the residues, which means that no peptide dissociation from the native site is observed. Nevertheless, pt7 and pt3 move apart from W35, I85 and L109, and get closer to H59 and S61, which is the unsurprising sign of a slight destabilization. On the other hand, during the re-TAMD trajectories, acK18 dissociates from its native binding site, which proves that the peptide moves away from this site.
Along the re-TAMD trajectories, statistics of residue contact proportions along the YEATS domain sequence (Fig. 3) are plotted for the polar (red) and hydrophobic (blue) contacts, as well as for the total number (green) of contacts. The peptides pt13 and pt3 display contact with a subset of residues while p7 displays contact with a very wide range of residues. Therefore, the specificity of contact is greater for pt13 and pt3 than it is for pt7. Noticeably, the profiles of polar and hydrophobic contacts  The partial charges come from the AMBER ff03 force field [47]. This analysis was done with a python script using the MDAnalysis module [52] are mostly superimposed, except for the residue I85 for which the hydrophobic profile dominates. The mapping of the atomic contacts on the YEATS domain's surface (Fig. 4) shows that the front surface, which contains the acK18 binding site, is much more sampled than the opposite surface in the case of pt13 and pt3. The peptide's preference for the side containing the acK18 site is certainly influenced by the peptide's initial position, but agrees with the specificity of the interaction between H3 peptide and YEATS. Also, the acK18 site is blue for all trajectories, which proves that the acK18 dissociation was complete, in agreement with Table 3, and that the peptide was mostly exploring the remaining part of the protein surface. If one compares the three different peptides, pt13 displays a more specific distribution of its atomic contact hot-spots -colored in red and corresponding to proportions greater than 80%. Indeed, most of the pt13 hot spots are located close to the acK18 binding site. Conversely, the peptide pt7 is much less specific with an almost completely blue and green surface and very few hot spots. The tripeptide pt3 displays other features as several hot spots are present, but more or less uniformly dispersed on the Fig. 4 Contacts between YEATS and peptides. Surface of YEATS domain colored according to the proportion of atomic contacts with the peptide, from blue (near 0% of maximum number of contacts) to red (over 80% maximum number of contacts). The gray atoms had strictly zero contacts with the peptide throughout the trajectory. The Connolly protein surface was calculated using PyMol. The protein surface is represented with 30% transparency in order to see YEATS' secondary structures in cartoon representation. The total number of inter-atomic contacts (between a YEATS atom and a peptide atom) for each YEATS atom was calculated along the three re-TAMD trajectories: TAMDYpt13 (a-b), TAMDYpt7 (c-d), TAMDYpt3 (e-f). The N terminal tails GSH bearing the Histidine tag were not drawn front surface. Also, pt3 is the peptide displaying the largest unsampled grey surface, which agrees with the observation that this peptide spends only 93% of the trajectory close to the protein surface, versus 100% and 99% for pt7 and pt13 respectively. Typical peptide conformations bound to the YEATS domain during re-TAMD trajectory are shown in Fig. 5.
The YEATS domain residues displaying at least one atomic contact frequency larger than 50% along re-TAMD trajectories, have been listed in Table 4. In TAMDYpt13, the number of residues with atomic contacts larger than 50% is similar to the number of contacts observed in the PDB structure 2NDF of the complex. A comparison between these two residue lists reveals that they are significantly different: the 2NDF and TAMDYpt13 lists only have about one half of their residues in common. Beside, there is a larger number of YEATS residues with atomic contacts greater than 50% for pt13 than for pt7 and pt3. The additional contacts supporting the specificity of pt13 with respect to pt7 can be clustered in three groups: (i) S7 C8 V10, located at the N terminal part on the protein side opposite to the acK18 native site, (ii) H59 A82  (Table 4), are colored in green. The peptide conformation is displayed in sticks, colored in red and the position of the acK18 native binding site is marked with an asterisk G83 F84 I85, located around the acK18 native site, and (iii) R102 D104 D106 F108 L109 L111, located on the surface of the β sheet, which may be naturally in contact with the H3 peptide when the peptide encounters the YEATS domains.
In the case of pt13, the mapping of the contacts of each peptide residue to the YEATS surface (Fig. 6) reveals quite different behavior between pre-acK18 and post-acK18 residues in the H3 peptide sequence. Indeed, the post-acK18 residues Q19, L20 and A21 along with acK18 bind mostly to the YEATS residues A82, G83, F84 and I85, which are located at the entrance of the acK18 binding site. Conversely, the pre-acK18 residues G13, K14, A15, P16 and R17 bind mostly to I85, and then alternatively to D104, D106 and H119, which are more dispersed on the protein surface. Thus, the QLA motif located in the The N terminal tails GSH bearing the Histidine tag were not considered post-acK18 region seems to play an important role in the peptide sampling on the surface. Twenty four sequences of YEATS domains were aligned (Fig. 7) using T-Coffee [54]. The residue contact proportions are plotted below the sequence alignments. Along MD trajectories, the protein sequence 80-84, on which the acK18 binding site is centered, displays strong residue contacts. In other sequence regions, the proportion of contacts are more spread out and somehow more intense for the native peptide pt13 than for the shortened peptides pt7 and pt3. Along re-TAMD trajectories, the residue contacts are much reduced along the sequence 80-84, and spread out on other protein regions, as observed in Fig. 4 and Table 4. Among the residues displaying strong atomic contacts with pt13 (Table 4), V10, H59, P72, A82, G83, F84, P98, L109, L120, C122 are conserved in the YEATS sequence alignment (Fig. 7). The mapping obtained by re-TAMD on the AF-9 YEATS domain can thus be related to global sequence features of the YEATS family.
The distributions of φ and ψ backbone angles (Fig. 8), determined on the different peptides, reveal that the sampling of the peptides in complex with the YEATS domain along re-TAMD trajectories (red) is similar to the sampling along the MD trajectories of isolated peptides (green). The re-TAMD procedure thus induces a maximal sampling of the peptide conformations, which is certainly a very positive aspect in the search of alternative binding conformations and of cryptic binding sites. Also, the large sampling observed along re-TAMD trajectories is a sign that the peptide along the surface is in a free-diffusion state along the protein surface, which is the first step of the interactions between biomolecules Fig. 6 Contacts between YEATS and peptide residues. Surface of YEATS domain colored according to the proportion of atomic contacts with a given pt13 peptide residue. The total number of contacts for each YEATS atom was calculated along the TAMDYpt13 trajectory. The labels on the YEATS surface are YEATS residues displaying at least one large atomic contact with the peptide. The N terminal tails GSH bearing the Histidine tag were not drawn before the formation of the close-encounter complex [55]. In that way, the re-TAMD trajectories, which were obtained here by dissociating the peptide/protein complex, converge to the first steps of the peptide/protein association. The sampling of the peptide along the MD trajectories of the peptide/YEATS complex (blue) is more reduced than in other trajectories and this difference is most prominent in the case of MDYpt3 (Fig. 8c), in which the peptide does not sample the region of negative values.  [54]. The columns corresponding to gaps in the human AF9 sequence were removed in order to reduce the figure width. Under the alignment, the first horizontal bar represents the secondary structure of the AF9 YEATS NMR structure (PDB 2NDF: α helices in red and β strands in orange). The following six bars represent the proportion of contact of each YEATS residue with the peptide, from blue (near 0% of maximum number of contacts) to red (over 80% maximum number of contacts). There is one bar for each trajectory: MDYpt13, MDYpt7, MDYpt3, TAMDYpt13, TAMDYpt7, TAMDYpt3. Contacts were computed with a python script using the MDAnalysis module [52]

Discussion
Here, the re-TAMD approach, based on the TAMD enhanced sampling method, has been presented for performing the exploration of a receptor surface by a ligand. This approach presents several advantages. First, the energy is calculated using an all-atoms classical empirical force field, which allows a precise evaluation of intermolecular interactions. Second, the choice of collective variables is fully open to the user, and additional collective variables could be put on the receptor, in order to study the interplay between internal dynamics and intermolecular interaction [56,57]. In a similar way, competition of two ligands for a binding site could be studied using re-TAMD with an appropriate soft-ratcheting criterion [29]. The effects of water molecules and/or ions could also be observed. During the present work, the starting point of all re-TAMD simulations contains the peptide bound to the native site of acK18. But, starting from any point of the protein surface should be possible using a ligand pose obtained by molecular docking.
The region around the native binding site of acK18 is mainly populated for the full peptide pt13, whereas quite different surfaces are explored for shorter peptides. This agrees with the specificity of the H3 N terminal peptide, supported by the strict conservation of the primary sequence in this region. Then, according to the (φ, ψ) distributions, the conformational spaces sampled by the peptides along the re-TAMD trajectories are similar to the ones sampled by the isolated peptides in MD trajectories. This is in agreement with the free-diffusion step [55] for the intermolecular interactions. Furthermore, the YEATS residues displaying the most interactions with the peptide along the re-TAMD trajectories, are conserved in the sequences alignment of the YEATS domain. One should nevertheless notice that not much is experimentally known about the first steps of interactions between YEATS domain and the N terminal peptide of histone H3, so the results presented here are more evaluated to be plausible with the general knowledge on protein/peptide interaction than with a knowledge specific to YEATS/H3 interaction.
Several protein-peptide docking approaches have been developed in the literature. Several of them are based on Fast Fourier Transform (FFT) or on Normal Mode Analysis (NMA) and uses rigid docking [58]. The Normal Mode Analysis has been also used [59][60][61][62] for determining several conformations of the receptor in order to perform docking on these conformations. However, the use of FFT or of NMA does not permit to simulate the relative effect the dynamics of each interaction partner has on the other partner. By contrast, in re-TAMD simulations, the peptide as well as the protein are free to move in the force field, the peptide motions being accelerated. In that way, the peptide explores more protein surface, and one can expect that the effect of the peptide on the protein should also be enhanced.
In the present work, 100 ns of re-TAMD trajectory was recorded for each analyzed peptide. This computational load is thus smaller than that of reconnaissance or funnel metadynamics [13,14], but much larger than that of many docking approaches [58,[63][64][65]. The re-TAMD approach can thus be considered to be efficient on small datasets of several dozens to a hundred ligands. This makes this method possible for virtual screening approaches focused on a ligand family. Given that for some studies, only dozens of ligands can be tested due to stringent experimental constraints, it is worth noting re-TAMD as an efficient method to map surface contacts in the context of an all-atom force field.

Conclusions
Based on the enhanced sampling approach TAMD, the method re-TAMD has been proposed to induce a free diffusion of a peptide around a protein surface. This approach has been tested on the interaction between the N terminal peptide of the histone H3 and the reader domain YEATS. Several contact distributions are obtained on the protein surface, depending on the peptide length. Less significant contact distribution has been observed as the peptide gets shorter, putting in evidence the importance of the 13 residue peptide for the reader/histone interaction. Furthermore, the most often observed contacts involved protein residues located in conserved regions of the YEATS sequence alignment.