Structural deformation upon protein-protein interaction: A structural alphabet approach
© Martin et al; licensee BioMed Central Ltd. 2008
Received: 15 June 2007
Accepted: 28 February 2008
Published: 28 February 2008
Skip to main content
© Martin et al; licensee BioMed Central Ltd. 2008
Received: 15 June 2007
Accepted: 28 February 2008
Published: 28 February 2008
In a number of protein-protein complexes, the 3D structures of bound and unbound partners significantly differ, supporting the induced fit hypothesis for protein-protein binding.
In this study, we explore the induced fit modifications on a set of 124 proteins available in both bound and unbound forms, in terms of local structure. The local structure is described thanks to a structural alphabet of 27 structural letters that allows a detailed description of the backbone. Using a control set to distinguish induced fit from experimental error and natural protein flexibility, we show that the fraction of structural letters modified upon binding is significantly greater than in the control set (36% versus 28%). This proportion is even greater in the interface regions (41%). Interface regions preferentially involve coils. Our analysis further reveals that some structural letters in coil are not favored in the interface. We show that certain structural letters in coil are particularly subject to modifications at the interface, and that the severity of structural change also varies. These information are used to derive a structural letter substitution matrix that summarizes the local structural changes observed in our data set. We also illustrate the usefulness of our approach to identify common binding motifs in unrelated proteins.
Our study provides qualitative information about induced fit. These results could be of help for flexible docking.
Most of biochemical reactions inherent to the life of a cell are mediated by protein-protein interactions, e. g. the recognition of a substrate by an enzyme, or an antigen by an antibody. Protein-protein interaction is influenced by several factors like the size and shape of the interface, shape complementarity between interacting proteins or hydrophobicity [1, 2]. Interfaces between interacting proteins have been extensively studied for decades now [3, 4]. It has been shown that they have distinct features when compared to non-specific interfaces observed in protein crystals [5–9], or when compared to the rest of the protein surface [10–16]. Different models have been proposed for the protein binding process. The first was the 'lock and key' model, stating that interacting proteins bind to each other thanks to shape complementarity, without structural modification. Another model has then been suggested: the induced fit, in which the protein structure is modified upon binding . Finally, it is thought that unbound protein exist as an ensemble of conformations, some of them being more favorable for the interaction , this is the pre-existing equilibrium model. As the number of experimental 3D protein structures increases, some evidences of induced-fit and pre-existing equilibrium are now available and are described in .
The prediction of protein-protein interactions is a current challenge. Some bioinformatic methods have been developed in order to predict whether or not two proteins interact [20–25]. When it is known that two proteins interact, docking methods are employed to predict the 3D structure of the resulting complex, given the structures of interacting partners [26, 27]. The performance of docking methods are monitored by Critical Assessment of Predicted Interactions (CAPRI), a blind prediction experiment where structural biologists provide unpublished experimental complex structures as targets for docking programs . Induced fit introduces a supplementary difficulty to the challenging task of docking. Slight modifications involve the rearrangement of side chains that change their conformations to accommodate the interaction with the interacting protein. Stronger modifications can also alter the backbone conformation. Flexible protein-protein docking methods are thus developed in order to account for these conformational changes (see for example  for a review of flexible docking methods).
The extend of induced fit modification in protein-protein complexes has been previously studied. A study made by Betts and Sternberg in 1999 revealed that, in a dataset of 39 protein-protein complexes, a half exhibited substantial movements, when compared to pairs of similar proteins solved by different groups . It has been later shown that the structural changes upon protein-protein binding correlate well with the theoretical displacements derived from normal mode analysis . Recently, this was further explored on antibodies that bind different antigens . The case of enzymes has also been addressed: the conformational modification induced by the binding appears to be small in most enzymes (less than 1 Å rmsd), but residues belonging to the binding site exhibit larger backbone motions . Recently, Daily and Gray have used control sets to distinguish between enzyme induced fit modifications and experimental error or intrinsic flexibility of proteins . They found that about 20% of the residues exhibit substantial conformational changes and noted a significant bias toward weakly constrained regions, e. g., loops.
In this study, we propose an investigation of structural changes in protein complexes, from a local point-of-view, via a structural alphabet developed in our lab. We consider a set of protein-protein complexes for which the crystallographic structures of both the complex and free partners are available, and quantify the structural changes in terms of structural letter modifications. We also use a control set of 14 protein pairs for which the structures has been independently determined by a different team, as in . The correlation between global change and the number of local change is investigated. We then study the preference for particular letters in the interface regions, and analyze the structural letter substitutions that occur at the interfaces. We also use this new approach to detect common binding motifs in unrelated proteins.
Proteins from 62 complexes are represented as sequences of structural letters using our structural alphabet called HMM-SA [35–37]. We then analyze the differences between bound and unbound structural letter sequences. For clarity, we briefly present here the structural alphabet HMM-SA; more details can be found in [35–37].
HMM-SA, is a library of 27 structural prototypes of four residues, called structural letters, established using hidden Markov model formalism. Thanks to HMM-SA, the 3D structure of a protein backbone is simplified into a sequence of structural prototypes. The simplification relies on Cα positions only: each four-residue fragment of the protein structure is described by four inter-Cα distances. The resulting distances are the input of a hidden Markov model, and the structure is translated as a sequence of structural letters. This encoding is made using the Viterbi algorithm  and takes into account both the similarity of the fragments with the 27 structural letters and the preferred transitions between structural letters. A protein structure of N residues is then encoded as a sequence of N - 3 structural letters. The model has been trained on 1429 X-ray structures of globular proteins, presenting less than 30% sequence identity with a resolution better than 2.5 Å and longer than 30 residues. These structures were taken from the PDB, irrespective of their quaternary structures. They represent a total of 332,493 four-residue fragments.
A set 14 protein pairs solved by different groups is used as a control set to assess the local structural changes observed in a data set of 68 protein-protein complexes, denoted as the complex set. In the control set, 1,128 structural letters are modified out of 4,072(28%). The total number of structural letter pairs considered in the complex set is 32,356. Overall, 20,679 structural letters are unchanged (64%), and 11,677 are changed (36%). This proportion is significantly greater than the proportion in the control set, as assessed by a Chi-square test (p-value < 2.10-16).
If we consider the 8 groups of structural letters with a 1 Å rmsd dev threshold (described above and shown in Figure 1) by ignoring structural letter changes within the same group, we obtain the following results: 5% of the control set is modified (216 modifications out of 4,072) and 11% of the complex set is modified (3,666 modifications out of 32,356). These proportions are significantly different as assessed by a Chi-square test (p-value < 2.10-16).
The global compositions of bound and unbound chains, in terms of structural letters, are similar (data not shown), except for helical letters [A] and [a]: bound conformations have more [a] and less [A] than unbound conformations.
Percentage of modified structural letters in the complex set
Type of complex
rmsd range a
less than 1 Å
1 to 2 Å
2 to 3 Å
more than 3 Å
Antibody-antigen complexes undergo 29% of structural letter modifications, a number similar to that obtained on the control set. Thus, on the limited number of structures available (10 antibody-antigen complexes), this class of proteins shows only moderate modifications upon protein-protein binding. The 'other' class experiences the highest percentage of structural letter changes (40%). This class encompasses different kind of complexes (transport proteins, signaling proteins, viral capsid). The enzyme class has an intermediate behavior with 32% of modifications. In their study, Daily et al found that 20% of the residues in enzymes are significantly modified upon binding . Here, we find 32% of change in the structural letter sequences. As we show later, a part of these changes replace a structural letter by a similar one. Table 1 also reports the percentages of modified structural letters according to the root mean square deviation of the Cα (Cα rmsd). For all types of complexes, the global tendency is a correlation between the percentage of modified structural letters and the Cα rmsd, the exception being the antibody-antigen in which a low percentage of modified letter (29%) is obtained for high rmsd (more than 3Å) for 2 chains.
Some structures undergo minor global modifications, and other structures are significantly modified, as assessed by the Cα rmsd ranging from 0.2 to 14.0 Å. For comparison, the Cα rmsd on the control set ranges from 0.20 to 0.38 Å, with a mean value equal to 0.30 Å (same rmsd is obtained for allosteric and non-allosteric protein pairs). The percentage of modified structural letters for different rmsd ranges is indicated in Table 1. As expected, the percentage of modified structural letters is higher on the structures with high rmsd.
Two chains exhibit high Cα rmsd, around 5 Å, but a moderate fraction of structural letter changes, around 30%. These two chains are chain A of receptor part in complex 2VIS (rmsd = 4.9 Å, percentage of change = 26% in a total of 207 structural letters) and chain B of receptor part of the same complex (rmsd = 5.0 Å, percentage of change = 32% in a total of 218 structural letters). The examination of these structures shows that they are made of two domains that undergo large motions upon binding, as can be seen on Figure 2b. The relative orientation of the two domains is significantly modified, hence leading to a high Cα rmsd. The local structures, however, remains similar, as assessed by the moderate percentage of modified structural letters. If they are superimposed by portion, the two domains have low Cα rmsd: 0.6 Å and 0.9 Å for domains 1–109 and 110–210 for chain A.
On the contrary, some structures exhibit slight global modifications but a high proportion of local changes: chains A and B of the receptor part of complex 1I4D (respective rmsd are 0.88 and 0.89 Å, respective percentage of modified structural letters are 75 and 68%), and chain B of the receptor part of complex 1F51 (rmsd equal to 1.45 Å, 69% of modified structural letters). These structures, shown on Figure 2c, have a good conservation of there global structures, but the structural letter sequences capture some subtle differences in helix structures. The unbound helices are encoded by runs of helical letter [A], alternate with less regular letters [V] and [W], whereas bound helices are encoded by homogeneous runs of [a], suggesting a higher regularity of bound helices.
It thus appears that the structural alphabet approach offers a complementary approach to the global rmsd as a few local change can be associated to drastic global change, and inversely.
The structural alphabet provides a simplified but detailed description of the protein backbone. As shown on Figure 1 some structural letters have very similar conformation, e.g., [a] and [A], whereas others are clearly different, e.g., [D] and [S]. This disparity has been quantified in , by the rmsd dev : 0.15 Å between [A] and [a], and 1.6 Å between [D] and [S]. Furthermore, the structural letters have different intrinsic variability, as measured by the rmsd intra . The rmsd intra of the structural alphabet varies from 0.08 Å for letter [A], to 0.91 Å for letter [F] . The consequences are that (i) different structural encoding can be observed for similar conformations (e. g. a run of [A] replaced by a run of [a]), and (ii) the same structural letter can encode relatively dissimilar fragments, e. g., the most variable letter [F]. It is then desirable to check for consistency between the structural alphabet approach and classical external measures to assess the extend of the local deformations. The aim is to see if the structural alphabet, used for structure description, can also be used to detect significant local deformations.
A consequence is that the rmsd dev alone cannot be used to quantify the structural change. This analysis tells us that although the structural alphabet offers a unique original tool to detect and qualitatively describe structural deformation, this information has to be combined with the local rmsd in order to properly measure the deformation.
The interface regions are defined using Voronoi tessellations. Among the 32,356 residues in the complex set, 3,746 are thus defined as interface residues. Interface residues then represent 12% of the whole dataset (i. e. both surface and core residues).
A total of 3,746 structural letters are involved in the interface: 2,217 (59%) are unchanged and 1,529 (41%) are changed. If we consider the 8 groups of Figure 1, 604 structural letters are changed (16%). This is significantly greater than the results obtained for the whole structures (36% of structural letter changes and 11% with the 8 groups), as assessed by Chi-square tests (p-values < 2.10-16).
To assess if each individual letter undergo more substitutions in the interface set than in the control set, we compute Z-scores (data not shown). All the structural letters, except [a], are more modified in the interface set than in the control set. The difference is significant for 19 letters out of 27, particularly high for letters I, Q, J, and K.
Number of possible substitutions (Nsub) in the different data sets. The numbers between parentheses are the difference between the Nsub and the Nsub of the control set.
4.3 (+ 1.3)
5.1 (+ 2.1)
5.2 (+ 1.1)
4.6 (+ 0.5)
5.7 (+ 1.0)
6.2 (+ 1.5)
5.6 (+ 1.6)
6.3 (+ 2.3)
5.2 (+ 1.5)
5.9 (+ 2.2)
4.1 (+ 0.6)
4.4 (+ 0.9)
3.4 (+ 1.5)
3.7 (+ 1.8)
2.4 (+ 0.9)
2.3 (+ 0.8)
4.9 (+ 2.6)
5.1 (+ 2.8)
4.7 (+ 1.5)
3.9 (+ 0.7)
2.6 (+ 1.3)
2.9 (+ 1.6)
1.9 (+ 0.7)
2.0 (+ 0.8)
2.7 (+ 1.1)
3.1 (+ 1.5)
3.3 (+ 1.7)
3.3 (+ 1.7)
2.7 (+ 0.9)
3.8 (+ 2.0)
3.8 (+ 1.7)
3.5 (+ 1.4)
5.1 (+ 2.5)
5.5 (+ 2.9)
4.8 (+ 3.2)
5.9 (+ 4.3)
2.8 (+ 1.0)
3.4 (+ 1.6)
3.2 (+ 1.0)
3.8 (+ 1.6)
3.1 (+ 1.4)
4.2 (+ 2.5)
2.7 (+ 0.8)
2.8 (+ 0.9)
3.5 (+ 1.2)
4.7 (+ 2.4)
3.5 (+ 1.3)
4.8 (+ 2.6)
2.7 (+ 0.7)
3.2 (+ 1.2)
3.0 (+ 1.0)
4.0 (+ 2.0)
2.8 (+ 1.0)
4.3 (+ 2.5)
The same global tendency is observed in the control and the interface sets: high Nsub for helical letters, some of the extended letters and a few coil letters. However, the Nsub computed from the the complex set are higher than the Nsub computed from the control set. The interface region analysis results, in a majority of cases, in higher Nsub than in the complex set, confirming that interface regions undergo more various structural changes. The Nsub are one to two points greater in the interface set than in the control set, except for letters [J] (+4.3), [R] (+2.9) and [E] (+2.8), resulting in Nsub greater than 5 for these letters. On the contrary, letter [D] has the lowest Nsub, equal to 2. This analysis thus reveals that some structural letters are particularly affected by the binding (i. e., [E,R,J]).
The quantitative measurement of structural letter changes is assessed using the local rmsd. In the control set, 5% of the fragments show a local rmsd greater than 0.2 Å. We will then use 0.2 Å as a threshold to select significant local deformations. In the complex set, 25% of the fragments have local rmsd greater than 0.2 Å, and 35% if we restrict to the interface fragments. It thus appears that interface regions undergo more severe local changes than the rest of the structure.
It thus appears that the severity of local deformation is not uniform among the structural letters, in particular among structural letters describing coils. Some structural letters are more likely to be affected by the formation of protein-protein complex.
Now that we have shown that some structural letters are preferentially affected upon binding, the next step is to analyze the resulting conformation after binding, namely the structural letter substitutions. Figure 6 is an illustration of the probabilities of structural letter substitution in the interface region. The unbound form is taken as the reference for this computation. To take into account only significant changes, we restrict the analysis to the pairs of structural letters that correspond to a local rmsd greater than 0.2 Å. The number of structural letter pairs with local rmsd greater than 0.2 Å is 1309, including 488 cases of structural letter identity. Among the 729 possible substitution probabilities (27 × 27), 312 are non-null and 139 are greater than 5%. It must be noted that the substitution probability matrix is highly asymmetrical.
For example, extended letters [A,a,V,W,Z,B,C] display high probabilities to be substituted into letter [Z,B] upon binding. The probability for letter [Z] to be transformed into [V] in the interface region upon binding is 8.8%, whereas it is 28.6% for the inverse transformation from [V] to [Z]. This arises from the normalization with respect to the unbound form needed for the probability computation. The substitution count table is nearly symmetrical, as shown in additional file 1, but the number of structural letters in each class being unequal (see Figure 4), it results in asymmetry in the substitution probabilities. To facilitate the global examination of Figure 6, let us separate the 27 structural letters into the 3 main groups associated to classical secondary structure elements: [a,A,V,W,Z,B,C] for helix and helix borders, [J,K,L,M,N,T,X] for strands and strand borders, and the remaining [D,E,O,S,R,Q,I,F,U,P,H,G,Y] for coils.
The structural alphabet thus provides a new way to describe local structural changes as the substitution of a structural letter by another one. It is the first time, to our knowledge, that such a qualitative description is reported.
Structural modifications occur in both partners: region 186–197 in chymotrypsin and region 38–50 in eglin part. The global Cα rmsd for chymotrypsin (receptor) is 1.75 Å and the percentage of structural letter substitution is 37%. For eglin (the ligand), global Cα rmsd is 1.5 Å and 45% of the structural letters are modified. Both modifications involve letters [I,J,Q] which are significantly more modified in the interface set than in the control set. 1DE4, shown in Figure 7d, is a complex between beta2-microglobulin and a transferrin receptor. The structural modification highlighted here occur in region 13–27 of the beta2-microglobulin and region 519–534 of the transferrin receptor (the ligand). The local structures of both partners are modified were the contact occurs. Beta2-microglobulin (the receptor) has a global Cα rmsd equal to 1.65 Å and 49% of structural letter substitution. The transferrin receptor (ligand) has a global Cα rmsd equal to 1.6 Å and 41% of its structural letters are modified upon binding. Both regions involve letters [I,J,K].
we look for structural motifs at least four structural letter long (i. e., seven residues);
the motif should be present in the bound forms of at least two complexes from different classes. We consider the 3 classes from Table 3, namely enzyme/substrate, antibody/antigen, and other;
Description of the complex set
Complexes PDB id
1ACB, 1AVX, 1AY7, 1BVN, 1CGI, 1D6R, 1DFJ, 1E6E, 1EAW, 1EWY, 1EZU, 1F34, 1HIA, 1KKL, 1MAH, 1PPE, 1TMQ, 1UDI, 2MTA, 2PCC, 2SIC, 2SNI, 7CEI
1AHW, 1BGX, 1BVK, 1DQJ, 1E6J, 1JPS, 1MLC, 1VFB, 1WEJ, 2VIS
1A2K, 1AK4, 1AKJ, 1ATN, 1B6C, 1BUH, 1DE4, 1E96, 1EER, 1F51, 1FC2, 1FQ1, 1FQJ, 1GCQ, 1GP2, 1GRN, 1H1V, 1HE1, 1HE8, 1I2M, 1I4D, 1IB1, 1IBR, 1IJK, 1KLU, 1KTZ, 1KXP, 1M10, 1ML0, 1N2C, 1QA9, 1RLB, 1SBB, 1WQ1, 2BTF
the motif should be located in totality at the protein-protein interfaces of the complexes;
we do not consider runs of helical letters (A,a,V,W,Z,B,C) or extended letters (L,M,N,T,X,J,K). Helices and strands being highly abundant in 3D structures, these motifs may be non significant;
a significant local deformation should be seen, at the considered motif, in at least one complex;
the local rmsd between the bound fragments covered by the motif should be lower than the local rmsd between unbound fragments.
Using these criteria, we extracted common bound motifs from proteins with unrelated function. With the rmsd criterion, we select cases where the conformational change induced by the binding put the bound structures closer than the unbound structures, what we call "local structural convergence". Given the limited amount of data we have, and the stringent criteria we applied (in particular, we consider only 3 classes), we found only a few cases of local structural convergence. Two examples are illustrated in Figure 8. Structural motif GOIF is seen in two unrelated complexes: 1AHW, an antibody/antigen complex, and 1BVN, an enzyme/substrate complex. The local Cα rmsd for the corresponding fragment is 1.4 Å between unbound forms and 0.7 Å only between bound forms. Complex 1AHW undergoes only minor conformational change, as assessed by the rmsd equal to 0.4 Å, and a similar unbound structural motif: GOIJ. Complex 1BVN is modified up to an amount of 0.8 Å, starting from a different structural motif: SGRF. The underlying amino-acid sequences are 'LQHGESP' (1AHW) and 'VIDLGGE' (1BVN). Structural motif LLGI is seen in one 'other' complex, 1GRN (complex between a G-protein and a GTPase activation domain) and one enzyme/substrate complex, 1UDI. Both complexes are significantly modified by the binding: 1.9 Å rmsd for 1GRN, from KPQL to LLGI, and 1UDI, in a lesser extend: 0.9 Å rmsd, from LNNG to LLGI. Local rmsd are equal to 2.2 and 0.8 Å before and after binding respectively. Underlying amino-acid sequences are 'YVPTVFD' (1GRN) and 'QLVIQES' (1UDI). These examples highlight the usefulness of the structural alphabet for further analysis studies using larger data sets.
This study reveals that the structural alphabet offers a new way to investigate local deformations induced by the protein-protein interaction. Classical studies revealed that interface regions preferentially involve loops. Here, we show that two structural letters forming helix ends [B,C] are preferred at the interface and that only a part of the structural letters describing the loops, [O,H,Y,R,J,S], are preferred at the interface. Letters [E,R,J] are particularly affected by the binding (number of possible substitution greater than 5 versus 2 in the control set). Concerning the severity of the substitutions, letters [E,F,I,Q,J] are subject to major modifications.
It is the first time that local conformation changes can be qualitatively described in such a way. The main advantage of using the structural alphabet approach, compared to classical rmsd measure, is that it provides a description of bound and unbound conformations, and, in turn, a qualitative description of the deformation. This feature opens the perspective for further studies, such as the classification of interface structural motifs and structural changes. The following questions could be addressed: are the structural modifications common to any type of complexes ? Can the same structural modifications be observed in unrelated proteins ? Could we use the qualitative description of structural changes to make a classification of binding movements ? An example of such analysis is illustrated in Figure 8, in which we highlight two examples of common binding structural motifs from unrelated proteins. Although the actual amount of data is insufficient to derive any conclusive remarks, the structural alphabet approach seems very promising to address such questions.
The computation of structural letter substitution probabilities highlights some preferred substitutions. Such informations could be useful for flexible docking experiments and binding pocket detection at protein surfaces. Flexible docking strategies include the use of ensembles of alternate starting conformations -taken from molecular dynamic simulation [41–44] or other conformational sampling techniques - and the explicit integration of conformational changes during the docking procedure via simulated annealing refinement  or multicopy mean-field approach . In this framework, the structural letter substitution probabilities derived from the present study could be used in a conformational sampling technique. The structural letter substitution matrix could be used in a generative manner using a Markov process: starting from the unbound structural letter sequence, modifications are introduced using the matrix, to generate a potential bound structural letter sequence. It is then possible to re-build the bound backbone from the structural letter sequence [48, 49]. This would probably require some external methods to predict which region is to be modified. The strong transition rules between successive structural letters  should also be taken into account in order to generate realistic structural letter sequences.
We use the version 2.4 of the benchmark presented by Mintseris et al : 83 crystallographic structures of protein-protein complexes -the bound structures- accompanied by the crystallographic structures of the free ligands and receptors -the unbound structures. The Mintseris dataset consists in 23 enzyme-inhibitor complexes, 21 antibody-antigen complexes (11 of them are in bound/unbound conformation) and 39 other type complexes. As we are interested by structural changes upon binding, the 11 antibody-antigen complexes in bound/unbound conformation are excluded from the analysis. Some ligands and receptors are multichains. The comparison between bound and unbound forms require a correspondence between the residue numbering of each form. This restriction leads to the exclusion of four complexes belonging to the 'other' class. When only the ligand (or the receptor) has inconsistent residue numbering, the receptor (or ligand) is kept in the analysis. Similarly, when one chain of a multimer protein has inconsistent residue numbering, the others were kept in the analysis. 15 chains were then further removed.
The complete dataset of 68 complexes (containing 156 chains from 124 proteins) used in this study is described in Table 3. We will refer to this data set as the complex set.
To distinguish the structural deformation induced by protein-protein binding from the experimental uncertainty and the expected variations due to protein flexibility, a control set is needed. We consider the control set of 14 protein pairs used by Daily and coworkers :
5 protein pairs independently crystallized by different groups: 2CBA/1CAM, 1VDQ/1HEL, 1UNE/1MKT, 1EY0/1STN, and 1TPH/1TPW.
9 pairs of allosteric proteins independently crystallized in the same form: 3CHY/1JBE, 1GDD/1BOF, 1GPB/8GPB, 4HHB/1A3N, 1T48/1T49, 1OIW/1YZK, 1VG8/1T91, 1XTS/1XTR, and 2TRT/2TCT.
In this study, the ligand and receptor of each complex, in bound and unbound forms, are simplified into structural letter sequences using HMM-SA and the Viterbi algorithm [35, 36]. Local conformational modifications between bound/unbound forms are studied through the structural letter sequences.
A classical measure of conformational change is the rmsd (root-mean square deviation), i. e., the mean deviation of atom positions after otimal superimposition of two structures. The rmsd can be computed for the whole protein -a global rmsd- or for a fragment of the protein -a local rmsd. In this study local and global rmsd are computed using Cα atoms only, using the ProFit software .
As explained in the Results section, a general distance between two different structural letters is given by the rmsd dev , as defined in . The rmsd dev has been computed from 500 fragment pairs randomly chosen in the two structural letters. The rmsd intra , computed in the same way, measures the intrinsic variability of each structural letter.
The structural distance between two fragments of four residues can then be measured using the local rmsd or the rmsd dev . Note that the difference between these two rmsd is that the local rmsd is computed for each pair of fragments using proFit, whereas the rmsd dev is taken from a pre-computed table, by considering only the structural encoding of the fragments.
where N bound (x → y) denotes the number of structural letter x in the unbound form that are replaced by structural letter y in the bound form, and N unbound (x) denotes the total number of structural letter x in the unbound form. When x = y, this quantity is the probability of being unchanged. Here, we consider that the unbound state is the starting state and the bound state is the final state. Then, the unbound state will be taken as a reference for the computation, and the resulting matrix might be asymmetrical.
The number of possible substitutions for each structural letter can then be computed from the substitution probabilities:N sub(x) = eH(x)
A N sub equal to 1 indicates that structural letter x is integrally transformed into one structural letter (it can be itself). The maximum theoretical N sub is 27: it means that structural letter x is transformed into all the 27 structural letters, with equal probabilities.
The local modifications induced by protein-protein binding are studied in more details at the receptor-ligand binding interface. Interfaces are detected using Voronoi tessellations. Voronoi tessellations are a way to divide the space around a given set of points into cells. The Voronoi cell around a point contains all the points that are closer to this point than the others. Voronoi tessellations are used to study contacts within proteins, without the use of threshold distance . Here, Voronoi tessellations are used to identify the residues that make contacts between the receptor and the ligand. We use the PROVAT software  to compute the Voronoi cells around Cα, with default parameters. Two residues are in contact if their Voronoi cells share a surface with non-zero area. A structural letter is a four residue fragment. The correspondence is made between a four residue fragment and its third Cα.
The structural modifications are studied in more details in the interfaces. We will refer to this part of the data as the interface set.
N obs (x) denotes the observed number of letter x in the interface set and N exp (x) denotes the expected number of x in the interface if the compositions of interface and non-interface regions were similar:
were denotes the relative frequency of x in non-interface region and N inter the number of structural letter of any type in the interface set.
Zscores are similarly computed to assess the over-modification of a given structural letter, with N obs (x) the number of structural letter x that is modified upon binding in the interface set, and where denotes the probability for letter x to be modified in the control set, and denotes the number of letter x in unbound form in the interface set.
Zscores are expected to follow a Gaussian distribution with mean equal to zero and standard deviation of 1. Significance thresholds are corrected to take the multiple tests into account.
We are grateful to INRA for awarding a Fellowship to JM and to Ministère de l'Enseignement Supérieur et de la Recherche for awarding a Fellowship to LR. We thank two anonymous referees for their remarks that helped us to improve the manuscript.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.