Deciphering the shape and deformation of secondary structures through local conformation analysis
© Baussand and Camproux; licensee BioMed Central Ltd. 2011
Received: 7 July 2010
Accepted: 1 February 2011
Published: 1 February 2011
Protein deformation has been extensively analysed through global methods based on RMSD, torsion angles and Principal Components Analysis calculations. Here we use a local approach, able to distinguish among the different backbone conformations within loops, α-helices and β-strands, to address the question of secondary structures' shape variation within proteins and deformation at interface upon complexation.
Using a structural alphabet, we translated the 3 D structures of large sets of protein-protein complexes into sequences of structural letters. The shape of the secondary structures can be assessed by the structural letters that modeled them in the structural sequences. The distribution analysis of the structural letters in the three protein compartments (surface, core and interface) reveals that secondary structures tend to adopt preferential conformations that differ among the compartments. The local description of secondary structures highlights that curved conformations are preferred on the surface while straight ones are preferred in the core. Interfaces display a mixture of local conformations either preferred in core or surface. The analysis of the structural letters transition occurring between protein-bound and unbound conformations shows that the deformation of secondary structure is tightly linked to the compartment preference of the local conformations.
The conformation of secondary structures can be further analysed and detailed thanks to a structural alphabet which allows a better description of protein surface, core and interface in terms of secondary structures' shape and deformation. Induced-fit modification tendencies described here should be valuable information to identify and characterize regions under strong structural constraints for functional reasons.
Our understanding of protein interaction mechanisms relies on the analysis of protein-protein complexes aiming to identify and characterize the fundamental physico-chemical and structural factors that are required for the specific recognition and functional interaction of protein partners. Considerable efforts have been made to describe protein-protein interfaces in terms of amino acids composition and evolution [1–5], and in terms of structural [6–10] and dynamical features [11–13]. The analysis of protein complexes revealed that, although specific protein-protein interfaces present distinct features compared to non-specific interfaces observed in proteins crystals [14–16], their properties can differ between the different types of complexes (i.e. homocomplexes, heterocomplexes, obligate and transient complexes) [1, 10, 17–20]. The analysis of secondary structures at protein-protein interface emphasized the importance of non-regular secondary structure (loops) compared to more rigid regular ones (α-helices and β-strands) preferred in the core . The secondary structure percentages at interface are more correlated with those of the exterior residues which suggests that the interface is structurally closer to the protein surface than to the protein core . Loops, which are more able to adjust themselves upon interaction, generally contribute to 40% of the interface [10, 23]. Compared to other complexes, transient complexes present a greater involvement of loops at interface since they provide more flexibility for the protein molecules to associate and dissociate appropriately . α-helices are also well represented at protein-protein interface, particularly in obligatory homocomplexes of which interfaces are mainly composed by helix-helix pairing [10, 17]. In transient heterocomplexes, binding sites have preference for β-sheets and long non-regular structures but not for α-helices . The strong preference for β-sheets is probably due to their high ability to form densely packed structures when placed one against the other, thus having a higher potential for intermolecular bond formation. In addition, secondary structures appear to be under constraints to form interface scaffolds favorable to protein-protein interaction .
Besides the static structural description of protein-protein interfaces, conformational and dynamical changes upon complexation have been analysed since they have important implication for the development of docking algorithms . Both the 'induced-fit'  and the 'pre-existing equilibrium'  models for protein binding mechanism underline structural differences between the bound and unbound states of proteins. In the former model the differences are due to conformational changes induced by the binding of the ligand, while in the latter the differences are more related to dynamical changes where the bound state corresponds to conformations that pre-exist in the unbound conformations ensemble. Comparisons between bound and unbound structures have been mainly performed through RMSD, torsion angles [11, 28], RMSF and Principal Components Analysis calculations . Evidence for both models have been found possibly playing a joint role in molecular recognition [29, 30]. Structural differences between the bound and the unbound states of a protein can be either large (monoclonal IgE antibody, RMSD ~ 7Å) or small (less than 1Å). Conformational changes are not restricted to the interface and affect around 20% of the residues in allosteric proteins [11, 28]. Interface residues generally undergo larger motions than the rest of the protein in the case of enzymes . In the case of ubiquitin, local structural variations in the region surrounding the binding site have been found to play an important functional role allowing the protein to adapt to its several structurally diverse partners despite a low RMSD in the ensemble of the recognition dynamics [30, 32]. The importance of the local structural variation observed in the binding process of ubiquitin highlights the need for efficient local approaches to understand the mechanism of protein-protein interaction. In terms of dynamics, mobility of residues at interface is not homogeneous, core and surface interface residues are respectively less and more mobile than the rest of the surface [12, 13]. In terms of secondary structures elements, loops are more likely to experience motions than α-helices and β-strands . Although the secondary structure composition at protein-protein interface is similar in bound and unbound conformations , changes in secondary structures from disorder-to-order and order-to-order occur, possibly playing important functional roles .
An innovative way to analyse and characterize induced-fit conformational changes has been proposed which consists of translating the 3 D protein structures into 1 D structural sequences using a structural alphabet . What is the advantage of using a structural alphabet to analyse secondary structures shape and their induced-fit deformation? Helical secondary structures can be curved, kinked or straight . Strand geometry depends on sheet parallelism and pleat which results in variable conformation of the β-strands. Loops are weakly constrained structures and therefore difficult to characterize and compare. The HMM-SA structural alphabet  describes the local shape of proteins and the logic of their assembly in 27 structural letters. It provides a detailed description of the protein backbone and allows the identification of conformational variations within the different secondary structure types. We call conformational variations differences in the backbone conformation (modeled by different structural letters) leading to variation in the shape of the secondary structures. Four structural letters are associated with variation in the backbone of α-helices, five to variation in the backbone of β-strands. The 18 remaining structural letters described local conformations forming loops. Thus the structural alphabet provides a way to distinguish among the different conformational states of each type of secondary structure, and also to characterize these states being then comparable. The study presented in , in which HMM-SA was used to analyse the differences in structural letter composition at interface of bound and unbound proteins, was the first qualitative description of induced-fit structural changes. It revealed that some specific local conformations in coils are more likely to be deformed at interface upon complexation than other, and that the severity of the structural changes may also vary.
Here we investigate the structural differences between the local conformations that can explain this variable behavior in respect of deformation upon complexation. While the previous study mainly focused on the deformation at interface of local conformations associated with loops, here we analyse each of the three types of secondary structure in the whole proteins. We first verify that the structural alphabet is able to fit previously reported description of protein interface, surface and core in terms of the secondary structure for the four different types of complexes. A more detailed analysis reveals a non-uniform distribution of the structural letters within proteins with clear preference of particular structural letters for either surface or core, and to a lesser extent for interface and non-interface regions. We show that structural letters with similar distribution preference shared common structural and solvent exposure features. In other words, it means that different backbone conformations tend to be adopted by the secondary structures depending on their location in proteins at interface, on surface or in core. We revisit the analysis of the structural deformation of local conformations upon interaction proposed in  by comparing a dataset of bound and unbound proteins and show how the deformation of local conformations is related to their preferred location in proteins. Deformation tendencies for local conformations are defined and different example cases of deformation are presented.
Results and Discussion
HMM-SA encoding and secondary structures
Distribution of secondary structures within protein compartments
Secondary structures distribution at protein interface, surface and core
Distribution of local conformations within protein compartments
Distribution of loop-letters and border-letters
The first axis of the MCA plot separates loop-letters into two groups of letters (Figure 2B1, C1): [G,R,S,O,E,I,Q] and [P,H,Y,U,D,F] preferentially distributed in core and on surface respectively. In addition, some letters show a preference for interface or non-interface regions (Figure 2C1). In the first group, [E,I,Q,O] present preference for interface (positive Z inter face/surface ) with significant Z-score values for [E,I,Q]. In the second group, [D] is under-represented at interface (highly negative Z inter face/surface ) whereas [F] shows preference for interface. The KLd values associated with border-letters are all significant: [B,K] are the most preferred on surface and the least in core while [Z,C,J] display the opposite behavior.
Distribution of β-letters
Non-uniform distribution among the three protein compartments is also observed for β-letters (Figure 2B2,C2). Letter [L] obtains the most significant KLd value among the 27 structural letters and displays a clear preference for surface. Significant KLd values are obtained for β-letters [M,N,T] which are preferentially distributed in core as illustrated by the MCA plot. Letters [T,N] are clearly distinguished by the second axis of the MCA plot: letter [T] is preferred at interface compared to surface while [N] is under-represented at interface compared to both surface and core indicating its preference for non-interface regions. Letter [X] has no significant preference.
Distribution of α-letters
Letters [A,a,V] exhibit different distribution in the three compartments (Figure 2B3, C3) while letter [W] has no clear preference. Letter [A] is preferred in core while [a,V] are preferred on surface. More precisely, Z-scores show the preference of [a] for non-interface region being preferred in both core and surface compared to interface (Figure 2C3). Notice that the KLd and Z-score values obtained for α-letters are lower than the ones obtained for loop- and β-letters indicating that α-letters display weaker distribution differences than the other structural letters.
Compartment preferences in the different types of protein-protein complexes
The distribution analysis of the structural letters in the three protein compartments of the complete dataset unveils compartment preferences among local conformations belonging to the same secondary structure type. The local approach analysis reveals a tendency for secondary structures to adopt different local shapes according to their location in proteins at interface, surface or core. The analysis of homodimers, heterodimers, obligate and transient complexes separately shows a similar distribution preferences for local conformations among the different types of complexes (Additional files 3, 4, 5 top and center). In particular, the distribution preference of letters for surface, core and non-interface is very strong and stable while the preference of letters for interface is more likely to vary between the different complexes. However, for transient complexes, the preference of local conformations for interface and non-interface is maintained in both bound and unbound states suggesting a structural predisposition of binding sites for interaction (Additional files 3, 4, 5 bottom).
Percentage of secondary structures affected by the preferential distribution
The local approach reveals that some local conformations are more affected by the preferential distribution than others. For instance structural letters [L] and [M], which have been shown to be preferred on surface and in core respectively, correspond to 57% of the β-letters affected by the preferential distribution in the complete dataset (Additional file 6).
In the following, α-letters [a,V], β-letter [L], loop-letters [P,H,Y,D,U,F] and border-letters [B,K], which are local conformations preferentially distributed on surface, are grouped together as surface-letters. Strong preference for core is observed for α-letters [A], β-letters [T,M,N], loop-letters [G,R,O,I,S,E,Q] and border-letters [Z,C,J]. They are therefore grouped together as core-letters. Although the representation at interface of some letters may vary among the different types of complexes, the tendency for letters [F] and [a,N,D] to be preferred in interface and non-interface regions respectively is very stable. Letters [a,N,D] are then further characterized as non-interface-letters and letter [F] as interface-letter. The structural characteristics of these groups of local conformations are analysed.
Compartment preference and amino acids composition of local conformations
The amino acids composition of local conformations is evaluated at interface, surface and core in the complete dataset. For each structural letter, tryptophan and tyrosin are in greater or similar proportion at interface than in core while all other hydrophobic residues present a greater proportion in core. Arginine and histidine present their highest proportion at interface compared to both surface and core. These residues have been previously found to be enriched at protein interface [1, 8, 42]. The proportion of proline and glycine, two residues known to be key structural residues, is observed to greatly vary between some structural letters, however these differences do not distinguish between surface- and core-letters structural letters (Additional files 7 and 8). Interface-letter [F] presents a high proportion of both residues (14% of proline and 22% of glycine at interface). Non-interface-letters [a,N] present low proportion of proline (from <7%) while [D] appears to be particularity enriched in glycine (55%) in agreement with . Other structural letters with different compartment preference [J,R,U] are enriched in glycine. Then the amino acid composition of the structural letters, analysed in the different compartments, is unlikely to explain the compartment preference of the local conformations and confirms that amino acids and local conformations give complementary and not redundant information.
Compartment preference and structural description of local conformations
Characteristics of loop-letters
By focusing on the values of descriptors P4/d3 and d1/d3 for loop-letters (Figure 3B), we observe that surface-letters associated with loops correspond to local conformations with short d3 and a tendency for low or negative P4. Non-interface-letter [D] and interface-letter [F] differ from the other surface-letters with the shortest d1. Core-letters display short d1 with positive P4 but can be separated in two groups: [I,R,S,Q] display long d3 while [G,E,O] display short d3 comparable to surface-letters. These structural differences between the loop local conformations agree with their solvent accessibility (Figure 3B right). All surface-letters as well as core-letters [I,R,S,Q] are respectively the most and least accessible to solvent while core-letters [G,E,O] present intermediate solvent accessibility. It suggests that local conformations with short d1 and long d3 are related to unfavored solvent exposure and then preferentially distributed in core, while local conformations with long d1 and short d3 are more exposed to solvent with variation according to the extent of the curvature (variation in d3 values) and its orientation. A negative P4 appears to indicate an orientation towards the protein exterior and is associated with surface-letters while positive P4 indicates an orientation towards the protein interior and is associated with core-letters. Notice that border-letters present intermediate descriptor values since they can be associated with either regular or non-regular conformations, and so are not considered here.
Characteristics of β- and α-letters
Similarly for β-letters (Figure 3C), surface-letter [L] is significant of a curvature in β-strands (the shortest d3 and highly negative P4) and presents the highest solvent exposure on surface among all β-letters, while core-letters [T,X,M,N] are the least exposed. In particular, [T,M] correspond to straight β-strand conformations (with the large d2). Distinction between α-letters in terms of structural descriptors is not clear (Figure 3A,C), which is coherent with the fact that they also display the least differences in terms of distribution between the three protein compartments (Figure 2). However, their subtle differences in terms of structural descriptors are in fact reflecting different helix geometries: surface-letters [V,a] are associated with distortions leading to kinked and curved helices respectively while [A] forms straight helices . Non-interface-letters [a,N] also display common structural specificities corresponding to the local conformations with the shortest d1 in respect with the other letters of the same secondary structure type. The structural specificities of letters associated with either regular or non-regular secondary structures but sharing the same compartment preference are unveiled: curved conformations appear to be preferred in surface and straight ones in core. Such variations in the backbone of secondary structures is associated with solvent exposure differences. Local conformations avoided at interface correspond to conformations with the shortest distance Cα 1-Cα 3. These results reveal new structural features, regarding the preferential shape of regular and non regular secondary structures in proteins compartments, which have not been appreciated before.
Revisiting the deformation of local conformations
The deformation of local conformations upon complexation previously studied in  is revisited and results are further interpreted in the light of the compartment preference and structural characteristics of the local conformations. We use a protein-protein interface definition based on solvent accessibility variation (versus contact points with voronoi tessellation) and consider all structural letter transitions (versus only severe deformations with local RMSD greater than 0.2Å) within and between the different secondary structure types.
Deformation of local conformations
Deformation of loops and exposure to protein partner
Relative solvent accessibilies are computed for deformed local loop conformations in the interface compartment in both unbound and disjoint bound conformations, and the difference D between the two accessibilities is calculated. A negative difference indicates a deformation towards a local conformation with higher exposure to the exterior (i.e. towards the partner) while a positive one indicate a tendency for lower exposure. The average difference calculated on surface-letters deformed on surface-letters ( = -8.2 ± 22.6%, median = -5.0) and on core-letters deformed on core-letters ( = -1.5 ± 18.1%, median = -2.6) are all negative indicating that complexation globally increases residue exposure to the protein exterior. However, deformation of surface-letters towards surface-letters tend to be associated with higher exposure than deformation towards core-letters ( = -4.5 ± 24.7%, median = 1.3). Coherently, deformation of core-letters towards core-letters tend to be associated with lower exposure than deformation towards surface-letters ( = -11.4 ± 21.8%,median = -7.7).
Put all together it suggests that, since the deformation of loops upon complexation barely modify their exposure to protein exterior (transitions mainly between letters sharing same compartement preference and structural characteristics), most of local loop conformations are in an optimized conformation for interaction in the unbound state. More drastic deformations of local conformations occur (transitions between letters of different compartment preference and different structural characteristics) which tend to modify the exposure of the residues towards the protein partner. Transitions from a core-letter to a surface-letter at interface would favor residue interaction between the two partners (increase exterior exposure) while the reverse transitions tend to unfavor it (decrease exterior exposure).
Local conformations are not subject to the same rate of deformation and follow some specific deformation tendencies: i) transitions from one secondary structure to another are avoided but deformation within each secondary structure type occur with preferences between pairs or groups of letters ([a]→ [A] for helices, [N]→ [T,M] for strand, [P,H,Y,D,U,F]→ [P,H,Y,D,U,F] and [G,R,O,I,S,E,Q]→ [G,R,O,I,S,E,Q] for loops), ii) deformation preference between local conformations are not commutative, iii) flanking regions are the most frequently deformed local conformations. These observations are in agreement with . The analysis of the distribution of local conformations in proteins highlights new features, and their deformations are consistent with their compartment preferences. Regarding regular secondary structures, iv) the most deformed local conformations [a,N] correspond to curved conformations which tend to be avoided at interface (in both bound and unbound states), v) the least deformed ones [A,T,M] correspond to straight conformations preferentially distributed in core and vi) the most deformed local conformations tend to be preferentially deformed towards the least deformed ones. Regarding loops, vii) two groups of local conformations emerge where deformation preferentially occur between local conformations of the same group, viii) these two groups present different compartment preference, one being preferred in core and the other on surface, ix) deformation from one group to the other is associated with higher variation of protein exterior exposure than deformation between local conformations of the same group.
Notice that the correlated straightening-out of regular secondary structures on each side of the interface of the complexes has been evaluated through the occurrence difference of regular straight letters [A,T,M] between the unbound and bound states of each subunit of each complex. However, the low number of observations per complex does not allow any firm conclusions to be drawn.
Illustration of deformation captured by the structural alphabet
Example cases of protein-protein interaction are selected from the bound/unbound dataset to illustate the information that can be derived from the deformation tendencies described above. The two first examples illustrate induced-fit modifications that follow the deformation tendencies, the last four illustrate their violation.
From curved to straight regular secondary structures
All the following example cases illustrate the violation of the deformation tendencies. In these examples, it appears that the observed deformations are associated with structural constraints directly related to the function of the proteins.
From straight to curved helices
The two following examples illustrate the deformation of loops associated with transitions between surface-and core-letters, which are in violation with the deformation tendencies. Residues 18-21 ([PDB:1DE4] chain A) belonging to the α 1 domain loop of the hemocromatosis protein (HFE) is deformed upon interaction with the transferin receptor (TfR) from a curved conformation (modeled by core-letter [O]) to a straight conformation (modeled by surface-letter [P]) (Figure 6B). This extended conformation of the loop allows the exposure of residues L20 and L22 towards the TfR and in particular the interaction of TfR-helix1 with Leu 22 . This loop plays a crucial role in the interaction of the two proteins, its substitution results in a ~ 10-fold reduction in affinity for TfR . The second example shows the deformation of residues 100-103, forming a loop at the surface of the transthyretin upon complexation with a molecule of retinol-binding protein ([PDB:1RLB] chain A). It corresponds to the transition from a straight (modeled by surface-letter [H]) to a curved conformation (modeled by core-letter [O]). It appears that this deformation is due to residue S100 that is pushed towards the protein interior while interacting with the partner, inducing a rotation of P102 (Figure 6C).
From regular to irregular local conformations
The detection of local deformations in the backbone of the proteins by this local approach highlights the importance not only to consider deformation between different secondary structure types but also the conformational variations that occur within the different secondary structure types. While deformation tendencies define general features for secondary structures induced-fit modification coherent with the compartment preference of local conformations, the example cases show more drastic structural modifications that violate the deformation tendencies due to strong structural constraints for functional reasons.
Descriptors of protein interfaces based on amino acid composition and evolution, structural features and complementarity are fundamental to the understanding, prediction and modeling of protein-protein interactions [5, 9, 50–52] and ultimately to protein functions. Recent work on ubiquitin has shown the need for efficent structural descriptors able to characterize local conformations [30, 32]. Here we use the structural alphabet HMM-SA that allows the identification of local variations in secondary structure conformations. Loops can be characterized despite their high plasticity that inhibits their description by global approaches . The straight or curved shape of regular secondary structures can be detected. Our analysis reveals new structural features, regarding the shape and induced-fit deformation of secondary structures, which have not been appreciated before. In particular, variations in the shape of secondary structures have been analysed thanks to the local approach for the different types of complexes and results are shown to be stable between homodimers, heterodimers, obligate and transient complexes. The large-scale analysis of secondary structure changes in proteins from disordered to ordered secondary structure and between different secondary structure types using a global approach has shown the importance of secondary structure modification for protein function . Here we show that conformational modification within secondary structures can be further analyzed and detailed using to the local approach. We show that the local conformations associated with the different types of secondary structures are not uniformly distributed within proteins at interface, in the core and on the surface, but show compartment preferences that can be related to structural characteristics. In the light of this new structural description of protein compartments, we revisited the induced-fit modifications of local conformation analysis proposed in .
The local conformations modeled by the 27 structural letters of HMM-SA are associated with variation in secondary structure conformation. We observed that they present preferential distributions at protein interface, surface and core which affect around 14% of the loop-letters, 23% of the β-letters and 3% of the α-letters. The greatest difference occurs between protein surface and core, where straight local conformations are preferred in core while curved ones are preferred on surface with the particularity for some of them to be avoided at interface. The proportion of a local conformation at interface is generally intermediate between its proportion on surface and in the core suggesting that interface scaffolds are formed by secondary structures mixing local conformations preferred on surface with ones preferred in the core. Previous analysis on amino acid composition have led to the description of protein-protein interfaces as regions displaying intermediate properties between those of the hydrophilic protein surface and the hydrophobic protein core [40, 54], hydrophobic and polar residues are organized in a core/rim interface [6, 7]. Local conformations preferentially distributed on the surface tend to be more accessible to solvent at interface than local conformations prefered in the core. This suggests a specific organisation of the local conformations in the binding site (similarly to the amino acids). However the amino acid composition of the local conformations appears to be not correlated with their compartment preference, exposure to solvent of residues is more likely to play a role. Moreover the fact that some local conformations are found to be avoided at interface in both protein bound and unbound states and that local loop conformations are mainly unchanged upon complexation suggests that such organisation is prior to the interaction. Binding sites would be structurally optimized to interact with protein partners. This latter remark is supported by a large-scale analysis of protein-protein interface performed by a global approach showing that favorable interface structural scaffolds have been re-used and adapted by evolution for diverse functions . To the authors' knowledge, the analysis and results presented here have not been reported before and have been elucidated thanks to the use of a local approach able to described the conformation of secondary structures elements in more details than global approaches. These findings should be considered for accurate protein structure reconstruction either based on structural alphabet  or on efficient secondary structure conformation prediction .
The analysis proposed in  has opened the path to an innovative way to analyse structural modifications upon complexation and has highlighted differences between local conformations regarding deformation. By revisiting the induced-fit modifications of local conformations in the light of their compartment preference and structural characteristics, we gain further insight into the deformation properties of local conformations, and of secondary structures to a larger extent, upon protein-protein complex formation. For regular secondary structures, curved conformations (surface preference) tend to be mostly deformed at interface towards straight conformations (core preference), these deformations could be a mechanistic effect of the interaction with the partner leading to a structural adaptive flattening of the interface's surface and a decrease of solvent exposure. For loops, deformation of local conformations appears to be mainly associated with the conservation of the exterior exposure suggesting that loops adopt optimized conformations prior to the interaction. Deformations associated with a modification of the exposure to protein exterior are suggested to favor/unfavor residue interaction with the partner. The low number of this latter type of deformation fits with the fact that only few residues at interface are under strong structural/functional constraints. Interestingly, flanking regions present a different behavior compared to secondary structures being highly deformed. It highlights their important structural adaptive role in the reorganisation of secondary structures between them upon interaction. Induced-fit modification tendencies defined from this analysis should be valuable information to consider for docking tools that aim to consider proteins flexibility [25, 57] since protein deformation can be of critical importance for protein interaction. Finaly, we present example cases where the violations of the induced-fit modification tendencies derived from this analysis are associated with strong structural constraints directly related to the function of the proteins. An example illustrates transitions between local conformations associated with different secondary structure types which characterize the deformation of a linker and of a neighboring region involved in the open/closed conformation of the protein. More globally, transitions between different secondary structure types have been shown to play an important role in protein function [58–60] and are observed in a variety of proteins . Therefore the possibility to finely detect and characterize such transitions is an important point of this study. Another example of the violation of the induced-fit modification tendencies is the deformation from straight to curved α-helices involved in the inhibitory conformation of a protein. The detection of such subtle deformations by the local approach highlights the importance not only of considering deformations between different secondary structure types but also the conformational variations that occur within them. Such considerations should allow a better understanding of the role of secondary structures in the functional mechanism of proteins.
Datasets of protein-protein complexes
Among the 8205 complexes with different interface scaffold described in , we select a set of 1496 two-chain protein complexes (1283 PDB entries) that present i) structure resolution below 2.5Å, ii) R-factor below 0.3 and iii) at least three other two-chain protein complexes in the PDB that share the same structural scaffold at interface. This dataset is constructed to avoid biases owes to similar interface scaffolds between the proteins of the dataset.
Homo/heterodimers, transient/obligate complexes
Four other datasets previously described in the literature are used here to distinguish among the different types of protein-protein complexes. These are denoted Homodimers (93 complexes ), Heterodimers (203 complexes ), Transient and Obligate complexes (70 and 96 complexes respectively ) datasets. 49% (respectively 17%) of the PDB entries in the transient complexes (respectively heterocomplexes) dataset are shared with the heterocomplexes (respectively transient complexes) dataset, homodimers and obligates complexes shares less than 5% of PDB entries.
Two more additional datasets extracted from the version 2.4 of the benchmark proposed in [63, 64] are used: 84 crystallographic structures of transient complexes (bound state) to which are associated the corresponding structures of the free proteins (unbound state).
Definition of protein compartments: Interface, surface and core
Proteins are divided into three compartments: interface, surface and core. Residues are assigned to one of the three compartments according to their percentage of relative solvent accessibilities in the disjoint bound conformation (noted A chain ), in the two-chain complex forming the interface of interest (noted A interf ) and in the higher complex considering all chains described in the PDB entry (noted A complex ). Core residues correspond to residues r that are buried in the core of the protein () and whose relative solvent accessibility is not modified when the chain is associated with the other chains of the complex (). These residues constitute the core compartment of proteins. Surface residues correspond to residues r that are exposed at protein surface () and that do not display solvent accessibility variation in the stand-alone chain compared to the higher complex (). These residues constitute the surface compartment. Interface residues correspond to residues r that are exposed at protein surface () and whose relative solvent accessibility is modified when the two chains forming the interface of interest are associated (). These residues constitute the interface compartment. Residues that do not fit one of these three definitions are denoted undefined and are not considered for the analysis since they cannot be assigned to a compartment. The definition of interface compartments in this work aims to take into account residues affected by the binding of the partner rather that only those which interact with it. This choice is based on previous studies which argued that interaction of protein partners may not only be due to specific interaction of residues but also to non-partner specific structural features surrounding the interacting residues (favorable interface scaffolds , convergent local structural motifs ). Therefore, similarly to  where the interface definition also considers neighboring residues to interacting ones since they provide the interface scaffold, we define as interfacial residues those with 1% solvent accessibility change upon interaction in order to largely consider the residues of the secondary structures forming the interface scaffold.
Residues and structural letters
The 3 D structures are described as series of overlapping four-residues fragments modeled by a structural letter. Therefore a residue r is associated with four different fragments L1, ..., L4 where L1 corresponds to the four successive residues r - 3 → r and L4 to the four successive residues r → r + 3. Each four-residue fragment is associated with a structural letter describing its conformation, a protein structure of N residues is encoded in a sequence of N - 3 structural letters. The physico-chemical characteristics and the compartment assignment of the structural letter encoding the fragment r - 2 → r + 1 are determined according to the properties of the residue r as in .
Qualitative statistical analysis
Multiple Correspondence Analysis
Multiple Correspondence Analysis (MCA) is a qualitative multivariate method used here for the 2 D representation of the structural letters' occurrence in each of the three protein compartments . The graphical display of the MCA allows the qualitative analysis of the structural letters' preference for proteins interface, surface or core compartments.
Principal Component Analysis
Principal Component Analysis (PCA) is a multivariate method used here for the representation of the structural descriptors of the structural letters. The PCA transforms the variables into a smaller number of uncorrelated variables (principal components) .
Quantitative statistical analysis
where cp is a compartment, sl is a given structural letter, ss is the set of letters of the same secondary structure type than sl, p sl,cp is the frequency of sl in compartment cp (i.e. occurence of sl in cp over N sl the occurence of sl in the 3 compartment) and p ss,cp is the frequency of ss in compartment cp (i.e. occurence of ss in cp over the occurence of ss in the three compartment). The KLd values can be assessed by a χ2 test, since the quantity 2N sl × KLd(sl) (denoted KLd quantities) follows a χ2 distribution.
where sl is a given structural letter, is the observed occurrence of sl in compartment cp 1, is the expected occurrence of sl in compartment cp 1 if distributions in cp 1 and cp 2 were similar. = Ncp 1(sl) × fcp 2(sl) where Ncp 1(sl) is the occurrence of sl in cp 1 and fcp 2(sl) the relative frequency of sl in cp 2. has to be > 5 for the Z-score to be statistically meaningful. A Bonferoni correction is applied on each test to determine the significativity threshold T : Zcp 1/cp 2(sl) > T indicates a significant preference of sl for compartment cp 1, Zcp 1/cp 2(sl) < -T indicates a significant preference for cp 2.
Relative solvent accessibility calculation
Relative solvent accessibilities of residues are calculated using NACCESS 2.1.1  with a probe size of 1.4Å. Relative accessibilities are calculated for each residue in a protein by expressing the summed residue accessible surfaces as a percentage of that observed in a ALA-X-ALA tripeptide built using the QUANTA molecular graphics package in extended conformations.
Quantification of structural letters deformation at interface
where P inter f (sl1, sl2) is calculated for letters at protein interface and P sur f (sl1, sl2) for letters at protein surface. The idea here is that deformations which differ the most between interface and surface (ΔP (sl1, sl2) >> 0) are more likely to be induced by the interaction.
We are grateful to University Denis Diderot-Paris7 for awarding a fellowship to JB. Thanks to Leslie Regad for discussion on the paper. We are grateful to Michael Sadowski for carefully reading the manuscript.
- Ofran Y, Rost B: Analysing six types of protein-protein interfaces. J Mol Biol 2003, 325: 377–387. 10.1016/S0022-2836(02)01223-8View ArticlePubMed
- Lo Conte L, Chothia C, Janin J: The atomic structure of protein-protein recognition sites. J Mol Biol 1999, 285: 2177–2198. 10.1006/jmbi.1998.2439View ArticlePubMed
- Glaser F, Steinberg D, Vakser I, Ben-Tal N: Residue frequencies and pairing preference at protein-protein interfaces. Proteins 2001, 43: 89–102. 10.1002/1097-0134(20010501)43:2<89::AID-PROT1021>3.0.CO;2-HView ArticlePubMed
- Res I, Lichtarge O: Character and evolution of protein-protein interfaces. Phys Biol 2005, 2: S36-S43. 10.1088/1478-3975/2/2/S04View ArticlePubMed
- Guharoy M, Chakrabarti P: Conserved residue clusters at protein-protein interfaces and their use in binding site identification. BMC Bioinformatics 2010, 11: 286. 10.1186/1471-2105-11-286PubMed CentralView ArticlePubMed
- Chakrabarti P, Janin J: Dissecting protein-protein recognition sites. Proteins 2002, 15: 334–343. 10.1002/prot.10085View Article
- Bahadur R, Chakrabarti P, Rodiffer F, Janin J: Dissecting subunit interfaces in homodimeric proteins. Proteins 2003, 53: 708–719. 10.1002/prot.10461View ArticlePubMed
- Neuvirth H, Raz R, Schreiber G: ProMate: a structure based prediction program to identify the location of protein-protein binding site. J Mol Biol 2004, 338: 181–199. 10.1016/j.jmb.2004.02.040View ArticlePubMed
- Hoskins J, Lovell S, Blundell T: An algorithm for predicting interaction sites: abnormally exposed amino acid residues and secondary structure elements. Protein Sci 2006, 5: 1017–1029. 10.1110/ps.051589106View Article
- Guharoy M, Chakrabarti P: Secondary structures based analysis and classification of biological interfaces: identification of binding motifs in protein-protein interactions. Bioinformatics 2007, 23: 1909–1918. 10.1093/bioinformatics/btm274View ArticlePubMed
- Betts M, Sternberg M: An analysis of conformational changes on protein-protein association: implications for predictive docking. Protein Eng 1999, 12: 271–283. 10.1093/protein/12.4.271View ArticlePubMed
- Smith G, Sternberg M, Bates P: The relationship between the flexibility of proteins and their conformational states on forming protein-protein complexes with an application to protein-protein docking. J Mol Biol 2005, 347: 1077–1101. 10.1016/j.jmb.2005.01.058View ArticlePubMed
- Yogurtcu O, Erdemli S, Nussinov R, Turkay M, Keskin O: Restricted mobility of conserved residues in protein-protein interfaces in molecular simulations. Biophys J 2008, 94: 3475–3485. 10.1529/biophysj.107.114835PubMed CentralView ArticlePubMed
- Valdar W, Thornton J: Conservation helps to identify biologically relevant crystal contacts. J Mol Biol 2001, 313: 399–416. 10.1006/jmbi.2001.5034View ArticlePubMed
- Mintseris J, Weng Z: Atomic contacts vectors in protein-protein recognition. Proteins 2003, 53: 629–639. 10.1002/prot.10432View ArticlePubMed
- Jeerson E, Walsh T, Barton G: Biological units and their effects upon the properties and prediction of protein-protein interactions. J Mol Biol 2006, 364: 1118–1129. 10.1016/j.jmb.2006.09.042View Article
- De S, Krishnadev O, Srinivasan N, Rekha N: Interaction preferences across protein-protein interfaces of obligatory and non-obligatory components are different. BMC Struct Biol 2005, 16: 15. 10.1186/1472-6807-5-15View Article
- Zhanhua C, Gah-Kok Gan J, Lei L, Sakharkar M, Kangueane P: Protein subunit interfaces: heterodimers versus homodimers. Bioinformation 2005, 2: 28–39.View Article
- Mintseris J, Weng Z: Structure, function and evolution of transient and obligate protein-protein interactions. Proc Natl Acad Sci 2005, 102: 10930–10935. 10.1073/pnas.0502667102PubMed CentralView ArticlePubMed
- Vacic V, Uversky V, Dunker A, Lonardi S: Composition Profiler: a tool for discovery and visualization of amino acid composition difference. BMC Bioinformatics 2007, 8: 211. 10.1186/1471-2105-8-211PubMed CentralView ArticlePubMed
- Jones S, Thornton J: Protein-protein interactions: a review of protein dimer structures. Prog Biophys Molec Biol 1995, 63: 31–65. 10.1016/0079-6107(94)00008-WView Article
- Argos P: An investigation of protein subunit and domain interfaces. Protein Eng 1998, 2: 101–113. 10.1093/protein/2.2.101View Article
- Miller S: The structure of interfaces between subunits of dimeric and tetrameric proteins. Protein Eng 1989, 3: 77–83. 10.1093/protein/3.2.77View ArticlePubMed
- Keskin O, Nussinov R: Favorable scaffolds: proteins with different sequence, structure and function may associate in similar ways. PEDS 2005, 18: 11–24.PubMed
- May A, Zacharias M: Accounting for global protein deformability during protein-protein and protein-ligand docking. Biochim Biophys Acta 2005, 30: 225–231.View Article
- Koshland D: Application of a theory of enzyme specificity to protein synthesis. Proc Natl Acad Sci 1958, 44: 98–104. 10.1073/pnas.44.2.98PubMed CentralView ArticlePubMed
- Tsai C, Kumar S, Ma B, Nussinov R: Folding funnels, binding funnels and protein function. Protein Sci 1999, 8: 1181–1190. 10.1110/ps.8.6.1181PubMed CentralView ArticlePubMed
- Daily M, Gray J: Local motions in a benchmark of allosteric proteins. Proteins 2007, 67: 385–399. 10.1002/prot.21300View ArticlePubMed
- Goh CS, Milburn D, Gerstein M: Conformational changes associated with protein-protein interactions. Curr Op Struct Biol 2004, 14: 104–109. 10.1016/j.sbi.2004.01.005View Article
- Wlodarski T, Zagrovic B: Conformational selelction and induced fit mechanism underlie specifity in non-covalent interactions with ubiquitin. Proc Natl Acad Sci 2009, 106: 19346–19351. 10.1073/pnas.0906966106PubMed CentralView ArticlePubMed
- Gutteridge A, Thornton J: Conformational changes observed in enzyme crystal structures upon substrate binding. J Mol Biol 2005, 346: 21–28. 10.1016/j.jmb.2004.11.013View ArticlePubMed
- Perica T, Chothia C: Ubiquitin - molecular dynamics for recognition of different structures. Curr Op Struct Bio 2010, 20: 367–376. 10.1016/j.sbi.2010.03.007View Article
- Dan A, Ofran Y, Kliger Y: Large-scale analysis of secondary structure changes in proteins suggests a role for disorder-to-order transitions in nucleotide binding proteins. Proteins 2009, 78: 236–248. 10.1002/prot.22531View Article
- Martin J, Regad L, Lecornet H, Camproux A: Structural deformation upon protein-protein interaction: a structural alphabet approach. BMC Struct Biol 2008, 18: 12. 10.1186/1472-6807-8-12View Article
- Kumar S, Bansal M: Geometrical and sequence characteristics of alpha-helices in globular proteins. Biophys 1998, 75: 1935–1944. 10.1016/S0006-3495(98)77634-9
- Camproux A, Gauthier R, Tuery P: A hidden Markov model derived structural alphabet for proteins. J Mol Biol 2004, 339: 591–605. 10.1016/j.jmb.2004.04.005View ArticlePubMed
- Camproux A, Tuffery P: Hidden Markov Model-derived structural alphabet for proteins: the learning of protein local shapes captures sequence specificity. Biochim Biophys Acta 2005, 1724: 394–403.View ArticlePubMed
- Regad L, Martin J, Camproux A: Identification of non-random motifs in loops using a structural alphabet. IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology 2006, 1–9.
- Miller S, Janin J, Lesk A, Chothia C: Interior and surface of monomeric proteins. J Mol Biol 1987, 196: 641–656. 10.1016/0022-2836(87)90038-6View ArticlePubMed
- Jones S, Thornton J: Principles of protein-protein interactions. Proc Natl Acad Sci 1996, 93: 13–20. 10.1073/pnas.93.1.13PubMed CentralView ArticlePubMed
- Pal A, Chakrabarti P, Bahadur R, Rodiffer F, Janin J: Peptide segments in protein-protein interfaces. J Biosci 2007, 32: 101–111. 10.1007/s12038-007-0010-7View ArticlePubMed
- Bogan A, Thron K: Anatomy of hot spots in protein interfaces. J Mol Biol 1998, 280: 1–9. 10.1006/jmbi.1998.1843View ArticlePubMed
- Kawabata T: MATRAS: a program for protein 3 D structure comparison. Nuc Ac Res 2003, 31: 3367–3369. 10.1093/nar/gkg581View Article
- Huse M, Chen YG, Massague J, Kuriyan J: Crystal structure of the cytoplasmic domain of the type I TGF-beta receptor in complex with FKBP12. Cell 1999, 96: 425–436. 10.1016/S0092-8674(00)80555-3View ArticlePubMed
- Huse M, Muir T, Chen YG, Kuriyan J, Massague J: The TGF-beta receptor activation process: an inhibitor- to substrate-binding switch. Molecular Cell 2001, 8: 671–682. 10.1016/S1097-2765(01)00332-XView ArticlePubMed
- Bennett M, Lebron J, Bjorkman P: Crystal structure of the hereditary haemochromatosis protein HFR complexed with transferin receptor. Nature 2000, 403: 46–53. 10.1038/47417View ArticlePubMed
- Lebron J, Bjorkman P: The transferrin receptor binding site on HFE, the class I MHC-related protein mutated in hereditary hemochromatosis. J Mol Biol 1999, 289: 1109–1118. 10.1006/jmbi.1999.2842View ArticlePubMed
- Pike A, Brzozowski A, Roberts S, Olsen O, Persson E: Structure of human factor VIIa and its implications for the trigerring of blood coagulation. Proc Natl Acad Sci 1999, 96: 8925–8930. 10.1073/pnas.96.16.8925PubMed CentralView ArticlePubMed
- Zhang E, Charles RS, Tulinsky A: Structure of extracellular tissue factor complexed with factor VIIa inhibited with a BTPi mutant. J Mol Biol 1999, 285: 2089–2104. 10.1006/jmbi.1998.2452View ArticlePubMed
- Ban Y, Edelsbrunner H, Rudolph J: Interface surfaces for protein-protein complexe. J ACM 2006, 53: 361–378. 10.1145/1147954.1147957View Article
- Darnell S, Page D, Mitchell J: An automated decision-tree approach to predicting protein interaction hot spots. Proteins 2007, 68: 813–823. 10.1002/prot.21474View ArticlePubMed
- Yu J, Guo M: Prediction of protein-protein interactions from secondary structures in binding motifs using the statistic method. In Proceedings of the 2008 Fourth International Conference on Natural Computation 2008.
- Regad L, Martin J, Nuel G, Camproux A: Mining protein loops using a structural alphabet and statistical exceptionality. BMC Bioinformatics 2010, 11: 75. 10.1186/1471-2105-11-75PubMed CentralView ArticlePubMed
- Korn A, Burnett R: Distribution and complementarity of hydropathy in multisubunit proteins. Proteins 1991, 9: 37–55. 10.1002/prot.340090106View ArticlePubMed
- Tuery P, Guyon F, P D: Improved greedy algorithm for protein structure reconstruction. J Comput Chem 2005, 26: 506–513. 10.1002/jcc.20181View Article
- Podtelezhnikov AD, Wild D: Reconstruction and stability of secondary structure elements in the context of protein structure prediction. Biophys J 2009, 96: 4399–4408. 10.1016/j.bpj.2009.02.057PubMed CentralView ArticlePubMed
- B-Rao C, Subramaniana J, Sharmaa S: Managing protein flexibility in docking and its applications. Drug Discovery Today 2009, 14: 394–400. 10.1016/j.drudis.2009.01.003View ArticlePubMed
- Kim Y, Rose C, Liu Y, Ozaki Y, Datta G, Tu A: FT-IR and near-infrared FT-Raman studies of the secondary structure of insulinotropin in the solid state: alpha-helix to beta-sheet conversion induced by phenol and/or by high shear force. J Pharm Sci 1994, 83: 1175–1180. 10.1002/jps.2600830819View ArticlePubMed
- Jiao W, Qian M, Li P, Zhao L, Chang Z: The essential role of the flexible termini in the temperature-responsiveness of the oligomeric state and chaperone-like activity for the polydisperse small heat shock protein IbpB from Escherichia coli. J Mol Biol 2005, 347: 871–884. 10.1016/j.jmb.2005.01.029View ArticlePubMed
- Guo J, Jaromczyk J, Xu Y: Analysis of chameleon sequences and their implications in biological processes. Proteins 2007, 67: 548–558. 10.1002/prot.21285View ArticlePubMed
- Tuncbag N, Gursoy A, Guney E, Nussinov R, Keskin O: Architectures and functional coverage of protein-protein interfaces. J Mol Biol 2008, 381: 785–802. 10.1016/j.jmb.2008.04.071PubMed CentralView ArticlePubMed
- Teyra J, Pisabarro M: Characterization of interfacial solvent in protein complexes and contributions of wet spots to the interface description. J Proteins 2007, 67: 1087–1095. 10.1002/prot.21394View Article
- Mintseris J, Wieke K, Pierce B, Anderson R, Chen R, Janin J, Weng Z: Protein-protein docking benchmarck 2.0: an update. . Proteins 2005, 60: 214–216. 10.1002/prot.20560View ArticlePubMed
- Hwang H, Pierce B, Mintseris J, Janin J, Weng Z: Protein-protein docking benchmark version 3.0. Proteins 2008, 73: 705–709. 10.1002/prot.22106PubMed CentralView ArticlePubMed
- Le Roux B, Rouanet H: Geometric Data Analysis, From Correspondence Analysis to Structured Data Analysis. Dordrecht: Kluwer; 2004.
- Jolliffe I: Principal Component Analysis, Springer Series in Statistics. 2nd edition. New York: Springer; 2002.
- Hubbard SJTJ: NACCESS. Tech. rep., Computer Program, Department of Biochemistry and Molecular Biology, University College London 1993. [http://www.bioinf.manchester.ac.uk/naccess/]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.