Exploring allosteric coupling in the α-subunit of Heterotrimeric G proteins using evolutionary and ensemble-based approaches

Background Allosteric coupling, which can be defined as propagation of a perturbation at one region of the protein molecule (such as ligand binding) to distant sites in the same molecule, constitutes the most general mechanism of regulation of protein function. However, unlike molecular details of ligand binding, structural elements involved in allosteric effects are difficult to diagnose. Here, we identified allosteric linkages in the α-subunits of heterotrimeric G proteins, which were evolved to transmit membrane receptor signals by allosteric mechanisms, by using two different approaches that utilize fundamentally different and independent information. Results We analyzed: 1) correlated mutations in the family of G protein α-subunits, and 2) cooperativity of the native state ensemble of the Gαi1 or transducin. The combination of these approaches not only recovered already-known details such as the switch regions that change conformation upon nucleotide exchange, and those regions that are involved in receptor, effector or Gβγ interactions (indicating that the predictions of the analyses can be viewed with a measure of confidence), but also predicted new sites that are potentially involved in allosteric communication in the Gα protein. A summary of the new sites found in the present analysis, which were not apparent in crystallographic data, is given along with known functional and structural information. Implications of the results are discussed. Conclusion A set of residues and/or structural elements that are potentially involved in allosteric communication in Gα is presented. This information can be used as a guide to structural, spectroscopic, mutational, and theoretical studies on the allosteric network in Gα proteins, which will provide a better understanding of G protein-mediated signal transduction.

ln ln where, the site indicated by i is represented by a 20-element vector of binomial probabilities , calculated from observed counts for each amino acid (x=1..20), and their mean frequencies in all proteins. kT* is an arbitrary energy unit. The subscript |δj indicates that the relevant probabilities are calculated after perturbing the site j, and the subscript MSA signifies a hypothetical site where 20 amino acids occur with their mean frequencies for a given number of samples. The relevant binomial probabilities are explicitly given as follows: where q x is the mean relative frequency of amino acid of kind x in all proteins (given in table 2), k x is the observed number of occurrence of the relevant amino acid at site i, n is the total number of samples at that site in MSA, and primes indicate the numbers observed in the "perturbed" subset of MSA (selected samples that contain a specific amino acid at site j).
The expression given in eq.1 hardly measures the statistical dependence of two sites for following reasons: Binomial probabilities used in eq.1 are sensitive to sample size and this sensitivity increases steeply when the observed amino acid frequencies deviate from their mean frequencies, which is generally the case in a given family of proteins. Note that the sample size inevitably decreases after perturbation, and the resulting effect cannot be compensated by the normalizing terms in the denominators of eq1. This behavior of binomial probabilities dominates the analysis and results in artificially high scores for conserved sites, where the deviation of amino acid frequencies from their mean values tend to be very high. Thus, ∆∆G i,j stat is expected to be highly correlated with ∆G i stat which is another energy-like measure proposed by Lockless and Ranganathan for the degree of conservation of site i. The artifactual nature of ∆∆G is shown in figure 1 by analyzing the same MSA for G protein-coupled receptors used by Süer et al. 2 (available from http:\\www\ghf\ghf.d).  In figure 1, mean ∆∆G value for each "affected" site is plotted against the ∆G value of that site. It is obvious that the expected degree of coupling of a given site to all other sites can be predicted with 99% certainty simply by looking at the conservation status of the relevant site. Therefore, eq.1 hardly provides information about statistical coupling in general. This behavior of SCA has also been discussed by Dekker et al 3 .
Dependence of amino acid distributions at two sites is actually a prerequisite for a perturbation experiment to yield a positive result. Hence, a reasonable alternative to the perturbation strategy described above is to assess directly the dependence of amino acid distributions at the pair of sites, as we did in the accompanying paper. In following lines we show that the dependence measure we propose can recover the relevant information embedded in a MSA, but the one suggested by Lockless and Ranganathan 1 fails to do so. In order to demonstrate this, we constructed a theoretical MSA consisting of 15 sites with 1000 samples of peptides. Table 1 and figure  2A show the structure of the simulated MSA. Results of the analysis are summarized in figure 2B-2E. Following are apparent in figure 2: 1) the structure given in table 1 was fully recovered when N χ 2 was used as a measure of coupling between sites ( figure 2E). 2) The sites (1,10,11), (7,13), (8,14), (9,15) and (3,4,5,6) forms almost equally distant coupling groups while (2,12) (as they are identical) forms a separate group in cluster analysis of N χ 2 values, which is consistent with the simulated structure. 3) Couplings, and thus their clustering, are asymmetric in case of ∆∆G (compare the group structure in columns and rows in figure 2D). 4) ∆∆G values tend to be high as the affected sites become conserved (compare the bar graphics with the coupling matrix in figure  2B). For example, perturbation of site 1 results in high coupling with conserved sites (5,6,7,8,9,13,14,15), whereas its coupling to 10 and 11, to which it is actually coupled, is low. 5) As a result, the overall picture with ∆∆G is inconsistent with the simulated coupling structure (compare figure 2A and 2D).

Table 1 List of the properties of 15 sites of a simulated MSA of 1000 samples.
In the simulation, two sites were fully conserved (5, 6), six sites were moderately conserved and coupled (7, 8, 9 to 13, 14, 15 respectively), two sites were unconserved but fully coupled (2 and 12), three sites were unconserved and weakly coupled (1, 10, 11), and two sites were unconserved and uncoupled (3,4). Unless indicated otherwise, amino acids were put randomly to each site using their mean frequency in all proteins. Amino acid frequencies used in the calculation of ∆∆G are given in table 2.

Site
Random with 50% L 25% R 25% T 8 Random with 60% R 20% I 20% P 9 Random with 60% W 20% N 20% D 10 L if site 1 is L, random otherwise 11 W if site 1 is L, random otherwise 12 Identical with site 2 13 L if site 7 is L, random otherwise 14 W if site 8 is R, random otherwise 15 W if site 9 is W, random otherwise  In conclusion, the analysis proposed by Lockless and Ranganathan fails to extract the covariance information from MSA and any measure of statistical dependence, including the one proposed here, seems to be a better alternative to the energy-like measure that have been proposed Lockless and Ranganathan.