A structural model of the E. coli PhoB Dimer in the transcription initiation complex
© Tung and McMahon; licensee BioMed Central Ltd. 2012
Received: 23 March 2011
Accepted: 20 March 2012
Published: 20 March 2012
There exist > 78,000 proteins and/or nucleic acids structures that were determined experimentally. Only a small portion of these structures corresponds to those of protein complexes. While homology modeling is able to exploit knowledge-based potentials of side-chain rotomers and backbone motifs to infer structures for new proteins, no such general method exists to extend our understanding of protein interaction motifs to novel protein complexes.
We use a Motif Binding Geometries (MBG) approach, to infer the structure of a protein complex from the database of complexes of homologous proteins taken from other contexts (such as the helix-turn-helix motif binding double stranded DNA), and demonstrate its utility on one of the more important regulatory complexes in biology, that of the RNA polymerase initiating transcription under conditions of phosphate starvation. The modeled PhoB/RNAP/σ-factor/DNA complex is stereo-chemically reasonable, has sufficient interfacial Solvent Excluded Surface Areas (SESAs) to provide adequate binding strength, is physically meaningful for transcription regulation, and is consistent with a variety of known experimental constraints.
Based on a straightforward and easy to comprehend concept, "proteins and protein domains that fold similarly could interact similarly", a structural model of the PhoB dimer in the transcription initiation complex has been developed. This approach could be extended to enable structural modeling and prediction of other bio-molecular complexes. Just as models of individual proteins provide insight into molecular recognition, catalytic mechanism, and substrate specificity, models of protein complexes will provide understanding into the combinatorial rules of cellular regulation and signaling.
Solving structures of complexes is inherently more difficult than solving those for individual proteins. As a result, significantly fewer structures of protein complexes than individual proteins have been determined experimentally . In recent years, homology modeling [2, 3] proved to be successful when the target protein has a similar sequence to proteins with known structures. However, the lack of a sufficiently large database of reference complexes makes the method unsuitable for structural modeling of protein complexes. A conceptually simple and straightforwardly applicable approach for modeling structures of bio-molecular complexes is highly desirable. When proposing new protein complexes, the models developed should be checked against the following attributes: stereo-chemically sound, having sufficient interfacial Solvent Excluded Surface Areas  (SESAs) to provide adequate binding strengths, physically meaningful for transcription regulation and consistency with the known experimental data.
PhoB is a response regulator of the two-component signaling system that is activated under phosphate starvation conditions . It activates more than 30 genes of the pho regulon . Structurally similar to many other response regulators, PhoB has two domains: an N-terminal Receiver Domain (RD) and a C-terminal Effector Domain (ED). The ED of PhoB adopts a winged-helix structure that consists of three α-helices flanked by two sets of β-sheets . The PhoB RD adopts a β-α structure  that can be classified as a flavodoxin-like fold according to SCOP . The flavodoxin-like fold can be found in RDs of other response regulators as well as flavodoxins , cytochrome-P450 oxidoreductase  and Toll/Interleukin Receptor TIR domains . These protein domains share the same structural fold with little or no sequence homology.
While PhoB has long been known to regulate the expression of the pho regulon, the specific geometry of the transcription initiation complex remains undetermined. In recent years, a significant amount of work has been dedicated to solving structures of RNAP complexes (see review articles [13–15]). The bacterial RNA polymerase (RNAP) is a multi-molecular complex consisting of five subunits including: two α-subunits, a β-subunit, a β'-subunit and an ω-subunit. To start transcription, the RNAP has to first bind a σ-subunit. This RNAP/σ-subunit complex then recognizes and binds to a targeted DNA operator site to go through the transcription process. In 2002, the low-resolution (6.5 Å) structure of the Thermus aquaticus RNAP holoenzyme with a fork-junction promoter DNA complex (PDB accession code: 1L9Z) was solved . Since then, crystal structures of different RNAP holoenzymes were solved to a higher resolution [17, 18] (e.g., PDB accession codes: 1ZYR, 2A6E). More recently, an electron microscopy (EM)-derived structure of a Catabolite Gene Activator (CAP)-dependent transcription initiation complex has been derived  (PDB accession code 3IYD). The structural information available so far provides a knowledge base for modeling of the transcription initiation complex together with the response regulator PhoB. In particular, the structure of the Catabolite Gene Activator (CAP)-dependent transcription initiation complex (3IYD) provides an ideal template for modeling structure of the PhoB-dependent transcription initiation complex.
Results and discussion
Combining the information of RD/ED MBGs with the structure of the ED/ED dimeric complex (1GXP), we explore the potential solutions for the PhoB dimeric complex. Out of the RD/ED conformations, only that of DrrB  (1P2F, shown as the red and the blue molecules in Figure 1), a PhoB/OmpR homolog, provides a satisfactory solution where the two RDs are in contact but not overlapping. Combining the structural information of ED/ED (1GXP), RD/ED (1P2F), ED (1GXP) and RD (2JB9), the model of the PhoB dimeric complex is developed (shown as the white and magenta molecules bound to DNA in Figure 1b). This model structure has appealing features including: good stereochemistry (no clashes between domains, stable interface surface area), protein-like structure (contents of secondary structures, density, etc.) and several of the known MBGs.
This PhoB in the modeled complex contains a previously unseen interface between RDs, however, because of the tandem head-to-tail orientation - that is different from the two-fold symmetry observed in the PhoB RD/RD dimer (2JB9). The next question is "does the new MBG between the two RDs in the model exists in other protein domains of a similar fold?" To answer this question, we search for interfaces between domains that have the flavodoxin-like fold and give the two domains with a tandem symmetry. Interestingly, the CheY (a chemotaxis protein) of the two CheY-P2 heterodimers in the crystal asymmetric unit  (PDB accession code: 1FFG), has the two flavodoxin-like molecules following a tandem symmetry. This contact of the two CheYs (1FFG) in the crystal is very similar to that of the PhoB dimeric RDs as shown in Figure 1c. While this particular CheY dimeric arrangement may not be functionally relevant for the CheY-CheA interaction, it does provide a potential MBG for the interaction of flavodoxin-like molecules.
In additional to the difference in the binding sites, changes in the DNA from 3IYD will be required because the CAP dimer binds and bends the DNA promoter much more than does the PhoB dimer. Therefore, the promoter region of the DNA in the PhoB transcription initiation complex has to be remodeled from the template structure (3IYD). Using a "motif modeling approach" as described in our earlier work , the structure of the DNA upstream to this overlapping region (including the PhoB binding sites) can be modeled using the structure of DNA from the PhoB ED/DNA complex (from 1GXP). This promoter DNA is extended upstream with a piece of canonical DNA duplex to accommodate the α-subunit C-terminal domain (CTD) binding. As a comparison, we have modeled the same piece of DNA upstream to this overlapping region using only a piece of canonical DNA B-duplex. The template DNA (from 3IYD), the remodeled promoter DNA for PhoB transcription initiation complex, and the upstream DNA in a canonical B-duplex conformation are shown in Figure 2b in white, magenta, and cyan respectively.
There exist off-the-shelf software that allows dockings of proteins or protein domains into complexes/full proteins (e.g., ZDOCK , AutoDock , RosettaDock ). These programs apply different sampling approaches and scoring functions with various degrees of success (e.g., see CAPRI  assessments). These docking procedures seem to work at their best if the interaction between the components is strong and/or there exists a global binding minimum. As a quick comparison, we have downloaded one of these programs, ZDOCK, and generated 2,000 structures (MBGs) docking the two domains RD (2JB9, residues 3-123) and ED (1GXP, residues 127-229) for deriving the PhoB structure. The two domains (RD & ED) of PhoB molecule are separated by a loop of 4-peptides group. There is a physical limitation for a 4-residues loop to make the connection. If the cut-off length for a 4-residues loop is set to be 14 Angstrom (approximately corresponds to a complete extended conformation), only 2.12% (43) of the 2,000 MBGs satisfied the connection criteria. If we focus on the set of the top 100 MBGs, structures 21 and 96 are the two that allow the RD-ED connection. A further look at the PhoB-PhoB dimer structures modeled based on the two ED-RD MBGs and the structure of the ED-ED-DNA complex (1GXP), neither structure is stereochemically feasible due to the domain overlapping including clashes between protein-protein and protein-DNA. If all the MBGs of the two domains from the docking study are compared to the MBG from our model, the closest came from structure 1,934 with a RMSD of 4.0 Angstrom (based on Cα atoms only). Overall, the docking procedure is less than efficient (only ~2% of the docked structures satisfies the connectivity constraint). It was also found that the selection of the relevant PhoB structure out of the pool of a large number of potential MBGs from the docking study is a non-trivial task.
We have demonstrated that Motif Binding Geometry (MBG) can be used to model structure of the PhoB dimer as it interacts with the transcription initiation complex (PhoB/RNAP/DNA) of E. coli. While the limited space available for the targeted protein in the molecular complex makes the modeling of the protein structure more challenging, it also provides a stringent test for choosing the relevant structure from the pool of potential conformations. While the two domains (ED and RD) of PhoB adopt a different symmetry when crystallized, it is not obvious how to assemble the PhoB dimer from the information of its domain structures. Using the excluded volume information and known MBGs between the ED and RD, we are able to develop a structural model for the PhoB dimeric complex where the two RD domains follow a tandem symmetry similar to that as seen in the two flavodoxin-like folds of CheY, a chemotaxis protein. The modeled PhoB dimer can bind to the direct repeat Pho box in the promoter region and interact directly with the α-, β- and σ- subunits of the RNAP.
Just as protein structures serve to integrate a variety of biochemical information and advance our understanding of the enzymatic reactions and molecular machines that enable life to continue, modeling of protein complexes will shed light on the protein interaction networks responsible for regulatory and signaling processes of cells. While our approach has not yet been tested with other protein complexes, it is hoped that the reader will see our methodology as a way of integrating the evolutionary, physical, and biological experimental data to produce new, testable, hypothesis.
Motif Binding Geometry (MBG) used for complex homology modeling
Upon binding, the folds of proteins often remain unchanged while the specifics of the surface may be adjusted to accommodate the interactions. Therefore, while docking of molecules by matching surface shape is an attractive method in principle, significant errors can be introduced into the overall binding geometry if induced fitting at the interface is involved during the binding process. Here, we introduce a structural based concept for bio-molecular docking by matching the scaffoldings (secondary structural motifs) of the interacting molecules to those with homologous folds and known MBGs. This approach is useful to structural modeling both to arrange stable folded domains in the intact protein and to find geometries of individual molecules in the complex. The method can readily provide a manageable set of potential solutions for further study and/or refinement.
Motif structural matching
Protein motifs consists of secondary structural elements (α-helix and β-sheet) arranged with a specific geometry in space. In cases where sequence homology is low (e.g., < 20% identity), it is difficult to discern structural alignments using only sequence alignments. A general approach based on the structural information is required for motif structural matching. We use the secondary structural elements to align the motifs. When each of the secondary structural elements is represented by a line vector, the structural matching can be accomplished by minimizing the angles (θ) and the minimum distances (d) between the set of corresponding line vectors. The Metropolis Monte Carlo simulation  is used for the minimization procedure.
Molecular graphics images were produced using the UCSF Chimera package  from the Resource for Biocomputing, Visualization and Informatics at the University of California, San Francisco.
This work was supported by LANL's Laboratory Directed Research & Development program. The coordinate files of the PhoB dimer together with the RNA polymerase, σ-factor and the DNA are available from the supplementary website.
- Berman HM, et al.: The protein data bank. Nucleic Acids Res 2000, 28: 235–242. 10.1093/nar/28.1.235PubMed CentralView ArticlePubMedGoogle Scholar
- Marti-Renom MA, et al.: Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomol Struct 2000, 29: 291–325. 10.1146/annurev.biophys.29.1.291View ArticlePubMedGoogle Scholar
- Ginalski K, et al.: Comparative modeling for protein structure prediction. Curr Opin Struct Biol 2006, 16: 172–177. 10.1016/j.sbi.2006.02.003View ArticlePubMedGoogle Scholar
- Hubbard SJ, Thornton JM: NACCESS, Version 2.1, Department of Biochemistry and Molecular Biology. University College, London;Google Scholar
- Makino K, et al.: Nucleotide sequence of the PhoB gene, the positive regulatory resolution. J Mol Biol 1986, 1986(190):37–44.View ArticleGoogle Scholar
- Kim SK, et al.: Dual transcriptional regulation of the Escherichia col phosphate-starvation-inducible psi gene of the phosphate regulon by PhoB and the cyclic AMP (cAMP)-cAMP receptor protein complex. J Bacteriol 2000, 182: 5596–5599. 10.1128/JB.182.19.5596-5599.2000PubMed CentralView ArticlePubMedGoogle Scholar
- Blanco AG, et al.: Tandem DNA recognition by PhoB, a two-component signal transduction transcriptional activator. Structure 2002, 10: 701–703. 10.1016/S0969-2126(02)00761-XView ArticlePubMedGoogle Scholar
- Arribas-Bosacoma R, et al.: The x-ray crystal structures of two constitutively active mutants of the Escherichia coli PhoB receiver domain give insights into activation. J Mol Biol 2007, 366: 626–641. 10.1016/j.jmb.2006.11.038PubMed CentralView ArticlePubMedGoogle Scholar
- Murzin AG, et al.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995, 247: 536–540.PubMedGoogle Scholar
- Fukuyama K, Matsubara H: Crystal structure of oxidized flavodoxin from a red alga Chondrus crispu refined at 1.8-Å resolution. J Mol Biol 1992, 225: 775–789. 10.1016/0022-2836(92)90400-EView ArticlePubMedGoogle Scholar
- Hubbard PA, et al.: NADPH-cytochrome P450 oxidoreductase. J Biol Chem 2001, 276: 29163–29170. 10.1074/jbc.M101731200View ArticlePubMedGoogle Scholar
- Tao X, et al.: An extensively associated dimmer in the structure of the C713S mutant of the TIR domain of human TLR2. Biochem Biophys Res Comm 2002, 299: 216–221. 10.1016/S0006-291X(02)02581-0View ArticlePubMedGoogle Scholar
- Bourkhov S, Nudler E: RNA polymerase holoenzyme: structure, function and biological implications. Curr Opin Microbiol 2003, 6: 93–100. 10.1016/S1369-5274(03)00036-5View ArticleGoogle Scholar
- Browning DF, Busby SJW: The regulation of bacterial transcription initiation. Nat Rev Microbiol 2004, 2: 57–65. 10.1038/nrmicro787View ArticlePubMedGoogle Scholar
- Vassylyev DG, Artsmovitch I: Tracking RNA polymerase one step at a time. Cell 2005, 123: 977–979. 10.1016/j.cell.2005.11.030View ArticlePubMedGoogle Scholar
- Murakami KS, et al.: Structural basis of transcription initiation: an RNA polymerase holoenzyme-DNA complex. Science 2002, 296: 1285–1290. 10.1126/science.1069595View ArticlePubMedGoogle Scholar
- Tuske S, et al.: Inhibition of bacterial RNA polymerase by streptolydigin: Stabilization of a straight-bridge-helix active-center conformation. Cell 2005, 122: 541–552. 10.1016/j.cell.2005.07.017PubMed CentralView ArticlePubMedGoogle Scholar
- Artsimovitch I, et al.: Allosteric modulation of the RNA polymerase catalytic reaction in an essential component of transcription control by rifamycins. Cell 2005, 122: 351–363. 10.1016/j.cell.2005.07.014View ArticlePubMedGoogle Scholar
- Hudson BP, et al.: Three-dimensional EM structure of an intact activator-dependent transcription initiation complex. Proc Natl Acad Sci USA 2009, 106: 19830–19835.PubMed CentralView ArticlePubMedGoogle Scholar
- Robinson VL, Wu T, Stock AM: Structural analysis of the domain interface in DrrB, a response regulator of the OmpR/PhoB subfamily. J Bacteriol 2003, 185: 4186–4194. 10.1128/JB.185.14.4186-4194.2003PubMed CentralView ArticlePubMedGoogle Scholar
- Gouet P, et al.: Further insights into the mechanism of function of the response regulator CheY from crystallographic studies of the CheY-CheA 124–257 complex. Acta Crystallogr sect D 2001, 57: 44–45. 10.1107/S090744490001492XView ArticleGoogle Scholar
- Huerta AM, et al.: RegulonDB: A database on transcription regulation in Escherichia coli. Nucleic Acids Res 1998, 26: 55–60. 10.1093/nar/26.1.55PubMed CentralView ArticlePubMedGoogle Scholar
- Tung CS, et al.: All-atom homology model of the Escherichia col 30S ribosomal subunit. Nat Struct Mol Biol 2002, 9: 750–755. 10.1038/nsb841View ArticleGoogle Scholar
- Byu K, et al.: Modulation of high affinity hormone binding. J Biol Chem 1998, 273: 6285–6291. 10.1074/jbc.273.11.6285View ArticleGoogle Scholar
- Case DA, et al.: The Amber biomolecular simulation programs. J Comput Chem 2005, 26: 1668–1688. 10.1002/jcc.20290PubMed CentralView ArticlePubMedGoogle Scholar
- Makino K, et al.: DNA binding of PhoB and its interaction with RNA polymerase. J Mol Biol 1996, 259: 15–26. 10.1006/jmbi.1996.0298View ArticlePubMedGoogle Scholar
- Jones S, Thornton JM: Principles of protein-protein interactions. Proc Natl Acad Sci USA 1996, 93: 13–20. 10.1073/pnas.93.1.13PubMed CentralView ArticlePubMedGoogle Scholar
- Pierce B, et al.: M-ZDOCK: A Grid-based approach for Cn Symmetric Multimer Docking. Bioinformatics 2005, 21: 1472–1476. 10.1093/bioinformatics/bti229View ArticlePubMedGoogle Scholar
- Trott O, et al.: AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization and multithreading. J Comput Chem 2010, 31: 455–461.PubMed CentralPubMedGoogle Scholar
- Gray JJ, et al.: Protein-Protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. J Mol Biol 2003, 331: 281–299. 10.1016/S0022-2836(03)00670-3View ArticlePubMedGoogle Scholar
- Janin J, et al.: CAPRI: a critical assessment of predicted interactions. Proteins 2003, 52: 2–9. 10.1002/prot.10381View ArticlePubMedGoogle Scholar
- Metropolis N, et al.: Equation of state calculations by fast computing machines. J Chem Phys 1953, 21: 1087–1092. 10.1063/1.1699114View ArticleGoogle Scholar
- Pattersen EF: UCSF Chimera--a visualization system for exploratory research and analysis. J Comput Chem 2004, 25: 1605–1612. 10.1002/jcc.20084View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.