Skip to main content

Crystal structure of Escherichia coli protein ybgI, a toroidal structure with a dinuclear metal site



The protein encoded by the gene ybgI was chosen as a target for a structural genomics project emphasizing the relation of protein structure to function.


The structure of the ybgI protein is a toroid composed of six polypeptide chains forming a trimer of dimers. Each polypeptide chain binds two metal ions on the inside of the toroid.


The toroidal structure is comparable to that of some proteins that are involved in DNA metabolism. The di-nuclear metal site could imply that the specific function of this protein is as a hydrolase-oxidase enzyme.


The protein encoded by the ybgI gene of Escherichia coli is 247 residues in length and has a molecular weight of 27 kDa. It belongs to the DUF34 family of proteins [1]. No biological function is known for members of this sequentially related family of, at present, 67 proteins. One of the members of this family, NIF3 yeast, which has 22% identity with ybgI, is reported to interact with the yeast transcriptional coactivator NGG1p, but the exact function of this interaction is not known [2]. It has been suggested that the product of the human gene, NIF3L1, and its mouse ortholog, Nif3l1, which have 22% identity with ybgI and 37% identity with yeast NIF3, inhibits Ngg1p from translocation to the nucleus or that NIF3 binds to Ngg1 in the cytoplasm and enters the nucleus by cotransport [3]. Analysis of the gene expression levels in Escherichia coli under conditions of genotoxic stress caused by mitomycin C DNA damage, showed that the expression level for ybgI was significantly induced[4]. This protein has been included as a structural genomics target [5, 6] for a study focusing on proteins which have no known function. The initial targets for this project were selected from the first completely sequenced bacterial genome of the Haemophilus influenzae [7]. The protein ybgI is a sequence homolog of Haemophilus influenzae HI0105 with a sequence identity of 59%. The ybgI protein was cloned, expressed and the crystal structure was determined to 2.2-Å resolution.

Results and Discussion

The ybgI protein consists of two similar interlinked a/ß domains; both are 3-layer sandwiches (alpha-beta-alpha) as shown in Figure 1. The first domain has a 5-stranded mixed ß-sheet with two a-helices on one side and three a-helices on the other side. Two of the three a-helices are approximately parallel to the ß-strands of the ß-sheet and the third is shorter, approximately perpendicular to the ß-strands and leads over to the second domain. The order of the ß-strands is 1-4-3-2-11. The second domain also has a central mixed ß-sheet but has 6 ß-strands with the order 5-6-8-9-10-7; the ß-sheet is flanked on each side by two a-helices and there is an additional short a-helix leading back to domain 1. The crystallographic asymmetric unit contains three dimers. The application of the three-fold crystal symmetry reveals that the quaternary structure is a toroid formed by three crystallographically related dimers. In the crystals, these toroids stack forming long tubes. The toroidal structure is shown in Figures 2A and 2B.

Figure 1
figure 1

The crystal structure of ybgI from E. coli. (A) Stereo view of the secondary structure cartoon showing the fold of the polypeptide chain. Domain 1 is shown with helices in blue and ß-strands in red and domain 2 is shown with helices in cyan and ß-strands in rose. The strands and helices are numbered sequentially from the N-terminus. This figure was prepared using MOLSCRIPT [31] and Raster3D [32, 33]. (B) Topology diagram of the secondary structure. Helicies are represented as rectangles and ß-strands are represented as arrows.

Figure 2
figure 2

(A) Side view of the toroidal structure. The dimers are colored slate blue and cyan, chocolate brown and tan and green and lime green. (B) Top view of the toroidal structure with the same coloring as A. Secondary structure cartoons are included inside transparent surface representations. These figures were prepared using PyMol[34].

Searching with CE [8, 9], DALI [10, 11] and SCOP [12, 13] yielded no other polypeptides with the particular arrangement of mixed ß-sheets and a-helices observed in either domain.

The toroid is composed of six polypeptide chains generated by the application of 3-fold symmetry on the dimer. The 2-fold noncrystallographic symmetry operators of the dimers are perpendicular to the three-folds. The inside diameter of the toroid is approximately 30 Å and the outside diameter is approximately 90 Å; the height of the toroid is 57 Å. Due to the 2-fold, the toroid appears the same when approached from either direction along a 3-fold symmetry axis. The superposition of the native subunit structure and selenomethionine subunit structure gives RMSDs for the Ca atoms of 0.2-0.3 Å. The z positions of the toroids on the crystallographic 3-folds differ; in other words, the non-crystallographic 2-folds are not coplanar. It is also of interest to note that the relative positions of the toroids differ between the native and selenomethionine crystals. For instance, the E chain selenomethionine/methionine at position 135 is packed up against the C chain region of 138–141 in the selenomethionine structure and against the C chain region of 140–144 in the native structure.

The most likely region for the active site is a group of conserved residues which includes four histidines (63, 64, 97, 215), two glutamic acids (194, 219), one aspartic acid (101), one asparagine (108), one cysteine (171), one tyrosine (22) and one tryptophan (68). There are also two metal ions 3.3 Å apart in the selenomethionine protein and 2.5 Å apart in the native protein bound by this cluster of residues. In the early refinement of the selenomethionine structure these were treated as 'water' molecules and the B-values became very low indicating that they must be something heavier than oxygen. The anomalous Fourier map of the selenomethionine data indicates that there is a significant anomalous signal at these positions, though much lower than selenium. The X-ray fluorescence identified the presence of Fe in the protein sample. One metal ion is coordinated by H64 Ne2, H215 Ne2 and E219 Oe1; the other metal ion is coordinated by D101 Od1 and Od2, E219 Oe2 and H63 Ne2. This grouping is set back into the inside wall of the toroid and includes residues from both domains. The metal ion sites of the dimer are at opposite ends of a cavity that extends across the dimer interface. This cavity is separated from the center of the toroid by the Y22 residues of the dimer chains. The Y22 residues narrow the access to the cavity from the center of the toroid to approximately 14 Å The distances between the metal ions of the dimer chains are 21.9 and 25.6 Å. The distances between 3-fold related metal ions are 45.0 and 42.5 Å. One of the six putative active sites is shown in Figure 3. In the native protein, the metal positions may be filled or partially filled by magnesium ions which are present in both the growth medium and in the crystallization solution. An anomalous fourier using the native data does not reveal any anomalous signal at these positions and negative results using X-ray fluorescence eliminate the presence of Fe, Zn, Cu, Ni, and Co. In this structure, only 11 sites were included. The electron density at these positions tends to be somewhat smeared. The appearance of the electron density and the refinement of the B factors were used as guides to include or exclude ion sites. The protein structure around the sites is quite good. The presence of iron in the selenomethionine protein sample may indicate the adventitious uptake of iron during preparation since the procedure includes the addition of iron sulfate as a component in the growth medium [14]. The intrinsic metal ions for this protein are not known. The inclusion of histidine, glutamic acid, and aspartic acid in the putative active site with a bridging glutamic acid between the ions is in keeping with cocatalytic sites in a number of proteins where the metal ions are Zn or Zn and Fe, Mn, or Mg [15]. The constancy of the protein structure around these sites supports the view that these are catalytic rather than structural sites.

Figure 3
figure 3

The putative active site with the metal ions shown in silver balls and four water molecules shown as cyan balls. This figure was prepared using MOLSCRIPT [31] and Raster3D [32, 33].

An E. coli operon has been identified that includes the nei gene which codes for endonuclease VIII and four other genes, ybgI, ybgJ, ybgK, ybgL [16]. Endonuclease VIII is an oxidative base excision repair protein. The proteins encoded by ybgJ and ybgK are putative carboxylases and the protein encoded by ybgL is a putative lactam utilization protein. The inclusion of ybgI in the nei operon of other bacteria is not well conserved.

The highly conserved residues of the DUF34 family are concentrated in two regions of the ybgI structure: at the putative active site and on the side of a groove between the polypeptide chains of the trimer. Figure 4 shows the conserved residues mapped onto the surface of the molecule.

Figure 4
figure 4

A view looking down into the toroid at the putative active site. A gap between the dimers and a trough lead down toward the site. Where conserved residues contact the surface, the surface has been colored red. Again the metal ions are depicted as silver balls. This figure was prepared using PyMol [34].

The toroidal ring quaternary structure brings to mind many proteins that are involved in DNA metabolism. In a recent review [17], Hingorani and O'Donnell examine these proteins and speculate on the convergence to the toroidal shape as being a means of providing an enclosed environment for otherwise chemically unfavorable reactions. The functions of these proteins include sliding clamps and helicases that catalyze ATP-fuelled DNA unwinding, and exonucleases and topoisomerases that chemically modify DNA. For instance, the exonuclease of ? bacteriophage is a trimer and forms a toroid with an inner diameter of 30 Å at one end and 15 Å at the opposite end. The double-stranded DNA is encircled by the exonuclease and processively hydrolyzes one of the two strands. The enzyme moves with a specific orientation and degrades the 5' strand so that the product is the 3' strand [18]. The ybgI structure is a symmetric toroid, the inner diameter is the same approached from above or below.

In a review of di-iron-carboxylate proteins (proteins with di-iron centers bridged by carboxylate residues and oxide/hydroxide groups) [19], the authors grouped the known structures into four structural categories. The first three categories are all variations on helix bundles. The fourth class is the a/ß sandwich category which includes purple acid phosphatases. These proteins have di-metal centers (Fe and Zn) that catalyze the hydrolysis of phosphate esters. There is an active site tyrosine radical that is responsible for the purple color and the OH is 2.2 Å from the iron atom. In ybgI, the closest tyrosine is 11 Å away from the metal ions.


The quaternary structure taken together with the upgraded response to DNA damage, the inclusion in the operon with endonuclease VIII, and sequential homology with the yeast NIF3 protein appears consistent with a function that involves DNA repair or involvement in the transcription process. Comparison of the active site with known structures has not yet yielded a definitive clue concerning the specific biological function. Biochemical studies to further profile the function of the ybgI protein are in progress.

The atomic coordinates and structure factors of the selenomethionine and native structures of ybgI are deposited in the Protein Data Bank[20] as 1NMO and 1NMP, respectively.


Cloning, expression, and purification

The ybgI gene was PCR, polymerase chain reaction, amplified from Escherichia coli MG1655 genomic DNA and subcloned into pDONR201 plasmid using Gateway Technology (Invitrogen). For expression, the coding sequence was transferred into pDEST14 plasmid using site-specific recombination (Invitrogen). The protein was produced in E. coli strain BL21 Star (DE3) (Invitrogen) that was transformed with pDEST14. Cells were grown on LB media containing 100 µg/µL ampicillin at 37°C to an A600 of 0.6 and induced with 1 mM isopropyl ß-D-thiogalactoside for 3 hours. The protein was purified by column chromatography in two steps using Source 30Q (Pharmacia) and Butyl-560M (Toyopearl).

Crystallization and structure determination

Crystals were obtained by the vapor diffusion method in hanging drops at room temperature for the native protein and the selenomethionine derivative. The reservoir solution for the native protein included 0.1 M cacodylate buffer at pH 7.5, 0.1 M magnesium acetate, 15% (w/v) polyethylene glycol 8000 and 5% (v/v) polyethylene glycol 400. The reservoir solution for the selenomethionine protein included 0.1 M imidazole buffer pH 8.0, 0.2 M calcium acetate and 15% (w/v) polyethylene glycol 3350. The hanging drops were formed by combining equal volumes of protein solution and reservoir solution. The protein concentrations were 4.7 mg/mL for the native protein and 8.2 mg/mL for the selenomethionine protein. For data collection the crystals were passed through a solution made of equal volumes of reservoir solution and saturated lithium formate for the native crystals and 2 volumes of reservoir solution and one volume of saturated lithium formate for the selenomethionine derivative [21].

Diffraction data were collected at the Advanced Photon Source (APS) South East Regional Collaborative Access Team (SER-CAT) beam line 22ID-D at Argonne National Laboratory. All data were collected at 100 K. Data were collected at three wavelengths for the selenomethionine derivative crystal (0.9795 Å, 0.9793 Å and 0.9780 Å) and at 0.9793 Å for the native crystal. The data were processed using D * TREK [22].

The selenium sites were found with Shake-N-Bake [23, 24]. The polypeptide has four methionine residues and there are three dimers (six monomers) in the asymmetric unit. The 18 highest-ranked sites were entered into SOLVE [25] SOLVE chose the opposite hand and gave a solution with 21 sites. RESOLVE [26] was not able to find the correct noncrystallographic symmetry, but once this was determined by visual and vector examination of the sites, RESOLVE was able to build backbone for 911 of the 1482 residues and place 491 sidechains. By superimposing the partial models for the six copies of the polypeptide chain, a nearly complete tracing was determined. CNS [27] was used to refine this model against the data. As the refinement progressed the noncrystallographic symmetry restraints were reduced. XTALVIEW [28] was used to visualize the structure and to make manual adjustments of the coordinates to improve their agreement with the electron density map. REDUCE and PROBE [29] were used to guide rebuilding to help resolve side chain conformations and PROCHECK [30] was used to validate the structures.

The selenomethionine data and the native data are not isomorphous. The cells differ by greater than 1% in the a and b unit cell dimensions. Consequently, the native structure was solved by molecular replacement using CNS. The dimer unit was used as the search molecule. Refinement against the diffraction data was also accomplished using the CNS package. As in the selenomethione structure, noncrystallographic symmetry restraints were used throughout the refinement but the weighting was reduced after the initial rounds of refinement. The data and refinement statistics are shown in Table 1.

Table 1 X-Ray Data Processing and Refinement Statistics

Metal ion determination

X-ray fluorescence scans were performed at the absorption edges for Zn, Cu, Ni, Co and Fe at the Advanced Photon Source (APS) Industrial Macromolecular Crystallography Association Collaborative Access Team (IMCA-CAT) beam line 17-ID at Argonne National Laboratory. Solution samples of the native and SeMet proteins were used for the scans. The scans indicated the presence of Fe in the SeMet protein and no Zn, Cu, Ni, or Co, and found none of these metals present in the native protein solution.


  1. Pfam:Protein Families database of alignmenets and HMMs[]

  2. Martens JA, Genereaux J, Saleh A, Brandl CJ: Transcriptional Activation by Yeast PDR1p Is Inhibited by Its Association with NGG1p/ADA3p. J. Biol. Chem. 1996, 271: 15884–15890. 10.1074/jbc.271.16.9298

    Article  CAS  PubMed  Google Scholar 

  3. Tascou S, Uedelhoven J, Dixkens C, Nayernia K, Engel W, Burfeind P: Isolation and characterization of a novel human gene, NIF3L1, and its mouse ortholog, Nif3l1, highly conserved from bacteria to mammals. Cytogenet. Cell Genet. 2000, 90: 330–336. 10.1159/000056799

    Article  CAS  PubMed  Google Scholar 

  4. Khil PP, Camerini-Otero RD: Over 1000 genes are involved in the DNA damage response of Escherichia coli. Mol. Microbiol. 2002, 44: 89–105. 10.1046/j.1365-2958.2002.02878.x

    Article  CAS  PubMed  Google Scholar 

  5. Eisenstein E, Gilliland GL, Herzberg O, Moult J, Orban J, Poljak RJ, Banergei L, Richardson D, Howard AJ: Biological function made crystal clear - annotation of hypothetical proteins via structural genomics. Curr Opin in Biotechnol 2000, 11: 25–30. 10.1016/S0958-1669(99)00063-4

    Article  CAS  Google Scholar 

  6. Structure2Function Project[]

  7. Fleischmann RD, M.D. Adams., White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb J-F, Doughertry BA, Merrick JM, McKenney K, Suffon G, FitzHugh W, Fields C, CGocayne JD, Scott J, Shirley R, Liu L-I, Glocek A, Kelley JM, Weidman JF, Phillips CA, Spriggs T, Hedblom E, Cotton MD, Utterback TR, Hanna MC, Nguyen DT, Saudek DM, Brandon RC, Fine LD, Fritchman JL, Fuhrmann JL, Geoghagen NSM, Gnehn CL, McDonald LA, Small KV, Fraser CM, Smith HO, Venter JC: Whole-Genome Random Sequencing and Assembly of Haemophilus influenzae Rd. Science 1995, 269: 496.

    Article  CAS  PubMed  Google Scholar 

  8. Shindyalov IN, Bourne PE: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Engineering 1998, 11: 739–747. 10.1093/protein/11.9.739

    Article  CAS  PubMed  Google Scholar 

  9. Finding 3-D Similarities in Protein Structures[]

  10. Holm L, Sander C: Mapping the protein universe. Science 1996, 273: 595–602.

    Article  CAS  PubMed  Google Scholar 

  11. The DALI server[]

  12. Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 1995, 247: 536–540. 10.1006/jmbi.1995.0159

    CAS  PubMed  Google Scholar 

  13. Structural Classification of Proteins[]

  14. Hendrickson WA, Horton JR, LeMaster DM: Selenomethionyl proteins produced for analysis by multiwavelength anomalous diffraction (MAD); a vehicle for direct determination of three-dimensional structure. EMBO J. 1990, 9: 1665–1672.

    PubMed Central  CAS  PubMed  Google Scholar 

  15. Auld DS: Zinc coordination sphere in biochemical zinc sites. Biometals 2001, 14: 271–313. 10.1023/A:1012976615056

    Article  CAS  PubMed  Google Scholar 

  16. Gifford CM, Wallace SS: The genes encoding endonuclease VIII and endonuclease III in Escherichia coli are transcribed as the terminal genes in operons. Nucleic Acids Research 2000, 28: 762–769. 10.1093/nar/28.3.762

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  17. Hingorani MM, O'Donnell M: A tale of toroids in DNA metalobism. Nat Rev Mol Cell Biol 2000, 1: 22–30. 10.1038/35036044

    Article  CAS  PubMed  Google Scholar 

  18. Kovall R, Matthews BW: Toroidal structure of lambda-exonuclease. Science 1997, 277: 1824–1827. 10.1126/science.277.5333.1824

    Article  CAS  PubMed  Google Scholar 

  19. Nordlund P, Eklund H: Di-iron -carboxylate proteins. Current Opinion in Structural Biology 1995, 5: 758–766. 10.1016/0959-440X(95)80008-5

    Article  CAS  PubMed  Google Scholar 

  20. Protein Data Bank[]

  21. Rubinson KA, Ladner JE, Tordova M, Gilliland GL: Cryosalts: suppression of ice formation in macromolecular crystallography. Acta Crystallog. 2000, D56: 996–1001.

    CAS  Google Scholar 

  22. Pflugrath JW: The finer things in X-ray diffraction data collection. Acta Crystallog. 1999, D55: 1718–1725.

    CAS  Google Scholar 

  23. Blessing RH, Smith GD: Difference structure-factor normalizaion for heavy-atom or anomalous-scattering substructure determinations. J. Appl. Cryst. 1999, 32: 664–670. 10.1107/S0021889899003416

    Article  CAS  Google Scholar 

  24. Weeks CM, Miller R: The design and implementation of SnB v2.0. J. Appl. Cryst. 1999, 32: 120–124. 10.1107/S0021889898010504

    Article  CAS  Google Scholar 

  25. Terwilliger TC, Berendzen J: Automated MAD and MIR structure solution. Acta Crystallog. 1999, D55: 849–861.

    CAS  Google Scholar 

  26. Terwilliger TC: Automated structure solution, density modification and model building. Acta Crystallog. 2002, D58: 1937–1940.

    CAS  Google Scholar 

  27. Brünger AT, Adams PD, Clore GM,, DeLano WL, Gros P, Grosse-Kunstleve RW, Jiang J-S, Kuszewski J, Nilges M, Pannu NS, Read RJ, Rice LM, Simonson T, Warren GL: Crystallography & NMR system: A new software suite for macromolecular structure determination. Acta Crystallog. 1998, D54: 905–921.

    Google Scholar 

  28. McRee DE: Practical Protein Crystallography 2 Edition San Diego, Academic Press 1999, 477.

    Google Scholar 

  29. Word JM, Lovell SC, LaBean TH, Taylor HC, Zalis ME, Presley BK, Richardson JS, Richardson DC: Visualizing and Quantifying Molecular Goodness-of-Fit: Small-probe Contact Dots with Explicit Hydrogen Atoms. J. Mol. Biol. 1999, 285: 1711–1733. 10.1006/jmbi.1998.2400

    Article  CAS  PubMed  Google Scholar 

  30. Laskowski RA, MacArthur MW, Moss DS, Thornton JM: PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Crystallogr. 1993, 26: 283–291. 10.1107/S0021889892009944

    Article  CAS  Google Scholar 

  31. Kraulis PJ: MOLSCRIPT: A program to produce both detailed and schematic plots of protein structures. J. Applied Crystallography 1991, 24: 946–950. 10.1107/S0021889891004399

    Article  Google Scholar 

  32. Bacon DJ, Anderson WF: A Fast Algoithm for Rendering Space-filling Molecule Pictures. J. of Molecular Graphics 1988, 6: 219–220. 10.1016/S0263-7855(98)80030-1

    Article  Google Scholar 

  33. Merritt EA, Bacon DJ: Raster3D: Photorealistic Molecular Graphics. Methods in Enzymology (Edited by: Sweet RM and Carter CW Jr). San Diego, Academic Press 1997, 277: 505–524.

    Google Scholar 

  34. DeLano WL: The PyMOL Molecular Graphics System. DeLano Scientific, San Carlos, CA, USA 2002. []

    Google Scholar 

Download references


We would like to acknowledge the consultations with Celia Chen on the crystallographic packing and selenomethione substructure. This work was supported by the National Institutes of Health grant No. P01-GM57890. This work was also supported in part by an award from the W.M. Keck Foundation. Diffraction data were collected at Southeast Regional Collaborative Access Team (SER-CAT) 22-ID beamline at the Advanced Photon Source, Argonne National Laboratory. Use of the Advanced Photon Source was supported by the U.S. Department of Energy, Office of Basic Energy Sciences, under Contract No. W-31-109-Eng-38.

Certain commercial materials, instruments, and equipment are identified in this manuscript in order to specify the experimental procedure as completely as possible. In no case does such identification imply that the materials, instruments, or equipment identified is necessarily the best available for the purpose.

The accepted SI units of concentration, mol/L, and of unified atomic mass unit, u, have been represented by the symbol M and by the symbol Da, respectively, in order to conform to the conventions of the journal.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jane E Ladner.

Additional information

Authors' contributions

JEL grew the crystals used for data collection, collected the diffraction data, solved and refined the molecular structure. GO expressed and purified the native and SeMet proteins and produced the original crystals. AT contributed to the selection of the protein and performed the X-ray fluorescence experiments. AJH provided us with access to the synchrotron and helped with the X-ray fluorsecence experiments. PPK and RDC-O performed the gene expression experiments and contributed to the selection of the protein. GLG conceived the study, participated in the coordination, and provided financial support.

Authors’ original submitted files for images

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Ladner, J.E., Obmolova, G., Teplyakov, A. et al. Crystal structure of Escherichia coli protein ybgI, a toroidal structure with a dinuclear metal site. BMC Struct Biol 3, 7 (2003).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Selenomethionine
  • Reservoir Solution
  • Advance Photon Source
  • Dime Chain
  • Purple Acid Phosphatase