Open Access

Crystal structure of Escherichia coli protein ybgI, a toroidal structure with a dinuclear metal site

  • Jane E Ladner1Email author,
  • Galina Obmolova1,
  • Alexey Teplyakov1,
  • Andrew J Howard3,
  • Pavel P Khil2,
  • R Daniel Camerini-Otero2 and
  • Gary L Gilliland1
BMC Structural Biology20033:7

DOI: 10.1186/1472-6807-3-7

Received: 19 June 2003

Accepted: 30 September 2003

Published: 30 September 2003



The protein encoded by the gene ybgI was chosen as a target for a structural genomics project emphasizing the relation of protein structure to function.


The structure of the ybgI protein is a toroid composed of six polypeptide chains forming a trimer of dimers. Each polypeptide chain binds two metal ions on the inside of the toroid.


The toroidal structure is comparable to that of some proteins that are involved in DNA metabolism. The di-nuclear metal site could imply that the specific function of this protein is as a hydrolase-oxidase enzyme.


The protein encoded by the ybgI gene of Escherichia coli is 247 residues in length and has a molecular weight of 27 kDa. It belongs to the DUF34 family of proteins [1]. No biological function is known for members of this sequentially related family of, at present, 67 proteins. One of the members of this family, NIF3 yeast, which has 22% identity with ybgI, is reported to interact with the yeast transcriptional coactivator NGG1p, but the exact function of this interaction is not known [2]. It has been suggested that the product of the human gene, NIF3L1, and its mouse ortholog, Nif3l1, which have 22% identity with ybgI and 37% identity with yeast NIF3, inhibits Ngg1p from translocation to the nucleus or that NIF3 binds to Ngg1 in the cytoplasm and enters the nucleus by cotransport [3]. Analysis of the gene expression levels in Escherichia coli under conditions of genotoxic stress caused by mitomycin C DNA damage, showed that the expression level for ybgI was significantly induced[4]. This protein has been included as a structural genomics target [5, 6] for a study focusing on proteins which have no known function. The initial targets for this project were selected from the first completely sequenced bacterial genome of the Haemophilus influenzae [7]. The protein ybgI is a sequence homolog of Haemophilus influenzae HI0105 with a sequence identity of 59%. The ybgI protein was cloned, expressed and the crystal structure was determined to 2.2-Å resolution.

Results and Discussion

The ybgI protein consists of two similar interlinked a/ß domains; both are 3-layer sandwiches (alpha-beta-alpha) as shown in Figure 1. The first domain has a 5-stranded mixed ß-sheet with two a-helices on one side and three a-helices on the other side. Two of the three a-helices are approximately parallel to the ß-strands of the ß-sheet and the third is shorter, approximately perpendicular to the ß-strands and leads over to the second domain. The order of the ß-strands is 1-4-3-2-11. The second domain also has a central mixed ß-sheet but has 6 ß-strands with the order 5-6-8-9-10-7; the ß-sheet is flanked on each side by two a-helices and there is an additional short a-helix leading back to domain 1. The crystallographic asymmetric unit contains three dimers. The application of the three-fold crystal symmetry reveals that the quaternary structure is a toroid formed by three crystallographically related dimers. In the crystals, these toroids stack forming long tubes. The toroidal structure is shown in Figures 2A and 2B.
Figure 1

The crystal structure of ybgI from E. coli. (A) Stereo view of the secondary structure cartoon showing the fold of the polypeptide chain. Domain 1 is shown with helices in blue and ß-strands in red and domain 2 is shown with helices in cyan and ß-strands in rose. The strands and helices are numbered sequentially from the N-terminus. This figure was prepared using MOLSCRIPT [31] and Raster3D [32, 33]. (B) Topology diagram of the secondary structure. Helicies are represented as rectangles and ß-strands are represented as arrows.

Figure 2

(A) Side view of the toroidal structure. The dimers are colored slate blue and cyan, chocolate brown and tan and green and lime green. (B) Top view of the toroidal structure with the same coloring as A. Secondary structure cartoons are included inside transparent surface representations. These figures were prepared using PyMol[34].

Searching with CE [8, 9], DALI [10, 11] and SCOP [12, 13] yielded no other polypeptides with the particular arrangement of mixed ß-sheets and a-helices observed in either domain.

The toroid is composed of six polypeptide chains generated by the application of 3-fold symmetry on the dimer. The 2-fold noncrystallographic symmetry operators of the dimers are perpendicular to the three-folds. The inside diameter of the toroid is approximately 30 Å and the outside diameter is approximately 90 Å; the height of the toroid is 57 Å. Due to the 2-fold, the toroid appears the same when approached from either direction along a 3-fold symmetry axis. The superposition of the native subunit structure and selenomethionine subunit structure gives RMSDs for the Ca atoms of 0.2-0.3 Å. The z positions of the toroids on the crystallographic 3-folds differ; in other words, the non-crystallographic 2-folds are not coplanar. It is also of interest to note that the relative positions of the toroids differ between the native and selenomethionine crystals. For instance, the E chain selenomethionine/methionine at position 135 is packed up against the C chain region of 138–141 in the selenomethionine structure and against the C chain region of 140–144 in the native structure.

The most likely region for the active site is a group of conserved residues which includes four histidines (63, 64, 97, 215), two glutamic acids (194, 219), one aspartic acid (101), one asparagine (108), one cysteine (171), one tyrosine (22) and one tryptophan (68). There are also two metal ions 3.3 Å apart in the selenomethionine protein and 2.5 Å apart in the native protein bound by this cluster of residues. In the early refinement of the selenomethionine structure these were treated as 'water' molecules and the B-values became very low indicating that they must be something heavier than oxygen. The anomalous Fourier map of the selenomethionine data indicates that there is a significant anomalous signal at these positions, though much lower than selenium. The X-ray fluorescence identified the presence of Fe in the protein sample. One metal ion is coordinated by H64 Ne2, H215 Ne2 and E219 Oe1; the other metal ion is coordinated by D101 Od1 and Od2, E219 Oe2 and H63 Ne2. This grouping is set back into the inside wall of the toroid and includes residues from both domains. The metal ion sites of the dimer are at opposite ends of a cavity that extends across the dimer interface. This cavity is separated from the center of the toroid by the Y22 residues of the dimer chains. The Y22 residues narrow the access to the cavity from the center of the toroid to approximately 14 Å The distances between the metal ions of the dimer chains are 21.9 and 25.6 Å. The distances between 3-fold related metal ions are 45.0 and 42.5 Å. One of the six putative active sites is shown in Figure 3. In the native protein, the metal positions may be filled or partially filled by magnesium ions which are present in both the growth medium and in the crystallization solution. An anomalous fourier using the native data does not reveal any anomalous signal at these positions and negative results using X-ray fluorescence eliminate the presence of Fe, Zn, Cu, Ni, and Co. In this structure, only 11 sites were included. The electron density at these positions tends to be somewhat smeared. The appearance of the electron density and the refinement of the B factors were used as guides to include or exclude ion sites. The protein structure around the sites is quite good. The presence of iron in the selenomethionine protein sample may indicate the adventitious uptake of iron during preparation since the procedure includes the addition of iron sulfate as a component in the growth medium [14]. The intrinsic metal ions for this protein are not known. The inclusion of histidine, glutamic acid, and aspartic acid in the putative active site with a bridging glutamic acid between the ions is in keeping with cocatalytic sites in a number of proteins where the metal ions are Zn or Zn and Fe, Mn, or Mg [15]. The constancy of the protein structure around these sites supports the view that these are catalytic rather than structural sites.
Figure 3

The putative active site with the metal ions shown in silver balls and four water molecules shown as cyan balls. This figure was prepared using MOLSCRIPT [31] and Raster3D [32, 33].

An E. coli operon has been identified that includes the nei gene which codes for endonuclease VIII and four other genes, ybgI, ybgJ, ybgK, ybgL [16]. Endonuclease VIII is an oxidative base excision repair protein. The proteins encoded by ybgJ and ybgK are putative carboxylases and the protein encoded by ybgL is a putative lactam utilization protein. The inclusion of ybgI in the nei operon of other bacteria is not well conserved.

The highly conserved residues of the DUF34 family are concentrated in two regions of the ybgI structure: at the putative active site and on the side of a groove between the polypeptide chains of the trimer. Figure 4 shows the conserved residues mapped onto the surface of the molecule.
Figure 4

A view looking down into the toroid at the putative active site. A gap between the dimers and a trough lead down toward the site. Where conserved residues contact the surface, the surface has been colored red. Again the metal ions are depicted as silver balls. This figure was prepared using PyMol [34].

The toroidal ring quaternary structure brings to mind many proteins that are involved in DNA metabolism. In a recent review [17], Hingorani and O'Donnell examine these proteins and speculate on the convergence to the toroidal shape as being a means of providing an enclosed environment for otherwise chemically unfavorable reactions. The functions of these proteins include sliding clamps and helicases that catalyze ATP-fuelled DNA unwinding, and exonucleases and topoisomerases that chemically modify DNA. For instance, the exonuclease of ? bacteriophage is a trimer and forms a toroid with an inner diameter of 30 Å at one end and 15 Å at the opposite end. The double-stranded DNA is encircled by the exonuclease and processively hydrolyzes one of the two strands. The enzyme moves with a specific orientation and degrades the 5' strand so that the product is the 3' strand [18]. The ybgI structure is a symmetric toroid, the inner diameter is the same approached from above or below.

In a review of di-iron-carboxylate proteins (proteins with di-iron centers bridged by carboxylate residues and oxide/hydroxide groups) [19], the authors grouped the known structures into four structural categories. The first three categories are all variations on helix bundles. The fourth class is the a/ß sandwich category which includes purple acid phosphatases. These proteins have di-metal centers (Fe and Zn) that catalyze the hydrolysis of phosphate esters. There is an active site tyrosine radical that is responsible for the purple color and the OH is 2.2 Å from the iron atom. In ybgI, the closest tyrosine is 11 Å away from the metal ions.


The quaternary structure taken together with the upgraded response to DNA damage, the inclusion in the operon with endonuclease VIII, and sequential homology with the yeast NIF3 protein appears consistent with a function that involves DNA repair or involvement in the transcription process. Comparison of the active site with known structures has not yet yielded a definitive clue concerning the specific biological function. Biochemical studies to further profile the function of the ybgI protein are in progress.

The atomic coordinates and structure factors of the selenomethionine and native structures of ybgI are deposited in the Protein Data Bank[20] as 1NMO and 1NMP, respectively.


Cloning, expression, and purification

The ybgI gene was PCR, polymerase chain reaction, amplified from Escherichia coli MG1655 genomic DNA and subcloned into pDONR201 plasmid using Gateway Technology (Invitrogen). For expression, the coding sequence was transferred into pDEST14 plasmid using site-specific recombination (Invitrogen). The protein was produced in E. coli strain BL21 Star (DE3) (Invitrogen) that was transformed with pDEST14. Cells were grown on LB media containing 100 µg/µL ampicillin at 37°C to an A600 of 0.6 and induced with 1 mM isopropyl ß-D-thiogalactoside for 3 hours. The protein was purified by column chromatography in two steps using Source 30Q (Pharmacia) and Butyl-560M (Toyopearl).

Crystallization and structure determination

Crystals were obtained by the vapor diffusion method in hanging drops at room temperature for the native protein and the selenomethionine derivative. The reservoir solution for the native protein included 0.1 M cacodylate buffer at pH 7.5, 0.1 M magnesium acetate, 15% (w/v) polyethylene glycol 8000 and 5% (v/v) polyethylene glycol 400. The reservoir solution for the selenomethionine protein included 0.1 M imidazole buffer pH 8.0, 0.2 M calcium acetate and 15% (w/v) polyethylene glycol 3350. The hanging drops were formed by combining equal volumes of protein solution and reservoir solution. The protein concentrations were 4.7 mg/mL for the native protein and 8.2 mg/mL for the selenomethionine protein. For data collection the crystals were passed through a solution made of equal volumes of reservoir solution and saturated lithium formate for the native crystals and 2 volumes of reservoir solution and one volume of saturated lithium formate for the selenomethionine derivative [21].

Diffraction data were collected at the Advanced Photon Source (APS) South East Regional Collaborative Access Team (SER-CAT) beam line 22ID-D at Argonne National Laboratory. All data were collected at 100 K. Data were collected at three wavelengths for the selenomethionine derivative crystal (0.9795 Å, 0.9793 Å and 0.9780 Å) and at 0.9793 Å for the native crystal. The data were processed using D * TREK [22].

The selenium sites were found with Shake-N-Bake [23, 24]. The polypeptide has four methionine residues and there are three dimers (six monomers) in the asymmetric unit. The 18 highest-ranked sites were entered into SOLVE [25] SOLVE chose the opposite hand and gave a solution with 21 sites. RESOLVE [26] was not able to find the correct noncrystallographic symmetry, but once this was determined by visual and vector examination of the sites, RESOLVE was able to build backbone for 911 of the 1482 residues and place 491 sidechains. By superimposing the partial models for the six copies of the polypeptide chain, a nearly complete tracing was determined. CNS [27] was used to refine this model against the data. As the refinement progressed the noncrystallographic symmetry restraints were reduced. XTALVIEW [28] was used to visualize the structure and to make manual adjustments of the coordinates to improve their agreement with the electron density map. REDUCE and PROBE [29] were used to guide rebuilding to help resolve side chain conformations and PROCHECK [30] was used to validate the structures.

The selenomethionine data and the native data are not isomorphous. The cells differ by greater than 1% in the a and b unit cell dimensions. Consequently, the native structure was solved by molecular replacement using CNS. The dimer unit was used as the search molecule. Refinement against the diffraction data was also accomplished using the CNS package. As in the selenomethione structure, noncrystallographic symmetry restraints were used throughout the refinement but the weighting was reduced after the initial rounds of refinement. The data and refinement statistics are shown in Table 1.
Table 1

X-Ray Data Processing and Refinement Statistics


ybgI native

ybgI SeMet

diffraction data


space group



cell (a,b,c) (Å)



resolution (Å)



wavelength (Å)





no. measured intensities





no. unique reflection





mean redundancy





R merge (all/high res.)





completeness (all/high res.)





I/s average (all/high res.)







resolution limits used (Å)



R-factor (95% data)



Rfree (5% data)



amino acid residues/atoms



non-protein atoms

11 Mg ions

12 Fe ions

no. of water molecules



bond length rms deviation (Å)



angle rms deviation (°)



average B (main/side chain) (Å2)



average B water (Å2)



Metal ion determination

X-ray fluorescence scans were performed at the absorption edges for Zn, Cu, Ni, Co and Fe at the Advanced Photon Source (APS) Industrial Macromolecular Crystallography Association Collaborative Access Team (IMCA-CAT) beam line 17-ID at Argonne National Laboratory. Solution samples of the native and SeMet proteins were used for the scans. The scans indicated the presence of Fe in the SeMet protein and no Zn, Cu, Ni, or Co, and found none of these metals present in the native protein solution.



We would like to acknowledge the consultations with Celia Chen on the crystallographic packing and selenomethione substructure. This work was supported by the National Institutes of Health grant No. P01-GM57890. This work was also supported in part by an award from the W.M. Keck Foundation. Diffraction data were collected at Southeast Regional Collaborative Access Team (SER-CAT) 22-ID beamline at the Advanced Photon Source, Argonne National Laboratory. Use of the Advanced Photon Source was supported by the U.S. Department of Energy, Office of Basic Energy Sciences, under Contract No. W-31-109-Eng-38.

Certain commercial materials, instruments, and equipment are identified in this manuscript in order to specify the experimental procedure as completely as possible. In no case does such identification imply that the materials, instruments, or equipment identified is necessarily the best available for the purpose.

The accepted SI units of concentration, mol/L, and of unified atomic mass unit, u, have been represented by the symbol M and by the symbol Da, respectively, in order to conform to the conventions of the journal.

Authors’ Affiliations

Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute and the National Institute of Standards and Technology
Genetics and Biochemistry Branch, NIDDK, National Institutes of Health
Physical Sciences Department, Illinois Institute of Technology


  1. Pfam:Protein Families database of alignmenets and HMMs[]
  2. Martens JA, Genereaux J, Saleh A, Brandl CJ: Transcriptional Activation by Yeast PDR1p Is Inhibited by Its Association with NGG1p/ADA3p. J. Biol. Chem. 1996, 271: 15884–15890. 10.1074/jbc.271.16.9298View ArticlePubMedGoogle Scholar
  3. Tascou S, Uedelhoven J, Dixkens C, Nayernia K, Engel W, Burfeind P: Isolation and characterization of a novel human gene, NIF3L1, and its mouse ortholog, Nif3l1, highly conserved from bacteria to mammals. Cytogenet. Cell Genet. 2000, 90: 330–336. 10.1159/000056799View ArticlePubMedGoogle Scholar
  4. Khil PP, Camerini-Otero RD: Over 1000 genes are involved in the DNA damage response of Escherichia coli. Mol. Microbiol. 2002, 44: 89–105. 10.1046/j.1365-2958.2002.02878.xView ArticlePubMedGoogle Scholar
  5. Eisenstein E, Gilliland GL, Herzberg O, Moult J, Orban J, Poljak RJ, Banergei L, Richardson D, Howard AJ: Biological function made crystal clear - annotation of hypothetical proteins via structural genomics. Curr Opin in Biotechnol 2000, 11: 25–30. 10.1016/S0958-1669(99)00063-4View ArticleGoogle Scholar
  6. Structure2Function Project[]
  7. Fleischmann RD, M.D. Adams., White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb J-F, Doughertry BA, Merrick JM, McKenney K, Suffon G, FitzHugh W, Fields C, CGocayne JD, Scott J, Shirley R, Liu L-I, Glocek A, Kelley JM, Weidman JF, Phillips CA, Spriggs T, Hedblom E, Cotton MD, Utterback TR, Hanna MC, Nguyen DT, Saudek DM, Brandon RC, Fine LD, Fritchman JL, Fuhrmann JL, Geoghagen NSM, Gnehn CL, McDonald LA, Small KV, Fraser CM, Smith HO, Venter JC: Whole-Genome Random Sequencing and Assembly of Haemophilus influenzae Rd. Science 1995, 269: 496.View ArticlePubMedGoogle Scholar
  8. Shindyalov IN, Bourne PE: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Engineering 1998, 11: 739–747. 10.1093/protein/11.9.739View ArticlePubMedGoogle Scholar
  9. Finding 3-D Similarities in Protein Structures[]
  10. Holm L, Sander C: Mapping the protein universe. Science 1996, 273: 595–602.View ArticlePubMedGoogle Scholar
  11. The DALI server[]
  12. Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 1995, 247: 536–540. 10.1006/jmbi.1995.0159PubMedGoogle Scholar
  13. Structural Classification of Proteins[]
  14. Hendrickson WA, Horton JR, LeMaster DM: Selenomethionyl proteins produced for analysis by multiwavelength anomalous diffraction (MAD); a vehicle for direct determination of three-dimensional structure. EMBO J. 1990, 9: 1665–1672.PubMed CentralPubMedGoogle Scholar
  15. Auld DS: Zinc coordination sphere in biochemical zinc sites. Biometals 2001, 14: 271–313. 10.1023/A:1012976615056View ArticlePubMedGoogle Scholar
  16. Gifford CM, Wallace SS: The genes encoding endonuclease VIII and endonuclease III in Escherichia coli are transcribed as the terminal genes in operons. Nucleic Acids Research 2000, 28: 762–769. 10.1093/nar/28.3.762PubMed CentralView ArticlePubMedGoogle Scholar
  17. Hingorani MM, O'Donnell M: A tale of toroids in DNA metalobism. Nat Rev Mol Cell Biol 2000, 1: 22–30. 10.1038/35036044View ArticlePubMedGoogle Scholar
  18. Kovall R, Matthews BW: Toroidal structure of lambda-exonuclease. Science 1997, 277: 1824–1827. 10.1126/science.277.5333.1824View ArticlePubMedGoogle Scholar
  19. Nordlund P, Eklund H: Di-iron -carboxylate proteins. Current Opinion in Structural Biology 1995, 5: 758–766. 10.1016/0959-440X(95)80008-5View ArticlePubMedGoogle Scholar
  20. Protein Data Bank[]
  21. Rubinson KA, Ladner JE, Tordova M, Gilliland GL: Cryosalts: suppression of ice formation in macromolecular crystallography. Acta Crystallog. 2000, D56: 996–1001.Google Scholar
  22. Pflugrath JW: The finer things in X-ray diffraction data collection. Acta Crystallog. 1999, D55: 1718–1725.Google Scholar
  23. Blessing RH, Smith GD: Difference structure-factor normalizaion for heavy-atom or anomalous-scattering substructure determinations. J. Appl. Cryst. 1999, 32: 664–670. 10.1107/S0021889899003416View ArticleGoogle Scholar
  24. Weeks CM, Miller R: The design and implementation of SnB v2.0. J. Appl. Cryst. 1999, 32: 120–124. 10.1107/S0021889898010504View ArticleGoogle Scholar
  25. Terwilliger TC, Berendzen J: Automated MAD and MIR structure solution. Acta Crystallog. 1999, D55: 849–861.Google Scholar
  26. Terwilliger TC: Automated structure solution, density modification and model building. Acta Crystallog. 2002, D58: 1937–1940.Google Scholar
  27. Brünger AT, Adams PD, Clore GM,, DeLano WL, Gros P, Grosse-Kunstleve RW, Jiang J-S, Kuszewski J, Nilges M, Pannu NS, Read RJ, Rice LM, Simonson T, Warren GL: Crystallography & NMR system: A new software suite for macromolecular structure determination. Acta Crystallog. 1998, D54: 905–921.Google Scholar
  28. McRee DE: Practical Protein Crystallography 2 Edition San Diego, Academic Press 1999, 477.Google Scholar
  29. Word JM, Lovell SC, LaBean TH, Taylor HC, Zalis ME, Presley BK, Richardson JS, Richardson DC: Visualizing and Quantifying Molecular Goodness-of-Fit: Small-probe Contact Dots with Explicit Hydrogen Atoms. J. Mol. Biol. 1999, 285: 1711–1733. 10.1006/jmbi.1998.2400View ArticlePubMedGoogle Scholar
  30. Laskowski RA, MacArthur MW, Moss DS, Thornton JM: PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Crystallogr. 1993, 26: 283–291. 10.1107/S0021889892009944View ArticleGoogle Scholar
  31. Kraulis PJ: MOLSCRIPT: A program to produce both detailed and schematic plots of protein structures. J. Applied Crystallography 1991, 24: 946–950. 10.1107/S0021889891004399View ArticleGoogle Scholar
  32. Bacon DJ, Anderson WF: A Fast Algoithm for Rendering Space-filling Molecule Pictures. J. of Molecular Graphics 1988, 6: 219–220. 10.1016/S0263-7855(98)80030-1View ArticleGoogle Scholar
  33. Merritt EA, Bacon DJ: Raster3D: Photorealistic Molecular Graphics. Methods in Enzymology (Edited by: Sweet RM and Carter CW Jr). San Diego, Academic Press 1997, 277: 505–524.Google Scholar
  34. DeLano WL: The PyMOL Molecular Graphics System. DeLano Scientific, San Carlos, CA, USA 2002. []Google Scholar


© Ladner et al; licensee BioMed Central Ltd 2003

This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.