- Research article
- Open Access
Crystal structure of THEP1 from the hyperthermophile Aquifex aeolicus: a variation of the RecA fold
BMC Structural Biologyvolume 5, Article number: 7 (2005)
aaTHEP1, the gene product of aq_1292 from Aquifex aeolicus, shows sequence homology to proteins from most thermophiles, hyperthermophiles, and higher organisms such as man, mouse, and fly. In contrast, there are almost no homologous proteins in mesophilic unicellular microorganisms. aaTHEP1 is a thermophilic enzyme exhibiting both ATPase and GTPase activity in vitro. Although annotated as a nucleotide kinase, such an activity could not be confirmed for aaTHEP1 experimentally and the in vivo function of aaTHEP1 is still unknown.
Here we report the crystal structure of selenomethionine substituted nucleotide-free aaTHEP1 at 1.4 Å resolution using a multiple anomalous dispersion phasing protocol. The protein is composed of a single domain that belongs to the family of 3-layer (α/β/α)-structures consisting of nine central strands flanked by six helices. The closest structural homologue as determined by DALI is the RecA family. In contrast to the latter proteins, aaTHEP1 possesses an extension of the β-sheet consisting of four additional β-strands.
We conclude that the structure of aaTHEP1 represents a variation of the RecA fold. Although the catalytic function of aaTHEP1 remains unclear, structural details indicate that it does not belong to the group of GTPases, kinases or adenosyltransferases. A mainly positive electrostatic surface indicates that aaTHEP1 might be a DNA/RNA modifying enzyme. The resolved structure of aaTHEP1 can serve as paradigm for the complete THEP1 family.
Comparative genomics led to the definition of 4873 clusters of orthologous groups of proteins (COGs) by comparing protein sequences encoded in (currently 66) completely sequenced genomes . Aimed at finding thermophile-specific proteins among bacteria, extended phylogenetic patterns searches based on the COG-database were performed. Using this strategy, COG1618 was detected as a cluster containing proteins from all thermophilic and hyperthermophilic but only one mesophilic organism [2–4]. Surprisingly, although also absent from unicellular eukaryotes, COG1618-homologs are present in many higher multicellular organism such as Homo sapiens, Mus musculus, Danio rerio, Rattus norvegicus, etc. Because of this unusual phylogenetic distribution, aaTHEP1, the gene product of aq_1292 from the hyperthermophilic bacterium Aquifex aeolicus, was characterised biochemically as the first member of COG1618 proteins . The analysis revealed that aaTHEP1 is an NTPase catalyzing ATP and GTP hydrolysis at turnover rates of 5 × 10-3 s-1and 9 × 10-3 s-1, respectively, with a Km in the micromolar range and a temperature optimum between 70 and 80°C. Although COG1618 proteins are annotated as "predicted nucleotide kinases"such an activity could not be confirmed for aaTHEP1 experimentally and its in vivo function remains unknown. To further characterize the aaTHEP1 function, we resolved its three dimensional structure by X-ray crystallography.
Results and discussion
Overall structure, domain class and architecture
Selenomethionine substituted aaTHEP1 was purified as described earlier  and eluted as a monomer from the final gel filtration column. Analysis of its nucleotide loading state using HPLC revealed that it was partially loaded with ADP (approx. 30%, data not shown). It was crystallized using PEG3350 as precipitant in the presence of KH2PO4 (see Methods) and crystals diffracted up to 1.4 Å using synchrotron radiation (see Table 1). Initial phases were obtained using a MAD phasing protocol (see Methods) and a model was build and refined. The final model has an Rcryst of 16.8% and an Rfree of 20.8% and contains one aaTHEP1 molecule in the asymmetric unit. 172 amino acid residues, 249 water molecules, one phosphate, one magnesium and two sodium ions were included in the model. No electron density was found for residues D38-K43 which are part of a disordered loop.
aaTHEP1 consists of a single compact domain confirming the gel filtration experiments as well as the resistance of aaTHEP1 to limited proteolysis . It is build up of nine strands and six helixes in the sequential order βαβββββαβααβαβα (Figures 1, 2, 3) which is in agreement with previously recorded CD-spectra showing an equal ratio of β-sheets and α-helices . All nine strands form a single sheet in topological order 918723465 wherein a five-stranded parallel and a four-stranded antiparallel region can be distinguished (Figure 3). Whereas the parallel part of the sheet almost lies in a plane, its antiparallel region is curved defining a convex (outer) and a concave (inner) side of the beta-structure (Figure 2). Spatially restricted to the parallel region, two α-helixes are located outside of the sheet. In contrast, a set of four helixes is distributed over the whole bended sheet at its inner side. This set consists of three parallel large α-helixes in identical N- to C-orientation who are accompanied by a further perpendicularly arranged much smaller 3/10-helix located near their N-terminal sides. The edge of the antiparallel region of the sheet forms a small bended lid that covers this smaller 3/10-helix.
In summary, the overall topology of aaTHEP1 is a central sheet with helical structures on each side. According to the CATH protein structure classification , aaTHEP1 is assigned to class 22.214.171.1240 i. e. "P-loop containing nucleotide triphosphate hydrolases, homologous superfamilies with Rossmann fold topology" which are mixed alpha-beta proteins with 3-layer(α/β/α) sandwich architecture.
Structural alignments and fold classification
For comparison with other structures in the pdb-database, the DALI algorithm was employed . The closest homologue of aaTHEP1 was found to be cob(I)alamin adenosyltransferase (pdb-code: 1G5R, Z-score = 9.9) that catalyzes the final step in the conversion of vitamin B(12) to coenzyme B(12) and has a RecA-like protein fold. A comparison between the topologies of aaTHEP1, cob(I)alamin adenosyltransferase and RecA clearly shows the structural similarity (Fig. 3) despite only 9% sequence identity in the aligned region. In contrast to cob(I)alamin adenosyltransferase and RecA, aaTHEP1 contains an extension of its β-sheet consisting of strands β3-β6. We conclude that the structure of aaTHEP1 represents a variation of the RecA protein fold.
Topology of the P-loop
Although being closest DALI-homologue, the structure of cob(I)alamin adenosyltransferase (CobA) differs significantly from aaTHEP1 within the P-loop (Figure 4). Whereas aaTHEP1 bears a P-loop typical for P-loop hydrolases, the P-loop of CobA is shorter by one amino acid which flattens its structure. This is an essential feature for the adenosyl transfer reaction . Thus, we do not expect aaTHEP1 to catalyze an adenosyl transfer. A survey comparing sequences and structures of all P-loop-fold proteins led to the definition of two major divisions, the GK- and the ASCE-class of NTPases . Whereas the GK-class includes all GTPases and kinases, the ASCE-class includes all further NTPases. Structurally, the GK-class enzymes contain adjacent P-loop and Walker B strands. In contrast, as it is the case for both aaTHEP1 and the RecA superfamily, the ASCE-proteins contain an additional strand between and a catalytic essential glutamate (E107 in aaTHEP1) within the Walker B motif, thus indicating that aaTHEP1 neither belongs to the group of GTPases nor to the kinase family.
The catalytic centre
No electron density for an ADP molecule was found indicating that only the nucleotide-free protein crystallized. However, we found electron density for a phosphate ion in the putative nucleotide binding site where the β-phosphate of the nucleotide is expected. This is a usual phenomenon, since negatively charged ions are often found in empty nucleotide binding sites (e. g. ).
In other ATPases and GTPases, the aspartate residue of the consensus site DxxG (D106 in aaTHEP1) is involved in positioning a water-bridged magnesium ion presumably important for nucleotide hydrolysis [10, 11]. In the nucleotide free aaTHEP1, there is also a magnesium ion at the corresponding position which is octahedrally coordinated to the hydroxyl group of T14 of the P-loop, a phosphate oxygen and four water molecules. One of these water molecules (W24) makes a hydrogen bond to D106. Thus, the arrangement of the magnesium ion is similar as this found in the nucleotide-bound conformation of other ATPases and GTPases.
To determine possible orientations of the nucleotide which was biochemically shown to undergo hydrolysis , we constructed a superposition of aaTHEP1 with RAS complexed with GppNHp (pdb-code: 5P21 ), and RecA complexed with ADP (pdb-code: 1MO3,) by aligning the P-loop including the precedent β-strand for spatial orientation (Figure 5). We then analyzed the resulting position of the nucleotides (GppNHp from Ras and ADP from RecA) relative to the aaTHEP1 surface (Figure 5). In both cases, the nucleotide would be located in a cleft of the aaTHEP1 surface and would sterically not clash with residues of aaTHEP1. The position of the phosphates is rather similar whereas the orientation of the ribose and especially the position of the base is markedly different in the ADP and GppNHp although the base would be close to conserved residues in both orientations. We cannot exactly envisage the base orientation of the nucleotide bound to aaTHEP1, but it is very likely that the overall orientation of the nucleotide and the position of the phosphates is correctly predicted. Consequently, the large remaining cleft located adjacently to the predicted position of the γ-phosphate is unoccupied. The pocket itself is rather unpolar but it is lined by a highly conserved patch of basic residues (Figure 5) to which a negatively charged cosubstrate, e. g. DNA/RNA could bind.
The protein surface
The location of conserved residues in a protein structure often points to sites which are functionally important, e. g. the catalytic centre or conserved binding sites . To detect putative binding sites of aaTHEP1, we colour coded the surface of aaTHEP1 with respect to the conservation of exposed amino acids. As can be seen in Figure 5, there is only one highly conserved region located in and around a cleft of the protein surface which includes the Walker A motif (P-loop). We conclude that this particular region represents the functionally most important site, i. e. the nucleotide and cosubstrate binding site of aaTHEP1. Not even a single amino acid residue conserved in all species aligned in Figure 1 can be detected on the residual protein surface of aaTHEP1. For that reason, we conclude that binding of the physiological cosubstrate is restricted to the neighbourhood of the nucleotide binding pocket.
Analysis of the electrostatic surface potential of aaTHEP1 strikingly revealed a number of positively charged clusters, whereas almost no negatively charged regions can be found (Figure 5). This is in agreement with the strong binding of aaTHEP1 to cation exchangers and its theoretical pI of 9.9. The largest positively charged spot is located in a conserved region close to the nucleotide binding cleft. Based on this observation and the similarity to the RecA protein we speculate that aaTHEP1 may be a DNA or RNA modifying enzyme. Gene functions can be predicted by searching for the conservation of operons and gene orders because genes found in gene strings, particularly in multiple genomes, can be assumed to be functionally linked . For THEP1, we detected 4 genomes (Aeropyrum pernix K1, Archaeoglobus fulgidus DSM 4304, Thermoplasma acidophilum DSM 1728 and Thermoplasma volcanium GSS1) where the THEP1-gene is immediately followed by a COG1867 protein on the same strand. In Pyrococcus furiosus, this protein is characterized as a N2, N2-dimethylguanosine tRNA methyltransferase . Thus, aaTHEP1 may also play a role in tRNA modification. Furthermore, both COG1867 proteins and THEP1 proteins can be considered to belong to the group of PACE-proteins (proteins from Archaea without assigned function that are conserved in Eukarya) . PACE proteins are described being involved in fundamental cellular functions and several of them are obviously related to RNA metabolism .
The human homologue
The human homologue MGC13186 (hsTHEP1) shows 39% sequence identity to aaTHEP1 (Figure 1) and was first described in a study aiming at identifying full-length ORF for all human and mouse genes. No function is yet described for this protein. However, gene profiling data from UniGene are available . hsTHEP1 is widely expressed in most of the examined tissues including brain, heart, lymph node, skin and pancreas whereas no expression was found in blood, thymus, bladder and spleen. It is especially highly expressed in embryonic and various tumour tissues. From these data we conclude that hsTHEP1 has a general function in many human tissues.
The crystal structure of aaTHEP1 uncovered a modified RecA-like fold. Although the function of aaTHEP1 remains unclear, the structure led us to conclude that the enzyme does not belong to the group of GTPase, kinases or adenosyltransferases. Analysis of the electrostatic surface potential revealed several positively charged clusters indicating the presence of putative nucleic acid binding sites. Since aaTHEP1 has homologues in thermophilic bacteria and vertebrates it can serve as a model for the complete COG1618 protein family.
To aid a consistent nomenclature of the THEP1 protein family we propose to adopt the name THEP1 to all members across the species, e.g. hsTHEP1 for the human protein, mmTHEP1 for the mouse protein, etc..
Crystallization, data collection, processing, structure solution, refinement and validation
Recombinant aaTHEP1 was purified from Escherichia coli as described earlier . Bacteria were grown in minimal media without methionine containing 50 mg/l L-selenomethionine . Crystals of the dimension 250 × 80 × 35 μm3 were obtained by the hanging drop method after mixing equal volumes of 13 mg/ml aaTHEP1 with reservoir buffer containing 15 % PEG-3350 and 0.1 M potassiumdihydrogenphosphate. For cryo-protection, crystals were soaked for 10 sec in 30 % PEG-3350, 200 mM potassiumdihydrogenphosphate and flash-frozen in liquid nitrogen. The diffraction data were collected at the Swiss Light Source (SLS) from a single crystal. Data were processed and scaled using XDS  and XSCALE . The positions of the three selenium sites in the asymmetric unit were determined using SHELXD . Those positions were refined and the electron density of the protein calculated by SHARP . Solvent flattening and histogram matching were done by SOLOMON  and DM . ARP/WARP was used to automatically build 85% of the backbone and sidechains . For further model interpretation XFIT XtalView  was used. Refinements were made with Refmac . PROCHECK  and Whatcheck  were used to validate the structure. Secondary structures were calculated using DSSP [32, 33]. DALI-searches  were carried out at , GRATH  at  and further structural comparisons using SSAP  were done at . BLAST was performed at . Figure 1 was prepared using GeneDoc available at . All figures depicting structures were prepared using PyMol  or Swiss pdb-viewer [42, 43]. The X-Ray coordinates and structure factors have been deposited in the PDB database under pdb-code 1YE8.
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA: The COG database: an updated version includes eukaryotes. BMC Bioinformatics 2003, 4(1):41.
Klinger C, Rossbach M, Howe R, Kaufmann M: Thermophile-specific proteins: the gene product of aq_1292 from Aquifex aeolicus is an NTPase. BMC Biochem 2003, 4(1):12.
Makarova KS, Wolf YI, Koonin EV: Potential genomic determinants of hyperthermophily. Trends Genet 2003, 19(4):172–176.
Meereis F, Kaufmann M: PCOGR: Phylogenetic COG ranking as an online tool to judge the specificity of COGs with respect to freely definable groups of organisms. BMC Bioinformatics 2004, 5(1):150.
Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM: CATH--a hierarchic classification of protein domain structures. Structure 1997, 5(8):1093–1108.
Holm L, Sander C: Protein structure comparison by alignment of distance matrices. J Mol Biol 1993, 233(1):123–138.
Bauer CB, Fonseca MV, Holden HM, Thoden JB, Thompson TB, Escalante-Semerena JC, Rayment I: Three-dimensional structure of ATP:corrinoid adenosyltransferase from Salmonella typhimurium in its free state, complexed with MgATP, or complexed with hydroxycobalamin and MgATP. Biochemistry 2001, 40(2):361–374.
Leipe DD, Koonin EV, Aravind L: Evolution and classification of P-loop kinases and related proteins. J Mol Biol 2003, 333(4):781–815.
Ghosh A, Uthaiah R, Howard J, Herrmann C, Wolf E: Crystal structure of IIGP1: a paradigm for interferon-inducible p47 resistance GTPases. Mol Cell 2004, 15(5):727–739.
Walker JE, Saraste M, Runswick MJ, Gay NJ: Distantly related sequences in the alpha- and beta-subunits of ATP synthase, myosin, kinases and other ATP-requiring enzymes and a common nucleotide binding fold. Embo J 1982, 1(8):945–951.
Pai EF, Krengel U, Petsko GA, Goody RS, Kabsch W, Wittinghofer A: Refined crystal structure of the triphosphate conformation of H-ras p21 at 1.35 A resolution: implications for the mechanism of GTP hydrolysis. Embo J 1990, 9(8):2351–2359.
Pai EF, Kabsch W, Krengel U, Holmes KC, John J, Wittinghofer A: Structure of the guanine-nucleotide-binding domain of the Ha-ras oncogene product p21 in the triphosphate conformation. Nature 1989, 341(6239):209–214.
Datta S, Ganesh N, Chandra NR, Muniyappa K, Vijayan M: Structural studies on MtRecA-nucleotide complexes: insights into DNA and nucleotide binding and the structural signature of NTP recognition. Proteins 2003, 50(3):474–485.
Ma B, Elkayam T, Wolfson H, Nussinov R: Protein-protein interactions: structurally conserved residues distinguish between binding sites and exposed protein surfaces. Proc Natl Acad Sci U S A 2003, 100(10):5772–5777.
Wolf YI, Rogozin IB, Kondrashov AS, Koonin EV: Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. Genome Res 2001, 11(3):356–372.
Constantinesco F, Motorin Y, Grosjean H: Characterisation and enzymatic properties of tRNA(guanine 26, N (2), N (2))-dimethyltransferase (Trm1p) from Pyrococcus furiosus. J Mol Biol 1999, 291(2):375–392.
Matte-Tailliez O, Zivanovic Y, Forterre P: Mining archaeal proteomes for eukaryotic proteins with novel functions: the PACE case. Trends Genet 2000, 16(12):533–536.
Armengaud J, Urbonavicius J, Fernandez B, Chaussinand G, Bujnicki JM, Grosjean H: N2-methylation of guanosine at position 10 in tRNA is catalyzed by a THUMP domain-containing, S-adenosylmethionine-dependent methyltransferase, conserved in Archaea and Eukaryota. J Biol Chem 2004, 279(35):37142–37152.
Strausberg RL, Feingold EA, Grouse LH, Derge JG, Klausner RD, Collins FS, Wagner L, Shenmen CM, Schuler GD, Altschul SF, Zeeberg B, Buetow KH, Schaefer CF, Bhat NK, Hopkins RF, Jordan H, Moore T, Max SI, Wang J, Hsieh F, Diatchenko L, Marusina K, Farmer AA, Rubin GM, Hong L, Stapleton M, Soares MB, Bonaldo MF, Casavant TL, Scheetz TE, Brownstein MJ, Usdin TB, Toshiyuki S, Carninci P, Prange C, Raha SS, Loquellano NA, Peters GJ, Abramson RD, Mullahy SJ, Bosak SA, McEwan PJ, McKernan KJ, Malek JA, Gunaratne PH, Richards S, Worley KC, Hale S, Garcia AM, Gay LJ, Hulyk SW, Villalon DK, Muzny DM, Sodergren EJ, Lu X, Gibbs RA, Fahey J, Helton E, Ketteman M, Madan A, Rodrigues S, Sanchez A, Whiting M, Madan A, Young AC, Shevchenko Y, Bouffard GG, Blakesley RW, Touchman JW, Green ED, Dickson MC, Rodriguez AC, Grimwood J, Schmutz J, Myers RM, Butterfield YS, Krzywinski MI, Skalska U, Smailus DE, Schnerch A, Schein JE, Jones SJ, Marra MA: Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences. Proc Natl Acad Sci U S A 2002, 99(26):16899–16903.
Pontius JU, Wagner L, Schuler GD: UniGene: a unified view of the transcriptome. In The NCBI Handbook. Bethesda (MD) , National Center for Biotechnology Information; 2003.
Van Duyne GD, Standaert RF, Karplus PA, Schreiber SL, Clardy J: Atomic structures of the human immunophilin FKBP-12 complexes with FK506 and rapamycin. J Mol Biol 1993, 229(1):105–124.
Kabsch W: Automatic Processing of Rotation Diffraction Data from Crystals of Internally Unknown Symmetry and Cell Constants. J Appl Cryst 1993, 26: 795–800.
Schneider TR, Sheldrick GM: Substructure solution with SHELXD. Acta Crystallogr D Biol Crystallogr 2002, 58(Pt 10 Pt 2):1772–1779.
de La Fortelle E, Bricogne G: Maximum-Likelihood Heavy-Atom Parameter Refinement in the MIR and MAD Methods. In Methods in Enzymology, Macromolecular Crystallography. Volume 276. Edited by: Sweet RM, Carter CW. New York , Academic Press; 1997:472–494.
Abrahams JP, Leslie AWG: Methods used in structure determination of bovine mitochondrial F1 ATPase. Acta Cryst 1996, D52: 30–42.
Collaborative Computational Project N: The CCP4 Suite: Programs for Protein Crystallography. Acta Cryst 1994, D50: 760–763.
Lamzin VS, Perrakis A: Current state of automated crystallographic data analysis. Nat Struct Biol 2000, 7 Suppl: 978–981.
McRee DE: XtalView/Xfit--A versatile program for manipulating atomic coordinates and electron density. J Struct Biol 1999, 125(2–3):156–165.
Murshudov GN: Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr D Biol Crystallogr 1997, 53(Pt 3):240–255.
Laskowski RA, MacArthur MW, Moss DS, Thornton JM: PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Cryst 1993, 26: 283–291.
Hooft RW, Vriend G, Sander C, Abola EE: Errors in protein structures. Nature 1996, 381(6580):272.
Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22(12):2577–2637.
EMBL Dali: email server for 3-D protein structure database searches[http://www.ebi.ac.uk/dali/]
Harrison A, Pearl F, Sillitoe I, Slidel T, Mott R, Thornton J, Orengo C: Recognizing the fold of a protein structure. Bioinformatics 2003, 19(14):1748–1759.
Taylor WR, Orengo CA: Protein structure alignment. J Mol Biol 1989, 208(1):1–22.
PyMOL Home Page[http://pymol.sourceforge.net/]
DeepView - Swiss PDB Viewer Home Page[http://www.expasy.org/spdbv/]
Guex N, Peitsch MC: SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis 1997, 18(15):2714–2723.
We are grateful to the machine and beamline groups whose outstanding efforts have made these experiments possible. We would like to thank Dr. John Doe for his support in setting up the beamline and Dr. Karin Muller for her help in analyzing our data. We also wish to thank Dr. Ilme Schlichting for collecting the data at the SLS, Drs Michael Weyand and Ingrid Vetter for their help in data analysis and Astrid Böhm for carrying out the fermentations.
MR carried out the protein expression, purification and crystallization experiments and participated in structure determination and data analysis. OD participated in structure determination and data analysis. CK participated in data analysis. AW and MK conceived of the study, and participated in its design and coordination. MK drafted the manuscript. All authors read and approved the final manuscript.
Michael Roßbach, Oliver Daumke contributed equally to this work.