- Research article
- Open Access
Crystal structure of THEP1 from the hyperthermophile Aquifex aeolicus: a variation of the RecA fold
- Michael Roßbach†1,
- Oliver Daumke†2,
- Claudia Klinger1,
- Alfred Wittinghofer2 and
- Michael Kaufmann1Email author
© Roßbach et al; licensee BioMed Central Ltd. 2005
Received: 18 December 2004
Accepted: 20 March 2005
Published: 20 March 2005
aaTHEP1, the gene product of aq_1292 from Aquifex aeolicus, shows sequence homology to proteins from most thermophiles, hyperthermophiles, and higher organisms such as man, mouse, and fly. In contrast, there are almost no homologous proteins in mesophilic unicellular microorganisms. aaTHEP1 is a thermophilic enzyme exhibiting both ATPase and GTPase activity in vitro. Although annotated as a nucleotide kinase, such an activity could not be confirmed for aaTHEP1 experimentally and the in vivo function of aaTHEP1 is still unknown.
Here we report the crystal structure of selenomethionine substituted nucleotide-free aaTHEP1 at 1.4 Å resolution using a multiple anomalous dispersion phasing protocol. The protein is composed of a single domain that belongs to the family of 3-layer (α/β/α)-structures consisting of nine central strands flanked by six helices. The closest structural homologue as determined by DALI is the RecA family. In contrast to the latter proteins, aaTHEP1 possesses an extension of the β-sheet consisting of four additional β-strands.
We conclude that the structure of aaTHEP1 represents a variation of the RecA fold. Although the catalytic function of aaTHEP1 remains unclear, structural details indicate that it does not belong to the group of GTPases, kinases or adenosyltransferases. A mainly positive electrostatic surface indicates that aaTHEP1 might be a DNA/RNA modifying enzyme. The resolved structure of aaTHEP1 can serve as paradigm for the complete THEP1 family.
Comparative genomics led to the definition of 4873 clusters of orthologous groups of proteins (COGs) by comparing protein sequences encoded in (currently 66) completely sequenced genomes . Aimed at finding thermophile-specific proteins among bacteria, extended phylogenetic patterns searches based on the COG-database were performed. Using this strategy, COG1618 was detected as a cluster containing proteins from all thermophilic and hyperthermophilic but only one mesophilic organism [2–4]. Surprisingly, although also absent from unicellular eukaryotes, COG1618-homologs are present in many higher multicellular organism such as Homo sapiens, Mus musculus, Danio rerio, Rattus norvegicus, etc. Because of this unusual phylogenetic distribution, aaTHEP1, the gene product of aq_1292 from the hyperthermophilic bacterium Aquifex aeolicus, was characterised biochemically as the first member of COG1618 proteins . The analysis revealed that aaTHEP1 is an NTPase catalyzing ATP and GTP hydrolysis at turnover rates of 5 × 10-3 s-1and 9 × 10-3 s-1, respectively, with a Km in the micromolar range and a temperature optimum between 70 and 80°C. Although COG1618 proteins are annotated as "predicted nucleotide kinases"such an activity could not be confirmed for aaTHEP1 experimentally and its in vivo function remains unknown. To further characterize the aaTHEP1 function, we resolved its three dimensional structure by X-ray crystallography.
Results and discussion
Overall structure, domain class and architecture
Data collection and refinement statistics A summary of all relevant crystallographic parameters during data collection and the refinement procedures is shown.
Swiss Light Source X06SA
a = 35.0 Å b = 64.2 Å c = 39.6 Å
α = 90.0° β = 105.2° γ = 90.0°
2.17 Å 3/Da
λpeak = 0.97625 Å
λinfl = 0.97980 Å
20 - 1.4 Å
20 - 1.4 Å
R symm a,b total
R symm a,b last shell
I/σ(I) last shell
Anomalous phasing power
λinfl = 1.7 λpeak = 1.8
Anomalous phasing power last shell
λinfl = 0.46 λpeak = 0.48
FOM last shell
FOM after solvent flattening
FOM after solvent flattening, last shell
18.2 - 1.4 Å
Reflections unique (test set)
Number of amino acids
Number of atoms
Number of water molecules
In summary, the overall topology of aaTHEP1 is a central sheet with helical structures on each side. According to the CATH protein structure classification , aaTHEP1 is assigned to class 18.104.22.1680 i. e. "P-loop containing nucleotide triphosphate hydrolases, homologous superfamilies with Rossmann fold topology" which are mixed alpha-beta proteins with 3-layer(α/β/α) sandwich architecture.
Structural alignments and fold classification
For comparison with other structures in the pdb-database, the DALI algorithm was employed . The closest homologue of aaTHEP1 was found to be cob(I)alamin adenosyltransferase (pdb-code: 1G5R, Z-score = 9.9) that catalyzes the final step in the conversion of vitamin B(12) to coenzyme B(12) and has a RecA-like protein fold. A comparison between the topologies of aaTHEP1, cob(I)alamin adenosyltransferase and RecA clearly shows the structural similarity (Fig. 3) despite only 9% sequence identity in the aligned region. In contrast to cob(I)alamin adenosyltransferase and RecA, aaTHEP1 contains an extension of its β-sheet consisting of strands β3-β6. We conclude that the structure of aaTHEP1 represents a variation of the RecA protein fold.
Topology of the P-loop
The catalytic centre
No electron density for an ADP molecule was found indicating that only the nucleotide-free protein crystallized. However, we found electron density for a phosphate ion in the putative nucleotide binding site where the β-phosphate of the nucleotide is expected. This is a usual phenomenon, since negatively charged ions are often found in empty nucleotide binding sites (e. g. ).
In other ATPases and GTPases, the aspartate residue of the consensus site DxxG (D106 in aaTHEP1) is involved in positioning a water-bridged magnesium ion presumably important for nucleotide hydrolysis [10, 11]. In the nucleotide free aaTHEP1, there is also a magnesium ion at the corresponding position which is octahedrally coordinated to the hydroxyl group of T14 of the P-loop, a phosphate oxygen and four water molecules. One of these water molecules (W24) makes a hydrogen bond to D106. Thus, the arrangement of the magnesium ion is similar as this found in the nucleotide-bound conformation of other ATPases and GTPases.
The protein surface
The location of conserved residues in a protein structure often points to sites which are functionally important, e. g. the catalytic centre or conserved binding sites . To detect putative binding sites of aaTHEP1, we colour coded the surface of aaTHEP1 with respect to the conservation of exposed amino acids. As can be seen in Figure 5, there is only one highly conserved region located in and around a cleft of the protein surface which includes the Walker A motif (P-loop). We conclude that this particular region represents the functionally most important site, i. e. the nucleotide and cosubstrate binding site of aaTHEP1. Not even a single amino acid residue conserved in all species aligned in Figure 1 can be detected on the residual protein surface of aaTHEP1. For that reason, we conclude that binding of the physiological cosubstrate is restricted to the neighbourhood of the nucleotide binding pocket.
Analysis of the electrostatic surface potential of aaTHEP1 strikingly revealed a number of positively charged clusters, whereas almost no negatively charged regions can be found (Figure 5). This is in agreement with the strong binding of aaTHEP1 to cation exchangers and its theoretical pI of 9.9. The largest positively charged spot is located in a conserved region close to the nucleotide binding cleft. Based on this observation and the similarity to the RecA protein we speculate that aaTHEP1 may be a DNA or RNA modifying enzyme. Gene functions can be predicted by searching for the conservation of operons and gene orders because genes found in gene strings, particularly in multiple genomes, can be assumed to be functionally linked . For THEP1, we detected 4 genomes (Aeropyrum pernix K1, Archaeoglobus fulgidus DSM 4304, Thermoplasma acidophilum DSM 1728 and Thermoplasma volcanium GSS1) where the THEP1-gene is immediately followed by a COG1867 protein on the same strand. In Pyrococcus furiosus, this protein is characterized as a N2, N2-dimethylguanosine tRNA methyltransferase . Thus, aaTHEP1 may also play a role in tRNA modification. Furthermore, both COG1867 proteins and THEP1 proteins can be considered to belong to the group of PACE-proteins (proteins from Archaea without assigned function that are conserved in Eukarya) . PACE proteins are described being involved in fundamental cellular functions and several of them are obviously related to RNA metabolism .
The human homologue
The human homologue MGC13186 (hsTHEP1) shows 39% sequence identity to aaTHEP1 (Figure 1) and was first described in a study aiming at identifying full-length ORF for all human and mouse genes. No function is yet described for this protein. However, gene profiling data from UniGene are available . hsTHEP1 is widely expressed in most of the examined tissues including brain, heart, lymph node, skin and pancreas whereas no expression was found in blood, thymus, bladder and spleen. It is especially highly expressed in embryonic and various tumour tissues. From these data we conclude that hsTHEP1 has a general function in many human tissues.
The crystal structure of aaTHEP1 uncovered a modified RecA-like fold. Although the function of aaTHEP1 remains unclear, the structure led us to conclude that the enzyme does not belong to the group of GTPase, kinases or adenosyltransferases. Analysis of the electrostatic surface potential revealed several positively charged clusters indicating the presence of putative nucleic acid binding sites. Since aaTHEP1 has homologues in thermophilic bacteria and vertebrates it can serve as a model for the complete COG1618 protein family.
To aid a consistent nomenclature of the THEP1 protein family we propose to adopt the name THEP1 to all members across the species, e.g. hsTHEP1 for the human protein, mmTHEP1 for the mouse protein, etc..
Crystallization, data collection, processing, structure solution, refinement and validation
Recombinant aaTHEP1 was purified from Escherichia coli as described earlier . Bacteria were grown in minimal media without methionine containing 50 mg/l L-selenomethionine . Crystals of the dimension 250 × 80 × 35 μm3 were obtained by the hanging drop method after mixing equal volumes of 13 mg/ml aaTHEP1 with reservoir buffer containing 15 % PEG-3350 and 0.1 M potassiumdihydrogenphosphate. For cryo-protection, crystals were soaked for 10 sec in 30 % PEG-3350, 200 mM potassiumdihydrogenphosphate and flash-frozen in liquid nitrogen. The diffraction data were collected at the Swiss Light Source (SLS) from a single crystal. Data were processed and scaled using XDS  and XSCALE . The positions of the three selenium sites in the asymmetric unit were determined using SHELXD . Those positions were refined and the electron density of the protein calculated by SHARP . Solvent flattening and histogram matching were done by SOLOMON  and DM . ARP/WARP was used to automatically build 85% of the backbone and sidechains . For further model interpretation XFIT XtalView  was used. Refinements were made with Refmac . PROCHECK  and Whatcheck  were used to validate the structure. Secondary structures were calculated using DSSP [32, 33]. DALI-searches  were carried out at , GRATH  at  and further structural comparisons using SSAP  were done at . BLAST was performed at . Figure 1 was prepared using GeneDoc available at . All figures depicting structures were prepared using PyMol  or Swiss pdb-viewer [42, 43]. The X-Ray coordinates and structure factors have been deposited in the PDB database under pdb-code 1YE8.
We are grateful to the machine and beamline groups whose outstanding efforts have made these experiments possible. We would like to thank Dr. John Doe for his support in setting up the beamline and Dr. Karin Muller for her help in analyzing our data. We also wish to thank Dr. Ilme Schlichting for collecting the data at the SLS, Drs Michael Weyand and Ingrid Vetter for their help in data analysis and Astrid Böhm for carrying out the fermentations.
- Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA: The COG database: an updated version includes eukaryotes. BMC Bioinformatics 2003, 4(1):41.PubMed CentralView ArticlePubMedGoogle Scholar
- Klinger C, Rossbach M, Howe R, Kaufmann M: Thermophile-specific proteins: the gene product of aq_1292 from Aquifex aeolicus is an NTPase. BMC Biochem 2003, 4(1):12.PubMed CentralView ArticlePubMedGoogle Scholar
- Makarova KS, Wolf YI, Koonin EV: Potential genomic determinants of hyperthermophily. Trends Genet 2003, 19(4):172–176.View ArticlePubMedGoogle Scholar
- Meereis F, Kaufmann M: PCOGR: Phylogenetic COG ranking as an online tool to judge the specificity of COGs with respect to freely definable groups of organisms. BMC Bioinformatics 2004, 5(1):150.PubMed CentralView ArticlePubMedGoogle Scholar
- Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM: CATH--a hierarchic classification of protein domain structures. Structure 1997, 5(8):1093–1108.View ArticlePubMedGoogle Scholar
- Holm L, Sander C: Protein structure comparison by alignment of distance matrices. J Mol Biol 1993, 233(1):123–138.View ArticlePubMedGoogle Scholar
- Bauer CB, Fonseca MV, Holden HM, Thoden JB, Thompson TB, Escalante-Semerena JC, Rayment I: Three-dimensional structure of ATP:corrinoid adenosyltransferase from Salmonella typhimurium in its free state, complexed with MgATP, or complexed with hydroxycobalamin and MgATP. Biochemistry 2001, 40(2):361–374.View ArticlePubMedGoogle Scholar
- Leipe DD, Koonin EV, Aravind L: Evolution and classification of P-loop kinases and related proteins. J Mol Biol 2003, 333(4):781–815.View ArticlePubMedGoogle Scholar
- Ghosh A, Uthaiah R, Howard J, Herrmann C, Wolf E: Crystal structure of IIGP1: a paradigm for interferon-inducible p47 resistance GTPases. Mol Cell 2004, 15(5):727–739.View ArticlePubMedGoogle Scholar
- Walker JE, Saraste M, Runswick MJ, Gay NJ: Distantly related sequences in the alpha- and beta-subunits of ATP synthase, myosin, kinases and other ATP-requiring enzymes and a common nucleotide binding fold. Embo J 1982, 1(8):945–951.PubMed CentralPubMedGoogle Scholar
- Pai EF, Krengel U, Petsko GA, Goody RS, Kabsch W, Wittinghofer A: Refined crystal structure of the triphosphate conformation of H-ras p21 at 1.35 A resolution: implications for the mechanism of GTP hydrolysis. Embo J 1990, 9(8):2351–2359.PubMed CentralPubMedGoogle Scholar
- Pai EF, Kabsch W, Krengel U, Holmes KC, John J, Wittinghofer A: Structure of the guanine-nucleotide-binding domain of the Ha-ras oncogene product p21 in the triphosphate conformation. Nature 1989, 341(6239):209–214.View ArticlePubMedGoogle Scholar
- Datta S, Ganesh N, Chandra NR, Muniyappa K, Vijayan M: Structural studies on MtRecA-nucleotide complexes: insights into DNA and nucleotide binding and the structural signature of NTP recognition. Proteins 2003, 50(3):474–485.View ArticlePubMedGoogle Scholar
- Ma B, Elkayam T, Wolfson H, Nussinov R: Protein-protein interactions: structurally conserved residues distinguish between binding sites and exposed protein surfaces. Proc Natl Acad Sci U S A 2003, 100(10):5772–5777.PubMed CentralView ArticlePubMedGoogle Scholar
- Wolf YI, Rogozin IB, Kondrashov AS, Koonin EV: Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. Genome Res 2001, 11(3):356–372.View ArticlePubMedGoogle Scholar
- Constantinesco F, Motorin Y, Grosjean H: Characterisation and enzymatic properties of tRNA(guanine 26, N (2), N (2))-dimethyltransferase (Trm1p) from Pyrococcus furiosus. J Mol Biol 1999, 291(2):375–392.View ArticlePubMedGoogle Scholar
- Matte-Tailliez O, Zivanovic Y, Forterre P: Mining archaeal proteomes for eukaryotic proteins with novel functions: the PACE case. Trends Genet 2000, 16(12):533–536.View ArticlePubMedGoogle Scholar
- Armengaud J, Urbonavicius J, Fernandez B, Chaussinand G, Bujnicki JM, Grosjean H: N2-methylation of guanosine at position 10 in tRNA is catalyzed by a THUMP domain-containing, S-adenosylmethionine-dependent methyltransferase, conserved in Archaea and Eukaryota. J Biol Chem 2004, 279(35):37142–37152.View ArticlePubMedGoogle Scholar
- Strausberg RL, Feingold EA, Grouse LH, Derge JG, Klausner RD, Collins FS, Wagner L, Shenmen CM, Schuler GD, Altschul SF, Zeeberg B, Buetow KH, Schaefer CF, Bhat NK, Hopkins RF, Jordan H, Moore T, Max SI, Wang J, Hsieh F, Diatchenko L, Marusina K, Farmer AA, Rubin GM, Hong L, Stapleton M, Soares MB, Bonaldo MF, Casavant TL, Scheetz TE, Brownstein MJ, Usdin TB, Toshiyuki S, Carninci P, Prange C, Raha SS, Loquellano NA, Peters GJ, Abramson RD, Mullahy SJ, Bosak SA, McEwan PJ, McKernan KJ, Malek JA, Gunaratne PH, Richards S, Worley KC, Hale S, Garcia AM, Gay LJ, Hulyk SW, Villalon DK, Muzny DM, Sodergren EJ, Lu X, Gibbs RA, Fahey J, Helton E, Ketteman M, Madan A, Rodrigues S, Sanchez A, Whiting M, Madan A, Young AC, Shevchenko Y, Bouffard GG, Blakesley RW, Touchman JW, Green ED, Dickson MC, Rodriguez AC, Grimwood J, Schmutz J, Myers RM, Butterfield YS, Krzywinski MI, Skalska U, Smailus DE, Schnerch A, Schein JE, Jones SJ, Marra MA: Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences. Proc Natl Acad Sci U S A 2002, 99(26):16899–16903.View ArticlePubMedGoogle Scholar
- Pontius JU, Wagner L, Schuler GD: UniGene: a unified view of the transcriptome. In The NCBI Handbook. Bethesda (MD) , National Center for Biotechnology Information; 2003.Google Scholar
- Van Duyne GD, Standaert RF, Karplus PA, Schreiber SL, Clardy J: Atomic structures of the human immunophilin FKBP-12 complexes with FK506 and rapamycin. J Mol Biol 1993, 229(1):105–124.View ArticlePubMedGoogle Scholar
- Kabsch W: Automatic Processing of Rotation Diffraction Data from Crystals of Internally Unknown Symmetry and Cell Constants. J Appl Cryst 1993, 26: 795–800.View ArticleGoogle Scholar
- Schneider TR, Sheldrick GM: Substructure solution with SHELXD. Acta Crystallogr D Biol Crystallogr 2002, 58(Pt 10 Pt 2):1772–1779.View ArticlePubMedGoogle Scholar
- de La Fortelle E, Bricogne G: Maximum-Likelihood Heavy-Atom Parameter Refinement in the MIR and MAD Methods. In Methods in Enzymology, Macromolecular Crystallography. Volume 276. Edited by: Sweet RM, Carter CW. New York , Academic Press; 1997:472–494.View ArticleGoogle Scholar
- Abrahams JP, Leslie AWG: Methods used in structure determination of bovine mitochondrial F1 ATPase. Acta Cryst 1996, D52: 30–42.Google Scholar
- Collaborative Computational Project N: The CCP4 Suite: Programs for Protein Crystallography. Acta Cryst 1994, D50: 760–763.Google Scholar
- Lamzin VS, Perrakis A: Current state of automated crystallographic data analysis. Nat Struct Biol 2000, 7 Suppl: 978–981.View ArticlePubMedGoogle Scholar
- McRee DE: XtalView/Xfit--A versatile program for manipulating atomic coordinates and electron density. J Struct Biol 1999, 125(2–3):156–165.View ArticlePubMedGoogle Scholar
- Murshudov GN: Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr D Biol Crystallogr 1997, 53(Pt 3):240–255.View ArticlePubMedGoogle Scholar
- Laskowski RA, MacArthur MW, Moss DS, Thornton JM: PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Cryst 1993, 26: 283–291.View ArticleGoogle Scholar
- Hooft RW, Vriend G, Sander C, Abola EE: Errors in protein structures. Nature 1996, 381(6580):272.View ArticlePubMedGoogle Scholar
- Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22(12):2577–2637.View ArticlePubMedGoogle Scholar
- EMBL Dali: email server for 3-D protein structure database searches[http://www.ebi.ac.uk/dali/]
- Harrison A, Pearl F, Sillitoe I, Slidel T, Mott R, Thornton J, Orengo C: Recognizing the fold of a protein structure. Bioinformatics 2003, 19(14):1748–1759.View ArticlePubMedGoogle Scholar
- GRATH Server[http://www.biochem.ucl.ac.uk/cgi-bin/cath/Grath.pl]
- Taylor WR, Orengo CA: Protein structure alignment. J Mol Biol 1989, 208(1):1–22.View ArticlePubMedGoogle Scholar
- SSAP Server[http://www.biochem.ucl.ac.uk/cgi-bin/cath/GetSsapRasmol.pl]
- NCBI BLAST[http://www.ncbi.nlm.nih.gov/BLAST/]
- GeneDoc HomePage[http://www.psc.edu/biomed/genedoc/]
- PyMOL Home Page[http://pymol.sourceforge.net/]
- DeepView - Swiss PDB Viewer Home Page[http://www.expasy.org/spdbv/]
- Guex N, Peitsch MC: SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis 1997, 18(15):2714–2723.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.