Structural and phylogenetic analysis of a conserved actinobacteria-specific protein (ASP1; SCO1997) from Streptomyces coelicolor
© Gao et al; licensee BioMed Central Ltd. 2009
Received: 26 January 2009
Accepted: 10 June 2009
Published: 10 June 2009
The Actinobacteria phylum represents one of the largest and most diverse groups of bacteria, encompassing many important and well-characterized organisms including Streptomyces, Bifidobacterium, Corynebacterium and Mycobacterium. Members of this phylum are remarkably diverse in terms of life cycle, morphology, physiology and ecology. Recent comparative genomic analysis of 19 actinobacterial species determined that only 5 genes of unknown function uniquely define this large phylum . The cellular functions of these actinobacteria-specific proteins (ASP) are not known.
Here we report the first characterization of one of the 5 actinobacteria-specific proteins, ASP1 (Gene ID: SCO1997) from Streptomyces coelicolor. The X-ray crystal structure of ASP1 was determined at 2.2 Ǻ. The overall structure of ASP1 retains a similar fold to the large NP-1 family of nucleoside phosphorylase enzymes; however, the function is not related. Further comparative analysis revealed two regions expected to be important for protein function: a central, divalent metal ion binding pore, and a highly conserved elbow shaped helical region at the C-terminus. Sequence analyses revealed that ASP1 is paralogous to another actinobacteria-specific protein ASP2 (SCO1662 from S. coelicolor) and that both proteins likely carry out similar function.
Our structural data in combination with sequence analysis supports the idea that two of the 5 actinobacteria-specific proteins, ASP1 and ASP2, mediate similar function. This function is predicted to be novel since the structures of these proteins do not match any known protein with or without known function. Our results suggest that this function could involve divalent metal ion binding/transport.
Actinobacteria constitute one of the main phyla within the Bacteria and they are highly diverse in terms of their morphology, physiology and ecology [2–5]. These bacteria are characterized by high G+C content (greater than 55 mol%) [3, 4] and a monoderm cell structure (i.e. bounded by a single membrane)[6, 7]. They include Streptomyces, the major antibiotic producers in the pharmaceutical industry as well as many important human, animal and plant pathogens, such as Mycobacterium, Tropheryma, Nocardia, Propionibacterium, Leifsonia, etc. However, except for their clustering in the 16S rRNA tree, no molecular, biochemical or physiological characteristics are known that can clearly distinguish species belonging to the phylum Actinobacteria from other bacteria [8, 9].
Comparative analyses of genomic sequences are enabling identification of novel genetic characteristics that are unique to different groups of bacteria. Large numbers of proteins and conserved indels (inserts and deletions) that are specific for various prokaryotic groups such as Archaea, Chlamydiae, Bacteriodetes-Chlorobi, Proteobacteria, etc. have been identified [10–14]. Our recent comparative genomic studies on available actinobacterial genomes have identified a large number of proteins that are either specific for all actinobacterial species or certain subgroups within this phylum . Blast searches with these proteins show no significant hits or similarity to any other protein in the databases. These proteins thus provide novel and useful molecular markers for this diverse group of bacteria . Among these actinobacteria-specific proteins, five proteins (corresponding to ML1009, ML1306, ML1029, ML0257 and ML0642 from the genome of Mycobacterium leprae TN) were found in every sequenced actinobacterial species  including those from the deepest branch Rubrobacter xylanophilus and also from intracellular pathogens such as Tropheryma whipplei which have highly reduced genomes [9, 15]. All five of these proteins are conserved within actinobacteria but have no known function. These five actinobacteria-specific proteins are referred to in this work as ASP-1, 2, 3, 4 and 5. The simplest and most logical explanation for the persistence of these proteins in only actinobacteria is that their genes evolved only once in a common ancestor of all actinobacteria and were subsequently passed on to all their decedents. So these genes/proteins provide among the very few molecular characteristics known that are distinctive of the Actinobacteria phylum [1, 8, 16, 17]. In view of their actinobacteria-specificity, it is of great interest to determine the cellular functions of these proteins and the cellular processes in which they participate. These studies are expected to provide novel insights into biochemical processes and physiological characteristics that are unique to actinobacteria.
In an attempt to gain insight into the cellular functions of these proteins, we have initiated structural work on these 5 actinobacteria-specific proteins. We report here the crystal structure of SCO1997 from S. coelicolor, which corresponds to the protein ML1009 from M. leprae (ASP1) . Structural and phylogenetic analysis indicates that although ASP1 retains a similar overall fold compared to members of the hydrolase superfamily such as purine nucleoside phosphorylase, the active site region and therefore function of ASP1 are distinct [18, 19]. Comparison of the most highly conserved sequences of ASP1 from different actinobacteria with their positions in the crystal structure reveals a potential role for ASP1 in binding and transport of divalent metal ion. Interestingly, additional sequence and structural analyses show that another actinobacteria-specific protein ASP2 (SCO1662; ML1306) is evolutionarily and functionally related to ASP1 .
Results and discussion
Crystal Structure of ASP1 from S. coelicolor
The protein ASP-1 is of hypothetical or unknown function. The genes involved in related functions (e.g. those that are part of an operon) are generally clustered in various species or closely related species. Thus, genetic linkage studies can often provide valuable clues regarding possible cellular function of a given gene/protein [20–22]. Hence, we have examined the neighboring genes of ASP1 in various sequenced actinobacteria. The genes flanking ASP1 in different actinobacterial genomes are either of unknown function or perform unrelated functions. The information for these flanking genes is presented in the Additional file 1 and it provides no clue regarding the possible cellular function of this protein.
Crystallographic data and model refinement statistics.
a, b, c (Å)
135.1, 135.1, 135.1
135.4, 135.4, 135.4
α, β, γ (°)
90, 90, 90
90, 90, 90
Bond lengths (Å)
Bond angles (°)
Assembly of the ASP1 trimer results in the formation of a roughly globular complex (~diameter 70 Ǻ) with three notable features (Figure 3). First, one side of the trimer adopts a very flat surface, forming what could perhaps function as a large docking interface. The electrostatic potential on this surface is quite neutral having only a small amount of basic potential. A second unusual feature of the ASP1 trimer is the presence of a large internal cavity (~7500 Ǻ3) surrounded by a three-pronged claw-like structure. Given the size of this cavity and overall claw-like structure that surrounds it, it is quite possible that this region acts as a binding surface for another protein(s) and or substrate. The electrostatic surface potential of each claw is negative creating an overall acidic surface on the internal cavity region of ASP1.
Although it is tempting to speculate that the presence of two Mg2+ ions in the central pore region of ASP1 suggests a role for ASP1 in metal transport, there is no direct evidence to support this idea. Furthermore, a structural comparison of ASP1 with CorA, a well characterized Mg2+ transporter whose homologs are present in S. coelicolor and various actinobacteria [23, 24], shows no obvious similarity between these proteins (results not shown). Therefore, if ASP1 function does involve some aspect of Mg2+ binding and/or transporter it does not appear to be similar to that conducted by CorA.
Structural Comparisons of ASP1
To further characterize the structure of ASP1 and gain insight into its possible function, we performed a comparative structural analysis using the program DaliLite version 3 . This analysis revealed significant structural similarity to a homologue from Corynebacterium glutamicum (GeneID: Ncgl1848) [PDB: 2P90], as well as several bacterial purine nucleoside phosphorylases and a number of other glycosidic hydrolases from the larger NP-1 family.
Comparison of ASP1 from S. coelicolor and C. glutamicum
As expected, structural comparison of ASP1 from S. coelicolor and C. glutamicum showed a high degree of conservation (root mean square deviation (RMSD): 1.6 Ǻ). Importantly, the structure of ASP1 from C. glutamicum crystallized as a trimer that is identical to the trimer reported here for ASP1 from S. coelicolor. This finding, along with our gel filtration data, provides additional support for the trimeric structure of ASP1 generated through crystallographic symmetry. Another important observation from the comparison of the structure from C. glutamicum is the structural conservation of the metal binding pore despite the absence of bound metal ion. The fact that the pore region adopts an identical structure even when a metal ion is not present provides strong evidence to suggest that the binding of metal is not simply required for structure integrity of the ASP1 trimer.
Comparison of ASP1 and PNP
In contrast, the following evidence strongly suggests that ASP1 does not function as a nucleoside phosphorylase. First, a large region of PNP responsible for forming an entire side of its active site cleft (residues ~100–180 encompassing β7–8–9 and αC-D-E; Figure 7B) is completely missing in the ASP1 structure, rendering ASP1 incompatible of binding nucleoside. Second, a sequence alignment (Additional file 3) of ASP1 homologues fails to identify any of the highly conserved residues involved in substrate binding or catalysis within the NP-1 family. Furthermore, from sequence and structural alignments it is equally clear that those regions of ASP1 which are most highly conserved, are not present within NP-1 family members. Finally, a PNP homologue in the S. coelicolor genome has already been identified (SCO4917) and shows no significant similarity to ASP1. Taken together, the observations from both sequence and structural comparison indicate that while ASP1 and PNP share similar overall structure and topology, their functions are different.
Phylogenetic Analysis of ASP1 and ASP2
Of the five actinobacteria-specific genes previously identified through comparative genomic analysis of 19 actinobacterial species, two genes ASP1 (SCO1997; ML1009) and ASP2 (SCO1662; ML1306) appear to encode structurally related proteins . These proteins have comparable length and share significant sequence similarity (25% identity and 43% similarity). The question remains, are these two conserved actinobacteria-specific proteins functionally related?
Sequence alignment of ASP1 and ASP2 homologues demonstrate that important residues which are highly conserved in ASP1 homologues and likely involved in protein function are also conserved in ASP2 homologues (Additional file 4). As highlighted in Additional file 4, 8 of the 15 absolutely conserved residues from ASP1 homologues are also absolutely conserved amongst ASP2 homologues. The remaining 7 are still highly conserved and are only substituted with similar amino acids (Additional file 4). This finding further underscores the importance of these residues in mediating the function of both paralogs. As stated earlier, amino acids that fall within the category of absolutely conserved and solvent exposed are particularly predictive of regions important for mediating interactions with other functionally important molecules [26–29]. D71 is most interesting in this regard because it not only fits this category, but is also found bound to two magnesium ions in the ASP1 structure. We know that the binding of magnesium is not required for overall structural stability since the structure of ASP1 from C. glutamicum does not contain metal ion. The precise function of this region within ASP1 and ASP2 will require further investigation.
The Actinobacteria phylum represents one of the largest groups of bacteria. Amazingly this diverse collection of bacteria can be characterized genetically to a first approximation by the presence of only 5 unique genes. All of these 5 genes, are of unknown function but they are expected to encode for function(s) that ultimately control actinobacteria-specific and important biological process(es). Understanding the cellular function of a protein of unknown function is not a straightforward task [20, 30]. However, structure determination often provides the most useful information in this regard [20, 30]. In this work, we report the structure of the first actinobacteria-specific protein. Our structural data in combination with sequence analysis further supports the idea that this protein carries out a novel function. This function is novel in the sense that the structure of this protein does not match any known protein, with or without known function. Given the immense number of structures that are now available and the wide coverage of function, it is reasonable to propose that ASP1 may mediate a function highly specific to Actinobacteria. Although it is unclear from the structural data alone, it seems possible that ASP1 function may involve some aspect of divalent metal ion interaction. It will be intriguing to determine what contribution, if any, this highly conserved 'pore' region makes toward ASP1 function. Our phylogenetic analysis also shows that another actinobacteria-specific protein ASP2, which is a paralogue of ASP1, may also have similar structure and function. Future genetic and biochemical studies of these proteins is therefore of great interest in linking the conservation of the biology of actinobacteria and their 5 unique genes.
Protein Expression and Purification
The ASP1 gene (SCO1997) from S. coelicolor A3(2) was cloned into the pET-22b vector and expressed in E. coli BL21(DE3) as a full length recombinant protein with a C-terminal (His)6-tag. SeMet protein was expressed in the methionine auxotroph E. coli B834 using a previously described method . For expression of both native and SeMet derivatized ASP1, cells were grown at 37°C to an OD600 of ~0.6; induced with 1 mM isopropyl beta-D-thiogalactopyranoside (IPTG); harvested after 4 h; resuspended in a binding buffer containing 20 mM Tris, pH 7.4, 500 mM NaCl and 10 mM imidazole; lysed in a French pressure cell; and clarified by centrifugation. Supernatant was loaded on a 1 mL Ni-column, and washed with 200 mL binding buffer along with 36 mM imidazole, and finally eluted at 300 mM imidazole. The eluted proteins were diluted 5 fold with buffer A (20 mM Tris, pH 7.5) and loaded onto a 5 mL HiTrap Q HP anion exchange column (Amersham) for further purification. Proteins were eluted with a 120 mL linear gradient from 50 to 500 mM NaCl. ASP1 eluted as a single peak at ~260 mM NaCl. Individual fractions from across the peak were pooled and buffer exchanged into a low-salt buffer (25 mM KCl, 10 mM HEPES, pH 7.5) for crystallization. The buffer used for gel filtration chromatography contained 20 mM Tris (pH 7.4) and 200 mM KCl.
Crystallization and Data Collection of ASP1
All crystals were grown at 17°C using the hanging drop/vapour diffusion method. Hanging drops containing 1 uL of protein solution (5 mg/mL) and 1 uL of mother liquor (0.1 M MES, 0.55 M magnesium formate, pH6.5~6.8, 0.25~0.5% n-Octyl-beta-D-glucoside, 0~1.5% glycerol) were dehydrated over a reservoir containing 800 uL of 1.5 M (NH4)2SO4. Cubic shaped crystals (100 × 100 × 100 μm3), suitable for data collection, grew after approximately 3 days incubation. Crystals were flash frozen directly in a nitrogen cold stream (100 K) with no further cryo-protection. Diffraction data sets for native and SeMet crystals were collected at wavelengths of 1.1 and 0.979 Å, respectively. All data was collected at the X25 beamline using an ADSC Q315 CCD x-ray detector (NSLS, Brookhaven, NY).
Structure Determination and Model Refinement
SAD data collected to 2.0 Å was processed using d*TREK . All 5 of the expected SeMet sites were located using HYSS [33, 34]. Phasing and density modification were carried out using CNS . Iterative rounds of manual model building and refinement were performed with Coot and REFMAC5 until R and Rfree values converged and could no longer be improved [36, 37]. The coordinates of the final ASP1 model were deposited in the Protein Data Bank under accession code 3E35. Surface area calculations were performed using the program PISA version 1.15 . Structure similarity searches were performed by DaliLite program v3 . Structural illustrations presented in figures were generated with PyMOL .
Phylogenetic analyses were carried out based on sequence alignments for ASP1 and ASP2 homologous genes from 18 actinobacterial species. Among these selected species, only 8 contain one of the two genes, while the others contain both gene copies. Multiple sequence alignments were created using the ClustalX version 1.83 . The alignment was then imported into TREE-PUZZLE version 5.2 for maximum-likelihood (ML) analysis using the WAG+F model with gamma distribution of evolutionary rates with four categories [41, 42].
This work was supported by research grants from the Canadian Institutes of Health Research to RSG (MOP-19391) and to MSJ (MOP-89903). The salary support for BG was provided by RSG research grant. SSM was supported by scholarship from Natural Sciences and Engineering Research Council of Canada.
- Gao B, Paramanathan R, Gupta RS: Signature proteins that are distinctive characteristics of Actinobacteria and their subgroups. Antonie Van Leeuwenhoek 2006, 90: 69–91.View ArticlePubMed
- Garrity GM, Bell JA, Lilburn TG: The Revised Road Map to the Manual. In Bergey's Manual of Systematic Bacteriology, Part A, Introductory Essays. Volume 2. Edited by: Brenner DJ, Krieg NR, Staley JT. New York: Springer; 2005:159–220.View Article
- Ventura M, Canchaya C, Tauch A, Chandra G, Fitzgerald GF, Chater KF, van Sinderen D: Genomics of Actinobacteria: Tracing the evolutionary history of an ancient phylum. Microbiol Mol Biol Rev 2007, 71(3):495–548.PubMed CentralView ArticlePubMed
- Stackebrandt E, Schumann P: Introduction to the taxonomy of actinobacteria. In Prokaryotes. Edited by: Dworkin M. Springer New York; 2006:297–321.View Article
- Ventura M, Canchaya C, Fitzgerald GF, Gupta RS, van Sinderen D: Genomics as a means to understand bacterial phylogeny and ecological adaptation: the case of bifidobacteria. Antonie van Leeuwenhoek 2007, 91: 351–372.View ArticlePubMed
- Gupta RS: Protein phylogenies and signature sequences: A reappraisal of evolutionary relationships among archaebacteria, eubacteria, and eukaryotes. Microbiol Mol Biol Rev 1998, 62: 1435–1491.PubMed CentralPubMed
- Gupta RS: The natural evolutionary relationships among prokaryotes. Crit Rev Microbiol 2000, 26: 111–131.View ArticlePubMed
- Gao B, Gupta RS: Conserved indels in protein sequences that are characteristic of the phylum Actinobacteria. Int J Syst Evol Microbiol 2005, 55: 2401–2412.View ArticlePubMed
- Stackebrandt E, Rainey FA, WardRainey NL: Proposal for a new hierarchic classification system, Actinobacteria classis nov. Int J Syst Bacteriol 1997, 47: 479–491.View Article
- Gupta RS, Griffiths E: Chlamydiae-specific proteins and indels: novel tools for studies. Trends Microbiol 2006, 14: 527–535.View ArticlePubMed
- Gao B, Gupta RS: Phylogenomic analysis of proteins that are distinctive of Archaea and its main subgroups and the origin of methanogenesis. BMC Genomics 2007, 8: 86.PubMed CentralView ArticlePubMed
- Gupta RS, Lorenzini E: Phylogeny and molecular signatures (conserved proteins and indels) that are specific for the Bacteroidetes and Chlorobi species. Bmc Evolutionary Biology 2007, 7: 71.PubMed CentralView ArticlePubMed
- Gao B, Mohan R, Gupta RS: Phylogenomics and protein signatures elucidating the evolutionary relationships among the Gammaproteobacteria. Int J Syst Evol Microbiol 2009, 59: 234–247.View ArticlePubMed
- Gupta RS, Mok A: Phylogenomics and signature proteins for the alpha Proteobacteria and its main groups. BMC Microbiol 2007, 7: 106.PubMed CentralView ArticlePubMed
- Raoult D, Ogata H, Audic S, Robert C, Suhre K, Drancourt M, Claverie JM: Tropheryma whipplei twist: A human pathogenic Actinobacteria with a reduced genome. Genome Res 2003, 13: 1800–1809.PubMed CentralPubMed
- Roller C, Ludwig W, Schleifer KH: Gram-positive bacteria with a high DNA G+C content are characterized by a common insertion within their 23S rRNA genes. J Gen Microbiol 1992, 138(6):1167–1175.View ArticlePubMed
- Zhi XY, Li WJ, Stackebrandt E: An update of the structure and 16S rRNA gene sequence-based definition of higher ranks of the class Actinobacteria, with the proposal of two new suborders and four new families and emended descriptions of the existing higher taxa. Int J Syst Evol Microbiol 2009, 59: 589–608.View ArticlePubMed
- Pugmire MJ, Ealick SE: Structural analyses reveal two distinct families of nucleoside phosphorylases. Biochemical Journal 2002, 361: 1–25.PubMed CentralView ArticlePubMed
- Mao C, Cook WJ, Zhou M, Koszalka GW, Krenitsky TA, Ealick SE: The crystal structure of Escherichia coli purine nucleoside phosphorylase: a comparison with the human enzyme reveals a conserved topology. Structure 1997, 5: 1373–1383.View ArticlePubMed
- Danchin A: From protein sequence to function. Curr Opin Struct Biol 1999, 9: 363–367.View ArticlePubMed
- Galperin MY, Koonin EV: Who's your neighbor? New computational approaches for functional genomics. Nat Biotechnol 2000, 18: 609–613.View ArticlePubMed
- Doerks T, von Mering C, Bork P: Functional clues for hypothetical proteins based on genomic context analysis in prokaryotes. Nucleic Acids Res 2004, 32: 6321–6326.PubMed CentralView ArticlePubMed
- Lunin VV, Dobrovetsky E, Khutoreskaya G, Zhang R, Joachimiak A, Doyle DA, Bochkarev A, Maguire ME, Edwards AM, Koth CM: Crystal structure of the CorA Mg2+ transporter. Nature 2006, 440: 833–837.View ArticlePubMed
- Payandeh J, Pai EF: A structural basis for Mg2+ homeostasis and the CorA translocation cycle. EMBO J 2006, 25: 3762–3773.PubMed CentralView ArticlePubMed
- Holm L, Kaariainen S, Rosenstrom P, Schenkel A: Searching protein structure databases with DaliLite v.3. Bioinformatics 2008, 24: 2780–2781.PubMed CentralView ArticlePubMed
- Schueler-Furman O, Baker D: Conserved residue clustering and protein structure prediction. Proteins 2003, 52: 225–235.View ArticlePubMed
- George RA, Spriggs RV, Bartlett GJ, Gutteridge A, MacArthur MW, Porter CT, Al Lazikani B, Thornton JM, Swindells MB: Effective function annotation through catalytic residue conservation. Proc Natl Acad Sci USA 2005, 102: 12299–12304.PubMed CentralView ArticlePubMed
- Livingstone CD, Barton GJ: Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. Comput Appl Biosci 1993, 9: 745–756.PubMed
- Lichtarge O, Bourne HR, Cohen FE: An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 1996, 257: 342–358.View ArticlePubMed
- Galperin MY, Koonin EV: 'Conserved hypothetical' proteins: prioritization of targets for experimental study. Nucleic Acids Res 2004, 32: 5452–5463.PubMed CentralView ArticlePubMed
- Hendrickson WA, Horton JR, Lemaster DM: Selenomethionyl Proteins Produced for Analysis by Multiwavelength Anomalous Diffraction (Mad) – A Vehicle for Direct Determination of 3-Dimensional Structure. Embo Journal 1990, 9: 1665–1672.PubMed CentralPubMed
- Pflugrath JW: The finer things in X-ray diffraction data collection. Acta Crystallogr D Biol Crystallogr 1999, 55: 1718–1725.View ArticlePubMed
- Adams PD, Grosse-Kunstleve RW, Hung LW, Ioerger TR, Mccoy AJ, Moriarty NW, Read RJ, Sacchettini JC, Sauter NK, Terwilliger TC: PHENIX: building new software for automated crystallographic structure determination. Acta Crystallogr D Biol Crystallogr 2002, 58(Pt 11):1948–1954.View ArticlePubMed
- Grosse-Kunstleve RW, Adams PD: Substructure search procedures for macromolecular structures. Acta Crystallogr D Biol Crystallogr 2003, 59(Pt 11):1966–1973.View ArticlePubMed
- Brunger AT, Adams PD, Clore GM, Delano WL, Gros P, Grosse-Kunstleve RW, Jiang JS, Kuszewski J, Nilges M, Pannu NS, Read RJ, Rice LM, Simonson T, Warren GL: Crystallography & NMR system: A new software suite for macromolecular structure determination. Acta Crystallogr D Biol Crystallogr 1998, 54: 905–921.View ArticlePubMed
- Emsley P, Cowtan K: Coot: model-building tools for molecular graphics. Acta Crystallogr D Biol Crystallogr 2004, 60(Pt 12 Pt 1):2126–2132.View ArticlePubMed
- Vagin AA, Steiner RA, Lebedev AA, Potterton L, McNicholas S, Long F, Murshudov GN: REFMAC5 dictionary: organization of prior chemical knowledge and guidelines for its use. Acta Crystallogr D Biol Crystallogr 2004, 60(Pt 12 Pt 1):2184–2195.View ArticlePubMed
- Krissinel E, Henrick K: Inference of macromolecular assemblies from crystalline state. Journal of Molecular Biology 2007, 372: 774–797.View ArticlePubMed
- Delano WL: The PyMOL User's Manual. Palo Alto, CA: DeLano Scientific; 2002.
- Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research 1997, 25: 4876–4882.PubMed CentralView ArticlePubMed
- Schmidt HA, Strimmer K, Vingron M, von Haeseler A: TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 2002, 18: 502–504.View ArticlePubMed
- Whelan S, Goldman N: A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 2001, 18: 691–699.View ArticlePubMed