Characterization of the family of Mistic homologues
© Roosild et al; licensee BioMed Central Ltd. 2006
Received: 07 March 2006
Accepted: 16 May 2006
Published: 16 May 2006
Mistic is a unique Bacillus subtilis protein with virtually no detectable homologues in GenBank, which appears to integrate into the bacterial membrane despite an overall hydrophilic composition. These unusual properties have been shown to be useful for high-yield recombinant expression of other membrane proteins through fusion to the C-terminus of Mistic. To better understand the structure and function of Mistic, we systematically searched for and characterized homologous proteins among closely related bacteria.
Three homologues of Mistic were found with 62% to 93% residue identity, all only 84 residues in length, corresponding to the C-terminal residues of B. subtilis Mistic. In every case, the Mistic gene was found partially overlapping a downstream gene for a K+ channel protein. Residue variation amongst these sequences is restricted to loop regions of the protein's structure, suggesting that secondary structure elements and overall fold have been conserved. Additionally, all three homologues retain the functional ability to chaperone fusion partners to the membrane.
The functional core of Mistic consists of 84 moderately conserved residues that are sufficient for membrane targeting and integration. Understanding the minimal structural and chemical complexity of Mistic will lead to insights into the mechanistic underpinnings of Mistic-chaperoned membrane integration, as well as how to optimize its use for the recombinant heterologous expression of other integral membrane proteins of interest.
Integral membrane (IM) proteins constitute nearly a third of the proteins of sequenced genomes and play critical roles in intercellular signaling, homeostasis and metabolite transport. Additionally, they are the target of a majority of therapeutic pharmaceuticals. However, our understanding of this class of proteins has lagged that of soluble proteins due to inherent difficulties in their recombinant production and their structural analysis. A new method to overcome the first obstacle recently emerged with the discovery of Mistic, a unique hydrophilic protein from Bacillus subtilis that associates with the bacterial membrane, and when fused to the N-terminus of other IM proteins can chaperone their expression in E. coli at high yields . It has been proposed that Mistic is able to autonomously integrate, in a Sec-independent manner, into the lipid bilayer. This is based on the indirect evidence that the protein lacks a stretch of hydrophobic amino acids that could be interpreted, mechanistically, as a signal sequence by the bacteria's translocon machinery. Additionally, high level expression of Mistic and Mistic fusions can be achieved without the toxicity normally observed with recombinant expression of IM proteins at levels saturating the secretory system. More recently it was shown that Mistic fused to GFP partitioned to liposomes in a cell-free expression system lacking a translocon system . Nevertheless, the physical mechanism by which Mistic accomplishes its chaperoning function remains unclear, and it is almost imperative to hypothesize that the highly hydrophilic surface of the NMR structure of Mistic must undergo a substantial, dynamic, conformational transition in order to associate with the membrane.
Results and discussion
Mistic gene structure
Genomic DNA from four species of Bacillus closely related to B. subtilis was amplified using two 'MisticSeeker' oligonucleotides complementary to conserved regions of the upstream gene, YugP, and the downstream gene, YugO-b (Figure 1b). For B. licheniformis, B. mojavensis and B. atrophaeus, amplified PCR fragments produced nucleotide sequence data of 900–950 base pairs, similar to that expected based on the positioning of the MisticSeeker primers with respect to the B. subtilis genome. In contrast, the primers only produced a 651 base pair fragment from B. pumilus genomic DNA. Alignment of the base sequences reveals general conservation of this region of the Bacillus chromosome, with higher degrees of conservation within the predicted open-reading-frames of the flanking genes, as expected. Of the variations observed, most notable is that B. pumilus has a large deletion (~250 bps) in the region encoding Mistic in B. subtilis (Figure 1c). Also, while there are no gaps or insertions in the 252 base pairs that encode for the C-terminal 84 residues of Mistic in B. subtilis, there are ample such frame shifting mutations in the region that aligns with the coding region for the first 26 residues of the original Mistic. Our data for B. licheniformis are consistent with that recently published for its sequenced genome [9, 10].
Mistic homologues have only 84 residues
Mistic homologues conserve structurally intrinsic and acidic residues
The degree of conservation of a residue (determined as described in Methods) was mapped to the NMR structure of Mistic (PDB id: 1YGM). Unconserved residues are generally restricted to loop regions or the C-terminus of Mistic leaving a subset of highly conserved residues forming the core of the Mistic structure (Figure 2b). This pattern is consistent with the retention of secondary structural elements, as well as the overall protein fold, amongst the homologues. The first forty residues of Mistic are significantly more conserved (30 of 40 strictly conserved) than the C-terminal half (15 of 44). This aspect of the conservation pattern is even more striking in the comparison of the two interhelical loops: L2-3 has 9 of 13 residues strictly conserved, whereas L3-4 only retains a single glycine out of seven residues. Perhaps not coincidentally, the N-terminal 40 residues also contain the two most prominently distinct structural elements of Mistic, namely a substantial kink located at the centre of the longest helix (α2) and a partially re-entrant loop (L2-3) that buries several consecutive residues in the core of the helical bundle. These elements may play critical roles in modulating Mistic's conformational changes. Buried residues are conserved to a greater degree than accessible residues (27 of 41 vs. 18 of 43) again supporting the notion that the interhelical interactions that stabilize the folded bundle are retained within the family of Mistic proteins. Most intriguing however, is the nearly strict conservation of the distribution of acidic residues (aspartate, glutamate) over the surface of Mistic's structure. Discounting the three highly flexible C-terminal residues, there are only five cases (out of 19) where a negatively charged residue is not strictly conserved in sequence space. In each of these instances, a compensating mutation can be found structurally proximate to the site of the alteration (Figure 2). Four of these pairs move the negative charge to an adjacent residue, one or two turns along the face of a helix, or to an adjacent alternate loop, respectively, while retaining a single negative charge. In the fifth case, the compensating loss of an adjacent cationic lysine residue maintains locally a similar overall net surface charge distribution. These observations suggest that the highly anionic nature of Mistic may be central to its mechanism of targeting and associating with the lipid bilayer.
Mistic homologues retain chaperone function
Mistic is a protein that assists the integration into the lipid bilayer of covalently-linked, recombinantly expressed IM proteins, but the mechanisms underlying this ability are unknown. We have shown here that the functional core of Mistic consists of the C-terminal 84 residues. This core conserves to a substantial degree residues critical to the formation of helical secondary structural elements as well as residues important for interhelical interactions. Even more strictly conserved is the surface distribution of an abundance of acidic residues. While this characteristic may seem incompatible with membrane association, in many respects it chemically and structurally resembles amphiphilic, anionic fusogenic peptides like the synthetic 20-amino acid peptide with 5 glutamates derived from the amino-terminal segment of hemagglutinin of influenza virus [11, 12]. Analogous functional mechanisms may also exist here, as both proteins play roles in membrane targeting of downstream proteins.
The existence of functional, truncated Mistic homologues raises the question as to what role the first helix (α1) of the four-helical bundle structure plays in Mistic's membrane association mechanism or its membrane topology. One plausible explanation is that the hydrophobic core of the Mistic bundle, which is exposed with removal of the N-terminal helix (α1), is in fact the hydrophobic surface that forms the interface for association with the lipid bilayer. This model is consistent with the previous observation that mutation of a core methionine within this putative hydrophobic surface to a more hydrophilic residue reduces Mistic's membrane affinity and chaperone efficacy . In this case, Mistic's association peripherally, with the outer leaflet of the inner membrane, would force any trailing fusion protein linked to the C-terminus of Mistic to interact with and partition between the membrane and periplasmic space, thus facilitating its integration. This mode of association is consistent with the recent analysis of the structural nature of proteins at the membrane-water interface that concludes that the presence of charged, amphiphilic helices positioned interfacially, roughly parallel with the membrane surface, is not irregular .
Mistic's in vivo function is equally perplexing. We have demonstrated that the distribution of this gene is restricted to a very limited number of closely related soil Bacilli and is always linked to the bacterium's K+ channel gene as overlapping reading frames. However, these observations have not yet produced a testable hypothesis as to Mistic's natural function. So far, the simple gene knockout produces no obvious phenotype (unpublished data). Nonetheless, the sequence conservation pattern observed within the family of Mistic proteins provides an additional clue towards elucidating the molecular mechanism of Mistic-facilitated membrane association, uncovering its natural function, and potentially optimizing its sequence for the recombinant heterologous expression of other IM proteins of interest.
Cloning of Mistic homologues
Genomic DNA was obtained from the Bacillus Genetic Stock Center (BGSC) for B. licheniformis, BGSCID 5A36; B. mojavensis, BGSCID 28A1; B. atrophaeus, BGSCID 11A1; B. pumilus, BGSCID 8A3 and was amplified using two 'MisticSeeker' oligonucleotides. (MisticSeekerOligo5':ATGCTAATACGACTCACTATAGGGGCTCTTTACTTTAAATTGTGCCC; MisticSeekerOligo3':ATGGCTAGTTATTGCTCAGCGGCCGACTGWNGANACNGTNABNABNGCCCACCADATNCC) PCR was conducted for 30 cycles with one minute incubations between melting (94°C), annealing (50°C), and elongation (72°C), temperatures using Vent DNA polymerase. The amplified product was sequenced using the same MisticSeeker oligos. Both DNA sequences and conceptually translated protein sequences were analyzed and aligned using ClustalW . Residues were categorized as either having a single, fully conserved residue, being strongly conserved (STA; NEQK; NHQK; NDEQ; QHRK; MILV; MILF; HY; FYW), weakly conserved (CSA; ATV; SAG; STNK; STPA; SGND; SNDEQK; NDEQHK; NEQHRK; FVLIM; HFY), or unconserved, consistent with the positively scoring groups that occur in the Gonnet Pam250 matrix. Secondary structural boundaries and residue accessibility were calculated and drawn using PROCHECK .
Expression of fusion proteins with Mistic homologues
Mistic was cloned by PCR into Gateway® destination (Invitrogen) vectors for expression studies on eukaryotic IM proteins. Eukaryotic target genes in Gateway® donor vectors were recombined with destination vectors to create expression vectors with the cargo protein fused downstream of Mistic with a separation linker of 19 amino acids. Freshly transformed colonies were cultured in TB and induced with 0.1 mM isopropyl-β-Dthiogalactopyranoside (IPTG) at an O.D. of 1.0. Growth was continued overnight at 10–18°C. Cells were harvested and resuspended in 50 mM Tris pH 8.0, 300 mM KCl, 10% glycerol, 10 mM imidazole with 1 mg/ml lysozyme. Cells were disrupted by sonication on ice and membranes were pelleted by high speed centrifugation (100,000 × g). Membranes were solubilized by sonication in the above buffer with the addition of 20 mM LDAO. Insoluble material was removed by high speed centrifugation and the desired protein was purified from the resulting supernatant using Ni-NTA affinity chromatography (Qiagen). Purified protein was analyzed by SDS-PAGE, before and after overnight incubation with thrombin at 4°C.
We would like to thank Dan Ziegler of the Bacillus Genetic Stock Center for assistance in preparing Bacillus genomic DNA, species verification, and discussions on Bacillus phylogeny. The authors thank Matt Lundberg for sharing unpublished Mistic knock-out data. This work is supported by NIH (GM74821, GM74929). The analyzed DNA sequences from B. atrophaeus, B. mojavensis and B. pumilus have been deposited in GenBank with accession numbers DQ402366, DQ402367 and DQ402368, respectively. TPR and S. Castronovo are currently at Drug Development Division, Nevada Cancer Institute, Las Vegas, Nevada 89135, USA.
- Roosild TP, Greenwald J, Vega M, Castronovo S, Riek R, Choe S: NMR structure of Mistic, a membrane-integrating protein for membrane protein expression. Science 2005, 307: 1317–1321. 10.1126/science.1106392View ArticlePubMedGoogle Scholar
- Noireaux V, Bar-Ziv R, Godefroy J, Salman H, Libchaber A: Toward an artificial cell based on gene expression in vesicles. Phys Biol 2005, 2: P1–8. 10.1088/1478-3975/2/3/P01View ArticlePubMedGoogle Scholar
- Socolich M, Lockless SW, Russ WP, Lee H, Gardner KH, Ranganathan R: Evolutionary information for specifying a protein fold. Nature 2005, 437: 512–518. 10.1038/nature03991View ArticlePubMedGoogle Scholar
- Panchenko AR, Kondrashov F, Bryant S: Prediction of functional sites by analysis of sequence and structure conservation. Protein Sci 2004, 13: 884–892. Epub 2004 Mar 9 10.1110/ps.03465504PubMed CentralView ArticlePubMedGoogle Scholar
- Ota M, Kinoshita K, Nishikawa K: Prediction of catalytic residues in enzymes based on known tertiary structure, stability profile, and sequence conservation. J Mol Biol 2003, 327: 1053–1064. 10.1016/S0022-2836(03)00207-9View ArticlePubMedGoogle Scholar
- Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D: Detecting protein function and protein-protein interactions from genome sequences. Science 1999, 285: 751–753. 10.1126/science.285.5428.751View ArticlePubMedGoogle Scholar
- Suel GM, Lockless SW, Wall MA, Ranganathan R: Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nat Struct Biol 2003, 10: 59–69. 10.1038/nsb881View ArticlePubMedGoogle Scholar
- Lockless SW, Ranganathan R: Evolutionarily conserved pathways of energetic connectivity in protein families. Science 1999, 286: 295–299. 10.1126/science.286.5438.295View ArticlePubMedGoogle Scholar
- Rey MW, Ramaiya P, Nelson BA, Brody-Karpin SD, Zaretsky EJ, Tang M, Lopez de Leon A, Xiang H, Gusti V, Clausen IG, Olsen PB, Rasmussen MD, Andersen JT, Jorgensen PL, Larsen TS, Sorokin A, Bolotin A, Lapidus A, Galleron N, Ehrlich SD, Berka RM: Complete genome sequence of the industrial bacterium Bacillus licheniformis and comparisons with closely related Bacillus species. Genome Biol 2004, 5: R77. Epub 2004 Sep 13 10.1186/gb-2004-5-10-r77PubMed CentralView ArticlePubMedGoogle Scholar
- Veith B, Herzberg C, Steckel S, Feesche J, Maurer KH, Ehrenreich P, Baumer S, Henne A, Liesegang H, Merkl R, Ehrenreich A, Gottschalk G: The complete genome sequence of Bacillus licheniformis DSM13, an organism with great industrial potential. J Mol Microbiol Biotechnol 2004, 7: 204–211. 10.1159/000079829View ArticlePubMedGoogle Scholar
- Murata M, Takahashi S, Kagiwada S, Suzuki A, Ohnishi S: pH-dependent membrane fusion and vesiculation of phospholipid large unilamellar vesicles induced by amphiphilic anionic and cationic peptides. Biochemistry 1992, 31: 1986–92. 10.1021/bi00122a013View ArticlePubMedGoogle Scholar
- Dubovskii PV, Li H, Takahashi S, Arseniev AS, Akasaka K: Structure of an analog of fusion peptide from hemagglutinin. Protein Sci 2000, 9: 786–98.PubMed CentralView ArticlePubMedGoogle Scholar
- Granseth E, von Heijne G, Elofsson A: A study of the membrane-water interface region of membrane proteins. J Mol Biol 2005, 346: 377–85. Epub 2004 Dec 13 10.1016/j.jmb.2004.11.036View ArticlePubMedGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTALW: improving the sensitivity of progressive multiple sequence alignment thropugh sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994, 22: 4673–4680.PubMed CentralView ArticlePubMedGoogle Scholar
- Laskowski RA, MacArthur MW, Moss DS, Thornton JM: PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Crystallogr 1993, 26: 283–291. 10.1107/S0021889892009944View ArticleGoogle Scholar