The structure of pyogenecin immunity protein, a novel bacteriocin-like immunity protein from Streptococcus pyogenes

Background Many Gram-positive lactic acid bacteria (LAB) produce anti-bacterial peptides and small proteins called bacteriocins, which enable them to compete against other bacteria in the environment. These peptides fall structurally into three different classes, I, II, III, with class IIa being pediocin-like single entities and class IIb being two-peptide bacteriocins. Self-protective cognate immunity proteins are usually co-transcribed with these toxins. Several examples of cognates for IIa have already been solved structurally. Streptococcus pyogenes, closely related to LAB, is one of the most common human pathogens, so knowledge of how it competes against other LAB species is likely to prove invaluable. Results We have solved the crystal structure of the gene-product of locus Spy_2152 from S. pyogenes, (PDB:2fu2), and found it to comprise an anti-parallel four-helix bundle that is structurally similar to other bacteriocin immunity proteins. Sequence analyses indicate this protein to be a possible immunity protein protective against class IIa or IIb bacteriocins. However, given that S. pyogenes appears to lack any IIa pediocin-like proteins but does possess class IIb bacteriocins, we suggest this protein confers immunity to IIb-like peptides. Conclusions Combined structural, genomic and proteomic analyses have allowed the identification and in silico characterization of a new putative immunity protein from S. pyogenes, possibly the first structure of an immunity protein protective against potential class IIb two-peptide bacteriocins. We have named the two pairs of putative bacteriocins found in S. pyogenes pyogenecin 1, 2, 3 and 4.


Background
Many Gram-positive bacteria produce anti-bacterial peptides and small proteins, called bacteriocins. There are three main classes produced by Gram-positive lactic acid bacteria (LAB): class I bacteriocins are the lantibiotics, small (<4 kDa), post-translationally modified peptides containing unusual amino acids such as lanthionine; class II are small, unmodified, heat-stable bacteriocins (<10 kDa); class III include larger (>30 kDa) heat-labile proteins, such as murein hydrolases [1]. Most bacteriocins are synthesized as precursors, which are matured and secreted, then target a specific bacterium and kill it by increasing its membrane permeability to various small molecules. Class II bacteriocins are subdivided into IIa, pediocin-like unmodified bacteriocins, IIb, two-peptide unmodified bacteriocins, IIc, formerly class V, where the N-and C-termini are covalently linked resulting in a cyclic structure, and class IId, non-pediocin, single, linear peptides [2]. The genetics and biosynthesis of class IIa bacteriocins have been well studied [3], and these constitute one of the most important groups of antimicrobial peptides, due to their useful antibacterial properties. All known IIa bacteriocins are described as being active against Listeria and some have already been tested as food preservatives for controlling food-borne pathogens [4]. Structurally, class IIa bacteriocins are related to each other being unstructured in aqueous solution, but with a central amphiphilic alpha-helical region when in lipid micelles or TFE [5][6][7]; they contain the characteristic conserved Nterminal YGNGVxCxxxxC sequence, though usually not the GxxxG motif(s) characteristic of IIb and IIc bacteriocins [7].
The class IIb two-peptide unmodified bacteriocins, for example plantaricin E/F [8], need the complementary action of both peptides to be active [9]. These bacteriocins contain long amphiphilic alpha-helical stretches, and the two complementary peptides interact when exposed to membrane-like entities. The GxxxG motif is conserved in many two-peptide bacteriocins, and it is postulated that the two complementary peptides dimerize via a helixhelix interaction that involves this motif, to form the functionally active heterodimer [10]. The dimer functions by creating a pore within the membrane through which small molecules leak out, and, typically, the genes encoding the two peptides are found adjacent to each other on the same operon [10].
The cyclic class IIc bacteriocins are characterized by being tryptophan-rich and lacking any GG, GxxxG or YGNGVx-CxxxxC motifs, [11]; the class IId bacteriocins have none of these features but some have recently been found also to be circular, rather than linear, with conserved AxxhhN and AhhW/F motifs [12][13][14].
In order to neutralize the toxic effect of the peptide on the "producing" cell the genes encoding bacteriocins are generally co-transcribed with a cognate immunity protein. These small proteins (typically 88-115 amino acids) interact very tightly with a specific bacteriocin or pair of bacteriocins and protect the "producing" microbe from the toxic effect of its own bacteriocin [15,16]. The immunity proteins usually show high specificity for their cognate bacteriocins [17,18]. For each class IIa bacteriocin encoded in a genome there is a one to one relationship between bacteriocin and cognate immunity protein; whereas, in contrast, for each pair of class IIb two-peptide bacteriocins there is a single cognate immunity protein encoded in a genome [9]. Structures of five immunity proteins have already been solved: ImB2 [19], EntA-im [20], PedB [21], PisI [22], and Mun-Im [23], all protective against IIa bacteriocins. As yet, no structures for immunity proteins protective against IIb or IIc bacteriocins have been solved.
The sequencing of bacterial genomes, including those from human pathogens, has revealed a number of genes which might potentially code for new bacteriocins and immunity proteins, suggesting that the use of these antimicrobial peptides is more widespread than previously thought and that bacteria might be targeting several different bacterial species using these toxins. Understanding which specific immunity protein neutralizes which bacteriocin toxin is important if these peptides are to be used as antimicrobials.
The Gram-positive bacterium Streptococcus pyogenes, closely related to LAB, is one of the most common human pathogens. It causes a wide range of both minor diseases such as pharyngitis, erysipelas and pyodermas, that are readily controlled by antibiotics, as well as major, often lethal, conditions such as acute rheumatic fever, necrotizing fasciitis and streptococcal toxic shock syndrome, in developing countries and in the western world [24]. The search for new antibacterial agents effective against this species is thus urgent.
Other streptococcal species have been shown to secrete bacteriocin-like toxins, as there are reports of S. salivarius producing a variety of bacteriocin-like inhibitory substances showing in vitro inhibitory activity against S. pyogenes, including Salivaricin A [25,26]. Such observations suggest that antibacterial toxins are playing a very important role in controlling the level of the S. pyogenes population in human microbiomes. Bacteriocin-like toxins and antitoxins may well have an impact on the development of new antibacterial strategies and treatments. A thorough understanding of the biology of the bacteriocins in combination with their immunity proteins is important for any possible therapeutic use of bacteriocins. The full sequences of LAB genomes provide opportunities to scan for the presence of toxins and their corresponding immunity proteins. However, the sequence alone may not be sufficient to identify these proteins.
Here we present the first structure of the protein from locus Spy_2152 (gene names taken from S. pyogenes M1 GAS), named pyogenecin immunity protein Sp-PIP, determined at 2.15 Å resolution. We provide structural and sequence analyses, and identify the putative corresponding bacteriocin-like toxins in the S. pyogenes genome, which are found to belong to the class IIb twopeptide bacteriocins.

Structure determination
We have expressed the pyogenecin immunity protein from S. pyogenes in Escherichia coli and purified it to homogeneity. The crystal structure of Sp-PIP was solved by the single wavelength anomalous diffraction (SAD) method using selenomethionine (SeMet)-substituted protein. The model was refined to 2.15 Å with an R-factor of 15.7% and an R-free of 25.4%. The crystal asymmetric unit contains 78 (out of the 102) well ordered residues and 74 water molecules. The first methionine and the 23 C-terminal residues are disordered and cannot be identified in the electron density maps. The structure is high quality; all non-glycine and non-proline residues of the model lie either in the most favorable region (95.9% of residues) or in the additionally allowed region (4.1% of residues) of the Ramachandran plot. Detailed refinement statistics and crystallographic data are shown in Table 1. The crystal structure of Sp-PIP, shown in Figure 1, is a typical anti-parallel four-helix bundle. The four long helices (H1, residues 3-16; H3, 23-41; H4, 43-61; H5, 66-79) pack tightly together around a well-defined hydrophobic core. One additional short helix (H2, residues 17-22) connects helices H1 and H3. All long helices are between 14 and 17 residues in length. Helix-pairs H1/H4 and H2/H3 are parallel to each other and helix-pairs H1/H2 and H3/H4 cross over each other at an angle of approximately 30°. In the structure, the N-and C-termini are very close together (7 Å). Overall, the protein is slightly acidic (calculated pI = 6.06) with two large acidic patches near each end of the bundle and one large positively charged patch near the Nand C-termini.

Structural analysis
Structural comparisons, using the DALI server [27], reveal that the structure of Sp-PIP matches a range of proteins all having a four helix bundle topology, with a Z-score of 8.7 or less. In this range of Z-scores, it is not clear whether the matches are due to homology or simply to structural similarity. However, structural homology combined with Structure of 2fu2 Figure 1 Structure of 2fu2. The structure of Sp-PIP -PDB ID: 2fu2 -shown as a four helix bundle in two orthogonal views. Both Nand C-termini of proteins are labeled and the protein is colored from blue (N-terminus) to red (C-terminus).
sequence analysis, enabled us to relate this protein to structures that ranked 2 nd (1tdp), 10 th (2bl8) and 23 rd (2k19) in the list of DALI matches, and the results are shown in Table 2. These homologous proteins are all pediocin-like immunity proteins. Sp-PIP shares a RMSD of 2.4 Å with ImB2, 1tdp, over 75 residues, and, using the structure comparison service SSM at European Bioinformatics Institute http://www.ebi.ac.uk/msd-srv/ssm [28], the RMSD is found to be 2.11 Å.

Sequence analysis
Guided by the structure of Sp-PIP, whose sequence had not been assigned to any known protein family, we sought to expand our knowledge of its sequence relatives using the Pfam database [29]. Having identified that it was structurally similar to known members of the Pfam EntA_Immun family (Pfam:PF08951) we carried out an analysis with profile Hidden Markov models (HMMs) to determine whether Sp-PIP should also belong to this family or not. Taking the EntA_Immun family from Pfam release 22.0 as a starting point we carried out iterative searches using the HMMER package (v2.3.2). After multiple rounds of searching of the HMM against the sequence database (UniProt version 12.5) using an E-value threshold of 0.04 along with careful manual inspection of the resulting matches we were able to detect 172 sequences compared to the 19 sequences found in Pfam release 22.0. The family has been updated in the current release of Pfam -24.0 -and the alignment is now available on the Pfam web-site at: http://pfam.sanger.ac.uk//family/ Pf08951.

Domain analysis
We investigated whether any other protein domains were to be found on any immunity proteins in the EntA_Immun family. Some members of the family do also carry associated domains. One protein contains an Nterminal helix-turn-helix transcriptional regulator (Uni-Prot:Q88Y45), and a group of methionine sulfoxide reductases from Enterococcus faecium (UniProt:Q3Y319) and closely related species carry a C-terminal PMSR domain -peptide methionine sulfoxide reductase (Pfam:PF01625) -as well as the N-terminal immunity protein domain.

Proteome analysis
Sp-PIP belongs to a large family of putative bacteriocin immunity proteins. To investigate the possible targets of the presumed immunity protein we carried out an analysis of the proteomes of several strains of S. pyogenes, including M1 GAS, MGAS10750, and MGAS8232, to try to identify any complementary bacteriocins and immunity proteins. We searched the proteomes with HMMs of both the known class I, antibacterial18 (Pfam:PF08130), and class II bacteriocins, bacteriocin_II (Pfam:PF01721),

Discussion
The four-helix bundle structure of Sp-PIP was found to be most closely related to those of a number of pediocin-like IIa immunity proteins. Three subgroups of pediocin-like IIa immunity proteins have been defined on the basis of common sequence motifs and phylogenetic analysis [17], being denoted as groups A, B and C, as shown in Figure 3: • Group A: EntA-im, the enterocin A immunity protein, [20]; PDB ID:2bl8 • Group B: PisI, the piscicolin 126 immunity protein [22]; PDB ID: 2k19 • Group C: ImB2, the carnobacteriocin B2 immunity protein [19]; PDB ID:1tdp The structural comparison results suggest that Sp-PIP is most closely related to Imb2, from subgroup C. These results suggest that Sp-PIP is highly likely to function as a bacteriocin immunity protein with similarities to proteins from group C.
Sequence analysis reveals that Sp-PIP belongs to the now expanded Pfam family of immunity proteins EntA_Immun that also includes pediocin-like immunity proteins from groups A, B and C as detailed above, thus confirming its evolutionary link to known bacteriocin immunity proteins. A search for other immunity proteins of a similar nature in the S. pyogenes genome identified another gene locus encoding a protein matching the Alignment of representative immunity proteins EntA_Immun family. Both of these loci were seen in all strains examined. Unlike the case in many other species, in S. pyogenes these bacteriocin immunity proteins were not found closely linked to their bacteriocins, a condition that might be explained by the remnants of a transposon identified between them.
Functions for proteins may often be inferred through consideration of the combinations of the functional domains found on them. A number of domain-pairings were identified on some of the immunity family members, such as an N-terminal helix-turn-helix transcriptional regulator on one protein, which might be suggestive of immunity proteins acting as transcriptional regulators or of interacting with transcription regulators and modulating their function. A group of methionine sulfoxide reductases are found to carry both a C-terminal PMSR domain and the immunity domain. The PMSR domain is involved in reversing protein inactivation by oxidation of methionine residues. However, it is not clear what, if any, functional significance these domain-pairings might have.
The search for potential bacteriocin targets for this putative immunity protein took in classes I, II and III. Our analyses identified that S. pyogenes contains a class I lantibiotic system composed of one unique lantibiotic type A sequence called srtA and just one unique immunity protein srtI. These two genes are closely associated on the genome, but are separated from the proteins of the class II system described below as well as from the presumed immunity proteins. This suggests that the Sp-PIP is unlikely to act as an immunity protein for class I bacteriocins. Class II bacteriocins are subdivided into classes IIa, IIb, IIc and IId. We did not find any bacteriocins of class IIa encoded by the S. pyogenes genome, which would have the well-conserved N-terminal YGNGVxCxxxC motif [30,31]. Searches were not made for bacteriocins of types IIc or IId since the immunity proteins complementary for these two groups identified so far, gassericin A [32] and circularin A, [33] do not share sequence or size similarity with those of the pediocin-like immunity proteins. The four putative class IIb proteins we identified from a short genomic region, and named pyogenecins 1-4, are found as two tandem pairs of genes separated by a gene in the antisense orientation that encodes a transposase pseudogene. The presence of this pseudogene suggests that this region might once have been part of a mobile genetic element. Finding bacteriocins in pairs is characteristic of the class IIb bacteriocins [34]. As with other class IIb bacteriocins, these pyogenecins have a conserved GG leader peptide, as well as the conserved GXXXG motif necessary for helix-helix interaction between the two proteins, as illustrated in Figure 2b. All four putative pyogenecins are found to fall into the Pfam Bacteriocin_IIc family (Pfam:PF10439), a family of bacteriocins secreted by streptococcal and other LAB species.
Based on the sequence similarity to other two-peptide bacteriocins, the genomic arrangement of the genes found, and the identification of two immunity proteins, we hypothesize that the putative pyogenecins 1 and 2 associate to form one active bacteriocin and pyogenecins 3 and 4 associate to form a second active bacteriocin possibly targeting a different bacterial species. These peptides need now to be synthesized to allow testing of their biological activity in a suitable bacteriocin-assay. Given that we find two immunity proteins we suggest that each one is specific for one of the putative bacteriocin pairs, Pyogenecin1/Pyogenecin2 and Pyogenecin3/ Pyogenecin4. There is also the possibility that one or both of these immunity proteins might be an orphan immunity protein [17], so-called because no cognate bacteriocin in the genome has been identified; and it has been pointed out that most other immunity proteins so far determined for IIa two-peptide bacteriocins are small trans-membrane proteins closely coupled transcriptionally with their bacteriocins [9]. However, it would appear that the sequences of only a handful of two-peptide immunity proteins have been deduced and neither are there defined structures for any of these nor are most of them more than putative immunity proteins, the designation being based only on the presence of predicted transmembrane helices [35][36][37], so comparison with the system we propose here is difficult. The exact method by which immunity proteins protect their host cell or interact with the bacteriocins is also unclear, but it may be that the orphan immunity proteins do confer resistance to other bacteriocins in addition to their cognate bacteriocin [17,9]. It is possible of course that many more two-peptide-like systems remain still to be characterized, as there are a large number of additional LAB peptides in the Pfam family Bacteriocin_IIc PF10439 which are "pairs" in the sense that they are expressed from closely adjacent genes and for which a cognate immunity protein has not yet been identified.

Conclusions
The structure of the putative immunity protein Sp-PIP protective against potential class IIb two-peptide bacteriocins could be the first structural representative of this class. A combined structural, genomic and proteomic analyses has allowed the identification and in silico characterization of a new putative immunity protein from S. pyogenes. Structurally similar proteins seem to provide immunity protection for single peptide IIa class bacteriocins. Further biological and biochemical studies are needed to verify the antibacterial activity of the putative pyogenecins and to determine the degree of resistance and cross-resistance provided by the proposed immunity protein.

Protein cloning, expression and purification
The ORF from the Spy_2152 locus was amplified from genomic S. pyogenes M1 DNA with KOD DNA polymerase using conditions and reagents provided by the vendor (Novagen, Madison, WI). The gene was cloned into the pMCSG7 vector using a modified ligation-independent cloning protocol [38] and over-expressed in E. coli BL21 (DE3) -Gold (Stratagene) harboring an extra plasmid encoding three rare tRNAs (AGG and AGA for Arg, ATA for Ile). The pMCSG7 vector bearing a tobacco etch virus (TEV) protease cleavage site creates a construct with a cleavable His 6 -tag fused onto the N-terminus of the target protein and adds three artificial residues (Ser-Asn-Ala) on that end. The cells were grown using SeMet-containing enriched M9 medium and conditions known to inhibit methionine biosynthesis. The cells were grown at 37°C to an OD 600 of ~0.6 and protein expression induced with 1 mM isopropyl-1-thio-β-D-galactopyranoside (IPTG). After induction, the cells were grown overnight with shaking at 20°C. The harvested cells were re-suspended in lysis buffer buffer (50 mM HEPES pH 8.0, 500 mM NaCl, 10 mM imidazole, 10 mM β-mercaptoethanol, and 5% v/v glycerol) in the presence of lysozyme (1 mg/mL) and of protease inhibitor cocktail (100 μL per 2 g of wet cells), and then kept on ice for 20 min before sonication. Then the cells were lysed using sonication. The lysate was clarified by centrifugation at 30,000 × g (RC5C-Plus centrifuge, Sorval) for 20 min, followed by filtration through 0.45 μm and 0.22 μm in-line filters (Gelman). The lysate was then applied to a 5 mL HiTrap Ni-NTA column on the AKTAxpress (GE Health Systems). His 6 -tagged protein was eluted using buffer containing a higher concentration of imidazole (500 mM NaCl, 5% glycerol, 50 mM HEPES, pH 8.0, 250 mM imidazole, 10 mM 2-mercaptoethanol), and the His 6 -tag was cleaved from the protein by treatment with recombinant His 6 -tagged TEV protease. A second Ni-NTA affinity chromatography was performed manually to remove the His 6 -tag and His 6 -tagged TEV protease. A total of 90 mg of protein was purified for finding crystallization conditions. Then the protein was dialyzed in 20 mM Tris-HCl pH 7.1, 50 mM NaCl, 2 mM dithiothreitol (DTT) and concentrated to 110 mg/mL using a Centricon Plus-20 Centrifugal Concentrator (Millipore). The standard purification protocol has been thoroughly described previously [39].

Protein crystallization
The initial crystallization conditions were searched using the sitting drops vapor diffusion method at 18°C with the help of the Mosquito crystallization workstation (TTP Labtech) with Index (Hampton Research) and Wizard I and II (Emerald Biostructures) crystallization screens. Crystals suitable for X-ray experiment were obtained at the initial stage of crystal screening, the crystallization condition number 23 of Wizard II within several days. Crystals were flash-frozen in liquid nitrogen with crystallization solution containing 25% (v/v) glycerol as cryoprotectant prior to data collection.

Data collection and structure determination
Anomalous diffraction data were collected at the selenium peak from crystals of a SeMet-substituted protein. The data sets were collected on an ADSC quantum Q315 charge-coupled device-detector at 100 K on the Structural Biology Center beamline 19ID at the Advanced Photon Source, Argonne National Laboratory. The space group was found to be C2, with cell parameters of a = 76.09 Å, b = 30.28 Å, c = 35.86 Å, α = 90.00°, β = 113.26°, and γ = 90.00°. The diffraction data were processed using the HKL3000 suite of programs [40].
All procedures for single wavelength anomalous dispersion (SAD), phasing, phase improvement by density modification, and initial protein model building were done using the structure module of the HKL3000 software package. The mean figure of merit of the phase-set was 0.207 for 50-2.25 Å data and improved to 0.542 after density modification (DM). The autotracing Arp/wArp module [41] in HKL3000 built 77 out of 102 residues with fitted sequence. The initial model was rebuilt with the program COOT [42] using electron density maps based on DM-phased reflection file. After each cycle of rebuilding, the model was refined using REFMAC5 [43] from the CCP4 suite with TLS refinement. The stereochemistry of the structure was checked with PROCHECK [44]. Atomic coordinates and experimental structure factors of Sp-PIP have been deposited with the PDB database and are accessible under the code 2fu2.