Crystal structure of the C-terminal globular domain of the third paralog of the Archaeoglobus fulgidus oligosaccharyltransferases

Background Protein N-glycosylation occurs in the three domains of life. Oligosaccharyltransferase (OST) transfers an oligosaccharide chain to the asparagine residue in the N-glycosylation sequons. The catalytic subunits of the OST enzyme are STT3 in eukaryotes, AglB in archaea and PglB in eubacteria. The genome of a hyperthermophilic archaeon, Archaeoglobus fulgidus, encodes three paralogous AglB proteins. We previously solved the crystal structures of the C-terminal globular domains of two paralogs, AglB-Short 1 and AglB-Short 2. Results We determined the crystal structure of the C-terminal globular domain of the third AglB paralog, AglB-Long, at 1.9 Å resolutions. The crystallization of the fusion protein with maltose binding protein (MBP) afforded high quality protein crystals. Two MBP-AglB-L molecules formed a swapped dimer in the crystal. Since the fusion protein behaved as a monomer upon gel filtration, we reconstituted the monomer structure from the swapped dimer by exchanging the swapped segments. The C-terminal domain of A. fulgidus AglB-L includes a structural unit common to AglB-S1 and AglB-S2. This structural unit contains the evolutionally conserved WWDYG and DK motifs. The present structure revealed that A. fulgidus AglB-L contained a variant type of the DK motif with a short insertion, and confirmed that the second signature residue, Lys, of the DK motif participates in the formation of a pocket that binds to the serine and threonine residues at the +2 position of the N-glycosylation sequon. Conclusions The structure of A. fulgidus AglB-L, together with the two previously solved structures of AglB-S1 and AglB-S2, provides a complete overview of the three AglB paralogs encoded in the A. fulgidus genome. All three AglBs contain a variant type of the DK motif. This finding supports a previously proposed rule: The STT3/AglB/PglB paralogs in one organism always contain the same type of Ser/Thr-binding pocket. The present structure will be useful as a search model for molecular replacement in the structural determination of the full-length A. fulgidus AglB-L.


Background
Asparagine-linked glycosylation (N-glycosylation) of proteins is the most ubiquitous post-translational modification in eukaryotes, all archaea, and some eubacteria [1,2]. Oligosaccharyltransferase (OST) catalyzes the transfer of an oligosaccharide chain from a lipid-linked oligosaccharide (LLO) donor to the asparagine residues in the Nglycosylation sequon, Asn-X-Ser/Thr (X ≠ Pro) [3,4]. In higher eukaryotes, OST is a multi-subunit and membraneassociated protein complex, whereas the OSTs from lower eukaryotes, archaea and eubacteria are single-subunit membrane proteins [5,6]. The catalytic subunit of the OST enzyme is the only subunit conserved evolutionally across the three domains of life, and it is referred to as STT3 (Staurosporine and Temperature sensitivity 3) in eukaryotes, AglB (Archaeal Glycosylation B) in archaea, and PglB (Protein Glycosylation B) in eubacteria. The STT3/AglB/ PglB proteins share a common overall architecture, consisting of an N-terminal multi-spanning transmembrane region and a C-terminal globular domain [7][8][9]. Despite the very low overall sequence identity, multiple sequence alignments revealed a few short conserved motifs: two diacidic DXD motifs in the N-terminal transmembrane region, and a well-conserved 5-residue WWDYG motif in the Cterminal globular domain [10][11][12]. We previously determined the crystal structures of the C-terminal globular domains of four AglB proteins and one PglB protein [13][14][15][16]. The structural comparison revealed the common structural unit and the unique structural units specific to each protein. In addition, structure-aided sequence alignment led to the discovery of new short motifs, the DK and MI motifs, based on the fact that the two motifs are located at spatially equivalent positions close to the WWDYG motif [14]. The consensus sequences of the DK and MI motifs are DXXKXXX(M/I) and MXXIXXX(I/V/W), respectively, where X means any amino acid residue. Since the side chains of the signature residues of the two motifs have very different chemical properties (i.e., D↔M and K↔I), the identification of the new motifs would have been almost impossible without reference to the threedimensional structures.
In 2011, the crystal structure of full-length Campylobacter lari PglB, in a complex with an acceptor peptide, was reported at 3.4 Å resolutions [17]. This structure revealed several important features of the STT3/AglB/PglB protein, including 1) the catalytically important acidic residues and a divalent metal ion in the transmembrane region, 2) the putative amide nitrogen activation mechanism of the side chain of the acceptor asparagine residue, and 3) the binding pocket in the C-terminal globular domain that recognizes the serine and threonine residues at the +2 position in the N-glycosylation sequon. The locations of the short amino acid motifs seem to correspond well with these functionally important structures. The conserved acidic residues in the two DXD motifs are involved in divalent ion coordination and amide nitrogen activation. Trp-Trp-Asp part of the conserved WWDYG motif and the second signature residue, Ile, of the MI motif in the PglB protein constitute the Ser/Thr-binding pocket.
Based on the presence of the DK or MI motif, we classified the STT3/AglB/PglB proteins into two groups. All PglB and some AglB proteins contain the MI motif, whereas all STT3 and the remaining AglB proteins contain the DK motif or its variant type [15]. Thus, there are two types of Ser/Thr-binding pockets: the Lys-type and the Ile-type, according to the second signature residue in the DK/MI motif. Mutagenesis studies proved the essential roles of the second signature residue for the enzymatic activity. The substitution of the lysine residue with alanine in yeast STT3 resulted in a lethal phenotype [13] and the substitutions with seven different amino acid residues in P. furiosus AglB-L resulted in the reduction of the in vitro activity [12]. The replacement of the isoleucine residue by alanine also substantially decreased the in vitro activities of the C. jejuni [14] and C. lari PglBs [18].
The genome of the hyperthermophilic archaeon, Archaeoglobus fulgidus, encodes three AglB paralogous genes. We have named the AglB paralogs with a letter plus an optional number, such as L (long) or S1 (short, number 1). The long AglB (AF_0380) consists of 868 residues and is called AfAglB-L, and the other two short AglBs (AF_0329 and AF_0040) consist of 591 and 593 residues, and are called AfAglB-S1 and AfAglB-S2, respectively. AfAglB-S1 and AfAglB-S2 are the shortest among the currently known STT3/AglB/PglB proteins, and they share 68% sequence identity. In contrast, AfAglB-L only shares 25% identities with AfAglB-S1 and AfAglB-S2. It would be interesting to elucidate the distinct and complementary roles of the multiple OST enzymes in one organism. Mammalian cells have two STT3 paralogs, STT3A and STT3B, which form different OST isoforms with the other seven subunits. The STT3A-containing OST isoform is the central player in the co-translational Nglycosylation of the nascent polypeptide chains, and the STT3B-containing OST isoform mediates the co-and post-translational N-glycosylations of unmodified glycosylation sites missed by the STT3A-OST isoform [19]. The protozoan parasite Trypanosoma brucei has three STT3 paralogs, STT3A, STT3B, and STT3C, and the three STT3 proteins constitute the three single-subunit OST enzymes. These enzymes have different specificities for the oligosaccharide moieties of the LLO donors and peptide acceptor sites [20]. For example, STT3A has stricter specificity for a particular type of lipid-linked oligosaccharide donor, and for glycosylation sites flanked by acidic residues, as compared to the other STT3 paralogs. In contrast, little is known about the different roles of the AglB paralogs in archaea.
We previously determined the crystal structures of the C-terminal globular domains of AfAglB-S1 and AfAglB-S2, at 1.75 and 1.94 Å resolutions, respectively [15,16]. In the present study, we determined the crystal structure of the C-terminal globular domain of AfAglB-L, as a fusion with maltose binding protein, at 1.90 Å resolutions. The three structures provide a complete overview of the three AglB paralogs encoded in the A. fulgidus genome. The information about their structural similarities and differences will be helpful to elucidate the distinct roles of the AglB paralogs not only in A. fulgidus, but also in other archaea.

Results and discussion
Crystallization of the C-terminal globular domain of AfAglB-L fused to maltose binding protein The primary sequence of the full-length AfAglB-L protein consists of the N-terminal transmembrane region (499 residues) and the C-terminal globular domain (369 residues). First, we expressed the C-terminal globular domain of AfAglB-L in E. coli cells, and purified large quantities of the soluble protein. Although the protein was crystallized in a reproducible manner, the crystals only diffracted to low resolution. Next, we tried a fusion with E. coli K12 maltose binding protein (MBP), since successful examples of structure determinations using fusion proteins with MBP have been reported [21,22]. We connected MBP to the C-terminal globular domain of AfAglB-L, without a flexible linker sequence between them. This fusion protein is referred to as MBP-sAglB. We added a His tag at the N-terminus of MBP, for Ni-affinity chromatography. Amylose-affinity chromatography was not used for purification, because we wanted to test the apo and maltosebound forms of MBP in the crystallization screening. In fact, MBP-sAglB crystallized only in the apo form in the absence of maltose, and the crystals diffracted to high resolution.

MBP-sAglB forms a swapped dimer in the crystal
MBP-sAglB was crystallized in the monoclinic space group C2, with one monomer per asymmetric unit. The structure was solved to 1.90 Å resolution by the molecular replacement method using the structure of the maltose-free form of MBP [PDB: 1PEB] as the search model ( Figure 1A). The positions of selenium atoms in the anomalous difference Fourier map calculated from the Se-SAD (selenium single-wavelength anomalous diffraction) data of the selenomethionine (SeMet)-substituted MBP-sAglB was helpful to interpret discontinuous electron densities corresponding to the C-terminal domain of AfAglB-L. The final model of MBP-sAglB was refined to R work and R free of 16.4% and 20.2%, respectively. Data collection and refinement statistics are listed in Table 1. In the final model, residues 171 to 180 in MBP were missing, as well as residues 524 to 540 in AfAglB-L. Interestingly, the C-terminal α-helix of MBP and the N-terminal α-helix of AfAglB-L were fused to form a long, continuous αhelical structure, which fixed the relative orientations of MBP and AfAglB-L in the crystal. This rigid connection may have facilitated the crystal growth. Apart from the direct covalent connection, the interactions between the C-terminal domain of AfAglB-L and MBP occurred but appeared minimal. The contact area was as small as 160 Å 2 , where two residues of MBP (K200 and K202) and two residues of AfAglB-L (W656 and D657) were involved. Thus, we concluded that no severe distortion was induced in the structure of the C-terminal domain of AfAglB-L by the extra noncovalent interactions within the fusion protein.
We found that the two molecules of MBP-sAglB exhibited an intertwined structure, related by a 2-fold rotational axis in the crystal ( Figure 1A). This is not surprising, because there are many examples of intertwined structures in crystals [23,24]. The hinge loop region is defined as the segment that links the swapped segment to the rest of the protein. We found that the Asn568-Pro-Phe-Gln-Ala-Gly573 segment in the AfAglB-L portion was the hinge loop region ( Figure 1B). The MBP-sAglB protein eluted as a monomer (ca. 80 kDa) in gel filtration chromatography ( Figure 2). Thus, the formation of the intertwined dimer is a crystallographic artifact and lacks biological significance. We created a monomeric image of the C-terminal globular domain of AfAglB-L, by restoring the swapped segment ( Figure 1C and D). The swapping was performed between A572 and G573, and the restored molecule consists of 73 residues (residues 500-572, magenta) from one molecule, and 296 residues (residues 573-868, cyan) from another molecule related by a 2-fold axis in the crystal. In addition to the common structural unit CC (blue), the C-terminal domains of the most AglB/PglB contain additional unique structural units, IS (green), P1 (yellow), and/ or P2 (red). In contrast, AfAglB-S1 and AfAglB-S2 only consists of the CC unit, indicating the indispensable role of the CC unit for the catalytic activity of the OST enzyme. In accordance with this notion, the CC units of the six structures share overall structural similarity. The CC unit features a mixed α/β fold, and contains the conserved WWDYG and DK/MI motifs ( Figure 1D). The length of the CC unit of AfAglB-L (243 residues) is longer than those of the other five AglB/PglB proteins (152-181 residues). The additional sequence forms three α-helices (α A , α B , and α C ), which is unique among the six structures ( Figure 1D). The IS unit is referred to as an insertion, because it seemed to be inserted into the amino acid sequence of the CC unit. The IS unit is a 9-stranded β-barrel-like structure in PfAglB-L, PhAglB-L, and CjPglB, but in AfAglB-L, it is smaller and differs from the β-barrel-like structure. The unique cluster of the three αhelices in the CC unit appears to substitute for the small IS unit in the AfAglB-L structure. The P1 unit of AfAglB-L is β-sheet rich and occupies a similar spatial position to those in PfAglB-L and PhAglB-L, but the arrangement of the βstrands is also different. Thus, the other structural units besides the CC unit may have special roles in each OST enzyme. For example, they may contribute toward the increased thermal stability of the AglB proteins in the hyperthermophilic archaea, Archaeoglobus and Pyrococcus.

Kinked helix with a short insertion sequence
The DK/MI motif resides on the characteristic kinked helix in the CC structural unit (Figure 3). The kinked helix consists of the N-terminal α-helical half and the Cterminal 3 10 -helical half. In our previous studies, we found that the AfAglB-S1 and AfAglB-S2 structures both contained an insertion sequence at the junction site of the two helical structures ( Figure 4A and B). We concluded that the DK/MI motif of the two AfAglB proteins was a variant type of the DK motif with an insertion. Since this unexpected insertion separated the first and second signature residues of the DK motif in the primary structure, the identification of the variant type of DK motif would have been almost impossible without reference to the three-dimensional structures ( Figure 4C). The consensus sequence of the variant type of DK motif was defined as E<>KXXX(M/I/P), where <> denotes the inserted sequence with a variable length [15]. In spite of the existence of abundant information, we could not clearly identify the DK/MI motif of AfAglB-L, due to the presence of redundant acidic residues in this region. By reference to the present AfAglB-L structure, we concluded that the kinked helix of AfAglB-L contained a four-residue insertion between the first and second signature residues, Glu613 and Lys618, respectively, of the variant type of DK motif. The spatial arrangement of the signature residues in AfAglB-L is identical to those in AfAglB-S1 and AfAglB-S2 (green side chains in Figure 4A and B).

The Ser/Thr-binding pocket in various AglB and PglB structures
We focused our attention on the Ser/Thr-binding pocket in the CC unit. The Ser/Thr-binding pocket was first identified in C. lari PglB, in a complex with a substrate peptide ( Figure 4A). The canonical structure of the Ser/ Thr pocket was also found in the previously determined AfAglB-S2 structure and the present AfAglB-L structure ( Figure 4A). Since both of the Archaeoglobus AglB proteins were crystallized in the absence of peptide substrates, we concluded that the Ser/Thr pocket was formed prior to peptide binding. The PglB protein contains the MI motif, whereas the two Archaeoglobus AglB proteins contain the DK motif. Thus, we also concluded that the Ser/Thr-binding pocket is a functional structure present in all of the OST enzymes, independently of the DK or MI motif.
In contrast, AfAglB-S1 has a deformed structure of the Ser/Thr-binding pocket [15]. The side chain of the tyrosine residue in the WWDYG motif protrudes in a different direction, and the α-helix following the WWDYG motif (pink) is also oriented differently ( Figure 4B). Indeed, we also found considerable conformational variation of the WWDYG motif in PfAglB-L and PhAglB-L [16]. We inferred that this phenomenon suggested the remarkable plasticity of the WWDYG motif, and hence the flexibility of the Ser/Thr pocket. Indeed, the dynamic nature of the WWDYG motif and the following α-helix was confirmed in an NMR relaxation study of the C-terminal domain of AfAglB-S2 in the absence of substrates [16]. We speculate that the transient collapse of the Ser/Thr pocket must occur during the catalytic cycle, although the Ser/Thr pocket in the C-terminal domain has a canonical structure in the resting state and a peptide-bound state, as represented by AfAglB-L, AfAglB-S2, and CjPglB, in the absence, and by ClPglB, in the presence of a substrate   The TM region, which was not included in the structure determination, is outlined in gray. The CC unit is colored blue, IS is green, P1 is yellow, and P2 is red. The characteristic kinked helix in the CC unit is colored light brown, as a landmark for comparison.
acceptor peptide, respectively. The necessity of multiple conformational states in the enzymatic activity was suggested by a biochemical experiment using PfAglB-L, in which the flexibility restriction forced by an engineered disulfide bond abolished the enzymatic activity, but its cleavage fully restored the activity [16]. Interestingly, in the crystal structure of MBP-sAglB, the domain swapping site is located in the segment corresponding to the flexible region identified in AfAglB-S2 [16].

Conclusions
We have determined the crystal structure of the Cterminal globular domain of one of the three oligosaccharyltransferases in the hyperthermophilic archaeon, Archaeoglobus fulgidus (Figure 1). The crystallization of the fusion protein with MBP afforded high quality protein crystals. The C-terminal domain of AfAglB-L consists of three structural units, CC, IS, and P1 ( Figure 3). Multiple sequence alignments in the region corresponding to the kinked helix in the CC unit were particularly difficult in the archaeal classes Archaeoglobi, Halobacteria and Methanomicrobia, due to the vast sequence diversity and abundance of acidic residues. The present AfAglB-L structure, together with the previously deter-mined AfAglB-S1 and AfAglB-S2 structures, provided the structure-guided sequence alignment, which indicated that all of the AglB paralogous proteins in these archaeal classes have the kinked helix, with inserted sequences of variable lengths ( Figure 4C). The insertion sequences allow the spatial arrangement of the three signature residues of the variant type of DK motif to superimpose onto those of the canonical type of DK motif, found in STT3 and most AglBs

AfAglB-L
AfAglB-S1  ( Figures 4A and B). The finding also supported a previously proposed rule: The catalytic subunits of the OST enzymes in one organism always contain the same type of DK/MI motif, and thus the same type (either the Ile-type or Lys-type) of Ser/Thr-binding pocket [15]. This information will be useful to understand the distinct and complementary roles of the two to four STT3/AglB paralogs coexisting in one organism.

Protein expression and purification
The DNA sequence of the C-terminal globular domain (residues 500-868) of AfAglB-L [UniProt/TrEMBL: O29 867_ARCFU, AF_0380] was amplified by PCR from the genomic DNA, and that of maltose binding protein (MBP, residues 1-366) was amplified from the plasmid pMAL-c5x (New England Biolabs). The two DNA fragments were combined by the SOEing PCR method [25]. The final PCR product was cloned into SmaI-XhoI digested pET-47b (Novagen), using an In-Fusion Advantage PCR Cloning Kit (Clontech). The resultant fusion protein contained an N-terminal His 6 tag. The expression plasmid was transformed into E. coli BL21 Gold (DE3) cells (Stratagene). The E. coli cells were grown at 310 K in LB media and selenomethionine core medium (Wako) for the production of native and SeMet derivative proteins, respectively, supplemented with 50 mg L -1 L-selenomethionine (Nacalai Tesque) and 30 mg L -1 kanamycin. When the A 600 reached 0.6, isopropyl-1-thio-β-D-thiogalactopyranoside was added to a final concentration of 0.5 mM. After overnight induction at 289 K, the cells were harvested by centrifugation. The cell pellets were suspended in TS buffer (50 mM Tris buffer, pH 8.0, 100 mM NaCl) and disrupted by sonication. The recombinant protein was purified by affinity chromatography on nickel Sepharose High Performance resin (GE Healthcare), and the N-terminal His tag was removed by 3C protease, leaving a Gly-Pro-Gly extension at the N-terminus. The cleaved protein was further purified by gel filtration chromatography, using a Super-dex200 10/300GL column (GE Healthcare) in TS buffer. The eluted protein was desalted and concentrated with an Amicon Ultra-15 centrifugal filter unit (Millipore, 100 kDa NMWL) to 40 mg mL -1 in 10 mM Tris-HCl buffer, pH 8.0, for crystallization.

Crystallization
Initial crystallization screening was performed by the sitting drop vapor diffusion method, using the Index crystallization screen (Hampton Research), JCSG + Suite (Qiagen), PACT (Qiagen), and Classics (Qiagen) kits. After optimization, the native and SeMet crystals grew from a hanging drop with a 1:1 volume ratio (total volume, 2 μl) of the protein stock solution (40 mg mL -1 , 10 mM Tris-HCl, pH 8.0) and the reservoir solution (0.1 M CAPSO buffer, pH 9.4, 33% polyethylene glycol 3350) at 293 K. Crystals were picked up with a nylon loop (Hampton Research), and then were directly cryocooled in liquid nitrogen. It was unnecessary to add any cryoprotectants, due to the high concentration of polyethylene glycol.
Data collection, structure determination, and refinement X-ray diffraction data were collected at beam line BL44XU of SPring-8 (Harima, Japan), and processed using the program HKL2000 [26] to the resolutions of 1.90 Å and 2.30 Å for native and SeMet crystals, respectively. Molecular replacement and the phase improvement with solvent flattening were performed using the program MR-Rosetta [27]. The initial electron density map of the native data set obtained by the molecular replacement using the structure of the maltose-free form of MBP [PDB: 1PEB] as the search model showed discontinuous electron densities in the region corresponding to the C-terminal domain of AfAglB-L. Then, we calculated the electron density map using phases obtained by molecular replacement combined with SAD phasing, but the quality of the electron density map did not improve significantly. Further manual model rebuilding was performed with the program COOT [28], and subsequent crystallographic refinement was performed with the program PHENIX [29]. Fortunately, the positions of nine selenium atoms in selenomethiones in the Cterminal domain of AfAglB-L were clearly visible in the anomalous difference Fourier map calculated from the Se-SAD data set. The superposition of the coordinates of AfAglB-S1 [PDB: 3VGP] and AfAglB-S2 [PDB: 3VU0] onto the partially built model of the N-terminal α-helix of the C-terminal globular domain of AfAglB-L correctly placed the AfAglB-S1 and -S2 structures in the electron density maps. Because the CC unit is common in all AglB and PglB proteins, the superposed structures guided the manual model building and refinements to obtain the final model of the C-terminal domain of AfAglB-L to a resolution of 1.90 Å (Figure 1). The asymmetric unit contained one protein molecule. The calculated solvent content was 44.1% (V M = 2.20 Å 3 Da -1 ). Data collection and refinement statistics are summarized in Table 1. The atomic coordinates of MBP-sAglB have been deposited in the Protein Data Bank, with the accession code 3WAI.
The figures were generated with the PyMOL Molecular Graphics System, Version 1.3, (Schrödinger, LLC). The multiple sequence alignment was performed with the program MAFFT [30].