- Research article
- Open Access
Exon 6 of human JAG1 encodes a conserved structural unit
BMC Structural Biology volume 9, Article number: 43 (2009)
Notch signaling drives developmental processes in all metazoans. The receptor binding region of the human Notch ligand Jagged-1 is made of a DSL (Delta/Serrate/Lag-2) domain and two atypical epidermal growth factor (EGF) repeats encoded by two exons, exon 5 and 6, which are out of phase with respect to the EGF domain boundaries.
We determined the 1H-NMR solution structure of the polypeptide encoded by exon 6 of JAG1 and spanning the C-terminal region of EGF1 and the entire EGF2. We show that this single, evolutionary conserved exon defines an autonomous structural unit that, despite the minimal structural context, closely matches the structure of the same region in the entire receptor binding module.
In eukaryotic genomes, exon and domain boundaries usually coincide. We report a case study where this assertion does not hold, and show that the autonomously folding, structural unit is delimited by exon boundaries, rather than by predicted domain boundaries.
The Notch signaling pathway is a highly connected and tightly regulated signal transduction network that drives developmental processes in all metazoans. Notch signaling controls cell lineage decisions in tissues derived from all three primary germ lines: endoderm, mesoderm, and ectoderm thus playing an essential role in organogenesis [1–3].
Both receptors and ligands are membrane-bound proteins, which normally restricts signaling to adjacent cells. Jagged-1, one of the five Notch ligands identified in man, is a single pass type I membrane protein with a large extracellular region made of a poorly characterized N-terminal region, a DSL (Delta/Serrate/Lag-2) domain, a series of 16 epidermal growth factor (EGF) tandem repeats, and a cysteine-rich juxtamembrane region (Figure 1). The DSL domain, together with the first two atypical EGF repeats constitutes Jagged-1 receptor binding region [4, 5].
We previously showed  that a peptide corresponding to EGF2 of human Jagged-1 (residues 263–295) cannot be refolded in vitro in the standard oxidative folding conditions used for other EGFs. As exon 6 of the JAG1 gene encodes not only EGF2 but also part of EGF1, we speculated that exon 6 might encode an autonomously folding unit. We thus prepared a longer peptide encompassing the C-terminal part of EGF1 and the entire EGF2 (Figure 1). This peptide, J1ex6 (residues 252–295), could be readily refolded in vitro and was shown to yield a folded unit with a disulfide bond topology typical of EGF repeats . We concluded that exon 6 encodes an autonomously folding unit, but whether the N-terminal overhang is only required for folding, acting as an internal chaperone in the reshuffling of disulfide bonds, or it is integral part of a structural unit encompassing the EGF1 C-terminal region and EGF2 remained an open issue.
We report here the solution structure of J1ex6 determined by 1H-NMR spectroscopy and demonstrate that exon 6 actually defines an EGF-like structural unit with an additional disulfide-linked loop in the N-terminal overhang. We show that the structure of this unit, in spite of the minimal structural context, is very close to the conformation of the same region in a larger construct comprising the DSL and the first three EGF repeats, for which the crystal structure has been recently determined . The exon/intron organization of this region is very well conserved in this class of Notch ligands, which leads us to speculate on the evolution of this structurally peculiar and functionally relevant region.
The solution structure of J1ex6 was determined by 1H NMR spectroscopy (PDB: 2KB9) (Table 1, Additional files 1 and 2). Disulfide bonds were experimentally determined by targeted proteolysis and MS analysis in a three-step strategy that lead to the unambiguous assignment of the disulfide topology, and they were explicitly used in structure calculations as distance constraints. The overall fold of J1ex6 is mainly dictated by the four disulfide bonds and lacks well defined secondary structure elements, as well as a true hydrophobic core (Figure 2). The mean pairwise RMSD values for the backbone and all heavy atoms (in parenthesis) are 1.04 ± 0.24 Å (1.65 ± 0.30 Å) from the first to the last half-cystine (residues C253–C264), 1.16 ± 0.39 Å (2.11 ± 0.52 Å) for the N-terminal overhang (residues 265–293), and 0.71 ± 0.24 Å (1.10 ± 0.26 Å) for the core EGF2 repeat (residues C265–C293). The distribution of psi/phi angles in the Ramachandran map for the 20 selected models is: 50.2% in most favored regions, 47.0% in additionally allowed regions, 2.9% in generously allowed regions, and 0.0% in disallowed regions. Whereas the availability of heteronuclear NMR data would have probably improved the precision of the models, the results for the psi/phi distribution are in line with the statistics for a set of 49 NMR structures of single and tandem EGF repeats deposited at the PDB (data not shown). It is thus possible that the sub-optimal distribution of psi/phi angles in the Ramachandran map of EGF repeats is a consequence of the constraints dictated by the disulfide bonds.
It was proposed that EGF domains can be divided in two structural groups, human EGFs (hEGF) and C1r-like EGFs (cEGF), depending on the location of the last half-cystine in the structure . Using the ANBNACBCCNCC annotation to describe the disulfide bond topology, where ANAC, BNBC, CNCC are the three disulfides, these two groups also display different lengths of the CN-CC loop, of the BN-AC loop, and of the linker connecting two EGFs of the same type. A comparison between different spacings in J1ex6 and in a set of 56 EGFs of known structure (see Additional file 3) shows that J1ex6 can be clustered together with the hEGFs for certain characteristics, such as the length of the CN-CC loop (8 residues), while for others it clusters neither with cEGFs nor with hEGFs. Notably, the BN-BC loop (10 residues) is shorter than in cEGFs (most frequently 12–13 residues) and in hEGFs (14 residues or more), as well as the total spacing between the first and the last half-cystine (AN-CC loop, 27 residues vs. 30 or more in other EGFs) and the linker between EGF1 and EGF2 (2 residues, vs. 5 or 6 in cEGFs and hEGFs, respectively). Overall, this makes J1ex6 rather more constrained than cEGFs and hEGFs. An exhaustive search of structural databases with the J1ex6 structure did not produce any hit with a significant score.
Surprisingly, the N-terminal overhang was found to be conformationally restrained and packs onto the following EGF2 unit. The interaction between the N-terminal overhang and the EGF2 repeat is mediated by a series of hydrophobic residues (Y255, W257 in EGF1; I266, P279, W280 in EGF2). This suggests that, even in solution, the EGF1-2 module is quite rigid.
Recently, the crystal structure of the region encompassing the DSL and the first three EGF repeats of Jagged-1 has been reported  (PDB: 2VJ2). A comparison between the solution structure of J1ex6 and the structure of the same region in the X-ray structure shows a good agreement in the tracing of the backbone (see Additional file 4). The backbone RMSD between the X-ray structure and the 20 models of the solution structure varies between 1.97 and 2.71 Å, with an average value of 2.32 ± 0.23 Å, the largest difference being observed in the region 267–271. Interestingly, the conformation of the N-terminal overhang in the solution structure of J1ex6 is very close to that adopted by the corresponding segment in the crystal structure of the DSL/EGF1-3 tandem domains, with an average RMSD of 1.86 ± 0.24 Å for the region 253–265 (Figure 3). These results confirm that exon 6 of the human JAG1 gene is actually encoding a structural unit containing three disulfide bonds with an EGF-like topology and an additional disulfide bond that is N-terminal to the EGF domain, rather than C-terminal as in laminin EGF-like domains.
Furthermore, in the crystal structure of the DSL/EGF1-3 modules , a kink is present between EGF1 and EGF2 in an otherwise linear, rod-like structure (Figure 4). Because this construct crystallized as a dimer with several interchain contacts, it can be questioned if packing forces are responsible for the bending of the chain. On the other hand, the good agreement between the crystal structure and the solution structure, in particular in the N-terminal overhang, and despite the reduced structural context, suggests that the kink is actually a structural feature that might have some functional relevance.
To find out if the dephasing of exon boundaries with respect to predicted domain boundaries in the region comprising these two atypical EGF repeats is accidental, or might underlie some common evolutionary origin, we analyzed the exon/intron organization of human JAG1 orthologues in 26 different species including primates (5), non-primate mammals (15), birds (1), amphibians (1), and fishes (4). The exon/intron arrangement in this region of the JAG1 genes is very well conserved throughout evolution, with a single exon encoding the C-terminal region of EGF1 and the complete EGF2 (see Additional file 5). The extension of this analysis to all homologues of Notch ligands showed that the same exonic organization is found not only in JAG1 but also in the JAG2, DLL1, DLL4, DLK1, and DLK2 gene families, for a total of 112 genes in species varying from fishes to primates, and only three exceptions found, all in lower organisms (see Additional files 6 and 7). Usually, exon 6 (or its equivalent) is flanked by a phase 2 and a phase 1 intron on the 5' and 3' ends, respectively.
Early on in 1978 it was proposed that exons encode "folded protein units", emphasizing the role of a correct folding process to produce functional proteins or domains . Recent advances in genome sequencing, domain classification, and 3D structure determination confirmed this hypothesis: a strong correlation between exon boundaries and predicted domain boundaries has been found in nine eukaryotic genomes, the correlation becoming stronger as the genome complexity becomes higher . Such a high correlation lead to the suggestion that in certain cases exon boundaries can be used to predict domain limits more accurately . In particular, a survey of domain repeats in seven metazoan species showed that there is a very good correspondence between exons and EGF repeats (0.93 exon/repeat on the average) . In the case reported here, exon boundaries do not coincide with the expected EGF domain limits. Although it can be argued that in some instances domain limits cannot be defined precisely, this is not the case of EGF repeats, which are clearly recognizable by a very specific pattern of the three disulfide bonds and by the spacing between half-cystines. In this case study, the overall correspondence is maintained, with exons 5 and 6 encoding EGF1 and 2, but exon and domain boundaries are clearly out of phase, with exon 5 encoding a truncated EGF with only four half-cystines and exon 6 encoding the C-terminal half of EGF1 and the entire EGF2. Furthermore, this peculiar exon/intron organization seems to be well conserved throughout evolution. How can these results be reconciled with the experimental finding that exon 6 of human JAG1 is encoding an autonomously folding and structural unit? Although from the statistical point of view this may be one of the rare instances where the 1:1 correspondence between exons and EGF repeat does not hold, the question remains if this has any structural or functional significance. It is possible that the particular exon structure in this region is dictated by folding and structural requirements. In this specific case, the constraints in the atypically short EGF2 repeat might require the N-terminal extension as an internal chaperone and a docking template to drive the correct folding.
Furthermore, the interface between EGF1 and EGF2 drives the relative orientation of the EGF1-2 tandem repeats and may have a functional role. It was shown that deletion of the DSL domain in a Jagged-1 construct abolishes binding to Notch . Whereas the DSL domain is necessary for binding, it is not sufficient. A construct containing the MNNL region and the DSL domain binds only weakly, while addition of the EGF1-2 restores full binding . Although the structural determinants of the interaction between DSL ligands and Notch receptors are not known in detail yet, the presence of a kink at the interface between EGF1 and EGF2 observed in the crystal structure of the Jagged-1 region comprising the DSL domain and the first three EGF repeats  might not be accidental and may be required for correct binding to Notch receptors. In calcium binding EGFs, which are connected by a fairly long linker, the relative orientation of two adjacent domains is mainly determined by the geometric constraints imposed by the coordination of the calcium ion. In EGF1-2, the same objective is achieved by drastically reducing the length of the linker region and encoding the C-terminal part of EGF1 and EGF2 in a single, conserved exon.
It has been proposed that the DSL domain may have evolved from the truncation of tandemly connected, short EGF domains . In fact, J1ex6 in itself can be viewed as two truncated tandem EGFs, and the sequence and disulfide pattern similarities between the DSL domain and J1ex6 are actually significant (see Additional file 8). One might then ask whether there is any evolutionary relationship between the two or, in other words, if the DSL domain and J1ex6 might have arisen from duplication of a common ancestor followed by divergent evolution and loss of one disulfide linkage in the DSL domain. If this hypothesis is true, one should be able to identify a primitive precursor where either the DSL or J1ex6 is missing. Indeed, we identified the non-canonical Notch ligands DLK1 and DLK2 as hits sharing with JAG1 a high sequence similarity and the same exon organization in the region comprising EGF1 and 2. Interestingly, these proteins lack the DSL domain, and this makes them good candidates as precursors of canonical Notch ligands. However, DLK1 and 2 are found only in vertebrates, and not in more primitive organisms such as nematodes and insects. [Note added in proof: After acceptance of our manuscript, Dr. Anne C. Hart called our attention to a paper recently published by her group in PLOS Biology (6(8):196, 2008) in which it is proposed that the secreted C. elegans protein OSM-11 is a functional ortholog of mammalian DLK1]. Furthermore, the DSL domain is made not only of a cysteine-rich region, but also of a more variable N-terminal region that is usually encoded by the same exon. The genome of the microbal eukaryote Monosiga brevicollis, one of the closest primitive relatives of metazoans, has been recently sequenced and revealed some archetypal features of Notch signaling . Domains that are typical of Notch receptor proteins, such as Notch/Lin, ankyrin, and EGF repeats are already present, although in distinct proteins and not arranged in the same domain architecture as in metazoan Notch proteins, but not homologues of Notch ligands. We were not able as well to detect any homologue of the DSL domain in the genome of M. brevicollis, but we found several hits corresponding to short EGF repeats. In conclusion, currently available data still do not provide strong evidence of an evolutionary relationship between the DSL domain and J1ex6, but are in support of a later appearance of the DSL domain with respect to the short EGF repeats. The unusual exon architecture of the region comprising the EGF1 and EGF2 repeats might have arisen from the insertion of an intron in a common precursor encoding both EGF1 and EGF2, and then conserved during the evolution of metazoans, together with the amino acid sequence.
In eukaryotic genomes, there is an overall very good correspondence between exon boundaries and predicted domain limits [9–11]. We report a case study where this correspondence is not fulfilled, and show that the autonomously folding, structural unit is defined by exon boundaries, rather than by predicted domain boundaries. Although this conclusion cannot be taken as a general rule, this study suggests that, together with domain boundaries and predicted secondary structure, exon boundaries may also be taken into account when designing constructs for structural studies. This option should be carefully considered especially when dealing with protein regions for which no similarity with known domains can be detected. These regions, also called "orphan domains", account for as much as ~15% of the eukaryotic proteomes , while an additional ~30% is made of poorly characterized regions such as those belonging to the Pfam-B families .
J1ex6 (44 amino acid long, corresponding to residues 252–295 of human Jagged-1) was synthesized on solid phase using Fmoc/tBu chemistry as previously described . Cysteine residues were introduced by double coupling as N-α-Fmoc-S-trityl-L-cysteine pentafluorophenyl ester in order to avoid cysteine racemization. All other amino acids were introduced as double couplings using a 4× excess of amino acid (Fmoc-AA/HCTU/DIPEA = 1/1/2). After cleavage/deprotection, the peptide was precipitated with diethylether, washed and freeze-dried. The crude peptide was reduced by TCEP and purified by RP-HPLC on a Zorbax 300SB-C18 semipreparative column. The purified peptide fractions were diluted to a final peptide concentration of 0.1 mg/mL in the degassed refolding buffer (0.25 M Tris-HCl, 2 mM EDTA, 3.7 mM GSH, 3.7 mM GSSG, pH 8) and refolded for 18 hours at 4°C. After acid quenching of the folding reaction with TFA, J1ex6 was purified by RP-HPLC using a Zorbax SB300-C18 column and freeze-dried.
The complete disulfide pattern of the folded peptide was unambiguously determined by targeted proteolysis and MS analysis in three steps. In the first reaction, the purified peptide (160 μg) was dissolved in 250 μL of sodium acetate buffer (50 mM, pH 5.6) containing 5 mM CaCl2 and incubated with trypsin (8 μg) for 18–48 h at 37°C. The reaction mixture was further incubated for 48 h at 37°C in the presence of thermolysin (15 μg). A fragment corresponding to the two-disulfide-bonded core was then isolated by RP-HPLC and subjected to a further proteolysis with proline-endopeptidase (1/20 w/w) for 18 h. At each step, aliquots from the digestion mixtures were desalted by ZipTip C18 (Millipore), mixed (1:1) with MALDI matrix (10 mg/mL HCCA in 75% MeCN/25% H2O/0.1% TFA) and analyzed by MALDI-MS on an Applied Biosystems 4800 TOF/TOF Analyzer operated in reflectron positive ion mode.
The sample for NMR spectroscopy was prepared dissolving the freeze-dried peptide in H2O/D2O (90/10, v/v) for a final sample concentration of ~0.5 mM and adjusting the pH to ~4.5 with NaOH 0.1 N. Limited solubility hampered data acquisition at higher pH values. Spectra were recorded at 298 K on a Bruker Avance operating at a 1H frequency of 800.13 MHz and equipped with a triple resonance, z-axis gradient cryo-probe. 2D NOESY and TOCSY spectra were recorded using 150 ms and 80 ms mixing times, respectively. Additional spectra were recorded on the same sample dissolved in D2O. Data were transformed using X-WinNMR (Bruker) and analyzed using CARA . Chemical shifts were referenced to internal DSS. Assignment of 1H backbone and side-chain resonances was achieved from COSY, TOCSY, and NOESY spectra using standard techniques. Structure calculations were carried out in a completely automated fashion using CYANA 2.1 . Disulfide bonds were explicitly added as distance constraints, with the weight for the upper SG-SG distance set to 10. Distance constraints were derived starting from 922 peaks manually picked in NOESY spectra recorded in H2O/D2O (90/10, v/v) and in D2O, and automatically assigned in a recursive manner within the standard CYANA protocol using 0.030 and 0.040 ppm chemical shift tolerance in the detected and indirect 1H dimensions, respectively. In each calculation round, 100 structures were minimized and 20 models were finally selected according to the target function value. Coordinates were deposited at the PDB (PDB code: 2KB9). Figures were prepared using MOLMOL .
Conservation of exon boundaries in 26 orthologues of human JAG1 retrieved from ENSEMBLE was verified by a BLAST search of the human J1ex6 amino acid sequence over the entire set of translated exons. The same type of search was extended to all homologues of human Jagged-1 for a total of 112 sequences. Sequences were then aligned using CLUSTAL-W.
Hurlbut GD, Kankel MW, Lake RJ, Artavanis-Tsakonas S: Crossing paths with Notch in the hyper-network. Curr Opin Cell Biol 2007, 19: 166–75. 10.1016/j.ceb.2007.02.012
Ehebauer M, Hayward P, Arias AM: Notch, a universal arbiter of cell fate decisions. Science 2006, 314: 1414–5. 10.1126/science.1134042
Bray SJ: Notch signalling: a simple pathway becomes complex. Nat Rev Mol Cell Biol 2006, 7: 678–89. 10.1038/nrm2009
Shimizu K, Chiba S, Kumano K, Hosoya N, Takahashi T, Kanda Y, Hamada Y, Yazaki Y, Hirai H: Mouse jagged1 physically interacts with notch2 and other notch receptors. Assessment by quantitative methods. J Biol Chem 1999, 274: 32961–9. 10.1074/jbc.274.46.32961
Cordle J, Johnson S, Tay JZ, Roversi P, Wilkin MB, de Madrid BH, Shimizu H, Jensen S, Whiteman P, Jin B, et al.: A conserved face of the Jagged/Serrate DSL domain is involved in Notch trans-activation and cis-inhibition. Nat Struct Mol Biol 2008, 15: 849–57. 10.1038/nsmb.1457
Guarnaccia C, Pintar A, Pongor S: Exon 6 of human Jagged-1 encodes an autonomously folding unit. FEBS Lett 2004, 574: 156–60. 10.1016/j.febslet.2004.08.022
Wouters MA, Rigoutsos I, Chu CK, Feng LL, Sparrow DB, Dunwoodie SL: Evolution of distinct EGF domains with specific functions. Protein Sci 2005, 14: 1091–103. 10.1110/ps.041207005
Blake C: Do genes-in-pieces imply proteins-in-pieces? Nature 1978, 273: 267. 10.1038/273267a0
Liu M, Grigoriev A: Protein domains correlate strongly with exons in multiple eukaryotic genomes – evidence of exon shuffling? Trends Genet 2004, 20: 399–403. 10.1016/j.tig.2004.06.013
Liu M, Wu S, Walch H, Grigoriev A: Exon-domain correlation and its corollaries. Bioinformatics 2005, 21: 3213–6. 10.1093/bioinformatics/bti509
Bjorklund AK, Ekman D, Elofsson A: Expansion of protein domain repeats. PLoS Comput Biol 2006, 2: e114. 10.1371/journal.pcbi.0020114
King N, Westbrook MJ, Young SL, Kuo A, Abedin M, Chapman J, Fairclough S, Hellsten U, Isogai Y, Letunic I, et al.: The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans. Nature 2008, 451: 783–8. 10.1038/nature06617
Moore AD, Bjorklund AK, Ekman D, Bornberg-Bauer E, Elofsson A: Arrangements in the modular evolution of proteins. Trends Biochem Sci 2008, 33: 444–51. 10.1016/j.tibs.2008.05.008
Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, et al.: The Pfam protein families database. Nucleic Acids Res 2008, 36: D281–8. 10.1093/nar/gkm960
Keller R: Optimizing the process of Nuclear Magnetic Resonance spectrum analysis and computer aided resonance assignment. In Ph.D. Thesis Nr. 59471. Swiss Federal Institute of Technology (ETH) Zürich; 2005.
Güntert P: Automated NMR structure calculation with CYANA. Methods Mol Biol 2004, 278: 353–78.
Koradi R, Billeter M, Wuthrich K: MOLMOL: a program for display and analysis of macromolecular structures. J Mol Graph 1996, 14: 51–5. 10.1016/0263-7855(96)00009-4
We acknowledge the support of the EU (European Network of Research Infrastructures for Providing Access and Technological Advancements in Bio-NMR) for access to the CERM NMR facility, Sesto Fiorentino (FI), Italy. We thank András Perczel and András Czajlik (Eötvös Loránd University, Budapest, Hungary) for their contribution in the initial phase of the NMR work. We are grateful to Stephan Grzesiek and Navratna Vajpai (Biozentrum, University of Basel, CH) for recording additional NMR spectra.
AP conceived the study, determined the NMR structure, and wrote the manuscript. CG carried out the peptide synthesis, purification, and disulfide bond determination. SD performed the sequence analysis. SP coordinated the study, participated in its design and contributed to draft the manuscript. All authors read and approved the final manuscript.