Exon 6 of human JAG1 encodes a conserved structural unit
© Pintar et al; licensee BioMed Central Ltd. 2009
Received: 01 February 2009
Accepted: 08 July 2009
Published: 08 July 2009
Notch signaling drives developmental processes in all metazoans. The receptor binding region of the human Notch ligand Jagged-1 is made of a DSL (Delta/Serrate/Lag-2) domain and two atypical epidermal growth factor (EGF) repeats encoded by two exons, exon 5 and 6, which are out of phase with respect to the EGF domain boundaries.
We determined the 1H-NMR solution structure of the polypeptide encoded by exon 6 of JAG1 and spanning the C-terminal region of EGF1 and the entire EGF2. We show that this single, evolutionary conserved exon defines an autonomous structural unit that, despite the minimal structural context, closely matches the structure of the same region in the entire receptor binding module.
In eukaryotic genomes, exon and domain boundaries usually coincide. We report a case study where this assertion does not hold, and show that the autonomously folding, structural unit is delimited by exon boundaries, rather than by predicted domain boundaries.
The Notch signaling pathway is a highly connected and tightly regulated signal transduction network that drives developmental processes in all metazoans. Notch signaling controls cell lineage decisions in tissues derived from all three primary germ lines: endoderm, mesoderm, and ectoderm thus playing an essential role in organogenesis [1–3].
We previously showed  that a peptide corresponding to EGF2 of human Jagged-1 (residues 263–295) cannot be refolded in vitro in the standard oxidative folding conditions used for other EGFs. As exon 6 of the JAG1 gene encodes not only EGF2 but also part of EGF1, we speculated that exon 6 might encode an autonomously folding unit. We thus prepared a longer peptide encompassing the C-terminal part of EGF1 and the entire EGF2 (Figure 1). This peptide, J1ex6 (residues 252–295), could be readily refolded in vitro and was shown to yield a folded unit with a disulfide bond topology typical of EGF repeats . We concluded that exon 6 encodes an autonomously folding unit, but whether the N-terminal overhang is only required for folding, acting as an internal chaperone in the reshuffling of disulfide bonds, or it is integral part of a structural unit encompassing the EGF1 C-terminal region and EGF2 remained an open issue.
We report here the solution structure of J1ex6 determined by 1H-NMR spectroscopy and demonstrate that exon 6 actually defines an EGF-like structural unit with an additional disulfide-linked loop in the N-terminal overhang. We show that the structure of this unit, in spite of the minimal structural context, is very close to the conformation of the same region in a larger construct comprising the DSL and the first three EGF repeats, for which the crystal structure has been recently determined . The exon/intron organization of this region is very well conserved in this class of Notch ligands, which leads us to speculate on the evolution of this structurally peculiar and functionally relevant region.
Structure calculation statistics
Sequential (|i - j| = 1)
Medium-range (|i - j| < 5)
Long-range (|i - j| ≥ 5)
Upper limits (number, max value (Å))
Lower limits (number, max value (Å))
vdW ((number, max value (Å))
Deviations from idealized geometry**
Bond lengths, r.m.s. (Å)
Bond angles, r.m.s. (°)
Average pairwise r.m.s. deviation*** (Å)
2.27 ± 0.39
1.31 ± 0.30
It was proposed that EGF domains can be divided in two structural groups, human EGFs (hEGF) and C1r-like EGFs (cEGF), depending on the location of the last half-cystine in the structure . Using the ANBNACBCCNCC annotation to describe the disulfide bond topology, where ANAC, BNBC, CNCC are the three disulfides, these two groups also display different lengths of the CN-CC loop, of the BN-AC loop, and of the linker connecting two EGFs of the same type. A comparison between different spacings in J1ex6 and in a set of 56 EGFs of known structure (see Additional file 3) shows that J1ex6 can be clustered together with the hEGFs for certain characteristics, such as the length of the CN-CC loop (8 residues), while for others it clusters neither with cEGFs nor with hEGFs. Notably, the BN-BC loop (10 residues) is shorter than in cEGFs (most frequently 12–13 residues) and in hEGFs (14 residues or more), as well as the total spacing between the first and the last half-cystine (AN-CC loop, 27 residues vs. 30 or more in other EGFs) and the linker between EGF1 and EGF2 (2 residues, vs. 5 or 6 in cEGFs and hEGFs, respectively). Overall, this makes J1ex6 rather more constrained than cEGFs and hEGFs. An exhaustive search of structural databases with the J1ex6 structure did not produce any hit with a significant score.
Surprisingly, the N-terminal overhang was found to be conformationally restrained and packs onto the following EGF2 unit. The interaction between the N-terminal overhang and the EGF2 repeat is mediated by a series of hydrophobic residues (Y255, W257 in EGF1; I266, P279, W280 in EGF2). This suggests that, even in solution, the EGF1-2 module is quite rigid.
To find out if the dephasing of exon boundaries with respect to predicted domain boundaries in the region comprising these two atypical EGF repeats is accidental, or might underlie some common evolutionary origin, we analyzed the exon/intron organization of human JAG1 orthologues in 26 different species including primates (5), non-primate mammals (15), birds (1), amphibians (1), and fishes (4). The exon/intron arrangement in this region of the JAG1 genes is very well conserved throughout evolution, with a single exon encoding the C-terminal region of EGF1 and the complete EGF2 (see Additional file 5). The extension of this analysis to all homologues of Notch ligands showed that the same exonic organization is found not only in JAG1 but also in the JAG2, DLL1, DLL4, DLK1, and DLK2 gene families, for a total of 112 genes in species varying from fishes to primates, and only three exceptions found, all in lower organisms (see Additional files 6 and 7). Usually, exon 6 (or its equivalent) is flanked by a phase 2 and a phase 1 intron on the 5' and 3' ends, respectively.
Early on in 1978 it was proposed that exons encode "folded protein units", emphasizing the role of a correct folding process to produce functional proteins or domains . Recent advances in genome sequencing, domain classification, and 3D structure determination confirmed this hypothesis: a strong correlation between exon boundaries and predicted domain boundaries has been found in nine eukaryotic genomes, the correlation becoming stronger as the genome complexity becomes higher . Such a high correlation lead to the suggestion that in certain cases exon boundaries can be used to predict domain limits more accurately . In particular, a survey of domain repeats in seven metazoan species showed that there is a very good correspondence between exons and EGF repeats (0.93 exon/repeat on the average) . In the case reported here, exon boundaries do not coincide with the expected EGF domain limits. Although it can be argued that in some instances domain limits cannot be defined precisely, this is not the case of EGF repeats, which are clearly recognizable by a very specific pattern of the three disulfide bonds and by the spacing between half-cystines. In this case study, the overall correspondence is maintained, with exons 5 and 6 encoding EGF1 and 2, but exon and domain boundaries are clearly out of phase, with exon 5 encoding a truncated EGF with only four half-cystines and exon 6 encoding the C-terminal half of EGF1 and the entire EGF2. Furthermore, this peculiar exon/intron organization seems to be well conserved throughout evolution. How can these results be reconciled with the experimental finding that exon 6 of human JAG1 is encoding an autonomously folding and structural unit? Although from the statistical point of view this may be one of the rare instances where the 1:1 correspondence between exons and EGF repeat does not hold, the question remains if this has any structural or functional significance. It is possible that the particular exon structure in this region is dictated by folding and structural requirements. In this specific case, the constraints in the atypically short EGF2 repeat might require the N-terminal extension as an internal chaperone and a docking template to drive the correct folding.
Furthermore, the interface between EGF1 and EGF2 drives the relative orientation of the EGF1-2 tandem repeats and may have a functional role. It was shown that deletion of the DSL domain in a Jagged-1 construct abolishes binding to Notch . Whereas the DSL domain is necessary for binding, it is not sufficient. A construct containing the MNNL region and the DSL domain binds only weakly, while addition of the EGF1-2 restores full binding . Although the structural determinants of the interaction between DSL ligands and Notch receptors are not known in detail yet, the presence of a kink at the interface between EGF1 and EGF2 observed in the crystal structure of the Jagged-1 region comprising the DSL domain and the first three EGF repeats  might not be accidental and may be required for correct binding to Notch receptors. In calcium binding EGFs, which are connected by a fairly long linker, the relative orientation of two adjacent domains is mainly determined by the geometric constraints imposed by the coordination of the calcium ion. In EGF1-2, the same objective is achieved by drastically reducing the length of the linker region and encoding the C-terminal part of EGF1 and EGF2 in a single, conserved exon.
It has been proposed that the DSL domain may have evolved from the truncation of tandemly connected, short EGF domains . In fact, J1ex6 in itself can be viewed as two truncated tandem EGFs, and the sequence and disulfide pattern similarities between the DSL domain and J1ex6 are actually significant (see Additional file 8). One might then ask whether there is any evolutionary relationship between the two or, in other words, if the DSL domain and J1ex6 might have arisen from duplication of a common ancestor followed by divergent evolution and loss of one disulfide linkage in the DSL domain. If this hypothesis is true, one should be able to identify a primitive precursor where either the DSL or J1ex6 is missing. Indeed, we identified the non-canonical Notch ligands DLK1 and DLK2 as hits sharing with JAG1 a high sequence similarity and the same exon organization in the region comprising EGF1 and 2. Interestingly, these proteins lack the DSL domain, and this makes them good candidates as precursors of canonical Notch ligands. However, DLK1 and 2 are found only in vertebrates, and not in more primitive organisms such as nematodes and insects. [Note added in proof: After acceptance of our manuscript, Dr. Anne C. Hart called our attention to a paper recently published by her group in PLOS Biology (6(8):196, 2008) in which it is proposed that the secreted C. elegans protein OSM-11 is a functional ortholog of mammalian DLK1]. Furthermore, the DSL domain is made not only of a cysteine-rich region, but also of a more variable N-terminal region that is usually encoded by the same exon. The genome of the microbal eukaryote Monosiga brevicollis, one of the closest primitive relatives of metazoans, has been recently sequenced and revealed some archetypal features of Notch signaling . Domains that are typical of Notch receptor proteins, such as Notch/Lin, ankyrin, and EGF repeats are already present, although in distinct proteins and not arranged in the same domain architecture as in metazoan Notch proteins, but not homologues of Notch ligands. We were not able as well to detect any homologue of the DSL domain in the genome of M. brevicollis, but we found several hits corresponding to short EGF repeats. In conclusion, currently available data still do not provide strong evidence of an evolutionary relationship between the DSL domain and J1ex6, but are in support of a later appearance of the DSL domain with respect to the short EGF repeats. The unusual exon architecture of the region comprising the EGF1 and EGF2 repeats might have arisen from the insertion of an intron in a common precursor encoding both EGF1 and EGF2, and then conserved during the evolution of metazoans, together with the amino acid sequence.
In eukaryotic genomes, there is an overall very good correspondence between exon boundaries and predicted domain limits [9–11]. We report a case study where this correspondence is not fulfilled, and show that the autonomously folding, structural unit is defined by exon boundaries, rather than by predicted domain boundaries. Although this conclusion cannot be taken as a general rule, this study suggests that, together with domain boundaries and predicted secondary structure, exon boundaries may also be taken into account when designing constructs for structural studies. This option should be carefully considered especially when dealing with protein regions for which no similarity with known domains can be detected. These regions, also called "orphan domains", account for as much as ~15% of the eukaryotic proteomes , while an additional ~30% is made of poorly characterized regions such as those belonging to the Pfam-B families .
J1ex6 (44 amino acid long, corresponding to residues 252–295 of human Jagged-1) was synthesized on solid phase using Fmoc/tBu chemistry as previously described . Cysteine residues were introduced by double coupling as N-α-Fmoc-S-trityl-L-cysteine pentafluorophenyl ester in order to avoid cysteine racemization. All other amino acids were introduced as double couplings using a 4× excess of amino acid (Fmoc-AA/HCTU/DIPEA = 1/1/2). After cleavage/deprotection, the peptide was precipitated with diethylether, washed and freeze-dried. The crude peptide was reduced by TCEP and purified by RP-HPLC on a Zorbax 300SB-C18 semipreparative column. The purified peptide fractions were diluted to a final peptide concentration of 0.1 mg/mL in the degassed refolding buffer (0.25 M Tris-HCl, 2 mM EDTA, 3.7 mM GSH, 3.7 mM GSSG, pH 8) and refolded for 18 hours at 4°C. After acid quenching of the folding reaction with TFA, J1ex6 was purified by RP-HPLC using a Zorbax SB300-C18 column and freeze-dried.
The complete disulfide pattern of the folded peptide was unambiguously determined by targeted proteolysis and MS analysis in three steps. In the first reaction, the purified peptide (160 μg) was dissolved in 250 μL of sodium acetate buffer (50 mM, pH 5.6) containing 5 mM CaCl2 and incubated with trypsin (8 μg) for 18–48 h at 37°C. The reaction mixture was further incubated for 48 h at 37°C in the presence of thermolysin (15 μg). A fragment corresponding to the two-disulfide-bonded core was then isolated by RP-HPLC and subjected to a further proteolysis with proline-endopeptidase (1/20 w/w) for 18 h. At each step, aliquots from the digestion mixtures were desalted by ZipTip C18 (Millipore), mixed (1:1) with MALDI matrix (10 mg/mL HCCA in 75% MeCN/25% H2O/0.1% TFA) and analyzed by MALDI-MS on an Applied Biosystems 4800 TOF/TOF Analyzer operated in reflectron positive ion mode.
The sample for NMR spectroscopy was prepared dissolving the freeze-dried peptide in H2O/D2O (90/10, v/v) for a final sample concentration of ~0.5 mM and adjusting the pH to ~4.5 with NaOH 0.1 N. Limited solubility hampered data acquisition at higher pH values. Spectra were recorded at 298 K on a Bruker Avance operating at a 1H frequency of 800.13 MHz and equipped with a triple resonance, z-axis gradient cryo-probe. 2D NOESY and TOCSY spectra were recorded using 150 ms and 80 ms mixing times, respectively. Additional spectra were recorded on the same sample dissolved in D2O. Data were transformed using X-WinNMR (Bruker) and analyzed using CARA . Chemical shifts were referenced to internal DSS. Assignment of 1H backbone and side-chain resonances was achieved from COSY, TOCSY, and NOESY spectra using standard techniques. Structure calculations were carried out in a completely automated fashion using CYANA 2.1 . Disulfide bonds were explicitly added as distance constraints, with the weight for the upper SG-SG distance set to 10. Distance constraints were derived starting from 922 peaks manually picked in NOESY spectra recorded in H2O/D2O (90/10, v/v) and in D2O, and automatically assigned in a recursive manner within the standard CYANA protocol using 0.030 and 0.040 ppm chemical shift tolerance in the detected and indirect 1H dimensions, respectively. In each calculation round, 100 structures were minimized and 20 models were finally selected according to the target function value. Coordinates were deposited at the PDB (PDB code: 2KB9). Figures were prepared using MOLMOL .
Conservation of exon boundaries in 26 orthologues of human JAG1 retrieved from ENSEMBLE was verified by a BLAST search of the human J1ex6 amino acid sequence over the entire set of translated exons. The same type of search was extended to all homologues of human Jagged-1 for a total of 112 sequences. Sequences were then aligned using CLUSTAL-W.
We acknowledge the support of the EU (European Network of Research Infrastructures for Providing Access and Technological Advancements in Bio-NMR) for access to the CERM NMR facility, Sesto Fiorentino (FI), Italy. We thank András Perczel and András Czajlik (Eötvös Loránd University, Budapest, Hungary) for their contribution in the initial phase of the NMR work. We are grateful to Stephan Grzesiek and Navratna Vajpai (Biozentrum, University of Basel, CH) for recording additional NMR spectra.
- Hurlbut GD, Kankel MW, Lake RJ, Artavanis-Tsakonas S: Crossing paths with Notch in the hyper-network. Curr Opin Cell Biol 2007, 19: 166–75. 10.1016/j.ceb.2007.02.012View ArticlePubMedGoogle Scholar
- Ehebauer M, Hayward P, Arias AM: Notch, a universal arbiter of cell fate decisions. Science 2006, 314: 1414–5. 10.1126/science.1134042View ArticlePubMedGoogle Scholar
- Bray SJ: Notch signalling: a simple pathway becomes complex. Nat Rev Mol Cell Biol 2006, 7: 678–89. 10.1038/nrm2009View ArticlePubMedGoogle Scholar
- Shimizu K, Chiba S, Kumano K, Hosoya N, Takahashi T, Kanda Y, Hamada Y, Yazaki Y, Hirai H: Mouse jagged1 physically interacts with notch2 and other notch receptors. Assessment by quantitative methods. J Biol Chem 1999, 274: 32961–9. 10.1074/jbc.274.46.32961View ArticlePubMedGoogle Scholar
- Cordle J, Johnson S, Tay JZ, Roversi P, Wilkin MB, de Madrid BH, Shimizu H, Jensen S, Whiteman P, Jin B, et al.: A conserved face of the Jagged/Serrate DSL domain is involved in Notch trans-activation and cis-inhibition. Nat Struct Mol Biol 2008, 15: 849–57. 10.1038/nsmb.1457PubMed CentralView ArticlePubMedGoogle Scholar
- Guarnaccia C, Pintar A, Pongor S: Exon 6 of human Jagged-1 encodes an autonomously folding unit. FEBS Lett 2004, 574: 156–60. 10.1016/j.febslet.2004.08.022View ArticlePubMedGoogle Scholar
- Wouters MA, Rigoutsos I, Chu CK, Feng LL, Sparrow DB, Dunwoodie SL: Evolution of distinct EGF domains with specific functions. Protein Sci 2005, 14: 1091–103. 10.1110/ps.041207005PubMed CentralView ArticlePubMedGoogle Scholar
- Blake C: Do genes-in-pieces imply proteins-in-pieces? Nature 1978, 273: 267. 10.1038/273267a0View ArticleGoogle Scholar
- Liu M, Grigoriev A: Protein domains correlate strongly with exons in multiple eukaryotic genomes – evidence of exon shuffling? Trends Genet 2004, 20: 399–403. 10.1016/j.tig.2004.06.013View ArticlePubMedGoogle Scholar
- Liu M, Wu S, Walch H, Grigoriev A: Exon-domain correlation and its corollaries. Bioinformatics 2005, 21: 3213–6. 10.1093/bioinformatics/bti509View ArticlePubMedGoogle Scholar
- Bjorklund AK, Ekman D, Elofsson A: Expansion of protein domain repeats. PLoS Comput Biol 2006, 2: e114. 10.1371/journal.pcbi.0020114PubMed CentralView ArticlePubMedGoogle Scholar
- King N, Westbrook MJ, Young SL, Kuo A, Abedin M, Chapman J, Fairclough S, Hellsten U, Isogai Y, Letunic I, et al.: The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans. Nature 2008, 451: 783–8. 10.1038/nature06617PubMed CentralView ArticlePubMedGoogle Scholar
- Moore AD, Bjorklund AK, Ekman D, Bornberg-Bauer E, Elofsson A: Arrangements in the modular evolution of proteins. Trends Biochem Sci 2008, 33: 444–51. 10.1016/j.tibs.2008.05.008View ArticlePubMedGoogle Scholar
- Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, et al.: The Pfam protein families database. Nucleic Acids Res 2008, 36: D281–8. 10.1093/nar/gkm960PubMed CentralView ArticlePubMedGoogle Scholar
- Keller R: Optimizing the process of Nuclear Magnetic Resonance spectrum analysis and computer aided resonance assignment. In Ph.D. Thesis Nr. 59471. Swiss Federal Institute of Technology (ETH) Zürich; 2005.Google Scholar
- Güntert P: Automated NMR structure calculation with CYANA. Methods Mol Biol 2004, 278: 353–78.PubMedGoogle Scholar
- Koradi R, Billeter M, Wuthrich K: MOLMOL: a program for display and analysis of macromolecular structures. J Mol Graph 1996, 14: 51–5. 10.1016/0263-7855(96)00009-4View ArticlePubMedGoogle Scholar