On the origin of the histone fold
© Alva et al; licensee BioMed Central Ltd. 2007
Received: 03 November 2006
Accepted: 28 March 2007
Published: 28 March 2007
Histones organize the genomic DNA of eukaryotes into chromatin. The four core histone subunits consist of two consecutive helix-strand-helix motifs and are interleaved into heterodimers with a unique fold. We have searched for the evolutionary origin of this fold using sequence and structure comparisons, based on the hypothesis that folded proteins evolved by combination of an ancestral set of peptides, the antecedent domain segments.
Our results suggest that an antecedent domain segment, corresponding to one helix-strand-helix motif, gave rise divergently to the N-terminal substrate recognition domain of Clp/Hsp100 proteins and to the helical part of the extended ATPase domain found in AAA+ proteins. The histone fold arose subsequently from the latter through a 3D domain-swapping event. To our knowledge, this is the first example of a genetically fixed 3D domain swap that led to the emergence of a protein family with novel properties, establishing domain swapping as a mechanism for protein evolution.
The helix-strand-helix motif common to these three folds provides support for our theory of an 'ancient peptide world' by demonstrating how an ancestral fragment can give rise to 3 different folds.
The organization of DNA into chromatin allows its compact and reversible packaging into the nucleus of a eukaryotic cell. The basic structural unit of chromatin is the nucleosome , which consists of 146 base pairs of double-stranded DNA wrapped around an octameric histone core complex . The core complex is composed of two copies of each of the histone proteins H2A, H2B, H3, and H4, organized as a central (H3-H4)2 tetramer flanked by two H2A-H2B dimers . Despite low sequence similarity, all core histone subunits share a common fold; they are composed of three helices separated by two short strap loops and assemble into heterodimers by interleaving the helices into the 'handshake motif' and juxtaposing the strap loops into short parallel β-bridges . This fold may have arisen through the duplication of a primordial helix-strand-helix motif [4, 5], consistent with the hypothesis that folded proteins arose by the combination of subdomain-sized peptides, the so-called antecedent domain segments [6–8].
Archaea also wrap their DNA into nucleosome-like structures ; their constituent histone subunits assemble into tetramers, which may reflect an ancestral form of the central part of the eukaryotic nucleosome octamer, the (H3-H4)2 tetramer . Archaeal histone subunits are occasionally duplicated on a single polypeptide chain , a form observed in eukaryotes only in the histone-like domain of the son of sevenless protein .
Bacteria also have nucleoid proteins with histone-like properties , but these belong to a different, unrelated fold. However, a homolog of archaeal single-chain histones was recently reported from the bacterium Aquifex aeolicus (1R4V) . Further homologs appear in the genomes of a few, phylogenetically diverse bacteria. It thus seems likely that the histone fold originated in the common ancestor of eukaryotes and archaea and spread into some bacteria through lateral gene transfer.
In an all-against-all application of HHsearch  to the SCOP database (JS, unpublished results) we found an evolutionary relationship between histone proteins and the helical part of the extended AAA+ ATPase domain, the C-domain [16, 17]. Based on this finding, we used sequence and structure comparisons to reconstruct in detail the evolutionary events that may have shaped the histone fold. Our results point to a common origin not only with the C-domain but also with the N-terminal substrate recognition domain of Clp/Hsp100 proteins . The conserved element is a helix-strand-helix motif, which we propose gave rise divergently to these three different folds and thus represents an antecedent domain segment.
Homology between proteins is typically inferred from similarities in sequence and structure. Sequence similarity is the primary criterion for deducing a common origin, but for distant evolutionary events, sequences may have diverged beyond our ability to detect their relatedness. Structures diverge much more slowly and their similarity is therefore often used to identify such distant events. However, similar structures may have arisen convergently from different origins and their similarity thus frequently does not provide conclusive evidence of common ancestry. In this study we applied a new, highly sensitive method for sequence comparison based on profile Hidden Markov Models (HMMs) to identify distant homologs of histones on the basis of sequence similarity alone. Subsequently, we validated our findings through structure comparisons.
We found two high-scoring matches with other folds. These are an alanyl tRNA synthetase (1RIQ, a.203.1.1, identified by the histone entry 1JFI), and the zeta subunit of a plasmid maintenance system (1GVN, c.37.1.21, identified by two C-domains: 1LV7 and 1R7R). Subsequent analysis could not confirm these matches as homologs.
Analysis of sequence and structure conservation
The surprising aspect of these findings is that histones, C-domains and Clp N-domains belong to three different folds (Fig. 2A–C). Histones are dimeric, interleaved helical bundles, as described in the Background section. C-domains are four-helix bundles composed of two consecutive helix-strand-helix motifs . Clp N-domains, finally, are multihelical domains formed by the repetition of a 4-helical motif . Although these three protein families have different topologies, they all incorporate two copies of the helix-strand-helix motif, which engages in the formation of a short parallel β-bridge. In the histone dimer, the β-bridge is formed by the association of one helix-strand-helix motif from each monomer, in the C-domain by the association of the two motifs consecutive in the polypeptide chain, and in the Clp N-domains by the association of each motif with an N-terminal strand of the symmetry-related motif.
The similarities detected by HMM-to-HMM comparison are limited to these helix-strand-helix motifs. Histones and C-domains both contain two consecutive copies of the motif and can be aligned over essentially their entire length (Fig. 3A). Clp N-domains contain two motifs decorated by two helices and each motif has its best matches to the C-terminal motif of histones and C-domains (Fig. 3A). The sequence alignment shows extensive similarity in the hydrophobic patterns of the three folds, but no highly conserved residues other than two Alanines in the core of the second helix-strand-helix motif, which allow for close packing interactions at the crossover point between the helices.
Domain swapping as mechanism for protein evolution
The results presented here suggest an evolutionary link between histones and the C-domains of AAA+ proteins, despite differences in their topology. We propose 3D domain swapping as the mechanism that accounts for their structural differences. 3D domain swapping is a process by which two or more identical proteins exchange a domain to form interlocked oligomers , in which all of the packing interactions that stabilize the monomer are present. The swapped portions can range from a single secondary structure element to an entire domain. In the simplest case the native fold, normally constituted by a single 'closed' monomer, is reconstituted by two so-called 'open' monomers. This reciprocal swap leads to a homodimer, whereas the runaway domain swap, in which swapping propagates along an axis in an open-ended manner, has been proposed to contribute to amyloid fibril formation [23–25].
Up to now, about 40 proteins have been shown to be able to undergo 3D domain swapping , and several studies indicate a physiological role of this mechanism in allostery and signal transduction [27–29]. A precondition is the presence of a flexible loop or hinge, about which the swapped elements can rotate in order to form a pair of 'open' monomers. The primary intervention by which 3D domain swaps have been engineered into monomeric proteins is through the shortening of the hinge, thus preventing the packing of part of the protein into its native location and forcing a swap, such as in domain 1 of lymphocyte antigen CD2 , staphylococcal nuclease , single-chain Fv fragments [32, 33], in a 3-helix bundle designed by Ogihara et al. .
A primordial helix-strand-helix motif
The helix-strand-helix motif, which is at the core of the similarity between histones and C-domains, is also found in Clp N-domains, which assume yet a third fold. Here, the motif is decorated with two C-terminal helices, and two copies of this extended, 4-helical motif are fused in antiparallel orientation. Thus, three different folds appear to have been built from a common helix-strand-helix motif. One theory for the origin of folded proteins proposes that they arose by fusion and recombination from an ancestral set of peptides, which emerged in the context of RNA-dependent replication and catalysis (the 'RNA world') [6–8]. The helix-strand-helix motif would be such an ancestral peptide, which gave rise divergently to the Clp N-domain and the AAA+ C-domain through two independent events of duplication and fusion (Fig. 4). The C-domain then evolved into the histone fold by 3D domain swapping. This scenario extends a previous hypothesis on the origin of eukaryotic core histones, which proposed that they evolved from the duplication of a single helix-strand-helix motif [4, 5].
In this study we have deduced homology based on similarities in sequence and structure. We are aware that homology of proteins is an assumption inferred from heuristics, of which sequence similarity is generally accepted as the best indicator. Structural similarity alone, especially of small fragments, does not necessarily imply evolutionary divergence, since it may result from general biophysical constraints. Indeed, we find a number of α-helical hairpins in the PDB with a high degree of structural similarity to the helix-strand-helix motif (rmsds of less than 1.5Å); some examples include hairpins from fumerate reductase (1QLA_A, residues 65–94) and tetracycline repressor-like protein (1T33_A, residues 144–173). However, none of them show detectable sequence similarity to each other or to the proteins in our study. This shows that the constraints of structure on sequence variability are not sufficient to explain the observed sequence similarity between histones, C-domains, and Clp N-domains.
We have retraced the evolutionary events which may have shaped the histone fold and have found connections to two other folds; the N-terminal substrate recognition domain of Clp/Hsp100 proteins and the helical part of the extended AAA+ ATPase domain. These 3 folds contain a homologous helix-strand-helix motif, despite the differences in the topology, leading us to propose a scenario for the origin of these folds from a common ancestral helix-strand-helix motif through events of duplication, fusion and 3D domain swapping. The short functional parallel β-bridges formed by the strands of the helix-strand-helix motifs seem to be the evolutionary driving force for the conservation of this motif. Our findings provide additional support for our previously proposed hypothesis that the diversity of today's folds might have arisen from an ancestral set of peptides.
We obtained histone and Clp N-domain sequences from the ASTRAL compendium  as defined by the SCOP (version 1.71)  folds a.22 and a.174, respectively, and reduced the set to less than 25% pairwise identity at 90% length coverage using BLASTCLUST . C-domains are not characterized as a separate fold in SCOP; we extracted their sequences from the 'extended AAA-ATPase' family (c.37.1.20) of the SCOP database by a procedure described by Ammelburg et al.  and also reduced this set to less than 25% pairwise identity.
We used these sequences to search the SCOP25 database for homologs with HHpred [15, 19], at default parameters and a probability cutoff of 10%. The SCOP25 database is a version of SCOP filtered for a maximum of 25% pairwise sequence identity. For each group, we pooled all search results and tabulated the frequencies at which various SCOP families appeared at each probability, binned at 10% intervals.
Data for the superposition in Fig. 3
NO. OF ALIGNED RESIDUES
RMSD TO HMFA [Å]
AAA+: σ 54
a) 12/16(α 1)
b) 47/54(α 2–4)
ClpA-N (1st half)
ClpA-N (2nd half)
ClpB-N (1st half)
ClpB-N (2nd half)
The authors thank Nick Grishin for discussions. This work was supported by institutional funds from the Max Planck Society.
- Kornberg RD, Thomas JO: Chromatin structure; oligomers of the histones. Science 1974, 184(139):865–868. 10.1126/science.184.4139.865View ArticlePubMedGoogle Scholar
- Luger K, Mader AW, Richmond RK, Sargent DF, Richmond TJ: Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature 1997, 389(6648):251–260. 10.1038/38444View ArticlePubMedGoogle Scholar
- Arents G, Burlingame RW, Wang BC, Love WE, Moudrianakis EN: The nucleosomal core histone octamer at 3.1 A resolution: a tripartite protein assembly and a left-handed superhelix. Proc Natl Acad Sci USA 1991, 88(22):10148–10152. 10.1073/pnas.88.22.10148PubMed CentralView ArticlePubMedGoogle Scholar
- Arents G, Moudrianakis EN: Topography of the histone octamer surface: repeating structural motifs utilized in the docking of nucleosomal DNA. Proc Natl Acad Sci USA 1993, 90(22):10489–10493. 10.1073/pnas.90.22.10489PubMed CentralView ArticlePubMedGoogle Scholar
- Arents G, Moudrianakis EN: The histone fold: a ubiquitous architectural motif utilized in DNA compaction and protein dimerization. Proc Natl Acad Sci USA 1995, 92(24):11170–11174. 10.1073/pnas.92.24.11170PubMed CentralView ArticlePubMedGoogle Scholar
- Lupas AN, Ponting CP, Russell RB: On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world? J Struct Biol 2001, 134(2–3):191–203. 10.1006/jsbi.2001.4393View ArticlePubMedGoogle Scholar
- Soding J, Lupas AN: More than the sum of their parts: on the evolution of proteins from peptides. Bioessays 2003, 25(9):837–846. 10.1002/bies.10321View ArticlePubMedGoogle Scholar
- Fetrow JS, Godzik A: Function driven protein evolution. A possible proto-protein for the RNA-binding proteins. Pac Symp Biocomput 1998, 485–496.Google Scholar
- Pereira SL, Grayling RA, Lurz R, Reeve JN: Archaeal nucleosomes. Proc Natl Acad Sci USA 1997, 94(23):12633–12637. 10.1073/pnas.94.23.12633PubMed CentralView ArticlePubMedGoogle Scholar
- Bailey KA, Chow CS, Reeve JN: Histone stoichiometry and DNA circularization in archaeal nucleosomes. Nucleic Acids Res 1999, 27(2):532–536. 10.1093/nar/27.2.532PubMed CentralView ArticlePubMedGoogle Scholar
- Fahrner RL, Cascio D, Lake JA, Slesarev A: An ancestral nuclear protein assembly: crystal structure of the Methanopyrus kandleri histone. Protein Sci 2001, 10(10):2002–2007. 10.1110/ps.10901PubMed CentralView ArticlePubMedGoogle Scholar
- Baxevanis AD, Arents G, Moudrianakis EN, Landsman D: A variety of DNA-binding and multimeric proteins contain the histone fold motif. Nucleic Acids Res 1995, 23(14):2685–2691. 10.1093/nar/23.14.2685PubMed CentralView ArticlePubMedGoogle Scholar
- Drlica K, Rouviere-Yaniv J: Histonelike proteins of bacteria. Microbiol Rev 1987, 51(3):301–319.PubMed CentralPubMedGoogle Scholar
- Qiu Y, Tereshko V, Kim Y, Zhang R, Collart F, Yousef M, Kossiakoff A, Joachimiak A: The crystal structure of Aq_328 from the hyperthermophilic bacteria Aquifex aeolicus shows an ancestral histone fold. Proteins 2006, 62(1):8–16. 10.1002/prot.20590PubMed CentralView ArticlePubMedGoogle Scholar
- Soding J: Protein homology detection by HMM-HMM comparison. Bioinformatics 2005, 21(7):951–960. 10.1093/bioinformatics/bti125View ArticlePubMedGoogle Scholar
- Neuwald AF, Aravind L, Spouge JL, Koonin EV: AAA+: A class of chaperone-like ATPases associated with the assembly, operation, and disassembly of protein complexes. Genome Res 1999, 9(1):27–43.PubMedGoogle Scholar
- Ammelburg M, Frickey T, Lupas AN: Classification of AAA+ proteins. J Struct Biol 2006.Google Scholar
- Zeth K, Ravelli RB, Paal K, Cusack S, Bukau B, Dougan DA: Structural analysis of the adaptor protein ClpS in complex with the N-terminal domain of ClpA. Nat Struct Biol 2002, 9(12):906–911. 10.1038/nsb869View ArticlePubMedGoogle Scholar
- Soding J, Biegert A, Lupas AN: The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 2005, (33 Web Server):W244–248. 10.1093/nar/gki408
- Andreeva A, Howorth D, Brenner SE, Hubbard TJ, Chothia C, Murzin AG: SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res 2004, (32 Database):D226–229. 10.1093/nar/gkh039
- Maurizi MR, Xia D: Protein binding and disruption by Clp/Hsp100 chaperones. Structure 2004, 12(2):175–183. 10.1016/S0969-2126(04)00027-9View ArticlePubMedGoogle Scholar
- Bennett MJ, Schlunegger MP, Eisenberg D: 3D domain swapping: a mechanism for oligomer assembly. Protein Sci 1995, 4(12):2455–2468.PubMed CentralView ArticlePubMedGoogle Scholar
- Janowski R, Kozak M, Jankowska E, Grzonka Z, Grubb A, Abrahamson M, Jaskolski M: Human cystatin C, an amyloidogenic protein, dimerizes through three-dimensional domain swapping. Nat Struct Biol 2001, 8(4):316–320. 10.1038/86188View ArticlePubMedGoogle Scholar
- Guo Z, Eisenberg D: Runaway domain swapping in amyloid-like fibrils of T7 endonuclease I. Proc Natl Acad Sci USA 2006, 103(21):8042–8047. 10.1073/pnas.0602607103PubMed CentralView ArticlePubMedGoogle Scholar
- Sambashivan S, Liu Y, Sawaya MR, Gingery M, Eisenberg D: Amyloid-like fibrils of ribonuclease A with three-dimensional domain-swapped and native-like structure. Nature 2005, 437(7056):266–269. 10.1038/nature03916View ArticlePubMedGoogle Scholar
- Liu Y, Eisenberg D: 3D domain swapping: as domains continue to swap. Protein Sci 2002, 11(6):1285–1299. 10.1110/ps.0201402PubMed CentralView ArticlePubMedGoogle Scholar
- Piccoli R, Di Donato A, D'Alessio G: Co-operativity in seminal ribonuclease function. Kinetic studies. Biochem J 1988, 253(2):329–336.PubMed CentralView ArticlePubMedGoogle Scholar
- Gotte G, Bertoldi M, Libonati M: Structural versatility of bovine ribonuclease A. Distinct conformers of trimeric and tetrameric aggregates of the enzyme. Eur J Biochem 1999, 265(2):680–687. 10.1046/j.1432-1327.1999.00761.xView ArticlePubMedGoogle Scholar
- Schymkowitz JW, Rousseau F, Wilkinson HR, Friedler A, Itzhaki LS: Observation of signal transduction in three-dimensional domain swapping. Nat Struct Biol 2001, 8(10):888–892. 10.1038/nsb1001-888View ArticlePubMedGoogle Scholar
- Murray AJ, Lewis SJ, Barclay AN, Brady RL: One sequence, two folds: a metastable structure of CD2. Proc Natl Acad Sci USA 1995, 92(16):7337–7341. 10.1073/pnas.92.16.7337PubMed CentralView ArticlePubMedGoogle Scholar
- Green SM, Gittis AG, Meeker AK, Lattman EE: One-step evolution of a dimer from a monomeric protein. Nat Struct Biol 1995, 2(9):746–751. 10.1038/nsb0995-746View ArticlePubMedGoogle Scholar
- Kortt AA, Malby RL, Caldwell JB, Gruen LC, Ivancic N, Lawrence MC, Howlett GJ, Webster RG, Hudson PJ, Colman PM: Recombinant anti-sialidase single-chain variable fragment antibody. Characterization, formation of dimer and higher-molecular-mass multimers and the solution of the crystal structure of the single-chain variable fragment/sialidase complex. Eur J Biochem 1994, 221(1):151–157. 10.1111/j.1432-1033.1994.tb18724.xView ArticlePubMedGoogle Scholar
- Perisic O, Webb PA, Holliger P, Winter G, Williams RL: Crystal structure of a diabody, a bivalent antibody fragment. Structure 1994, 2(12):1217–1226. 10.1016/S0969-2126(94)00123-5View ArticlePubMedGoogle Scholar
- Ogihara NL, Ghirlanda G, Bryson JW, Gingery M, DeGrado WF, Eisenberg D: Design of three-dimensional domain-swapped dimers and fibrous oligomers. Proc Natl Acad Sci USA 2001, 98(4):1404–1409. 10.1073/pnas.98.4.1404PubMed CentralView ArticlePubMedGoogle Scholar
- Kinch LN, Grishin NV: Evolution of protein structures and functions. Curr Opin Struct Biol 2002, 12(3):400–408. 10.1016/S0959-440X(02)00338-XView ArticlePubMedGoogle Scholar
- Botos I, Melnikov EE, Cherry S, Khalatova AG, Rasulova FS, Tropea JE, Maurizi MR, Rotanova TV, Gustchina A, Wlodawer A: Crystal structure of the AAA+ alpha domain of E. coli Lon protease at 1.9A resolution. J Struct Biol 2004, 146(1–2):113–122. 10.1016/j.jsb.2003.09.003View ArticlePubMedGoogle Scholar
- Ogura T, Whiteheart SW, Wilkinson AJ: Conserved arginine residues implicated in ATP hydrolysis, nucleotide-sensing, and inter-subunit interactions in AAA and AAA+ ATPases. J Struct Biol 2004, 146(1–2):106–112. 10.1016/j.jsb.2003.11.008View ArticlePubMedGoogle Scholar
- Diemand AV, Lupas AN: Modeling AAA+ ring complexes from monomeric structures. J Struct Biol 2006.Google Scholar
- Lee AY, Hsu CH, Wu SH: Functional domains of Brevibacillus thermoruber lon protease for oligomerization and DNA binding: role of N-terminal and sensor and substrate discrimination domains. J Biol Chem 2004, 279(33):34903–34912. 10.1074/jbc.M403562200View ArticlePubMedGoogle Scholar
- Brenner SE, Koehl P, Levitt M: The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res 2000, 28(1):254–256. 10.1093/nar/28.1.254PubMed CentralView ArticlePubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215(3):403–410.View ArticlePubMedGoogle Scholar
- Guex N, Peitsch MC: SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis 1997, 18(15):2714–2723. 10.1002/elps.1150181505View ArticlePubMedGoogle Scholar