On the origin of the histone fold
© Alva et al; licensee BioMed Central Ltd. 2007
Received: 03 November 2006
Accepted: 28 March 2007
Published: 28 March 2007
Skip to main content
© Alva et al; licensee BioMed Central Ltd. 2007
Received: 03 November 2006
Accepted: 28 March 2007
Published: 28 March 2007
Histones organize the genomic DNA of eukaryotes into chromatin. The four core histone subunits consist of two consecutive helix-strand-helix motifs and are interleaved into heterodimers with a unique fold. We have searched for the evolutionary origin of this fold using sequence and structure comparisons, based on the hypothesis that folded proteins evolved by combination of an ancestral set of peptides, the antecedent domain segments.
Our results suggest that an antecedent domain segment, corresponding to one helix-strand-helix motif, gave rise divergently to the N-terminal substrate recognition domain of Clp/Hsp100 proteins and to the helical part of the extended ATPase domain found in AAA+ proteins. The histone fold arose subsequently from the latter through a 3D domain-swapping event. To our knowledge, this is the first example of a genetically fixed 3D domain swap that led to the emergence of a protein family with novel properties, establishing domain swapping as a mechanism for protein evolution.
The helix-strand-helix motif common to these three folds provides support for our theory of an 'ancient peptide world' by demonstrating how an ancestral fragment can give rise to 3 different folds.
The organization of DNA into chromatin allows its compact and reversible packaging into the nucleus of a eukaryotic cell. The basic structural unit of chromatin is the nucleosome , which consists of 146 base pairs of double-stranded DNA wrapped around an octameric histone core complex . The core complex is composed of two copies of each of the histone proteins H2A, H2B, H3, and H4, organized as a central (H3-H4)2 tetramer flanked by two H2A-H2B dimers . Despite low sequence similarity, all core histone subunits share a common fold; they are composed of three helices separated by two short strap loops and assemble into heterodimers by interleaving the helices into the 'handshake motif' and juxtaposing the strap loops into short parallel β-bridges . This fold may have arisen through the duplication of a primordial helix-strand-helix motif [4, 5], consistent with the hypothesis that folded proteins arose by the combination of subdomain-sized peptides, the so-called antecedent domain segments [6–8].
Archaea also wrap their DNA into nucleosome-like structures ; their constituent histone subunits assemble into tetramers, which may reflect an ancestral form of the central part of the eukaryotic nucleosome octamer, the (H3-H4)2 tetramer . Archaeal histone subunits are occasionally duplicated on a single polypeptide chain , a form observed in eukaryotes only in the histone-like domain of the son of sevenless protein .
Bacteria also have nucleoid proteins with histone-like properties , but these belong to a different, unrelated fold. However, a homolog of archaeal single-chain histones was recently reported from the bacterium Aquifex aeolicus (1R4V) . Further homologs appear in the genomes of a few, phylogenetically diverse bacteria. It thus seems likely that the histone fold originated in the common ancestor of eukaryotes and archaea and spread into some bacteria through lateral gene transfer.
In an all-against-all application of HHsearch  to the SCOP database (JS, unpublished results) we found an evolutionary relationship between histone proteins and the helical part of the extended AAA+ ATPase domain, the C-domain [16, 17]. Based on this finding, we used sequence and structure comparisons to reconstruct in detail the evolutionary events that may have shaped the histone fold. Our results point to a common origin not only with the C-domain but also with the N-terminal substrate recognition domain of Clp/Hsp100 proteins . The conserved element is a helix-strand-helix motif, which we propose gave rise divergently to these three different folds and thus represents an antecedent domain segment.
Homology between proteins is typically inferred from similarities in sequence and structure. Sequence similarity is the primary criterion for deducing a common origin, but for distant evolutionary events, sequences may have diverged beyond our ability to detect their relatedness. Structures diverge much more slowly and their similarity is therefore often used to identify such distant events. However, similar structures may have arisen convergently from different origins and their similarity thus frequently does not provide conclusive evidence of common ancestry. In this study we applied a new, highly sensitive method for sequence comparison based on profile Hidden Markov Models (HMMs) to identify distant homologs of histones on the basis of sequence similarity alone. Subsequently, we validated our findings through structure comparisons.
We found two high-scoring matches with other folds. These are an alanyl tRNA synthetase (1RIQ, a.203.1.1, identified by the histone entry 1JFI), and the zeta subunit of a plasmid maintenance system (1GVN, c.37.1.21, identified by two C-domains: 1LV7 and 1R7R). Subsequent analysis could not confirm these matches as homologs.
The surprising aspect of these findings is that histones, C-domains and Clp N-domains belong to three different folds (Fig. 2A–C). Histones are dimeric, interleaved helical bundles, as described in the Background section. C-domains are four-helix bundles composed of two consecutive helix-strand-helix motifs . Clp N-domains, finally, are multihelical domains formed by the repetition of a 4-helical motif . Although these three protein families have different topologies, they all incorporate two copies of the helix-strand-helix motif, which engages in the formation of a short parallel β-bridge. In the histone dimer, the β-bridge is formed by the association of one helix-strand-helix motif from each monomer, in the C-domain by the association of the two motifs consecutive in the polypeptide chain, and in the Clp N-domains by the association of each motif with an N-terminal strand of the symmetry-related motif.
The similarities detected by HMM-to-HMM comparison are limited to these helix-strand-helix motifs. Histones and C-domains both contain two consecutive copies of the motif and can be aligned over essentially their entire length (Fig. 3A). Clp N-domains contain two motifs decorated by two helices and each motif has its best matches to the C-terminal motif of histones and C-domains (Fig. 3A). The sequence alignment shows extensive similarity in the hydrophobic patterns of the three folds, but no highly conserved residues other than two Alanines in the core of the second helix-strand-helix motif, which allow for close packing interactions at the crossover point between the helices.
The results presented here suggest an evolutionary link between histones and the C-domains of AAA+ proteins, despite differences in their topology. We propose 3D domain swapping as the mechanism that accounts for their structural differences. 3D domain swapping is a process by which two or more identical proteins exchange a domain to form interlocked oligomers , in which all of the packing interactions that stabilize the monomer are present. The swapped portions can range from a single secondary structure element to an entire domain. In the simplest case the native fold, normally constituted by a single 'closed' monomer, is reconstituted by two so-called 'open' monomers. This reciprocal swap leads to a homodimer, whereas the runaway domain swap, in which swapping propagates along an axis in an open-ended manner, has been proposed to contribute to amyloid fibril formation [23–25].
Up to now, about 40 proteins have been shown to be able to undergo 3D domain swapping , and several studies indicate a physiological role of this mechanism in allostery and signal transduction [27–29]. A precondition is the presence of a flexible loop or hinge, about which the swapped elements can rotate in order to form a pair of 'open' monomers. The primary intervention by which 3D domain swaps have been engineered into monomeric proteins is through the shortening of the hinge, thus preventing the packing of part of the protein into its native location and forcing a swap, such as in domain 1 of lymphocyte antigen CD2 , staphylococcal nuclease , single-chain Fv fragments [32, 33], in a 3-helix bundle designed by Ogihara et al. .
The helix-strand-helix motif, which is at the core of the similarity between histones and C-domains, is also found in Clp N-domains, which assume yet a third fold. Here, the motif is decorated with two C-terminal helices, and two copies of this extended, 4-helical motif are fused in antiparallel orientation. Thus, three different folds appear to have been built from a common helix-strand-helix motif. One theory for the origin of folded proteins proposes that they arose by fusion and recombination from an ancestral set of peptides, which emerged in the context of RNA-dependent replication and catalysis (the 'RNA world') [6–8]. The helix-strand-helix motif would be such an ancestral peptide, which gave rise divergently to the Clp N-domain and the AAA+ C-domain through two independent events of duplication and fusion (Fig. 4). The C-domain then evolved into the histone fold by 3D domain swapping. This scenario extends a previous hypothesis on the origin of eukaryotic core histones, which proposed that they evolved from the duplication of a single helix-strand-helix motif [4, 5].
In this study we have deduced homology based on similarities in sequence and structure. We are aware that homology of proteins is an assumption inferred from heuristics, of which sequence similarity is generally accepted as the best indicator. Structural similarity alone, especially of small fragments, does not necessarily imply evolutionary divergence, since it may result from general biophysical constraints. Indeed, we find a number of α-helical hairpins in the PDB with a high degree of structural similarity to the helix-strand-helix motif (rmsds of less than 1.5Å); some examples include hairpins from fumerate reductase (1QLA_A, residues 65–94) and tetracycline repressor-like protein (1T33_A, residues 144–173). However, none of them show detectable sequence similarity to each other or to the proteins in our study. This shows that the constraints of structure on sequence variability are not sufficient to explain the observed sequence similarity between histones, C-domains, and Clp N-domains.
We have retraced the evolutionary events which may have shaped the histone fold and have found connections to two other folds; the N-terminal substrate recognition domain of Clp/Hsp100 proteins and the helical part of the extended AAA+ ATPase domain. These 3 folds contain a homologous helix-strand-helix motif, despite the differences in the topology, leading us to propose a scenario for the origin of these folds from a common ancestral helix-strand-helix motif through events of duplication, fusion and 3D domain swapping. The short functional parallel β-bridges formed by the strands of the helix-strand-helix motifs seem to be the evolutionary driving force for the conservation of this motif. Our findings provide additional support for our previously proposed hypothesis that the diversity of today's folds might have arisen from an ancestral set of peptides.
We obtained histone and Clp N-domain sequences from the ASTRAL compendium  as defined by the SCOP (version 1.71)  folds a.22 and a.174, respectively, and reduced the set to less than 25% pairwise identity at 90% length coverage using BLASTCLUST . C-domains are not characterized as a separate fold in SCOP; we extracted their sequences from the 'extended AAA-ATPase' family (c.37.1.20) of the SCOP database by a procedure described by Ammelburg et al.  and also reduced this set to less than 25% pairwise identity.
We used these sequences to search the SCOP25 database for homologs with HHpred [15, 19], at default parameters and a probability cutoff of 10%. The SCOP25 database is a version of SCOP filtered for a maximum of 25% pairwise sequence identity. For each group, we pooled all search results and tabulated the frequencies at which various SCOP families appeared at each probability, binned at 10% intervals.
Data for the superposition in Fig. 3
NO. OF ALIGNED RESIDUES
RMSD TO HMFA [Å]
AAA+: σ 54
a) 12/16(α 1)
b) 47/54(α 2–4)
ClpA-N (1st half)
ClpA-N (2nd half)
ClpB-N (1st half)
ClpB-N (2nd half)
The authors thank Nick Grishin for discussions. This work was supported by institutional funds from the Max Planck Society.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.