The bioinformatics studies, particularly the disorder prediction algorithms, were successful in identifying the additional C-terminal region that forms helix V in ZHX1 HD4. Use of sequence alignments alone would have missed this additional apparently integral feature of ZHX1 HD4, as most of helix V lacks sequence homology with other homeodomains of known structure.
Consistent with the experience from many centres using high throughput methods, our study of the ZHX family proteins has shown the value of making multiple constructs including those derived from different members of the gene family. A related approach using orthologs rather than paralogs has been recognized over many years as an important strategy in the quest to obtain diffraction quality protein crystals . The variation in surface residues in differing homologs is likely to have most effect on solubility and formation of crystal contacts; being the least conserved residues they hence will vary most between different sets of orthologs or paralogs. Indeed, the homologous domains from ZHX1, 2 and 3 did not show the same pattern of expression. For example, whilst HD4 of ZHX1 expressed as a soluble protein, the equivalent domain from either ZHX2 or ZHX3 did not. Nevertheless, this work did lead to the structure for one ZHX2 homeodomain, HD2. Removal of the His tag was important for crystallizing ZHX1 HD4. Although only a small sample set was tried as a proof of principle, it may be worth pursuing tag removal if more single domain homeodomain crystal structures were to be tackled, particularly as the tag accounts for a relatively high percentage of residues in a 60 amino acid length protein
Overall the use of high-throughput cloning and expression methodology meant that many constructs could be tested, thus allowing a number of complementary approaches to be tried in order to maximize the chances of successful structure determination. These approaches led to three-dimensional structure determination for three targets (two X-ray, one NMR). Despite attempts to crystallize protein derived from multidomain constructs, only single homeodomains yielded crystals, which may imply that the linkers between domains have too much flexibility to allow packing into a lattice. Induction of rigidity in these structures, which could improve crystallisability, may require the presence of DNA oligomers or partner proteins such as additional transcription factors in higher order complexes.
Homeodomains, with their common functional role in the regulation of gene transcription, generally show a high degree of structural conservation. As an example, certain homeodomain orthologs such as human Hox-A, although separated in evolution by 1/2 billion years from the Antennapedia gene, show >98% sequence conservation . Much greater variation in sequence is, of course, observed in paralog homeodomains reflecting functional diversity, yet in spite of this, the three-dimensional structures are generally conserved. Thus the determination of two ZHX homeodomain structures reported here with variations in three-dimensional structures from the canonical form is of interest.
In ZHX1 HD4, helix IV which is in effect a C-terminal extension of helix III is well ordered, which is in contrast to the higher degree of flexibility observed in the absence of DNA . The fact that ZHX1 HD4 helix IV is ordered without the bound nucleic acid ligand could be a result of the presence of helix V where its interactions with helix I may help to anchor helix IV residues. It is not clear what the functional consequences of this ordering, if any, may be. Certainly a lower level of induced fit of helix III/IV to the target DNA is likely which could in turn affect both specificity and tightness of binding of DNA.
For ZHX1 HD4, the long C-terminal helix (V) appears unusual in reported homeodomain structures. Helix V makes numerous, mainly hydrophobic contacts, with the C-terminal half of helix I. It is thus likely that this additional feature will result in greater stability of the protein. Interestingly, HD2 from homez, a homeobox leucine zipper containing transcription factor, has a short C-terminal single turn of helix that partially overlaps with helix V of ZHX1 HD4. There is no sequence identity between ZHX1 HD4 and homez homeodomain beyond this turn of helix. The extended helix V is only present in ZHX1 HD4, thus demonstrating differences in 3-D structure with homez despite their likely common origin as part of a vertebrate derived homeodomain gene subfamily .
Alignment of equivalent regions to ZHX1 HD4 helix V in other ZHX paralogs (Figure 1) suggests that this extended helix may also be present, particularly in the case of ZHX2. Inspection of other ZHX homeodomains whose structures have been deposited in the PDB (specifically ZHX1 HD3, ZHX2 HD3 & ZHX3 HD2), however, confirms the absence of helix V. ZHX1 HD4 has a significant amino acid sequence identity to that of homez HD2 of 51%. Homez HD2 is suggested to have a DNA binding function based on putative side-chain/nucleic acid interactions . By analogy, ZHX1 HD4 has Trp25 and Arg53 in position to form hydrophobic and ionic interactions, respectively, with DNA. Additionally, the basic N-terminal region present in ZHX1 HD4 is positioned to bind to the minor groove of DNA. ZHX1 HD4 also has a significant structural relationship to an engrailed homeodomain [32, 40]. Engrailed is a well characterized DNA binding homeodomain module and its structural relatedness may hence imply a similar biological role for ZHX1 HD4, rather than, for example, in formation of protein-protein interactions.
From an early study of homeodomain sequences a number of covariant residues were noted . Thus 16 strongly covariant residue pairs were identified, the most highly correlated co-occurring pair of residues were Glu19 and Arg30. From the three-dimensional homeodomain structures it was clear that the side-chains of these two residues formed a salt bridge, as exemplified in the engrailed homeodomain  which is one of the closest related structures to ZHX1 HD4 available. For engrailed there is a network of ionic interactions formed by two sets of residues 15/37 and 19/30 which also includes 15/19. It was suggested that the salt bridges in engrailed could provide stabilization as the small core/surface ratio for such a short protein may require such additional interactions.
In the case of ZHX1 HD4, the ionic bridges are broken by substitution of apolar residues at 15 (residue 684 in full length ZHX1) and 19 (688). There are no other residues forming replacement salt bridges elsewhere in the ZHX HD4 structure. Homez HD2 appears different again as, although closer to ZHX HD4 in evolutionary terms,  it retains only a single charged side-chain in the 15/37, 19/30 set (Asp30) and as a consequence has more hydrophobic contacts in this region (e.g. via the side chains of Phe19 & Leu 34). Also the highly conserved 31/42 salt bridge, which is retained in ZHX1 HD4, is not present in homez where glutamine replaces aspartic acid in the former and is not even in a position to form a hydrogen bond.
Regarding the potential biological role of helix V, it appears to be unlikely to act to form a more rigid link to HD5 than those between other homeodomains in ZHX1, as there is a significant predicted region of low complexity consisting of 40 residues between HD4 and HD5. As mentioned earlier, helix V is also unlikely to directly affect DNA binding as it points away from the expected major groove binding site. Taken together with the increased hydrophobic contacts formed by covariant residues 15/37 in ZHX HD4, compared to engrailed, for example, then this could reflect the greater tendency of certain homeodomains from homeotherms to rely more on hydrophobic stabilization, as put forward in a recent pivotal study . This stabilization relates to the fact that hydrophobic interactions are entropically driven and the entropy term, in contributing to the free energy decrease, has a negative temperature coefficient, thereby strengthening such interactions at higher temperature. A corollary to this is that the presence of more ionic interactions in non-homeotherm homeodomains such as engrailed may be appropriate for stabilizing the protein over a wider temperature range. Alternatively, it could be the case that lower thermal stability may be a desired property for certain transcriptional regulators, allowing a shorter protein half life and thus more rapid response as part of a control mechanism.
Naively, it might be expected that any greater stabilisation of a homeodomain could give an increased binding affinity for the DNA ligand. However, it has been observed that there is no correlation between homeodomain thermal stability and strength of DNA binding . Thus the need for greater stability of an individual homeodomain may be related to specific functional requirements such as interactions with a particular set of regulatory proteins, for example, rather then representing a general property.
For OPPF_2273, the in situ cleavage of HD2-3 to yield crystals of ZHX2 HD2 was unexpected as was the non-canonical homeodomain conformation observed in the crystal structure involving a domain swap of helix I thereby forming a dimer. It is not clear that the manner of the generation of HD2: i.e. by proteolysis in the crystallization droplet, should necessarily give rise to this unusual conformation, as presumably it is merely the cleavage of a flexible loop connecting two domains. Opening out of the standard homeodomain fold, as observed for ZHX2 HD2, would likely have the effect of destabilizing the hydrophobic core of the protein. However, the dimer we observe in the crystal structure serves to bury some of the exposed hydrophobic residues at the interface between the subunits.
Previously published data indicate that HD1 of full length ZHX1-3 is responsible for both homo- and hetero-dimer formation [6, 9–12], based on yeast two-hybrid data . Thus the possible biological significance of a dimer formed between ZHX2 HD2 subunits in this context is unclear. As the homez/ZHX gene family is specific to a vertebrate lineage and involved in complex regulatory networks , then it is possible that different interactions to those detected in a yeast two-hybrid system may occur .
However, a similar kind of domain swapping as seen for ZHX2 HD2 could form the basis for the observed dimerisation of ZHX1 HD1 mentioned above.
Even if the unusual ZHX2 HD2 structure is not normally present at significant frequencies in biological systems it is of interest in the context of protein folding, where an alternative conformation exists for the same sequence. The altered homeodomain conformation could form the basis for the acquisition of potential distinctive new functional properties for this commonly occurring protein module.