Type II restriction endonucleases (REases) form one of the largest groups of biochemically characterized enzymes (reviews: [1, 2]). They usually recognize a short (4–8 bp) palindromic sequence of double-stranded DNA and catalyze the hydrolysis of phosphodiester bonds at precise positions within or close to this sequence, leaving "blunt" ends or "sticky" (5' or 3') overhangs. They form restriction-modification (RM) systems together with DNA methyltransferases (MTases) of the same or a similar sequence specificity, whose enzymatic activity leads to methylation of the target sequence and, consequently, its protection against the cleavage by the REase . Type II RM systems behave as selfish "toxin-antitoxin" genetic modules; they undergo rampant horizontal transfer and parasitize the cells of prokaryotic hosts to ensure the maintenance of their DNA [4–6]. The activity of the RM systems manifests itself by destruction of DNA molecules without the required methylation patterns, e.g. DNA molecules of invading phages or plasmids, or the genomic DNA of their host cells that once had the RM genes but have lost them.
The activity of REases is the target of selection pressure involving various agents: their host, the invading DNA molecules, and their competitors including other RM systems [7–10]. Presumably because of the absence of simple constant selection pressure on the REase activity, they undergo rapid divergence, and as a consequence, different REase families exhibit very little sequence similarity (review: ). Besides, there is formidable evidence, mainly from crystallographic analyses, that these enzymes have originated independently in the evolution on at least several occasions.
Thus far, REases have been found to belong to at least five unrelated structural folds. Most of REases belong to the PD-(D/E)XK superfamily of Mg2+-dependent nucleases, which also includes various proteins involved in DNA recombination and repair [12, 13]. Two REases with different folds have been found to be Mg2+-independent: R.BfiI belongs to the phospholipase D (PLD) superfamily of phosphodiesterases [14, 15], while R.PabI exhibits a novel "half-pipe" fold [16, 17]. A number of REases have been predicted to be related to the HNH superfamily of metal-dependent nucleases, which groups together enzymes with various activities, such as recombinases, DNA repair enzymes, and homing endonucleases [12, 18]. For some of these REases from the HNH superfamily, bioinformatics predictions of the active site have been substantiated by mutagenesis; examples include R.KpnI , R.MnlI , and R.Eco31I . Finally, R.Eco29kI and its two close homologs have been predicted to belong to the GIY-YIG superfamily of nucleases that includes e.g. DNA repair enzymes and homing nucleases ; this prediction has been recently supported by mutagenesis of the R.Eco29kI active site . Among of all REase folds, the mechanism of action of GIY-YIG and half-pipe nucleases is least well understood, and no co-crystal structures are available for any member of these superfamilies.
A recent large-scale bioinformatics survey of Type II REase sequences  indicated that for about 81% of experimentally characterized (i.e. not putative) enzymes, the three-dimensional fold can be predicted based on advanced bioinformatics analyses, mainly protein fold-recognition and analysis of amino acid conservation patterns and secondary structure prediction (review of methodology: ). However, the other REases remain unassigned to known folds and the architectures of their active sites and potential mechanisms of action remain obscure.
R.Hpy188I is one of the REases, for which no fold prediction have been made thus far. R.Hpy188I recognizes the unique sequence, TCNGA, and cleaves the DNA between nucleotides N and G in its recognition sequence to generate a one-base 3' overhang . Its orthologs are found among many, but not all, strains of Helicobacter pylori that have been tested with respect to the REase activity . In this work, we present the results of a bioinformatics analysis that has detected remote relationship between R.Hpy188I and known GIY-YIG nucleases thanks to utilization of metagenomics sequences to generate a multiple sequence alignment with enhanced evolutionary information. We suggest that this approach could be applied to predict structure of other proteins, for which fold-recognition analyses done with standard alignments have failed.