Type II restriction endonuclease R.Hpy188I belongs to the GIY-YIG nuclease superfamily, but exhibits an unusual active site

Background Catalytic domains of Type II restriction endonucleases (REases) belong to a few unrelated three-dimensional folds. While the PD-(D/E)XK fold is most common among these enzymes, crystal structures have been also determined for single representatives of two other folds: PLD (R.BfiI) and half-pipe (R.PabI). Bioinformatics analyses supported by mutagenesis experiments suggested that some REases belong to the HNH fold (e.g. R.KpnI), and that a small group represented by R.Eco29kI belongs to the GIY-YIG fold. However, for a large fraction of REases with known sequences, the three-dimensional fold and the architecture of the active site remain unknown, mostly due to extreme sequence divergence that hampers detection of homology to enzymes with known folds. Results R.Hpy188I is a Type II REase with unknown structure. PSI-BLAST searches of the non-redundant protein sequence database reveal only 1 homolog (R.HpyF17I, with nearly identical amino acid sequence and the same DNA sequence specificity). Standard application of state-of-the-art protein fold-recognition methods failed to predict the relationship of R.Hpy188I to proteins with known structure or to other protein families. In order to increase the amount of evolutionary information in the multiple sequence alignment, we have expanded our sequence database searches to include sequences from metagenomics projects. This search resulted in identification of 23 further members of R.Hpy188I family, both from metagenomics and the non-redundant database. Moreover, fold-recognition analysis of the extended R.Hpy188I family revealed its relationship to the GIY-YIG domain and allowed for computational modeling of the R.Hpy188I structure. Analysis of the R.Hpy188I model in the light of sequence conservation among its homologs revealed an unusual variant of the active site, in which the typical Tyr residue of the YIG half-motif had been substituted by a Lys residue. Moreover, some of its homologs have the otherwise invariant Arg residue in a non-homologous position in sequence that nonetheless allows for spatial conservation of the guanidino group potentially involved in phosphate binding. Conclusion The present study eliminates a significant "white spot" on the structural map of REases. It also provides important insight into sequence-structure-function relationships in the GIY-YIG nuclease superfamily. Our results reveal that in the case of proteins with no or few detectable homologs in the standard "non-redundant" database, it is useful to expand this database by adding the metagenomic sequences, which may provide evolutionary linkage to detect more remote homologs.


Background
Type II restriction endonucleases (REases) form one of the largest groups of biochemically characterized enzymes (reviews: [1,2]). They usually recognize a short (4-8 bp) palindromic sequence of double-stranded DNA and catalyze the hydrolysis of phosphodiester bonds at precise positions within or close to this sequence, leaving "blunt" ends or "sticky" (5' or 3') overhangs. They form restriction-modification (RM) systems together with DNA methyltransferases (MTases) of the same or a similar sequence specificity, whose enzymatic activity leads to methylation of the target sequence and, consequently, its protection against the cleavage by the REase [3]. Type II RM systems behave as selfish "toxin-antitoxin" genetic modules; they undergo rampant horizontal transfer and parasitize the cells of prokaryotic hosts to ensure the maintenance of their DNA [4][5][6]. The activity of the RM systems manifests itself by destruction of DNA molecules without the required methylation patterns, e.g. DNA molecules of invading phages or plasmids, or the genomic DNA of their host cells that once had the RM genes but have lost them.
The activity of REases is the target of selection pressure involving various agents: their host, the invading DNA molecules, and their competitors including other RM systems [7][8][9][10]. Presumably because of the absence of simple constant selection pressure on the REase activity, they undergo rapid divergence, and as a consequence, different REase families exhibit very little sequence similarity (review: [11]). Besides, there is formidable evidence, mainly from crystallographic analyses, that these enzymes have originated independently in the evolution on at least several occasions.
Thus far, REases have been found to belong to at least five unrelated structural folds. Most of REases belong to the PD-(D/E)XK superfamily of Mg 2+ -dependent nucleases, which also includes various proteins involved in DNA recombination and repair [12,13]. Two REases with different folds have been found to be Mg 2+ -independent: R.BfiI belongs to the phospholipase D (PLD) superfamily of phosphodiesterases [14,15], while R.PabI exhibits a novel "half-pipe" fold [16,17]. A number of REases have been predicted to be related to the HNH superfamily of metaldependent nucleases, which groups together enzymes with various activities, such as recombinases, DNA repair enzymes, and homing endonucleases [12,18]. For some of these REases from the HNH superfamily, bioinformatics predictions of the active site have been substantiated by mutagenesis; examples include R.KpnI [19], R.MnlI [20], and R.Eco31I [21]. Finally, R.Eco29kI and its two close homologs have been predicted to belong to the GIY-YIG superfamily of nucleases that includes e.g. DNA repair enzymes and homing nucleases [22]; this predic-tion has been recently supported by mutagenesis of the R.Eco29kI active site [23]. Among of all REase folds, the mechanism of action of GIY-YIG and half-pipe nucleases is least well understood, and no co-crystal structures are available for any member of these superfamilies.
A recent large-scale bioinformatics survey of Type II REase sequences [24] indicated that for about 81% of experimentally characterized (i.e. not putative) enzymes, the three-dimensional fold can be predicted based on advanced bioinformatics analyses, mainly protein foldrecognition and analysis of amino acid conservation patterns and secondary structure prediction (review of methodology: [25]). However, the other REases remain unassigned to known folds and the architectures of their active sites and potential mechanisms of action remain obscure.
R.Hpy188I is one of the REases, for which no fold prediction have been made thus far. R.Hpy188I recognizes the unique sequence, TCNGA, and cleaves the DNA between nucleotides N and G in its recognition sequence to generate a one-base 3' overhang [26]. Its orthologs are found among many, but not all, strains of Helicobacter pylori that have been tested with respect to the REase activity [27]. In this work, we present the results of a bioinformatics analysis that has detected remote relationship between R.Hpy188I and known GIY-YIG nucleases thanks to utilization of metagenomics sequences to generate a multiple sequence alignment with enhanced evolutionary information. We suggest that this approach could be applied to predict structure of other proteins, for which fold-recognition analyses done with standard alignments have failed.

Initial bioinformatics analysis of R.Hpy188I and its homologs
The lack of overall sequence conservation among REases, the absence of invariable residues even in the active site and the presence of several alternative folds makes structure prediction and generation of multiple sequence alignments for these enzymes a non-trivial task. In order to predict the structure of R.Hpy188I, we used the Gene-Silico meta-server, which is a gateway to a number of third-party algorithms (see Methods). In particular, we predicted the secondary structure of this enzyme and carried out the fold recognition analysis to identify the structures of potentially homologous proteins in the Protein Data Bank that could serve as modeling templates. Unfortunately, querying the meta-server with R.Hpy188I sequence alone has not revealed any significant matches to proteins of known structure (for a discussion of significance thresholds of individual FR methods, see the Methods section). Of all methods used, only HHSEARCH revealed a match to GIY-YIG nucleases, albeit at the 9 th position of the ranking, with a score that did not indicate statistical significance (probability 0.113, e-value 68).
Most of fold recognition methods make their predictions not for a single sequence, but for a multiple sequence alignment generated by PSI-BLAST searches of the nonredundant (nr) NCBI database (or of a subset of sequences culled from this database). Analysis of an independently carried out PSI-BLAST run against that database (with e-value threshold of 1e-3) revealed only one nearly identical sequence, of REase R.HpyF17I that exhibits only 1 amino acid difference and 4 additional amino acids at the N terminus (Sapranauskas, R., Lubys, A. and Janulaitis, A. unpublished reference "Cloning and analysis of the TCNGA-specific restriction-modification system from Helicobacter pylori strain A17-2"). The results of fold recognition analysis starting from R.HpyF17I or from an alignment of R.Hpy188I and R.HpyF17I, were the same as those starting from R.Hpy188I alone. Thus, R.Hpy188I can be considered an "ORFan" [28], at least with respect to the nr database.
Previously, in the course of bioinformatics analysis of R.NlaIV enzyme, we found that inclusion of sequences from metagenomics projects can increase information content of a multiple sequence alignment and improve detection of remote homologies, in particular for proteins with very few homologs in the nr database [29]. Thus, we carried out a new PSI-BLAST search for R.Hpy188I (also with e-value threshold of 1e-3), of the env_nr database (protein sequences deduced from environmental DNA samples), obtained from the NCBI server database. This search revealed 9 sequences, with e-values ranging from 3e-10 to 3e-4. Again, running FR analyses for these sequences gave no significant matches to any structure. Nonetheless, a PSI-BLAST search of a database comprising both nr AND env_nr revealed an increased number of sequences. In the search, 25 sequences including 18 from marine metagenome [30] were found to exhibit significant scores (e-values < e-4) and a conserved pattern of residues (I/V)-Y-X 9 -(K)-I-G (where X indicates any amino acid residue) associated with a predicted β-hairpin structure that remotely resembled the genuine bipartite GIY-YIG motif. FR analysis of a multiple sequence alignment calculated for the sequences returned by the PSI-BLAST search revealed the relationship of these sequences to the GIY-YIG superfamily, according to the following servers: HHSEARCH (probability 0.946), FFAS (score -12.6), mGenTHREADER (probability 0.422), FUGUE (score 10.3), INUB (score 44.1). According to the Livebench evaluation, all these scores indicate higher reliability than the threshold of approximately 5% false positives (see Methods) and in our experience can be taken as reasonably confident 3D fold prediction. Further, the consensus predictor PCONS selected 1yd0 as a preferred template with score 0.665, a value almost exactly at the threshold. Thus, we estimate that a probability of incorrect fold prediction for the R.Hpy188I family is around 5%.
We conclude that utilization of evolutionary information from metagenomics sequences can greatly increase the information content of a multiple sequence alignment, to the point where a reasonably sized family can be detected for a sequence, which appears as an "ORFan" when only the nr database is considered. An extended multiple sequence alignment that includes metagenomics sequences together with proteins from the nr database can then be used as a sensitive probe in protein fold-recognition, for detection of remote homologies to proteins of known structure.

Molecular modeling of R.Hpy188I
It is well known that fold recognition methods can produce artifacts. For instance, sequence alignments to wrong templates can reveal misleading local similarity of amino acid residues, and generate structures that are completely misfolded. Thus, in order to substantiate the sequencebased prediction of membership of R.Hpy188I in the GIY-YIG superfamily (with the confidence of FR predictions estimated to be around 95%), we decided to build a model of its structure and evaluate its quality on the threedimensional level. Although the GIY-YIG domain of UvrC [31] has been identified as the preferred structure, fold recognition alignments reported by different methods exhibited differences. Thus, we used the "FRankenstein's Monster" approach to simultaneously generate a model of the protein core and optimize the target-template alignment by generation, evaluation, and recombination of alternative models [32,33]. This approach has been evaluated as one of the best template-modeling methods in CASP5 and CASP6; we have also used it to generate accurate models of REases R.SfiI [34] and R.MvaI [35], which were confirmed by independent crystallographic analyses [36,37]. The final alignment ( Figure 1) indicated that regions 1-59, 89-103, and 113-121 of R.Hpy188I lack the counterpart in GIY-YIG domains of known structure and cannot be modeled "by homology".
Initially, we attempted to fold regions 1-59 (N-terminal extension), 89-103, and 128-143 (two insertions and a structure of low sequence similarity to the template) using ROSETTA (see Methods), while keeping the rest of the model 'frozen'. However, the resulting models (lowenergy representatives of the 5 largest clusters of decoys) exhibited relatively poor packing (data not shown). Thus, we subjected these models to refinement with the REFINER method [38], using additional restraints on secondary structure, according to the consensus prediction reported by the meta-server. Recently, we have used this approach to correctly predict the structures of MiaA, MiaB, and MiaE enzymes [39]. Among all the refined R.Hpy188I models, the one with the lowest predicted deviation to the native structure (root mean square deviation from the native structure of about 4.25 Å according to the Meta-MQAP method, and LGscore of 3.536 i.e. 'very good model' according to PROQ) has been selected as the final model ( Figure 2) and subjected to further analyses.

Analysis of the R.Hpy188I model
Comparison of the R.Hpy188I model with the much smaller template structures of GIY-YIG domains of UvrC and I-TevI homing endonuclease ( Figure 2) illustrates the challenge of modeling, in particular with respect to regions that have no counterpart in the templates and have been added de novo. Nonetheless, our model obtained very good scores, which suggests that it is likely to be well-folded and that potential errors are unlikely to occur in the structurally most important regions. Parts modeled de novo do not form an autonomously folded (sub)domain. Instead, they pack against the homologymodeled GIY-YIG core. The secondary structure in the model fulfills the restraints used during model building; interestingly, a part of the N-terminal loop (residues 6-8) has formed a small β-sheet with a part of the insertion Multiple sequence alignment of 25 members of the R.Hpy188I family, together with the structurally characterized homologs from the GIY-YIG superfamily, UvrC (1yd0) and I-TevI (1mk0) Figure 1 Multiple sequence alignment of 25 members of the R.Hpy188I family, together with the structurally characterized homologs from the GIY-YIG superfamily, UvrC (1yd0) and I-TevI (1mk0). The members are indicated by the NCBI GI number followed by the original REBASE name, e.g. R.Hpy188I (for sequences available in REBASE, e.g. R.Hpy188I) or the abbreviated genus and species name with exception of Marine for the marine metagenome and Synpha for Synechococcus phage S-PM2. Amino acid residues are colored according to the similarity of their physico-chemical properties. Secondary structure (ss), as determined experimentally for UvrC and I-TevI and predicted for R.Hpy188I (taken from the final model presented in the present study), is indicated below the alignment as tubes (helices) and arrows (strands). The alternative positions of Arg residue (84 or 104 or 105 in R.Hpy188I sequence) are indicated by asterisks (*) above the alignment, whereas the position of the two Cys residues C90 and C101 is indicated by plus characters (+).
(residues 101-103). The model reveals the predicted configuration of the putative active site of R.Hpy188I, comprising amino acid residues Y63, K73, R84, Y88, E149, and Q169 ( Figure 3). In comparison with the GIY-YIG domains analyzed so far [22], R.Hpy188I and some of its homologs are the first to exhibit K (K73 for R.Hpy188I) at A structural model of R.Hpy188I (A and B) compared to its experimentally characterized homologs (C and D) position corresponding to Y29 of UvrC and Y17 of I-TevI (Y of the YIG half motif) and Q (Q169 for R.Hpy188I) at the position corresponding to N88 of UvrC and N90 of I-TevI (Figure 3 and 4).
The mechanism of phosphodiester bond hydrolysis has not been elucidated experimentally for any protein from the GIY-YIG superfamily, however a tentative mechanism has been proposed based on analysis of the crystal structure of a GIY-YIG domain from the UvrC enzyme [31]. In analogy to that tentative mechanism, the divalent metal ion may function as Lewis acid, while E149 of R.Hpy188I may be responsible for metal coordination, K73 (alternatively Y63 or Y88) may function as a general base, and R84 may stabilize the negative charge of the free 5' phosphate after DNA cleavage. The hydoxyl group of Y29 of UvrC has been proposed to accept a proton from a nucleophilic water molecule while simultaneously transferring its proton to the metal-bound hydroxide [31]. The amino group of K73 might act in a similar way.
Interestingly, among the afore-mentioned residues of the putative active site, R84 (indicated by an asterisk in Figure  1 variants with the Arg residue in these alternative locations ( Figure 4) revealed that the positively charged guanidino group at the tip of its side chain can assume spatially similar location as in the "orthodox" position. This finding suggests that R104/105 may fulfill the same role of phosphate binding as R84 despite being attached to a nonhomologous position in the protein backbone. Such a spatial "migration" of a catalytic residue has not yet been observed in enzymes from the GIY-YIG superfamily; however, it has been reported for two different residues (Glu/ Asp or Lys/Arg) in a number of nucleases from the PD-(D/ E)XK superfamily [40][41][42]. Thus, it will be interesting to test experimentally the functional significance of the swapped Arg residue in the newly discovered GIY-YIG enzymes described in this work.
In addition to potential catalytic residues, the model of R.Hpy188I (Figure 4) revealed a pair of semi-conserved cysteines (C90 and C101) in the vicinity of the alternative positions of the afore-mentioned Arg residue (84 and 104/105). The presence of these two Cys residues is strongly correlated: they co-occur in 11 sequences and a single member of this pair is present only in 2 sequences. Both Cys residues are absent from all 12 members of the R.Hpy188I family that possess a shorter variant of the intervening loop (Figure 1). It is tempting to speculate that this pair of Cys residues might have a functional role, e.g. somehow stabilize the longer variant of the loop that may be involved in protein-DNA interactions. In the model they are sufficiently close to each other to form a disulfide, which is however unlikely to happen in nature due to the generally reducing environment of the bacterial cytoplasm, which prevents oxidation of sulfhydryl groups [43]. Alternatively, if R.Hpy188I forms a dimer like most of Type II restriction endonucleases, they could form an intermolecular Zn-bindig site. Unfortunately, our model cannot provide detailed clues as to the function of C90 and C101, hence we propose them as interesting targets for experimental characterization.
Analysis of the protein surface with respect to the distribution of sequence conservation and the electrostatic potential ( Figure 5) reveals that the surface of R.Hpy188I is mostly positively charged. The predicted catalytic residues line up a bottom of a pocket with an overall neutral charge that is surrounded by a charged rim. Most of that rim exhibits positive charge (complementary to the negative charge of DNA backbone), suggesting its possible role in the DNA binding. However, one side of the rim exhibits local concentration of negative charge, suggesting potential involvement in interactions with the positively charged metal ion.

Phylogenetics and genomic context analysis of the R.Hpy188I family
In order to interpret the structural and genomic features of different R.Hpy188I homologs in the evolutionary context, we have calculated a phylogenetic tree for the entire family ( Figure 6). It reveals that the R.Hpy188I family comprises two subfamilies with different characteristic features (hereafter dubbed R.Hpy188I branch and R.HpyAORF481P branch after the representative members from REBASE). All the members of the R.Hpy188I branch contain the phosphate-binding Arg residue at the "orthodox" 84 position. All of them, except for two sequences from environmental samples, contain the R.Hpy188I and its close homolog R.HpyF17I are the only members of this protein family for which some function has been determined [26]. Like virtually all Type II restriction enzymes, their genes are closely associated with genes encoding a DNA modification methyltransferase with cognate specificity. Thus, checking whether functionally uncharacterized members of the R.Hpy188I family are also genetically linked with DNA methyltransferase homologs is a convenient way to predict whether they could constitute a restriction-modification system. Unfortunately, most of R.Hpy188I members had been identified in metagenomic sequences, which typically contain only short fragments of genomic DNA and may not necessarily include the associated MTase gene together with the REase gene. Nonetheless, we carried out analysis of DNA sequence context for R.Hpy188I homologs to identify their neighbors and attempted to predict their cellular function beyond the putative generic nuclease activity.
It turned out that 19 members of the R.Hpy188I family are flanked with DNA MTase homologs (Table 1). For the remaining 6 homologs, the existence of a flanking MTase gene could not be verified because of incomplete nucleotide sequences that did not extend beyond the REase-like gene. We identified flanking MTase genes for 9 out of 13 members of the R.Hpy188I branch and 10 out of 12 of the R.Hpy481P branch (Table 1). Conserved association of members of the R.Hpy188I family with DNA MTases suggests that all of them are or used to be a functional restriction-modification system.
All but two (7 out of 9) MTases of the R.Hpy188I branch are closely related to M.Hpy188I (BLAST E-value < 1e-9      Several systems from the HpyAORF481P branch appear to be associated with another RM system ( Table 1). The most interesting case is observed in the genome of Campylobacter upsaliensis RM3195, where another putative RM system has been inserted into the CupORF237P system comprising homologs of R.HpyAORF481P and M.HpyAORF481P. Insertion of a restriction-modification gene complex into another restriction-modification gene complex has been already suggested to have occurred in Helicobacter pylori [47].

Conclusion
Our results reveal that R.Hpy188I and its homologs are new members of the GIY-YIG superfamily, despite the fact that they exhibit two deviations from the consensus catalytic motif of the superfamily. First, R.Hpy188I exhibits Lys instead of Tyr of the "YIG" half-motif. Second, in one branch of R.Hpy188I family, a presumably catalytic Arg residue is missing at its typical position in sequence, but instead is found in a non-homologous position that nonetheless allows for spatial conservation of the guanidino group potentially involved in phosphate binding. Our discovery provides important insight into sequence diversity of GIY-YIG nucleases and suggests that other members with unusual active sites might await discovery. In this context, the theoretical model of R.Hpy188I structure developed in this work will serve as a convenient guide for experimental analyses aimed at understanding of the cleavage mechanism of GIY-YIG nucleases.
Our phylogenetic analysis shows that the R.Hpy188I family can be subdivided into two branches, one comprising close homologs of R.Hpy188I itself, and the other comprising close homologs of R.HpyAORF481P. Members of either branch are characterized by a different set of features, including localization of residues predicted to participate in the enzymatic activity and possibly in structural stability. They are also found associated with MTases from different classes. Last, but not least, sequence context analyses revealed that in the family of R.Hpy188I homologs, comprising mostly sequences detected in metagenomics data, all genes that have appropriate flanking sequences present in the database, are accompanied by a putative DNA MTase gene or its fragment, suggesting that they all are or used to be functional restriction-modification systems.

Sequence database searches, phylogenetic analyses and genomic context analyses
Searches of the non-redundant version of current sequence database (nr) and the database of environmental protein sequences (env_nr) were carried out using PSI-BLAST [48], initially separately for nr and env_nr via the websites of NCBI and MPI-Tuebingen, and finally using the local version (against the combined nr+env_nr database). The final search was carried out with the e-value threshold of 1e-3. The multiple sequence alignment of R.Hpy188I and proteins identified in nr+env_nr database was calculated using MAFFT [49] with default parameters and refined by hand to ensure that no unwarranted gaps had been introduced within α-helices and β-strands.
Finally, based on the alignment, the phylogenetic tree was calculated using MEGA 4.0 [50], employing the Minimum Evolution method with the JTT model of substitutions. The stability of individual nodes was calculated using the bootstrap test (1000 replicates) and confirmed by the interior branch test. Genomic context analyses were carried out using hmmpfam from the HMMer package [51] against the PFAM [52] and TIGRFAMs [53] databases with e-value threshold of 1e-3.

Protein fold prediction
Preliminary structure predictions were carried out via the HHPRED server [54]. As soon as we identified (by eye) sub-optimal alignments of R.Hpy188I sequence to GIY-YIG nucleases of known structure, we resubmitted it to the GeneSilico metaserver gateway [55] for secondary structure prediction and fold recognition. Structural predictions were carried out both for the R.Hpy188I sequence alone (without success), for the alignments of R.Hpy188I with sequences from the env_nr database (with somewhat better, but still statistically insignificant results), and finally for the alignment of the sequences found by searching the combined nr and env_nr database. It is important to indicate that different FR servers use completely different scoring systems, with different scales (e.g. Z-scores, e-values, percent values etc.). Moreover, the meaning of scores changes over time and may not be the same as reported in original publications describing the methods, as servers are modified and databases grow continuously. The comparable reliability thresholds for a number of servers are estimated e.g. by the Livebench benchmark [56,57] [65]. The scores were taken from the last Livebench run (Livebench-2008.2), with the sole exception of PCONS, which has not been included in Livebench-2008.2 and its score was taken from the CASP7 evaluation [66].

Protein structure modeling
Homology modeling of the catalytic core was carried out using the "FRankenstein's monster" approach (see [32,33] for a detailed description). Briefly, preliminary models were built with MODELLER [67] based on alternative sequence alignments between R.Hpy188I and template structures obtained from various fold-recognition servers with significant scores (all templates used for modeling were members of the GIY-YIG superfamily). The preliminary models were scored by MetaMQAP [68] and a "hybrid" model was generated by merging fragments with consensus alignment with those non-consensus fragments that exhibited best MetaMQAP scores. Additional evaluations of protein structure quality were carried out with PROQ [69].
The "hybrid" model obtained was used as a starting point for folding simulations of the complete sequence using ROSETTA [70]. The homology-modeled core of R.Hpy188I (residues 60-88, and 104-112) was completely "frozen" and the search of conformational space for the variable regions (residues 122-170) was restricted by the choice of fragments from known crystal structures that were compatible with the sequence and predicted secondary structure of R.Hpy188I. However, the models obtained by this protocol exhibited relatively poor packing and unsatisfactory MetaMQAP and PROQ scores (data not shown). Therefore, the final simulation of R.Hpy188I folding was conducted by the REFINER method, which uses a reduced representation of the protein chain and a statistical potential of mean force to describe intramolecular interactions [38]. REFINER is a real-space version of a lattice-based algorithm CABS [71] we have earlier successfully combined with the "FRankenstein's Monster" method in CASP6 [72] or for modeling of R.Eco29kI enzyme, another member of the GIY-YIG superfamily [23]. The folding was carried out with restraints on predicted secondary structure. Models generated during the simulation had their full-atom representation rebuilt and were scored using PROQ and MetaMQAP. The best-scoring structure (in terms of predicted root mean square deviation with respect to the unknown true structure) was selected as the final model. Mapping of sequence conservation onto the model was done for the multiple sequence alignment of the Hpy188I family, initially with COLORADO3D [73], and ultimately with the ConSurf server [74].