Bmc Structural Biology Distantly Related Lipocalins Share Two Conserved Clusters of Hydrophobic Residues: Use in Homology Modeling

Background: Lipocalins are widely distributed in nature and are found in bacteria, plants, arthropoda and vertebra. In hematophagous arthropods, they are implicated in the successful accomplishment of the blood meal, interfering with platelet aggregation, blood coagulation and inflammation and in the transmission of disease parasites such as Trypanosoma cruzi and Borrelia burgdorferi.


Background
Lipocalins are small secreted proteins (160-200 residues), typically structured in a 8 strands up and down β-barrel. A 3 10 helix closes one extremity of the barrel (H1) and a second is found parallel to its surface (H2). The interior of the cavity can hold a small, typically hydrophobic, molecule. Each lipocalin is usually well adapted to the recognition of its ligand. Lipocalins can also bind to receptors and be part of macromolecular complexes. They are involved in numerous functions such as in the transport of molecules implicated in homeostasis (e.g. retinoids, arachidonic acid), enzymatic synthesis, immunomodulation, olfaction, pheromone signaling and cell regulation [1]. The sequence identity is low among this family despite a well conserved tertiary structure. For lipocalins with differing biochemical functions pairwise identity can fall below 10% [2]. However, there is a core set of lipocalins, called 'kernel', that are quite closely related proteins. They share three structurally conserved regions (SCRs). The more divergent lipocalins, called outliers, match no more than two of the SCRs [3].
Recognition of the SCRs permits assignment to the lipocalin family membership. However, for members not sharing the SCRs motifs, structural determination is the only safe way to decide their relationship to the family. Another strategy to decide their assignment is through the analysis of their exon-intron structure [4,5]. For instance, RaHBP2 was assigned to the lipocalin family only by its structural properties [6]. RaHBP2 is a histamine-binding lipocalin from the hard tick Rhipicephalus appendiculatus with two binding pockets. The pocket at the bottom of the barrel is the low affinity binding site (L) and contains two negatively charged residues. The one near the mouth is the high affinity binding site (H) and contains four negatively charged residues. Its similarity with other members of the lipocalin family is very low. Furthermore it has an α-helix instead of a 3 10 helical structure closing the barrel.
Lipocalins are widespread across species and are found in various organisms such as bacteria, plants, arthropoda and vertebra [1]. Up to now, they have not been evidenced in the Archaea domain, but this might be due to the fact that it is difficult to identify lipocalins not sharing the SCRs. Otherwise, an increasing number of sequences with an identity around 15% with lipocalins and missing lipocalin recognition motifs are found in protein databanks. In blood sucking arthropods, many lipocalinrelated sequences, expressed in the salivary glands, have been identified [7]. Several have been characterized, notably RaHBP2, and were found to be implicated in the completion of the blood meal, interfering with platelet aggregation, blood coagulation, activation of the complement system and inflammation. They are also implicated in the transmission of disease parasites such as Trypano-soma cruzi and Borrelia burgdorferi and tick toxicoses [8][9][10]. However, most of the expressed sequences, among them LIR 2 from tick Ixodes ricinus, have unknown functions and have a pairwise sequence identity with experimentally identified lipocalins within or below the twilight zone [9,11,12].
The question is how to confirm that they belong to the lipocalin family and determine their function without solving their structures, which could be a long and difficult process. Homology modeling has up to now been the only method available to predict the 3D structure of proteins of this size, with an accuracy comparable to a lowresolution experimental structure [13]. Prediction of a 3D model by homology modeling requires 30% identity. It has been determined that above a cut-off of 30% sequence identity, 90% of the pairs are homologous and have an equivalent structure; below 25%, less than 10% are [12]. This implies that under this limit, the correct assignment of an homologous template becomes less reliable, as well as the alignment between the target and template sequences. Sequence identity between lipocalins is far under this limit. It should be noted that this is not an exception since Rost has determined that most of the similar protein structure pairs in the PDB appear to have less than 12% pairwise sequence identity [14]. Therefore, before considering the construction of a 3D model with a low level of identity between the template and target, the validity of the template must be confirmed and the alignment optimized. This can be done by comparing predictions of secondary structures and accessibility to the solvent, patterns of hydrophobic and peculiar residues.
Proteins can share a similar 3D-structure with low sequence similarity only if the fold is not determined by all details of the sequence but by key features [15,16]. When comparing the structures of proteins with low similarity, it is usual that a set of clustered residues remains conserved; the latter form the structural core. Clarke et al. have identified a structural core for the immunoglobulinlike beta-sandwich proteins [16], Ptitsyn, a structural core for c-type cytochromes [17] and Socolich et al., a cluster of evolutionarily linked residues for the WW fold [18]. The structural core of lipocalins has not yet been analyzed taking into account the more distant outlier lipocalins [19,20].
The aim of this study is to provide arguments for the assignment of outlier lipocalins to this family and to help their alignment with a template for homology modeling. To achieve this goal, the conserved properties for lipocalins were identified and their structural core was analyzed using a set of ten structurally aligned lipocalins. These proteins have a maximum sequence identity of 28% and diverse functions. Due to the diversity of functions, it is likely that the characteristics identified as conserved could be important for the fold and not for the function. To identify the residues implicated in the structural core, the interactions conserved for these lipocalins were determined.
The results were thereafter used to confirm the assignment of LIR2, a tick protein, to the lipocalin family and to align it with RaHBP2. The latter is an atypical lipocalin since it does not share the SCRs and has an α-helix closing the bottom of the barrel. Nevertheless, it shares the main structural properties identified here for the lipocalins. The alignment was used to build a 3D model for LIR2. Despite the low sequence similarity between LIR2 and its template, the model enables hypotheses about its binding to histamine to be made, and experimentally validated.

Study of the lipocalin family
The lipocalins from our database (see methods) were structurally aligned with VAST. A gap in H1 for 1PEE was suppressed and another was introduced before the second helix of 1QFT to improve the structural correspondence. The structural alignment is presented in Figure 1 and the identity between the sequences in the table provided as supplementary material (Additional file 1). The identity varies from 5% to 28% with an average value around 15%, below the nominal threshold for a reliable sequence alignment [21]. Detection of residues in interaction in the structures was done using the PEX software [22]. Interactions are classified depending on the nature of the amino acids implicated, that is to say hydrophobic (Ala, Cys, Phe, Gly, His, Ile, Leu, Met, Val, Trp, Tyr, Pro), hydrophilic (Asp, Glu, Arg, Lys, His, Asn, Gln, Ser, Thr, Tyr), charged (Asp, Glu, Arg, Lys, His) and aromatic (Phe, His, Tyr, Trp). A maximum of 15 interactions were taken into account for each residue, because no residue has more than 15 interactions. An interaction is considered to be fully conserved when it is conserved for each structure and when the interacting residues are of the same nature (i.e. : hydrophobic, hydrophilic, charged or aromatic) throughout the alignment. If the interaction is not of the same type for one protein, it is considered to be not strictly conserved. For some structures of lipocalins (1XKI and 1A3Y) the H1 is not well resolved. The conservation of the barrel and H2 is thus considered separately from that of H1. Furthermore, as RaHBP2 (1QFT) is a peculiar lipocalin having an α-helix instead of a 3 10 structure closing the barrel, and does not present the structurally conserved regions (SCRs) of typical lipocalins, it has been considered separately in the conservation analysis.
Prior to the interaction study, the conservation of the nature of residues (i.e.: hydrophobic, hydrophilic, aromatic, charged, see above for definition) was analyzed in the alignment. The size of the residue was also taken into account : Glu, Phe, His, Lys, Ile, Leu, Met, Gln, Arg, Trp, Tyr are considered as bulky and the others as small residues.

Barrel and helix 2 Conservation in the alignment
The positions in the alignment for which a property is conserved for all 10 sequences are indicated in Figure 1. Among the 25 conserved positions, 6 are of a conserved hydrophilic nature, 16 hydrophobic, one is aromatic (Trp from SCR1) and one is charged (the negative residue in SCR3). Seven positions are conserved in size. Four are conserved as bulky residues and three as small, among which the conserved Gly of SCR1. Table summarizes the conserved positions in all the lipocalins. Less than a half of the conserved positions have their side chains external to the barrel (49,52,74,110,162,165,166,169,205,214,220). All positions conserved as hydrophobic, except residue 220, have a accessible surface area inferior to 30%. For residue 220, the accessibility is not conserved in 1XKI.
Owing to the absence of the C-ter region and consequently the disulfide bond, residue 220 is more exposed to the solvent for that particular structure. The conserved hydrophilic positions do not show conservation for their accessibility to the solvent.

Conserved interactions
The interactions conserved throughout the 10 structures (10/10) have been studied and are represented in Figure 2 and summarized in Table 1. All conserved interactions involve hydrophobic residues located on the interior of the barrel, except for that between 48 and 192 implicating a Trp and a basic residue (but interacting through their hydrophobic regions) and the 169-205 interaction located at the interface between H2 and the barrel ( Figure  3A and 3B). The conserved interactions involve all strands of the barrel as well as the two helices.
Interactions conserved for 9 proteins out of ten (9/10) are represented in Figure 2 and 3 and summarized in Table 1. Two clusters of hydrophobic interactions clearly appear. The internal cluster implicates residues on β-strands at the bottom of the barrel ( Figure 3A). These residues can be seen as forming a hydrophobic belt. The internal cluster also includes H1. The external cluster involves residues on H2, strands β A, F, G and H and the C-terminal loop (Figure 2 and 3B). Globally, it appears that the net of interactions between strands β F, G and H is more developed Structural alignment obtained with VAST    Conserved positions based on the VAST alignment and related structural information: the position of the residue relative to the barrel; inside (in) or outside (out) and the corresponding secondary structure (2 nd ). The conserved properties (10/10 conservation) are also presented, i.e. hydrophobic (pho), hydrophilic (phi), charged (NEG for negative charge), aromatic (arom), bulky/small, as well as the conserved interaction partner (for 11/11, 10/10 and 9/10 lipocalins). (x) indicates the properties and interactions shared by 1QFT.
1AQB. Some interactions that are not conserved are nevertheless compensated. For example, the hydrophobic interaction 80-93 is not conserved for 1PEE because of the hydrophilic nature of position 93 (Glu). However, this residue interacts with the hydroxyl groups of Tyr 39, 131 and 154, stabilizing the protein [23]. It is worth noting that the interaction 91-115 is not conserved for Nitrophorin 2 (1PEE), but is well conserved in its close homologue, Nitrophorin 4 [PDB: 1D2U] [23]. The interaction 170-190 is not conserved for 1AQB because of the hydrophilic nature of residue 170 (Gln). However, the lack of interaction 170-190 seems to be compensated for the external disulfide bond between Cys 173 and 182, linking β-G and β-H. The hydrophobic interactions 189-205 and 52-189 are not conserved owing to the hydrophilic nature of residue 189 in 1EXS. Nevertheless, Phe 205 interacts with both 52 and 189 and it appears to bridge these residues, the aromatic ring being in interaction with the Cβ of Ser (189). In addition, the interaction 52-220 is absent in 1NGL but compensated by another interacting pair, 219-54.

Interactions for 1QFT
1QFT is an outlier lipocalin with low similarity with the other lipocalins. It lies apart from the family since it does not share the conserved regions (SCRs) of the family, it binds hydrophilic ligands and H1 is in α-conformation. Conserved interactions identified at 90% (9/10 study) for lipocalins and shared by 1QFT (9 lipocalins + 1QFT) are shown in Table 1 and Figure 2C. Several interactions conserved for the 10/10 lipocalins study are not conserved in 1QFT. In the case of interaction 48-80, the lack of conservation is due to the distance. For interaction 91-131, this is due to the orientation of residue 131 towards the ligand. For interactions involving residues 156 and 168, their hydrophilic nature is responsible for the non-conservation; they both interact with histamine. Interactions

Helix 1
In the 1XKI and 1A3Y structures, residues close to or belonging to the N-ter part of the 3 10 helix (H1) are absent or not well resolved. For that reason, the conservation was studied in the N-terminal region for 7 (7/8 study) and 8 (8/8 study) structures out of 10. Interactions study of helix 1 for lipocalins Conserved interactions in H1 for 8/8 and 7/8 structures are represented in Figure 2B and summarized in Table 2.

Conservation of the positions in the alignment
The interactions are all hydrophobic except for that between residues 35 and 38 (hydrophilic). Three interactions are 100% conserved and three others are further conserved for 7/8 structures. Figure 2B illustrates the importance of the central residue (39) in the stabilization of the bottom of the barrel. It interacts with five out of eight strands (β-A, B, C, F and G) for 7/8 structures. The distance between the residues of interaction 39-168 (for 1NGL) and 39-48 (for 1EXS) explains why they do not appear in the 8/8 conservation study. Interaction 39-91 is missing in the 8/8 study because in 1PEE the side chain of residue 40 (Phe) is inserted between the two residues.

Interactions of H1 for 1QFT
Due to the orientation and the α-conformation of H1 in 1QFT, residue 39 is not central as it is for the other lipocalins and thus is not involved in conserved interactions. Hence, the way by which α-helix 1 interacts with the barrel has been studied separately, as shown in Figure 2D. with 166 for all structures except for 1XKI. This interaction is not seen in the 9/10 conservation study, due to the restraint that residues must be separated by at least two residues to be considered in interaction. For 1QFT, it appears that residue 162 cannot make a H-bond with 166 owing to the presence of a disulfide bond linking βG to H2. Position 165 makes a hydrophilic, but not well conserved interaction (4/10 structures) with 38. The corresponding interaction for 1QFT is 37-165. This interaction might play an important role in the folding despite its low conservation, because both positions are well conserved. Position 214 interacts with residue 54 (5/10; pho), 187 (5/10; pho) and 209 (6/10; pho), reinforcing the external cluster.

Homology modeling for LIR2
LIR2 is a protein from Ixodes ricinus. PSI-BLAST was used to scan the PDB to find a homologous protein [24]. The only structure found after 4 iterations with an E value inferior to the threshold was that of RaHBP2 (1QFT). LIR2 has an identity around 15% with 1QFT and no lipocalin recognition motifs. The ClustalW alignment between LIR2 and 1QFT is shown in Figure 4. As for the PSI-BLAST alignment, some aberrations are noticed. The secondary structure of LIR 2 (predicted with the PROF method [25]) does not correspond to that of 1QFT in the N-ter region. Furthermore, the region of LIR2 corresponding to H1 (in 1QFT) contains three prolines, that do not favor the helical conformation. In the region corresponding to βA, position 48 (referring to the lipocalin alignment of Figure  1) does not correspond to an aromatic amino acid in LIR2. This residue is aromatic for all lipocalins including 1QFT; several mutational studies have notably demonstrated the importance of that residue in the lipocalin structure stability [26][27][28]. Position 49 is not bulky and hydrophobic as in the other lipocalins. The region corresponding to βB is not predicted as β. Position 80 corre- Positions in H1 that are conserved and related structural information i.e. the position of the residue relative to the barrel; inside (in) or outside (out). The conserved properties (10/10 and 8/10 conservation) are also presented (see table 1  sponds to an Arg, while being a hydrophobic residue in lipocalins. The cysteine from βB, implicated in a disulfide bridge between the C-ter part and βB in 1QFT, is also not conserved.

Modification of the alignment
To alleviate those misalignments, the alignment has been manually modified taking into account the prediction of secondary structures, the conserved interactions and positions for lipocalins and the cysteines implicated in disulfide bridges for 1QFT.

H1
To align H1 the conserved interaction between residues 34 and 37 is used (see the window on Figure 4). The corresponding residues for lipocalins and 1QFT are both hydrophilic. Positions 35 and 39 are also used: for 1QFT, residue 35 (Ala) is small allowing to decrease the steric constraints between H1 and, βD and βE. A bulkier residue at that position would interfere with the interaction with βD-βE. Residue 39 makes a hydrophobic interaction with 91. In the modified alignment, residues 34, 35, 37 and 39 now correspond respectively to Asn, Ala, Arg and Val.
For the N-ter region, the secondary structure and information about disulfide bonds did not aid the alignment since, it is predicted as β-strand by PROF for LIR2 but with a low reliability and is predicted as an α-helix by NPSA (data not shown).
βA Residues 48 and 49 are implicated in conserved interactions: position 48 is a conserved Trp interacting with 80, 190 and 192 and 49 is a bulky hydrophobic residue interacting with 194. They both were used to align βA together with secondary structure predictions (see the window on Figure 4). The latter helps to obtain a global alignment but is not sufficient to avoid ambiguity, since it can be aligned in different ways. Adding restraints for residues 48  βB To align βB, positions 75 and 80 were used together with secondary structure predictions. Residue 75 is a conserved Cys for arthropod lipocalins and residue 80 makes a conserved hydrophobic interaction with 93. Secondary structure predictions permit a global alignment to be obtained and residues 75 and 80 eliminate ambiguity.

βC and βD
Strands βC and βD were not realigned as the secondary structure predictions are in good correspondence with that of 1QFT and as positions 91, 93 and 115 are conserved hydrophobic residues. The corresponding residues for LIR2 are Phe, Tyr and Leu respectively. Residues 91 and 115 make conserved interactions with 80, 115 and 131. They all correspond to hydrophobic residues in the modified alignment.

βE
In strand βE, the secondary structure predictions are in good correspondence with that of 1QFT. Furthermore, residues 131 and 133, both implicated in conserved hydrophobic interactions, are conserved residues in the ClustalW alignment. Nevertheless, an uncertainty remains about the alignment of that region. Indeed, if a gap is suppressed in N-ter to βE, the secondary structures are still in good correspondence and residues 131 and 133 are still hydrophobic residues.
To determine the correct alignment, position 129 is used. In 1QFT, Asn 129 makes two H-bonds with the NH and CO groups of the backbone of the twisted βD-βE loop. This loop is of equivalent length in 1QFT and LIR2, and is longer than for other lipocalins. In the case where a gap is suppressed in N-ter, residue 129 corresponds to Asn for LIR2. For this reason, the latter was chosen. In that case, residues 131 and 133 correspond respectively to Met and Phe.

βF, βG and βH
Strands βF, βG and βH were not realigned as the secondary structures are in good correspondence and as residues 158, 167, 169, 170, 189 and 190 (respectively Val, Cys, Ile, Leu, Leu and Trp in LIR2) are potentially able to make the conserved interactions.
H2 and βI H2 and βI were not realigned, as positions 205 and 227 are both Cys, as in 1QFT. In the latter, two disulfide bridges are present. One joins the C-ter part to βB and is conserved for arthropod lipocalins (75-227; 1QFT, 1PEE, 1I4U). The other bridges H2 to βG (167-205; also present for 1I4U). The corresponding Cys of LIR2 are conserved; furthermore, LIR2 possesses two supplementary cysteines that could form a disulfide bridge between H2 and βH.
In the modified alignment, almost all residues in the hydrophobic internal cluster of LIR2 are conserved, only residues 156 (Asn) and 168 (Thr) are hydrophilic, as for 1QFT. In the external cluster, positions 52 and 171 are Arg. Despite their hydrophilic nature, they are able to make hydrophobic interaction through their hydrophobic tail [29]. The sequence corresponding to loop βF-βG in LIR2 is similar to the SCR2 motif of the lipocalins. Residues Thr-Asp-Tyr in 1AQB are equivalent to Ser-Asn-Tyr in LIR 2 [3].
The fairly good correspondence between the secondary structures of 1QFT and LIR2, combined with the conservation of the residues implicated in the two conserved hydrophobic clusters and the conservation of the Cys involved in disulfide bridges in 1QFT, lend support that LIR2 belongs to the lipocalin family.
3D model A 3D model was constructed using the refined alignment and the 1QFT structure as template. Modeller was used to build the model [30]. Its stereochemical validity was checked with the Procheck algorithm [31]. Only one residue is in the disallowed phi/psi region of the Ramachandran plot. Three others, located in loops, are in generously allowed region. In the model, it is noted that the two cysteines located on βH and H2, have no correspondence in 1QFT and are facing each other (residues 187 and 213 on Figure 3D). The distance between the Cα of the two residues is 6 Å, compatible with a disulfide bridge. For that reason, a model where Cys 187 and Cys 213 were restrained to form a disulfide bridge was calculated. This model is similar to that built without restraint (data not shown). In the model, all interactions conserved for lipocalins (9/10 study) are found for LIR2, except for that between residues equivalent to positions 48 and 80. As for 1QFT, residue 48 interacts with 78. Neither interactions with residue 91 (Phe) are conserved owing to the orientation of its side chain, which points outside. Interactions involving residues 156 and 168 are not conserved in LIR2 because of their hydrophilic nature. As for 1QFT, the residue equivalent to 52 (Arg) interacts through its hydrophobic tail with 189 (Leu) and interacts with 221 (Thr).
Experimental measurement of the secondary structure FTIR measurements permitted the determination of the secondary structure of LIR2. The FTIR spectrum presents a maximum at 1632 cm -1 , characteristic of β-structure (data not shown). After deconvolution, there is 22% of α-helix, 48% of β-strand, 17% of turns and 13% of coil. This is typical of lipocalins, notably 1QFT that has 19% α-helix, 43% of β-strand, 24% of turns and 13% of coil, as determined on the RX structure.

Prediction of ligand binding
The analysis of the internal cavity of LIR2 reveals that the bottom of the barrel is more hydrophobic than for 1QFT and that the upper part contains almost all the hydrophilic residues of the cavity. As shown in Figure 4, the hydrophobic residues in the bottom of the barrel are conserved between LIR2 and 1QFT; i.e. Trp (48) When comparing the residues of the L site, that participate in the binding of the histamine in 1QFT, to the corre-sponding residues in LIR2, it appears that the negative residues Asp 42 and 168 (see Figure 4) are not conserved. In 1QFT, these have been shown to interact with histamine. The corresponding residues in LIR2 are Asn (42) and Thr (168). As no negative residue is conserved in the L site for LIR2, no binding of histamine is predicted for that site.
Concerning the H site in LIR2, the negative residues are pretty well conserved; only residue 156 (Asn) is not. However, LIR2 contains a positive residue (Lys 50) in the cavity and two others (Arg 69 and 111) that are susceptible to belong to the ligand-binding pocket; these would repulse for histamine binding. Furthermore, the aromatic residues (Trp 69 and Phe 154) that are parallel to the cycle of histamine in 1QFT are not conserved in LIR2 (respectively Arg and Thr). For RaHBP1 (a close homologue to RaHBP2), such a substitution (Phe 154 is substituted by Leu) causes a significant decrease in affinity for histamine [6]. Furthermore, in the loops surrounding the entry of the H site, the ratio of negative to positive residues is 7/1 for 1QFT and 2/3 for LIR2. In 1QFT, the presence of these negative residues in the loops were proposed to contribute to the attraction of histamine to the binding site [6].
For the H site, despite the fact that most of the negative histamine-binding residues are conserved, the absence of the aromatic residues and of one negative residue should hinder high affinity binding of histamine for LIR2.
Experimental determination of the affinity of LIR2 to histamine LIR2 and RaHBP2 were expressed in 293T free-serum cell medium. The ability of LIR2 to bind histamine was tested by incubating concentrated supernatant cells containing LIR2 with 3 H-histamine. RaHBP2 was used as positive control, and a concentrated supernatant of untransfected cells used as negative control. These binding assays show high affinity for RaHBP2, and no affinity for LIR2 (similar cpm value to supernatant of untransfected cells) confirming that LIR2 is unable to bind histamine, as predicted from the model ( Figure 5).

Discussion
The aim of the present work is to provide information to help the construction of 3D models for the weakly related proteins of the lipocalin family. The members of this family have a wide variety of functions and are hence of biological importance. The identity between lipocalins can fall below 10%. Building a 3D model by homology modeling for proteins having an identity below the 25-30% cut-off is quite risky and requires that the selection of the template and the alignment with the target be further validated. This can be done by comparing predictions of secondary structures, accessibility to the solvent and patterns of hydrophobic and peculiar residues. In this work, information about the structural core of lipocalins was extracted and used to build a 3D model for LIR2, a protein from the tick Ixodes ricinus. For that purpose, a set of lipocalin structures was analyzed and conserved properties were identified. To capture the widest diversity it was tried to find a structure for each clade identified in the phylogenic study of lipocalins (Ganfornina et al., 2000). Nine structures were collected. Nitrophorin 2 [PDB: 1PEE] and RaHBP2 [PDB: 1QFT] were included into the study. The latter was studied separately owing to its uncommon αconformation of the first helix and to its hydrophilic binding sites. The lipocalins were structurally aligned with the VAST method.

Conserved positions of the alignment
To analyze the conserved properties of the alignment, the amino acids were classified as hydrophobic, hydrophilic, aromatic, charged, bulky or small. For the 10 lipocalins, having a mean length of 170 amino acids, only 25 positions are conserved, two are kept strictly identical (Gly (42) and Trp (48) from SCR1) and one is negative (192 from SCR3). The ratio of conserved hydrophobic versus hydrophilic positions is nearly three to one. All conserved hydrophobic positions have a solvent accessible surface less than 30%. The size of the residues is less conserved than their hydrophobicity. Only 7 positions are conserved in terms of size; however this is not unusual [32].
The side chain-side chain interactions were studied for each structure. Interactions were divided into four classes depending on the nature of the residues implicated, i.e. hydrophobic, hydrophilic, charged or aromatic. At first, RaHBP2 was not considered in the analysis and the conservation of interactions for 10/10 and 9/10 structures were analyzed. In both studies, no conserved electrostatic or aromatic interactions were found and mostly all conserved interactions were hydrophobic. The pattern of hydrophobic interactions suggests the existence of two clusters, one internal to the barrel and one external. The internal cluster is composed by residues 39,48,80,91,93,115,131,133,156,158,168,170,190 and 192 ( Figure  3A), and the external by residues 52, 159, 169, 171, 189, 205 and 220 ( Figure 3B), when considering the 9/10 study.
In the internal cluster, the β strands are linked by 14 hydrophobic interactions (Figures 2A and 3A), forming what can be seen as a hydrophobic belt. A similar belt was detected in the 10 β-strand barrel of the lipid-binding protein family [33,34]. This belt is linked to the central residue (39) of H1 by 5 interactions, coming from 5 different strands ( Figure 2B). The central residue hence appears important in the structural core of lipocalins. The helix is further stabilized by an internal hydrophilic interaction. Among the conserved positions in the alignment and not appearing in the conserved interactions (9/10 and 7/10 study; Figure 1) are positions 34, 38, and 165. They are involved in less conserved interactions between the barrel and H1.
It should be noted that all conserved hydrophobic positions in the alignment are implicated in conserved interactions, except positions 49 and 214, for which interactions are conserved for a fewer number of structures. These last two positions belong to the external cluster. Among the 6 conserved hydrophilic positions in the alignment, three are implicated in SCR3 and one (110) seems to be involved in the stabilization of the interaction between strands βB and βC on the external surface of the barrel.
Our results were compared to those of Ragona and col. [20] who have identified by NMR the interacting residues in partially folded bovine β-lactoglobulin at pH 2. These residues located in the cavity of the barrel correspond to positions 39, 48, 77, 80, 91, 131, 156, 158, 168, 170 and 190 in our alignment. Residues 93, 95, 115, 133 could not be unambiguously detected by NMR, but they were assigned by the authors to the internal cluster using the Xray structure. All these residues are in good correspondence with those identified in the present study. The interacting residues in the external cluster (detected by NMR) correspond to positions 52,167,169,204,205,208. Residues 51, 77 were furthermore assigned to this cluster Binding assay of LIR2 with histamine Figure 5 Binding assay of LIR2 with histamine. Binding assay was performed with 40 µl of concentrated 293T supernatant cell culture. The negative control used was a 10 time concentrated free-serum medium of untransfected cells. The supernatants were incubated with 100 nM 3 H-histamine for 2 hours at 37°C. Protein precipitation with polyethylene glycol 8000 was used to separate bound from free histamine.
using the X-ray data. Residue 77 was not detected as making conserved interaction in our study. This is due to the presence of a beta bulge in the 77-80 region. This bulge is present in our study for β-lactoglobulin [PDB: 1EXS], the odorant binding lipocalin from nasal mucosa of pig [PDB: 1A3Y] and the mouse major urinary protein [PDB: 1DF3]. Residues 204 and 208 respectively interact with residues 167 and 169 in the NMR study. In the present work, residues 204 and 167 do not appear to interact owing to the orientation of H2 towards the barrel that differs for β-lactoglobulin. Residues 208 and 169 were not detected to be involved in a strictly conserved interaction. In effect they show interaction for 6/10 structures and two structures have an Arg at position 208 interacting through its hydrophobic region with residue 169. Likewise, residue 208 (Lys) of RaHBP2 is also interacting with 169 through its hydrophobic region.
It was suggested that β-lactoglobulin at acidic pH is in a molten globule state, similarly to the retinol binding protein [26]. Since the residues implicated in interactions in the β-lactoglobulin molten globule correspond well with those conserved for native lipocalin structures, it supports the hypothesis that residues essential in the native structure of the lipocalins are also important for the folding, as suggested by Ragona et al. [35]. Clarke et al. reached a similar conclusion for the immunoglobulin-like proteins, a highly diverse protein family with no conservation of function and little or no sequence identity [16].
Greene et al. have studied the evolutionarily conserved residues (ECR) in 32 lipocalins [19]. Many of those residues are hydrophobic and equivalent to those highlighted in this work (residues 34,39,42,48,49,52,54,129,131,158,161,162,163,167,189,191,192,200,203 and 205 in our alignment). However, no residues from βB, βC, βD are found conserved by Greene et al. Even if fewer interactions are conserved for these strands, our study and that of Ragona et al. [20] clearly suggest that some residues of these strands also play a role in the hydrophobic internal cluster, closing the belt. This discrepancy could be due to the fact that our alignment is based on the structures and not on the sequences alone. Some other residues, such as 161 and 163 belonging to the βF-G loop (SCR2) are described as ECR, but are not found as conserved in this study. This is because outlier lipocalins were included in the alignment. Since residues 133 and 190 are implicated in conserved interactions for 11/11 structures it is surprising not to see them in the ECRs. Again this could be due to the way lipocalins were aligned.

RaHBP2
In our alignment, RaHBP2 [PDB: 1QFT] was added. The latter is an outlier lipocalin with a low similarity to the other lipocalins. It lies apart from the family since it does not share the conserved regions of the family, binds a hydrophilic ligand and its H1 is in α-conformation. When considering RaHBP2 in the conservation analysis, it comes out that only four interactions are conserved for the 11 lipocalins, three in the internal cluster and one in the external. The interactions in the barrel link the two sheets (ABCD and EFGH) together. When comparing the interactions conserved for the 9/10 lipocalins to those of RaHBP2, it appears that the belt is not fully conserved owing to the hydrophilic nature of two residues (156 and 168) and to the orientation of residue 131, involved in the binding with histamine. The interaction 48-80 is not present: due to the α-conformation of H1, residue 48 moves away from residue 80. Nevertheless, there are neighboring interactions than can compensate these lacks. H1 of RaHBP2 has no central residue equivalent to residue (39) interacting with strands βA, βB, βC, βF and βG, while still interacting with those strands through different residues.
Thus, even though H1 has a different conformation and though there are two hydrophilic binding sites, the hydrophobic internal cluster of RaHBP2 is fairly well conserved. The external cluster of RaHBP2 is conserved except for interactions 52-189 and 52-220 because of the hydrophilic nature of residue 52, but a careful analysis reveals that residue 52 interacts through its hydrophobic moiety with residue 189. For RaHBP2 (as for α-crustacyanin [PDB: 1I4U]), a disulfide bond bridges βG to H2.

Modeling of LIR2
To determine the homology and build a 3D model for LIR 2, a protein with only 15% identity with RaHBP2 [PDB: 1QFT], information was combined from the analyses of the structural core of lipocalins, the positions of cysteines implicated in disulfide bridges in RaHBP2 and the secondary structure. In a first approach, alignment between LIR2 and RaHBP2, was carried out by ClustalW. The alignment showed inconsistencies in the Cys bonding pattern, the secondary structures and the conserved hydrophobic residues. It was corrected for H1 and strands βA, βB and βE. The information obtained from the comparative analysis enabled the alignment to be improved. Due to the low similarity between the two sequences, the secondary structure prediction of LIR2 and the cysteine bridge conserved among arthropod lipocalins did not provide enough information to obtain an unambiguous alignment. This holds true for PSI-BLAST alignment (not shown). The information from conserved interactions has permitted to obtain a coherent alignment. It is important to note that the analysis of the structural core is not aimed to perform better than PSI-BLAST (or ClustaW), but rather to eliminate ambiguities and assess the alignment obtained by those methods.
The corrected alignment enabled the building of a 3D model for LIR2. The model shows a potential disulfide bridge, not present in RaHBP2, supporting both the assignment of the fold and the alignment ( Figure 3D). This is further supported on one hand by the FTIR measurements that indicate a secondary structure compatible with the lipocalin fold and on the other hand by the conservation in LIR2 of most of the conserved hydrophobic interactions. Despite its homology with RaHBP2, the analysis of the model of LIR2 does not suggest binding to histamine, as confirmed experimentally. A more detailed study of the cavity should further help to understand the nature of its natural ligand.

Conclusion
The lipocalins are part of a protein super-family with a low level of pairwise similarity, making homology modeling a difficult task. In this study, it was shown that the determination of the residues implicated in the hydrophobic core of lipocalins, by analyzing the conserved interactions, enabled to assess the assignment of a lipocalin-like protein and to improve the "classical" alignment in ambiguous regions. Information obtained from that study should help modeling other lipocalin-like proteins. This study could be applied to other protein families with low pairwise similarity, such as the structurally related fatty acid binding proteins or avidins.

Lipocalin analysis
To study the lipocalin family a bank of structurally aligned lipocalins with low similarity was gathered. For this purpose, it was tried to obtain a structure for each of the clades identified in the phylogenic analysis of Ganfornina and col. [36]. A 3D structure was found in the Protein Data Bank (PDB) for nine out of 13 clades. Since nitrophorin [23], from Rhodnius prolixus and RaHBP2 from Rhipicephalus appendiculatus were not considered in the phylogenetic tree owing to their low similarity, they were added in the bank. The structural alignment was generated by the VAST algorithm [37] and includes the odorant binding lipocalin from nasal mucosa of pig

Study of the interactions
The interactions were computed from PDB files with the PEX software [22]. Previously, the PDB files corresponding to the structures were renumbered, so that spatially equivalent residues (i.e. having the same position in the structural alignment) have the same number. This procedure considers that amino acids interact when the center to center distance between their closest atoms is less than 4.5 Å. Residues which interact must be separated by at least by two residues in the sequence. The accessible surface area (ASA) was calculated using the method of Shrake and Rupley [38]. To be considered accessible (or inaccessible) to the solvent a residue has to have an ASA more (or less) to 30% of its total surface.

Modeling of LIR2
The PROF prediction of secondary structure was obtained through the PredictProtein server [25,39]). The ClustalW algorithm was used to generate the non-refined alignment between LIR2 and the sequence of 1QFT [40]. The 3D model of LIR2, comprising residues 22 to 196, was generated by Modeller [30], using a refined alignment. The model was afterwards evaluated by Procheck [31].

Experimental methods
Sequencing LIR2 mRNAs from salivary glands of 30 engorged females of Ixodes ricinus were extracted using the Micro-FastTrack 2.0 mRNA Isolation Kit (Invitrogen, Carlsbad, USA). The complete cDNA sequence of LIR2 was recovered by RACE-PCR (Gene Racer Kit, Invitrogen) performed according to manufacturer's recommended procedure.

Characterizing LIR2
The molecular weight of LIR2 is of 24.2 kDa and its isoelectric point is of 8.97, as determined by "pepstats". The signal peptide is predicted by "SignalP" to be of 20 amino acids.