- Open Access
LRRML: a conformational database and an XML description of leucine-rich repeats (LRRs)
- Tiandi Wei†1,
- Jing Gong†1Email author,
- Ferdinand Jamitzky1, 2,
- Wolfgang M Heckl1, 3,
- Robert W Stark1 and
- Shaila C Rössle1
© Wei et al; licensee BioMed Central Ltd. 2008
- Received: 05 June 2008
- Accepted: 05 November 2008
- Published: 05 November 2008
Leucine-rich repeats (LRRs) are present in more than 6000 proteins. They are found in organisms ranging from viruses to eukaryotes and play an important role in protein-ligand interactions. To date, more than one hundred crystal structures of LRR containing proteins have been determined. This knowledge has increased our ability to use the crystal structures as templates to model LRR proteins with unknown structures. Since the individual three-dimensional LRR structures are not directly available from the established databases and since there are only a few detailed annotations for them, a conformational LRR database useful for homology modeling of LRR proteins is desirable.
We developed LRRML, a conformational database and an extensible markup language (XML) description of LRRs. The release 0.2 contains 1261 individual LRR structures, which were identified from 112 PDB structures and annotated manually. An XML structure was defined to exchange and store the LRRs. LRRML provides a source for homology modeling and structural analysis of LRR proteins. In order to demonstrate the capabilities of the database we modeled the mouse Toll-like receptor 3 (TLR3) by multiple templates homology modeling and compared the result with the crystal structure.
LRRML is an information source for investigators involved in both theoretical and applied research on LRR proteins. It is available at http://zeus.krist.geo.uni-muenchen.de/~lrrml.
- Protein Data Bank
- Homology Modeling
- Document Type Definition
- Protein Data Bank Entry
- Protein Data Bank Structure
Leucine-rich repeats (LRRs) are arrays of 20 to 30 amino acid long protein segments that are unusually rich in the hydrophobic amino acid leucine. They are present in more than 6000 proteins in different organisms ranging from viruses to eukaryotes . The structure of the LRRs and their arrangement in repetitive stretches of variable length generate a versatile and highly evolvable framework for the binding of manifold proteins and non-protein ligands . The crystal structure of the ribonuclease inhibitor (RI) yielded the first insight into the three-dimensional molecular basis of LRRs . It has a horseshoe shaped solenoid structure with parallel β-sheet lining the inner circumference and α-helices flanking its outer circumference. To date, there are over one hundred crystal structures available. All known LRR domains adopt an arc or horseshoe shape .
The LRR sequences can be divided into a highly conserved segment (HCS) and a variable segment (VS). The highly conserved segment consists of an 11 or 12 residue stretch with the consensus sequence LxxLxLxxN(Cx)xL. Here, the letter L stands for Leu, Ile, Val or Phe forming the hydrophobic core, N stands for Asn, Thr, Ser or Cys, and x is any amino acid. The variable segment is quite diverse in length and consensus sequence, accordingly eight classes of LRRs have been proposed [4, 5]: 'RI-like (RI)', 'Cysteine-containing (CC)', 'Bacterial (S)', 'SDS22-like (SDS22)', 'Plant-specific (PS)', 'Typical (T)', 'Treponema pallidum (Tp)' and 'CD42b-like (CD42b)'.
The discrepancy between the numbers of structure-known LRR proteins and the structure-unknown ones triggered studies focusing on the homology modeling of LRR proteins [6–8]. Homology modeling is a computational method, which is widely used to identify structural features defining molecular interactions [8–10]. The modeling results are an important input for the design of biochemical experiments. The first step of homology modeling is the selection of a structure-known protein, which serves as a template for the unknown target structure. In practice, however, it is difficult to find a complete template which has a high enough sequence identity to the target repetitive protein (single template modeling), due to different repeat numbers and varying arrangements. This limitation can be overcome by combining multiple templates. First, the most similar structure-known LRRs are found for each LRR in the target sequence as a local template. Second, all local templates are combined to generate the multiple sequence alignments for the entire target sequence. Thus, it is possible to construct a start model for further investigation, even if no adequate single template is available. Such an approach, however, requires a comprehensive database of LRRs to extract adequate template candidates. So far, the individual three-dimensional LRR structures are not directly available from the established databases and there are only a few detailed annotations for them. Additional information such as sequence insertions and types is missing. In order to consolidate this information and to provide a source for homology modeling and structural analysis of LRR proteins, we developed LRRML, a database and an extensible markup language (XML) description of LRR structures.
A LRR begins at the beginning of the highly conserved segment (HCS) and ends at the end of the variable segment (VS) (just before the HCS of the next LRR).
The HCS of a LRR must pose a typical conformation, i.e. a short β-sheet begins at about position 3 and a hydrophobic core is formed by the four L residues at position 1, 4, 6, and 11.
Typical type (T)
Bacterial type (S)
Ribonuclease inhibitor-like type (RI)
SDS22-like type (SDS22)
Cysteine-containing type (CC)
Plant-specific type (PS)
Treponema pallidum type (Tp)
CD42b type (CD42b)
During the LRR identification and classification all sequence insertions longer than 3 residues were annotated. About one tenth of entries have insertions longer than 3 residues while few entries have deletions, which suggests that the evolution of LRRs may prefer insertion to deletion.
Numbers of LRR and PDB entries (release 0.2) in the nine LRR classes.
Coverage of LRR proteins with PDB structures of different databases.
Numbers of LRR proteins with PDB structures
Numbers of identified LRRs
Comparison of LRR numbers of different LRR proteins by different databases.
Each database entry is an individual three-dimensional LRR structure, which was identified with high accuracy.
Extensive annotations, such as systematic classification, secondary structures, HCS/VS partitions and sequence insertion, are provided.
LRRs were extracted from all structure-known LRR protein structures from PDB.
The sequence information (XML tag <l:Sequence>): amino acid sequence and sequence length.
The classification information (XML tag <l:Type>): class name and consensus sequences.
The sequence partitions (XML tag <l:Regions>): amino acid sequence, position, length and insertion of HCS and VS.
The corresponding PDB sources (XML tag <l:Sources>): ID, chain, LRR number and classification of the source PDB entries; serial number, position, DSSP  secondary structure and three-dimensional coordinates of the current LRR in these source PDB entries.
The entire database can be browsed by LRR IDs or by PDB IDs. When browsing, the entries appear in a summary table containing at first ID, type and sequence. Clicking on an ID opens an XML Stylesheet (XSLT)  converted HTML web page that presents the entry in detail. The original XML file and the coordinates file in PDB format can also be downloaded. The XSLT file used is provided as Additional file 2. Aside from the textual view, a LRR structure can be visualized by the online molecular viewer Jmol . After loading, users can change the view settings flexibly by themselves. LRRML is provided with various search functions, including PDB ID search which returns all LRRs contained in this PDB structure, class search which returns all LRRs of this class, or length search which returns all LRRs with this sequence length. To simplify the homology modeling, the similarity search was implemented. It returns the structures of the most similar LRRs for a structure-unknown LRR. The target LRR sequence can be searched against the entire database, a certain LRR class or LRRs with a certain length. At first, a global pair wise sequence alignment with sequence identity will be generated for the target LRR and each of the LRRs in the user selected set. Then, the most similar LRRs will be returned as template candidates, ranked by sequence identity.
The DBMS provides a REST-style application programming interface (API) through HTTP, which supports GET and POST requests. A unique resource identifier (URI) 'http://zeus.krist.geo.uni-muenchen.de:8081/exist/rest/...' is treated by the server as path to a database collection. Also, request parameters can help select any required elements. For example, '_query' executes a specified XPath/XQuery; the URL "http://zeus.krist.geo.uni-muenchen.de:8081/exist/rest/db/lrrml?_query=//LRR [.//TAbbr='S']" returns all the S type LRRs.
Application in homology modeling
LRRML was designed as a tool for template selection in homology modeling of LRR proteins. Traditionally, the template used in homology modeling is one or more full length protein structures obtained via similarity search. Nevertheless, due to the different repeat numbers and arrangements of LRRs, the sequence identity between the target and the full length template is usually not high enough for homology modeling. With LRRML the most similar structure-known LRR can be found for each LRR in the target sequence as a local template. The combination of all local templates through multiple alignments helps to achieve a high sequence identity to the target.
Sequence identities (%) of target-template LRR pairs.
A specialised conformational leucine-rich repeats database called LRRML has been developed. It is supported by an XML database management system and can be searched and browsed with either an easy-to-use web interface or REST like interface. The interface is suitable for most graphical web browsers and has been tested on the Windows, Mac and Linux operating systems. LRRML contains individual three-dimensional LRR structures with manual structural annotations. It presents useful sources for homology modeling and structural analysis of LRR proteins. Since the amount of structure-determined LRR proteins constantly increases, we plan to update LRRML every 2 to 3 months.
This database is freely available at http://zeus.krist.geo.uni-muenchen.de/~lrrml.
This work was supported by Graduiertenkolleg 1202 of the Deutsche Forschungsgemeinschaft.
- Matsushima N, Tanaka T, Enkhbayar P, Mikami T, Taga M, Yamada K, Kuroki Y: Comparative sequence analysis of leucine-rich repeats (LRRs) within vertebrate toll-like receptors. BMC Genomics 2007, 8: 124–143.View ArticleGoogle Scholar
- Dolan J, Walshe K, Alsbury S, Hokamp K, O'Keeffe S, Okafuji T, Miller SFC, Guy Tear G, Mitchell KJ: The extracellular Leucine-Rich Repeat superfamily; a comparative survey and analysis of evolutionary relationships and expression patterns. BMC Genomics 2007, 8: 320–343.View ArticleGoogle Scholar
- Kobe B, Deisenhofer J: Crystal structure of porcine ribonuclease inhibitor, a protein with leucine-rich repeats. Nature 1993, 366: 751–756.View ArticleGoogle Scholar
- Kobe B, Kajava AV: The leucine-rich repeat as a protein recognition motif. Curr Opin Struct Biol 2001, 11: 725–732.View ArticleGoogle Scholar
- Bell JK, Mullen GE, Leifer CA, Mazzoni A, Davies DR, Segal DM: Leucine-rich repeats and pathogen recognition in Toll-like receptors. Trends Immunol 2003, 24: 528–533.View ArticleGoogle Scholar
- Kajava AV: Structural Diversity of Leucine-rich Repeat Proteins. J Mol Biol 1998, 277: 519–527.View ArticleGoogle Scholar
- Stumpp MT, Forrer P, Binz HK, Plckthun A: Designing Repeat Proteins: Modular Leucine-richRepeat Protein Libraries Based on the Mammalian Ribonuclease Inhibitor Family. J Mol Biol 2003, 332: 471–487.View ArticleGoogle Scholar
- Kubarenko A, Frank M, Weber AN: Structure-function relationships of Toll-like receptor domains through homology modelling and molecular dynamics. Biochem Soc Trans 2007, 35: 1515–1518.View ArticleGoogle Scholar
- Rössle SC, Bisch PM, Lone YC, Abastado JP, Kourilsky P, Bellio M: Mutational analysis and molecular modeling of the binding of Staphylococcus aureus enterotoxin C2 to a murine T cell receptor Vbeta10 chain. Eur J Immunol 2002, 32: 2172–2178.View ArticleGoogle Scholar
- Hazai E, Bikádi Z: Homology modeling of breast cancer resistance protein (ABCG2). J Struct Biol 2008, 162: 63–74.View ArticleGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28: 235–242.View ArticleGoogle Scholar
- Matsushima N, Kamiya M, Suzuki N, Tanaka T: Super-Motifs of Leucine-Rich Repeats (LRRs) Proteins. Genome Inform 2000, 11: 343–345.Google Scholar
- Finn RD, Tate J, Mistry J, Coggill PC, Sammut JS, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A: The Pfam protein families database. Nucleic Acids Res 2008, 36: D281–288.View ArticleGoogle Scholar
- Mulder NJ, Apweiler R: InterPro and InterProScan: tools for protein sequence classification and comparison. Methods Mol Biol 2007, 396: 59–70.View ArticleGoogle Scholar
- Schultz J, Copley RR, Doerks T, Ponting CP, Bork P: SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res 2000, 28: 231–234.View ArticleGoogle Scholar
- Wu CH, Apweiler R, Bairoch A, Natale DA, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Mazumder R, O'Donovan C, Redaschi N, Suzek B: The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res 2006, 34: D187–191.View ArticleGoogle Scholar
- Heida N, Hasegawa Y, Mochizuki Y, Hirosawa K, Konagaya A, Toyoda T: TraitMap: an XML-based genetic-map database combining multigenic loci and biomolecular networks. Bioinformatics 2004, 20 Suppl 1: i152-i160.View ArticleGoogle Scholar
- Kunz H, Derz C, Tolxdorff T, Bernarding J: XML knowledge database of MRI-derived eye models. Comput Methods Programs Biomed 2004, 73: 203–208.View ArticleGoogle Scholar
- Jiang K, Nash C: Application of XML database technology to biological pathway datasets. Conf Proc IEEE Eng Med Biol Soc 2006, 1: 4217–4220.View ArticleGoogle Scholar
- eXist-db: an open source database management system[http://exist-db.org]
- The World Wide Web Consortium[http://www.w3.org]
- Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22: 2577–2637.View ArticleGoogle Scholar
- Jmol: an open-source Java viewer for chemical structures in 3D[http://www.jmol.org]
- Fiser A, Do RK, Sali A: Modeling of loops in protein structures. Protein Sci 2000, 9: 1753–1773.View ArticleGoogle Scholar
- Laskowski RA, MacArthur MW, Moss DS, Thornton JM: PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Cryst 1993, 26: 283–291.View ArticleGoogle Scholar
- Liu L, Botos I, Wang Y, Leonard JN, Shiloach J, Segal DM, Davies DR: Structral basis of Toll-like receptor 3 signaling with double-stranded RNA. Science 2008, 320: 379–381.View ArticleGoogle Scholar
- Maiti R, Van Domselaar GH, Zhang H, Wishart DS: SuperPose: a simple server for sophisticated structural superposition. Nucleic Acids Res 2004, 32: W590–594.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.