- Research article
- Open Access
Structural motif screening reveals a novel, conserved carbohydrate-binding surface in the pathogenesis-related protein PR-5d
© Doxey et al; licensee BioMed Central Ltd. 2010
- Received: 25 April 2010
- Accepted: 3 August 2010
- Published: 3 August 2010
Aromatic amino acids play a critical role in protein-glycan interactions. Clusters of surface aromatic residues and their features may therefore be useful in distinguishing glycan-binding sites as well as predicting novel glycan-binding proteins. In this work, a structural bioinformatics approach was used to screen the Protein Data Bank (PDB) for coplanar aromatic motifs similar to those found in known glycan-binding proteins.
The proteins identified in the screen were significantly associated with carbohydrate-related functions according to gene ontology (GO) enrichment analysis, and predicted motifs were found frequently within novel folds and glycan-binding sites not included in the training set. In addition to numerous binding sites predicted in structural genomics proteins of unknown function, one novel prediction was a surface motif (W34/W36/W192) in the tobacco pathogenesis-related protein, PR-5d. Phylogenetic analysis revealed that the surface motif is exclusive to a subfamily of PR-5 proteins from the Solanaceae family of plants, and is absent completely in more distant homologs. To confirm PR-5d's insoluble-polysaccharide binding activity, a cellulose-pulldown assay of tobacco proteins was performed and PR-5d was identified in the cellulose-binding fraction by mass spectrometry.
Based on the combined results, we propose that the putative binding site in PR-5d may be an evolutionary adaptation of Solanaceae plants including potato, tomato, and tobacco, towards defense against cellulose-containing pathogens such as species of the deadly oomycete genus, Phytophthora. More generally, the results demonstrate that coplanar aromatic clusters on protein surfaces are a structural signature of glycan-binding proteins, and can be used to computationally predict novel glycan-binding proteins from 3 D structure.
- Gene Ontology
- Linear Discriminant Analysis
- Protein Data Bank
- Aromatic Residue
- Insoluble Cellulose
Carbohydrate-binding proteins (CBPs) are highly diverse in terms of their sequences, structures, binding sites, and evolutionary histories . Sequence-based classifications (e.g., as used in the CAZy database ) are an attempt to organize this diversity, and do so by grouping CBPs into evolutionarily related families and subfamilies. Many of these families have a common function and mechanism, while in others functions have diversified . Prediction of novel CBPs with unique binding sites and mechanisms that are unrelated to known cases is a more difficult task, as there is no single sequence profile or pattern that defines a carbohydrate-binding site. Thus, while sequence-based carbohydrate-binding site prediction methods have been shown to be moderately successful, structural information will be key to achieve higher prediction accuracies .
Structure-based algorithms are a promising approach for prediction and analysis of binding sites in proteins from first principles. Just as sequence profiles and patterns can be used to infer function in uncharacterized sequences, the existence of specific structural patterns in incompletely characterized structures may provide clues into their functions [4, 5]. As binding site residues and other functional motifs may be close in 3 D space but be non-contiguous in the amino acid sequence, structural patterns are inherently better at representing proteins functions than primary sequence alone. A number of structure-based approaches have been applied to carbohydrate-binding site prediction, and have achieved reasonable prediction accuracy [6–8]. However, even using structural information, not all carbohydrate-binding sites can be correctly predicted (e.g., false negative rates are roughly 30%). Structure-based prediction of CBPs with novel folds and binding sites has also not been performed and validated experimentally. Given their enormous potential in biotechnological applications , computational prediction of novel CBPs is a worthwhile goal.
It is unlikely that general feature-detection approaches will be able to identify all types of carbohydrate-binding sites. Carbohydrate ligands are diverse in size, geometry and other physicochemical characteristics , and this diversity is mirrored in the features of carbohydrate-binding sites in proteins. A few recent studies have developed more targeted approaches that apply structure-based methods to specific classes of CBPs [10, 11]. At a cost of lower generality, approaches that focus on structural motifs of particular functional classes of CBPs may achieve predictions with better ligand specificities and greater overall accuracies.
A useful structural and functional classification of CBPs is described by Boraston et al. . Carbohydrate-binding modules (CBMs) were divided into three main types (type A, B and C) based on their structural and functional characteristics, where members of each class are not necessarily related and do not share a common sequence pattern. Type A CBMs, which bind insoluble carbohydrates, possess a unique structural signature of three surface aromatic residues whose side-chains are arranged in a coplanar orientation to dock to a crystalline carbohydrate surface. In the binding sites of type B (glycan-chain binding) CBMs, there are typically two coplanar aromatic residues which form a "sandwich" or "clamp" around the glycan ligand. Through hydrophobic stacking (CH-Π) interactions [12, 13], aromatic sidechains of Type A and B CBMs bind to their respective glycan ligands, which are polysaccharides or oligosaccharides. Smaller monosaccharides, however, are the targets of type C CBMs, which do not necessarily possess coplanar aromatic motifs. The use of aromatic motifs as structural signatures of CBPs is consistent with computational and experimental analyses of carbohydrate-binding sites. Malik et al. , who scored amino acid propensities in known carbohydrate-binding sites, found that Trp is extremely overrepresented (331%). Mutations of aromatic residues such as Trp have also been shown to significantly decrease carbohydrate-binding activities [14–17]. Ultimately, these studies indicate that Trp and Tyr are highly prevalent in carbohydrate-binding sites, while Phe and His are found less frequently. The abundance of Trp is partially due to it having the largest surface area of all amino acids for potential hydrophobic interactions. The relative abundance of Tyr over Phe is explained by the ability of Tyr to form additional H-bonds and electrostatic interactions because of its hydroxyl group. Thus, even within aromatic residues, subtle sidechain differences have the potential to affect carbohydrate-recognition.
While binding sites like those found in type C CBMs may be more structurally diverse and thus harder to identify using a structural signature, aromatic motifs found in type A and B CBM binding have the potential to be used as 3 D motifs in structural database screening to identify novel carbohydrate-binding sites . In this work, we expand on a previous 3D-motif approach  to perform a comprehensive PDB-wide screen for coplanar aromatic surface motifs. The primary goal is to determine whether such motifs are significantly enriched in carbohydrate-related proteins and can be used to identify novel CBPs and binding sites not found in existing CBP families. A novel prediction (pathogenesis-related protein, PR-5d) is then analyzed computationally and tested experimentally.
PR-5d refers to the tobacco pathogenesis-related protein, which is a member of the larger PR-5 family, including the proteins thaumatin and osmotin. Anti-fungal activity has been demonstrated for PR-5d  and related PR-5 proteins [19, 20], but the structural basis of this activity is still unclear. Membrane pore-formation has been suggested as one possible anti-fungal mechanism . Carbohydrate-binding and hydrolytic functions have also been observed for a number of PR-5 proteins (e.g., ß-1, 3-glucan interactions in thaumatin-like proteins [21, 22]). Carbohydrate interactions are consistent with structural modelling studies of PR-5 proteins, which have demonstrated that PR-5 proteins contain highly acidic clefts suitable for carbohydrate hydrolytic function [22, 23]. This suggests that the mechanism may involve interactions between PR- 5 proteins and pathogen cell wall carbohydrates. The PR-5d surface motif predicted in this study is therefore of particular interest because pathogen-specific insoluble-carbohydrate binding may represent a previously unknown mechanism by which PR-5d acts on specific pathogens of tobacco and related species. Furthermore, the results of this work highlight a critical surface region on which subtle mutations may underlie important functional novelties in the PR-5 family. In a broader sense, this study highlights the potential of structural motif screening approaches to predict novel functions using large-scale structural data.
Linear discriminant analysis of coplanar aromatic surface motifs
The coefficients of linear discriminants, which provide an indication of the contribution and relationship of each variable to the discrimination, are shown in Figure 2. The signs of each coefficient are as expected; the distance parameter is negative indicating a preference for aromatic pairs in closer proximity; solvent accessibility (asaavg) is positive indicating a preference for greater surface exposure; and theta is negative indicating that lower angles (higher coplanarity) is preferred. The residue-type parameter resscore was the strongest discriminating variable between aromatic motifs found in glycan-binding sites and random, aromatic surface motifs.
PDB screening reveals a significant association between aromatic surface motifs and carbohydrate related functions
After parameter fitting, the algorithm was applied to 15,970 non-redundant structures from the nrPDB dataset, with an initial 229,919 aromatic pairs from 15,047 different structures that passed the parameter thresholds. These were screened for pairs that received a raw LDA score greater than the 99th percentile score (~3.40) of all sites analyzed in the initial training set. This resulted in 1,304 high-scoring aromatic pairs from a total of 994 unique structures.
Enriched GO functions in structures identified by motif screening.
# in screen
# in dataset
hydrolase activity, hydrolyzing Oglycosyl compounds
carbohydrate metabolic process
chitin catabolic process
cellulose catabolic process
glycerol ether metabolic process
polysaccharide catabolic process
siderophore-iron transmembrane transporter activity
For some apparently enriched terms not directly related to carbohydrate-binding (e.g., 'cation-binding'), these terms are commonly linked with carbohydrate-related enzymes and thus exhibit significant enrichment in the dataset. For example, 1LWJ (T. Maritima 4-alpha-glucanotransferase/acarbose complex), is tagged with GO: 0005975 (carbohydrate metabolic process) as well as GO: 0003824 (catalytic activity), and GO: 0043169 (cation binding). All three of these terms showed significant enrichment (Table 1).
Also shown are 17 predicted aromatic-motif binding sites occurring in structural genomics proteins whose functions are incompletely characterized or unknown (Figure 3, bottom panel). While carbohydrates/glycans are likely targets of the predicted motifs, they may also be binding sites for other ligands (e.g., some nucleotide-binding sites also consist of an aromatic sandwich motif).
The detected motifs are similar to those found in type A and B CBMs, and are predominantly Trp-rich coplanar strip or sandwich motifs. All examples shown in Figure 3 contain Trp/Trp motifs, as this was the most highly scored residue combination and thus occurs most frequently in the top percentile of scoring. All of the structural genomics proteins shown have different folds from the structures in the training set, with one exception. PDB ID 3hnm (a domain from a putative chitobiase) structurally aligned well with the CBM 22 xylan-binding domain (1dyo) according to VAST (E-value = 0.0453) , but has very a low sequence identity (5.2%) to this protein.
In several of these cases, sequence or structural similarities also support the possibility of carbohydrate-binding functions. PDB ID 1u9c is in the same SCOP superfamily ('Class I glutamine amidotransferase-like') as A4 beta-galactosidase middle domain (PDB ID 1kwk). A BLAST search of 2i9i detected similarity to "neuraminyllactose-binding hemagglutinin" proteins (e.g., PDB ID 3bgh). PDB ID 2b06 has a nudix fold also found in GDP-mannose mannosyl hydrolase NudD (e.g., PDB ID 1rya). PDB ID 1pbj has a CBS-domain pair fold, and a similar site in PDB ID 2rif binds AMP. Lastly, 3cbw is a structure of beta-mannanase BsMan26A from Bacillus subtilis.
The motifs shown in Figure 3 were identified using an LDA score threshold equivalent to the 99th percentile score. However, it is important to note that other known and candidate carbohydrate-binding structures had motifs scoring below this threshold. For example, the structural genomics proteins, PDB ID 3e5z (a putative gluconolactonase; predicted residues: W29, Y135) and PDB ID 3dsm (B. uniformis surface layer protein; predicted residues: W117, W162, Y207, W240, Y284, Y303) are likely to interact with carbohydrates and had predicted binding sites scoring highly (>95% score) but below the threshold.
A high-scoring aromatic motif on the surface of pathogenesis-related protein, PR-5d
One of the top predictions was a surface motif in the pathogenesis-related protein PR-5d from tobacco. The putative binding site in PR-5d received a raw LDA score of ~3.80 (99.6th percentile) for the W34/W36 pair, ~3.69 (99.5th percentile) for the W36/W196 pair, and ~3.15 (98.6th percentile) for the W34/W196 pair. Thus, the motifs in PR-5d received extremely high scores, scoring higher than many of the glycan-binding sites from the training set. Compared to all proteins in the nrPDB analyzed in screening, this putative binding site had scores greater than 99.7% of cases.
In addition to PR-5d, only 51 other structures out of 15,970 (< 0.3%) were found to contain type A CBM-like triplets with scores greater or equal to that of PR-5d. Such binding sites composed of three or more coplanar aromatic motifs were identified by grouping together pairs of high-scoring coplanar aromatics that share a common residue.
Given previous associations between the hydrolytic cleft and glycan binding [21–23], it is possible that motifs within or near the hydrolytic cleft of PR-5d also contribute to cellulose binding. In addition to W34/W36/W196, a weaker scoring pair was detected near the hydrolytic cleft (F91 and F96). This site received an LDA score of 0.505, placing it in the 86th percentile of all scores. Though this site and other motifs in the hydrolytic cleft are suitable for binding glycan chains, they do not possess the common pattern of type A CBM binding sites, and are much less likely to be capable of binding to a crystalline-cellulose surface.
An insoluble cellulose-pulldown assay of tobacco proteins identifies PR-5d
The structural analysis indicates that PR-5d possesses a significant structural signature of carbohydrate-binding proteins, the W34/W36/W196 motif that is similar to insoluble-carbohydrate binding motifs found in known type A CBMs. Thus, the binding activity of PR-5d towards insoluble cellulose was tested experimentally using a cellulose pulldown assay of tobacco (Nicotiana tabacam) proteins, followed by mass spectrometry. A similar experiment performed with insoluble chitin was also performed in a previous study, which identified a chitinase CBM but not PR-5d . Tobacco plants were first treated with salt in order to cause an ethylene-induced stress response in order to induce PR-5d gene expression (previously shown in Sato et al. ) and obtain a larger PR-5d yield than that expected under normal conditions. In the cellulose-pulldown assay, tobacco protein extract was mixed with insoluble cellulose, and the mixture was washed stringently in order to remove non-cellulose-binding proteins and purify only proteins with strong binding activity towards insoluble cellulose. The cellulose-binding fraction was then analyzed with SDS-PAGE, stained with both coomassie and silver stain, and analyzed by mass spectrometry. The stringent conditions resulted in several faint bands on the coomassie stained gel, which were more easily visualized using silver stain (Figure 4B). A band at the expected size (~23 kDa) of PR-5d was present, which was then excised and identified by mass spectrometry (MS). MS analysis identified two peptides, which were identified as the top-scoring match to the sequence of the tobacco PR-5d protein (Figure 4B). The identification of PR-5d in the cellulose-binding fraction demonstrates that PR-5d has insoluble-cellulose-binding activity and validates the computational prediction.
Phylogenetic analysis of the PR-5d W34/W36/W196 motif
To further support the functional importance of the cellulose-binding motif, patterns of residue conservation across species were investigated. Close homologs of PR-5d were retrieved via a BLAST search of the NCBI nr protein database and a multiple alignment and phylogenetic tree was constructed. The predicted aromatic motif residues and corresponding residues from other related PR-5 proteins were then mapped onto the phylogeny in order to phylogenetically trace the origin of the W34/W36/W196 motif in PR-5d.
A proposed function of the Solanaceae specific PR-5d subfamily
Solanaceae plants are common targets of the deadly plant pathogen, Phytophthora (Greek for "plant-destroyer"). The clade-defining, putative binding site towards insoluble-polysaccharides in the plant-defense protein, PR-5d, may be an evolutionary adaptation towards defense against Phytophthora. Unlike fungi, which are commonly studied targets of PR-5 proteins, a distinguishing feature of Phytophthora infestans and other oomycetes is that they possess a cell wall containing insoluble cellulose [30, 31]. Previous studies have shown that PR-5 related proteins such as tobacco osmotin are active against Phytophthora infestans[19, 32]. According to the tree, tobacco osmotin and related osmotin-like proteins are indeed the most closely related sequences to the PR-5d subfamily (several of these proteins are labelled in Figure 5). The occurrence of the W34/W36/W196 motif may therefore represent a functional innovation in response to Phytophthora invasion and infection, providing additional indirect support for the mechanistic importance of this motif.
Through this work, it has been shown that key features of surface aromatic motifs (residue type, distance, solvent-accessibility, and coplanarity) can be used to structurally distinguish known and novel glycan-binding proteins and their binding sites from random protein structures. This provides further support for approaches that use physicochemical and geometric features of protein surfaces to analyze and predict specific protein functions [33–35]. By performing a large-scale screen of the PDB using the 3 D pattern, existing carbohydrate-binding sites were correctly predicted as well as novel motifs in structural genomics proteins of unknown function. In addition, aromatic surface motifs were found to be significantly associated with carbohydrate-related functions. A high-scoring prediction (PR-5d) was studied structurally, phylogenetically and experimentally. The combined results suggest that the W34/W36/W196 surface motif in PR-5d may function as an insoluble-polysaccharide binding site that targets polysaccharides such as cellulose in pathogen cell walls. A likely target is the major plant pathogen, Phytophthora, which has a cell wall made of cellulose, and which commonly infects the Solanaceae species in which the PR-5d W34/W36/W196 motif is present.
Structural dataset construction
Type A and B CBMs: Representative structures of six type A CBMs and 12 type B CBMs were retrieved from the Protein Data Bank (PDB), yielding a total of 17 diverse carbohydrate-binding structures. All structures have distinct folds or low pairwise sequence identity (BLASTp E-values greater than 0.01). Type A CBMs included 1cbh (CBM1), 1exg (CBM2), 1nbc (CBM3), 1aiw (CBM5), 1e8r (CBM10), and 2bh0, a recently solved structure of a bacterial expansin with similarity to plant beta-expansins (group 1 grass pollen allergens). This structure also has a binding site and activity analogous to type A CBMs . Type B CBMs included 2xbd (CBM2), 1gui (CBM4), 1uxx (CBM6), 1gny (CBM15), 1j84 (CBM17), 1ac0 (CBM20), 1dyo (CBM22), 1of4 (CBM27), 1uww (CBM28), 1gwk (CBM29), 1uh2 (CBM34), and 1ux7 (CBM36). These structures were selected based on the scheme presented in Boraston et al. , a comprehensive review on CBMs.
Nh3d reference dataset: The Nh3d version 3.0 dataset  was retrieved for use as the reference dataset. This dataset contains 806 structurally dissimilar protein chains from the PDB, and was built using the hierarchical CATH database classification. Nh3d was chosen because it was carefully constructed to contain structurally dissimilar protein chains without recognizable common ancestry, and so it lends itself to statistical, structural analysis. In addition, none of the proteins in this dataset are closely related (as determined through BLAST searches) to the sequences in the type A and B CBM dataset. nrPDB: For the purposes of screening, a large non-redundant database of 15,970 PDB structures was generated by retrieving a precomputed list of structures from the PISCES server . The percentage identity cutoff was 90%, the resolution cutoff was 3.0 Å, and all R-factors were allowed. Homologs of the type A and B CBM structures were not included in this dataset.
Structural motif analysis and screening
Aromatic residue pairs were selected within known structures from the type A and B CBM dataset based on previous literature and manual inspection (aromatic pairs listed in Figure 1). For type B CBMs, one pair of aromatic residues was used to describe a single ligand-binding site, and for type A CBMs composed of three aromatic residues, the two pairs with the shortest inter-residue distance were used. As a comparison reference dataset containing random protein structures, the Nh3D dataset was used. All aromatic pairs passing initial parameter thresholds (identical to those later described in Screening) were selected, and assumed as non-binding-sites for algorithm training.
Here, Zip are the values of the discriminating variables; Bp are the coefficients; and Di is the score for the ith data point (in this case, putative binding site). We used the raw LDA score in subsequent searches for novel binding sites. The following features were used to train the parameters in LDA:
Coplanarity: measured as the angle (0 to 90 degrees) between the normal vectors of both aromatic rings).
Residue type: The score of each residue pair was set to the frequency of the pair plus a pseudocount of 0.5 (except for Phe because Phe was not observed at all). The scores for each residue pair were divided by the sum of total scores, resulting in (Trp/Trp = 0.431; Trp/Tyr = 0.293; Tyr/Tyr = 0.155; Trp/His = 0.052; His/His = 0.017; Any/Phe = 0).
Distance: the Euclidean distance, sqrt[(x2-x1)2+(y2-y1)2+(z2-z1)2], between the centroids [(x1,y1,z1) and (x2,y2,z2)] of each aromatic ring.
Solvent accessibility: the solvent accessible surface area (ASA) was calculated from a Voronoi tessellation . To account for differences in inherent sidechain solvent accessibility between different aromatic residues, each value was divided by the maximum observed in the background dataset for that residue. The average relative ASA value of the two aromatic sidechains was then used as a final feature.
Thresholding: aromatic pairs with feature values outside the allowed range were removed. The allowed range was simply defined based on the minimum and/or maximum values observed for known binding sites in the training set (6.03 Å ≤ Distance ≤ 21.03 Å, Fractional solvent-accessibility relative to residue type ≥ 0.21, Coplanarity ≤ 83.55 degrees). The ASA cutoff, for instance, removed internal aromatic residues incapable of forming external interactions.
Scoring: The LDA scoring function was used to score all remaining candidate binding sites. The score of any aromatic pair can be compared to the "background" distribution of scores, which reflects its potential for being a glycan- or other type of ligand-binding site.
Gene Ontology (GO) analysis
Since this test was done for all N = 501 GO terms, we used false-discovery rates to correct for multiple statistical tests. The 501 P-values were first ranked in increasing order, and significant p-values were those for which the raw p-value is less than (rank × alpha/N) . An FDR alpha value of 0.05 was used. The fold enrichment (fraction of GO term observed in the screen/fraction of GO terms in all structures) is also reported in Table 1.
Cellulose pulldown assay
Ten grams of root tissues from 3-week old tobacco plants were ground with liquid nitrogen and homogenized with 10 mL of extraction buffer (20 mM HEPES pH 8.0, 0.5 M NaCl, 0.1 mM EDTA pH 8.0, 0.1% Triton-X100). The suspension was sonicated 3 × 30 seconds with 30-second pauses between pulses at 200-300 W. The cell lysate was centrifuged at 10000 × g at 4°C for 10 minutes. A protein concentration of 1 μg/μL was determined using the Bradford assay. 0.5 grams of Avicel® microcrystalline cellulose (~1 mL) that was purified from fibrous plants (FMC Corporation, Newark, DE) was equilibrated with 10 mL of extraction buffer. The whole 10 mL of lysate was then mixed with the equilibrated cellulose and incubated at 4°C for 3 hours. The mixture was applied to a Poly-Prep® chromatography column (Bio-Rad Laboratories, Hercules, CA). Loosely bound proteins were removed by washing with 30 mL (~30 column volumes) of extraction buffer. The cellulose-binding proteins were eluted with 4 × 1 mL of 0.1% SDS, and the eluate was collected as the cellulose-binding fraction.
The cellulose-binding fraction was separated by 12% SDS-PAGE gel and stained with Coomassie blue G-250 (Bio-Rad Laboratories, Hercules, CA). The strongest bands were excised and digested with trypsin as previously described . The peptides were extracted from gel pieces by vortexing and dried in a SpeedVac (Instruments Inc., Hicksville, NY). The peptides were resuspended in 50% acetonitrile with 0.1% formic acid. Mass spectrometry was performed on an Applied Biosystems Q-TRAP system. Peak lists were generated and processed using Analyst software version 1.4.1 (Applied Biosystems). The protein was identified using PEAKS version 4.5 (Bioinformatics Solutions Inc., Waterloo, ON). The parental and fragment mass error are 0.2 Da and 0.1 Da, respectively. Fragments were predicted based on digestion with trypsin (one missed cleavage site allowed) and carbamidomethylation and methionine oxidation as fixed and variable modifications, respectively.
As the Coomassie gel produced faint bands, a second SDS-PAGE gel was prepared and silver stained. The silver staining procedure was followed by the instruction of the PlusOne Protein Silver Staining Kit from GE Healthcare (cat # 17-1150-01).
Phylogenetic and sequence analysis of PR-5d
A BLAST search of tobacco PR-5d (PDB ID 1aun) was used to identify related sequences. All sequences with E < 0.001 that aligned to the query with sequence coverage > 90% were used to build a second alignment using MUSCLE . Conserved regions of the alignment were used to generate a midpoint-rooted neighbor-joining tree using Seaview . A major clade of PR-5d proteins containing the PR-5d subclade was then selected for further analysis.
This work is supported by the National Sciences and Engineering Research Council of Canada (NSERC) through grants to BJM, and a Canada Graduate Scholarship (CGS) to ACD. We also thank Tony Facciuolo for his involvement in related work on PR-5d.
- Boraston AB, Bolam DN, Gilbert HJ, Davies GJ: Carbohydrate-binding modules: fine-tuning polysaccharide recognition. Biochem J 2004, 382: 769–81. 10.1042/BJ20040892PubMed CentralView ArticlePubMedGoogle Scholar
- Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V, Henrissat B: The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics. Nucleic Acids Res 2009, 37: D233–8. 10.1093/nar/gkn663PubMed CentralView ArticlePubMedGoogle Scholar
- Malik A, Ahmad S: Sequence and structural features of carbohydrate binding in proteins and assessment of predictability using a neural network. BMC Struct Biol 2007, 7: 1. 10.1186/1472-6807-7-1PubMed CentralView ArticlePubMedGoogle Scholar
- Jones S, Thornton JM: Searching for functional sites in protein structures. Curr Opin Chem Biol 2004, 8: 3–7. 10.1016/j.cbpa.2003.11.001View ArticlePubMedGoogle Scholar
- Rigden DJ: Understanding the cell in terms of structure and function: insights from structural genomics. Curr Opin Biotechnol 2006, 17: 457–64. 10.1016/j.copbio.2006.07.004View ArticlePubMedGoogle Scholar
- Taroni C, Jones S, Thornton JM: Analysis and prediction of carbohydrate binding sites. Protein Eng 2000, 13: 89–98. 10.1093/protein/13.2.89View ArticlePubMedGoogle Scholar
- Shionyu-Mitsuyama C, Shirai T, Ishida H, Yamane T: An empirical approach for structure-based prediction of carbohydrate-binding sites on proteins. Protein Eng 2003, 16: 467–78. 10.1093/protein/gzg065View ArticlePubMedGoogle Scholar
- Kulharia M, Bridgett SJ, Goody RS, Jackson RM, InCa-SiteFinder: A method for structure-based prediction of inositol and carbohydrate binding sites on proteins. J Mol Graph Model 2009, 28: 297–303. 10.1016/j.jmgm.2009.08.009View ArticlePubMedGoogle Scholar
- Shoseyov O, Shani Z, Levy I: Carbohydrate binding modules: biochemical properties and novel applications. Microbiol Mol Biol Rev 2006, 70: 283–95. 10.1128/MMBR.00028-05PubMed CentralView ArticlePubMedGoogle Scholar
- Doxey AC, Cheng Z, McConkey BJ: Discrimination of insoluble-carbohydrate binding proteins and their binding sites using a 3 D motif detection method. IEEE International Conference on Bioinformatics and Biomedicine 2008, 207–213. full_textGoogle Scholar
- Nassif H, Al-Ali H, Khuri S, Keirouz W: Prediction of protein-glucose binding sites using support vector machines. Proteins 2009, 77: 121–32. 10.1002/prot.22424View ArticlePubMedGoogle Scholar
- Laughrey ZR, Kiehna SE, Riemen AJ, Waters ML: Carbohydrate-pi interactions: what are they worth? J Am Chem Soc 2008, 130: 14625–33. 10.1021/ja803960xPubMed CentralView ArticlePubMedGoogle Scholar
- Vandenbussche S, Díaz D, Fernández-Alonso MC, Pan W, Vincent SP, Cuevas G, Cañada FJ, Jiménez-Barbero J, Bartik K: Aromatic-carbohydrate interactions: an NMR and computational study of model systems. Chemistry 2008, 14: 7570–8. 10.1002/chem.200800247View ArticlePubMedGoogle Scholar
- Bray MR, Johnson PE, Gilkes NR, McIntosh LP, Kilburn DG, Warren RA: Probing the role of tryptophan residues in a cellulose-binding domain by chemical modification. Protein Sci 1996, 5: 2311–2318. 10.1002/pro.5560051117PubMed CentralView ArticlePubMedGoogle Scholar
- McLean BW, Bray MR, Boraston AB, Gilkes NR, Haynes CA, Kilburn DG: Analysis of binding of the family 2a carbohydrate-binding module from Cellulomonas fimi xylanase 10a to cellulose: specificity and identification of functionally important amino acid residues. Protein Eng 2000, 13: 801–809. 10.1093/protein/13.11.801View ArticlePubMedGoogle Scholar
- Uchiyama T, Katouno F, Nikaidou N, Nonaka T, Sugiyama J, Watanabe T: Roles of the exposed aromatic residues in crystalline chitin hydrolysis by chitinase A from Serratia marascens 2170. J Biol Chem 2001, 276: 41343–41349. 10.1074/jbc.M103610200View ArticlePubMedGoogle Scholar
- Lehtio J, Sugiyama J, Gustavsson M, Fransson L, Linder M, Teeri TT: The binding specificity and affinity determinants of family 1 and family 3 cellulose binding modules. Proc Natl Acad Sci USA 2003, 100: 484–489. 10.1073/pnas.212651999PubMed CentralView ArticlePubMedGoogle Scholar
- Koiwa H, Kato H, Nakatsu T, Oda J, Yamada Y, Sato F: Purification and characterization of tobacco pathogenesis-related protein PR-5d, an antifungal thaumatin-like protein. Plant Cell Physiol 1997, 38: 783–91.View ArticlePubMedGoogle Scholar
- Woloshuk CP, Meulenhoff JS, Sela-Buurlage M, van den Elzen PJ, Cornelissen BJ: Pathogen-induced proteins with inhibitory activity toward Phytophthora infestans . Plant Cell 1991, 3: 619–28. 10.1105/tpc.3.6.619PubMed CentralView ArticlePubMedGoogle Scholar
- Vigers AJ, Wiedema S, Roberts WK, Legrand M, Selitrennikoff CP, Fritig B: Thaumatin-like pathogenesis-related proteins are antifungal. Plant Science 1992, 83: 155–161. 10.1016/0168-9452(92)90074-VView ArticleGoogle Scholar
- Grenier J, Potvin C, Trudel J, Asselin A: Some thaumatin-like proteins hydrolyse polymeric beta-1,3-glucans. Plant J 1999, 19: 473–80. 10.1046/j.1365-313X.1999.00551.xView ArticlePubMedGoogle Scholar
- Osmond RIW, Hrmova M, Fontaine F, Imberty A, Fincher GB: Binding interactions between barley thaumatin-like proteins and (1,3)-beta-D-glucans: Kinetics, specificity, structural analysis and biological implications. FEBS Journal 2001, 268: 4190–4199.Google Scholar
- Koiwa H, Kato H, Nakatsu T, Oda J, Yamada Y, Sato F: Crystal structure of tobacco PR-5d protein at 1.8 A resolution reveals a conserved acidic cleft structure in antifungal thaumatin-like proteins. J Mol Biol 1999, 286: 1137–45. 10.1006/jmbi.1998.2540View ArticlePubMedGoogle Scholar
- Thiruv B, Quon G, Saldanha SA, Steipe B: Nh3d: a reference dataset of nonhomologous protein structures. BMC Struct Biol 2005, 5: 12. 10.1186/1472-6807-5-12PubMed CentralView ArticlePubMedGoogle Scholar
- Barrell D, Dimmer E, Huntley RP, Binns D, O'Donovan C, Apweiler R: The GOA database in 2009--an integrated Gene Ontology Annotation resource. Nucleic Acids Res 2009, (37 Database):D396–403. 10.1093/nar/gkn803Google Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Statist Soc Ser B 1995, 57: 289–300.Google Scholar
- Gibrat JF, Madej T, Bryant SH: Surprising similarities in structure comparison. Curr Opin Struct Biol 1996, 6: 377–85. 10.1016/S0959-440X(96)80058-3View ArticlePubMedGoogle Scholar
- Tailford LE, Ducros VM, Flint JE, Roberts SM, Morland C, Zechel DL, Smith N, Bjørnvad ME, Borchert TV, Wilson KS, Davies GJ, Gilbert HJ: Understanding How Diverse β-Mannanases Recognize Heterogeneous Substrates. Biochemistry 2009, 48: 7009–18. 10.1021/bi900515dView ArticlePubMedGoogle Scholar
- Sato F, Kitajima S, Koyama T, Yamada Y: Ethylene-induced gene expression of osmotin-like protein, a neutral isoform of tobacco PR-5, is mediated by the AGCCGCC cis-sequence. Plant Cell Physiol 1996, 37: 249–55.View ArticlePubMedGoogle Scholar
- Zevenhuizen LPTM, Bartnicki-Garcia S: Structure of the insoluble hyphal wall glucan of Phytophthora cinnamomi . Biochemistry 1969, 8: 1496–1502. 10.1021/bi00832a027View ArticlePubMedGoogle Scholar
- Latijnhouwers M, de Wit PJ, Govers F: Oomycetes and fungi: similar weaponry to attack plants. Trends Microbiol 2003, 11: 462–9. 10.1016/j.tim.2003.08.002View ArticlePubMedGoogle Scholar
- Liu D, Raghothama KG, Hasegawa PM, Bressan RA: Osmotin overexpression in potato delays development of disease symptoms. Proc Natl Acad Sci USA 1994, 91: 1888–92. 10.1073/pnas.91.5.1888PubMed CentralView ArticlePubMedGoogle Scholar
- Bagley SC, Altman RB: Characterizing the microenvironment surrounding protein sites. Protein Sci 1995, 4: 622–35. 10.1002/pro.5560040404PubMed CentralView ArticlePubMedGoogle Scholar
- Jones S, Thornton JM: Searching for functional sites in protein structures. Curr Opin Chem Biol 2004, 8: 3–7. 10.1016/j.cbpa.2003.11.001View ArticlePubMedGoogle Scholar
- Doxey AC, Yaish MW, Griffith M, McConkey BJ: Ordered surface carbons distinguish antifreeze proteins and their ice-binding regions. Nat Biotechnol 2006, 24: 852–5. 10.1038/nbt1224View ArticlePubMedGoogle Scholar
- Kerff F, Amoroso A, Herman R, Sauvage E, Petrella S, Filée P, Charlier P, Joris B, Tabuchi A, Nikolaidis N, Cosgrove DJ: Crystal structure and activity of Bacillus subtilis YoaJ (EXLX1), a bacterial expansin that promotes root colonization. Proc Natl Acad Sci USA 2008, 105: 16876–81. 10.1073/pnas.0809382105PubMed CentralView ArticlePubMedGoogle Scholar
- Wang G, Dunbrack RL Jr: PISCES: a protein sequence culling server. Bioinformatics 2003, 19: 1589–91. 10.1093/bioinformatics/btg224View ArticlePubMedGoogle Scholar
- McConkey BJ, Sobolev V, Edelman M: Quantification of protein surfaces, volumes and atom-atom contacts using a constrained Voronoi procedure. Bioinformatics 2002, 18: 1365–1373. 10.1093/bioinformatics/18.10.1365View ArticlePubMedGoogle Scholar
- Cheng Z, Wei Y-YC, Sung WW, Glick BR, McConkey BJ: Proteomic analysis of the response of the plant growth-promoting bacterium Pseudomonas putida UW4 to nickel stress. Proteome Science 2009, 7: 18. 10.1186/1477-5956-7-18PubMed CentralView ArticlePubMedGoogle Scholar
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004, 32: 1792–7. 10.1093/nar/gkh340PubMed CentralView ArticlePubMedGoogle Scholar
- Gouy M, Guindon S, Gascuel O: SeaView version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol Biol Evol 2010, 27: 221–224. 10.1093/molbev/msp259View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.