Deciphering structure and topology of conserved COG2042 orphan proteins
© Armengaud et al; licensee BioMed Central Ltd. 2005
Received: 13 October 2004
Accepted: 08 February 2005
Published: 08 February 2005
The cluster of orthologous group COG2042 has members in all sequenced Eukaryota as well as in many Archaea. The cellular function of these proteins of ancient origin remains unknown. PSI-BLAST analysis does not indicate a possible link with even remotely-related proteins that have been functionally or structurally characterized. As a prototype among COG2042 orthologs, SSO0551 protein from the hyperthermophilic archaeon Sulfolobus solfataricus was purified to homogeneity for biophysical characterization.
The untagged protein is thermostable and behaves as a monomeric protein in gel filtration experiment. Several mass spectrometry-based strategies were combined to obtain a set of low resolution structural information. Kinetic data from limited proteolysis with various endoproteases are concordant in pointing out that region Glu73-Arg78 is hyper-sensitive, and thus accessible and flexible. Lysine labeling with NHS-biotin and cross-linking with DTSSP revealed that the 35 amino acid RLI motif at the N terminus is solvent exposed. Cross-links between Lys10-Lys14 and Lys23-Lys25 indicate that these residues are spatially close and in adequate conformation to be cross-linked. These experimental data have been used to rank multiple three-dimensional models generated by a de novo procedure.
Our data indicate that COG2042 proteins may share a novel fold. Combining biophysical, mass-spectrometry data and molecular model is a useful strategy to obtain structural information and to help in prioritizing targets in structural genomics programs.
Genomic comparative studies on entirely sequenced genomes from the three domains of life, i.e. Bacteria, Archaea and Eukaryota , evidenced that proteins involved in the organization or processing of genetic information (structures of ribosome and chromatin, translation, transcription, replication and DNA repair) display a closer relationship between Archaea and Eukaryota than between Bacteria and Eukaryota [2–4]. To identify new proteins involved in such important cellular mechanisms, an exhaustive inventory of proteins of unknown function common to only Eukaryota and Archaea but not in Bacteria has been devised [5–7]. Among such proteins, the Cluster of Orthologous Group COG2042 comprises proteins ubiquitously present in Eukaryota and present in many, but not all, Archaea; a hallmark of their ancient origin. The corresponding ancestral protein should have been present in the common ancestor of these two domains of life. Some partial experimental data are known from the Saccharomyces cerevisiae COG2042 homolog. Deletion of the Yor006c gene was shown to result in a viable phenotype but some apparent moderate growth defects were noticed on a fermentable carbon source [8, 9]. Two putative protein partners for Yor006c were identified through a high-throughput two-hybrid study : Ydl017w, a serine/threonine kinase also known as the cell division control protein 7 (Cdc7), and Yil025c, a hypothetical ORF. However, the cellular function of COG2042 proteins remains unknown.
A polar region, named RLI, is conserved at the N terminus of COG2042 proteins as well as at the N terminus of another cluster of orthologous proteins, namely COG1245. The latter, exemplified by SSO0287 in Sulfolobus solfataricus , are large proteins (about 600 residues) that encompass four different domains: a RLI domain, a [4Fe-4S] ferredoxin domain, and two ATPase domains, usually found in ABC transporter. Their putative function is currently subjected to discussion [12, 13] but could be related to rRNA metabolism. Indeed, four of the eleven proteins shown to interact with the yeast COG1245 homolog (Ydr091c) were identified as involved in rRNA metabolism (Ymr047c, Ydl213c, Ylr340w, Ylr192c). Experimental data on the human homolog of Ydr091c indicated that this protein reversibly associates with RnaseL, and thus COG1245 proteins were named RNase L inhibitor .
Because knowledge of protein structure is of high importance to understand protein function, huge efforts have been recently invested in high-throughput protein structure determination programs . Recent reports indicate that only a relatively small percentage of expressed and purified proteins are amenable to full 3D structure by NMR or crystallography and X-ray diffraction [16, 17]. In silico modeling (homology modeling, fold recognition, ab initio and de novo modeling) is the alternative to quickly gain the fold of a protein. However, such approach sometimes remains ambiguous in reliably identifying correct structures for protein sequences remotely-related to those found in PDB database. A promising strategy is the use of experimental data (if possible easily obtained) for model discrimination or refinement [18–20]. For example, the tertiary structure of the bovine basic fibroblast growth factor (FGF)-2 was probed with a lysine-specific cross-linking agent and subjected to tryptic peptide mapping by mass spectrometry to identify the sites of cross-linking . The low resolution interatomic distance information obtained experimentally allowed the authors to distinguish among threading models in spite of a relatively low sequence similarity (13 % of identical residues). Interestingly, the constant development of novel cross-linking reagents suitable for mass spectrometry  enables enrichment of cross-linked peptides facilitating such strategy. A chemical modification approach [23–26], in combination with limited proteolysis procedures [27, 28], can also provide useful structural constraints  for model refinement.
A step further is to attempt such approaches with proteins having no detectable homologs. In order to get insight into the topology of COG2042 members and if possible to use these experimental data to discriminate among structural protein templates, we combined limited proteolysis, lysine labeling and cross-linking strategies. The protein SSO0551 from the hyperthermophilic archaea Sulfolobus solfataricus was chosen as a prototype because of its thermostability and the probable absence of post-translational modifications when produced as a recombinant form in Escherichia coli. The SSO0551 protein is monomeric with a low molecular mass (19 kDa). This size is easily amenable to characterization by mass spectrometry. Our results reveal that the polar RLI motif at the N terminus is probably structured and solvent exposed, pointing at a common trait between COG2042 and COG1245 proteins, this latter group being also conserved in Eukaryota and Archaea but absent in Bacteria. The accessible and flexible regions defined by limited proteolysis combined with lysine accessibility assessed by NHS-biotin labeling and DTSSP cross-linking allowed us to discriminate among ten top ranking de novo three-dimensional (3D) models.
COG2042 comprises members exclusively from Eukaryota and Archaea
Expression in E. coli of two engineered SSO0551 constructs
Fingerprint identification of recombinant products from pSBTN-AB31 and pSBTN-AB30 constructs.
[MH]+ observed (in amu)
Δmass (in ppm)
[MH]+ observed (in amu)
Δmass (in ppm)
[MH]+ expected (in amu)
Recombinant SSO0551 is structured, thermostable and monomeric
Native molecular mass of SSO0551 was determined by size-exclusion chromatography on a Superdex 200 HR10/30 calibrated column. Pure protein eluted as a peak centered at 39.1 mL in the assay conditions corresponding to an apparent molecular mass lower than 20 kDa. This elution profile indicates that this structured protein behaves as a compact monomer.
Limited proteolysis defines Glu73-Arg78 as a hyper-sensitive region
During the earliest events of the trypsin proteolysis analyzed in various conditions for detection of large products but also smaller peptides, monocharged cations with following m/z: 8614.6 amu, 10603.4 amu, 6489.8 amu, and 12724.1 amu, were attributed to fragments [1–75] (Δmass: -178 ppm), [76–166] (+89 ppm), [1–56] (+145 ppm), [57–166] (+200 ppm), respectively (data not shown). These data clearly indicate that Lys75 and Arg56 are two sites of early cleavage by trypsin. Identification of peptides Val32-Lys166 (15478.7 amu, -53 ppm) and Gly35-Lys166 (15191.2 amu, +153 ppm) also indicates that Arg31 and Lys34 could be two other initial nick-sites.
Similar experiments with endoproteinase Arg-C resulted in observation of two pairs of complementary peptides with m/z of 1920.9 amu ([1-15] +70 ppm) and 17296.8 amu ([16–166], -95 ppm) on one hand, 8998.1 amu ([1–78], -176 ppm) and 10217.5 amu ([79–166], +332 ppm) on the other hand. These data indicated that Arg78 and Arg15 are the main proteolyzed sites when ArgC enzyme was used. Chymotrypsin attacks SSO0551 native protein mainly at Phe74 because two complementary peptides with m/z of 8487.0 amu ([1–74], -249 ppm) and 10734.2 amu ([75–166], -157 ppm) were clearly evidenced. Glu73 is the main proteolyzed site when GluC protease was used, as peptides with m/z of 8338.7 amu ([1–73], -118 ppm) and 10880.0 amu ([74–166], +28 ppm) were detected. For all these analysis, smaller peptidic fragments that accumulated over time could be attributed from further proteolysis of the products arising from initial attacks (data not shown). All these results are concordant in pointing out that Glu73-Arg78 and Glu28-Arg31 are two accessible solvent-exposed regions of the protein as they can be proteolyzed by several endopeptidases, the first cited being definitively hyper-sensitive. Local unfolding not just surface exposure is necessary for efficient in vitro proteolysis because the polypeptide segment being cleaved must form a specific structure with the associated protease . For this reason, Glu73-Arg78 region should also correspond to a flexible region, i.e. a protruding loop.
Lysine labeling with NHS-biotin and DTSSP cross-linking confirm that the N terminus is rather solvent-exposed
Monoisotopic [M+H]+ peptides generated by various proteases after NHS-biotin labeling of SSO0551.
VYIIDYHK DDPK R
K 10 andK 14
K LVK LK
K 20 andK 23
LVK LK IAEFTR
K 23 andK 25
Our initial objective was to obtain about SSO0551 as much low-resolution structural information as possible in order to discriminate among putative three-dimensional models representing COG2042 protein structure. However, currently available threading tools applied on SSO0551 failed to detect any structurally related-proteins. Alternatively, we obtained ten different ab initio models of SSO0551 using the fully-automated ROBETTA server based on ROSETTA procedures . On these ten models, we applied all the low-resolution structural information gathered in this work. We predicted for every model location of preferential proteolytic sites using the NickPred software . Models M1, M2 and M6 on one hand, and M9 and M10 on the other, show hypersensitive regions in the RLI motif or C terminus, respectively. These features do not correspond to our experimental data. Only models M4, M7 and M8 predict that the loop Glu73-Arg78 is solvent exposed (data not shown). Among these three models, M4 and M8 respect the ranking of preferential nick-sites for trypsin, chymotrypsin, ArgC and GluC proteases. Solvent accessibility for lysine side chain was evaluated for models M4, M7 and M8 and compared with experimental data (data not shown). All the lysine residues labeled with NHS-biotin are found solvent-exposed in model M8. Manual inspection of cross-linked lysines (Lys10-Lys14 and Lys23-Lys25) revealed that model M4 is not valid because of the opposite orientation of Lys10 and Lys14. Figure 7 (Panels B & C) shows cartoon views of the M8 model that fulfills all our experimental constraints. For this model, the distance between the two reactive amine groups of Lys10-Lys14 and Lys23-Lys25 pairs are 12.7 Å and 13.3 Å, respectively. Search with DALI for structural homologs using model M8 did not result in significant scores with any known PDB structures. This is consistent with the PSI-BLAST results and may indicate that COG2042 proteins share a novel fold. COG2042 proteins are thus a target of choice for genomic structural studies.
In conclusion, we have presented a strategy consisting in obtaining low-resolution structural information (determination of nick-sites, solvent exposed residues, and residue-residue distances) that can be used to distinguish among a large set of theoretical molecular models. Lack of remotely-related structural templates or lack of adequacy between experimental data and most theoretical models indicates that such family of proteins should become a priority in structural genomic projects.
Chemical and biological reagents
Most chemicals used in this study were obtained from Sigma and were of analytical grade. Oligonucleotide primers were purchased from Genset. N-hydroxysuccinimide-biotin (NHS-biotin) and 3,3'-dithio-bis [sulfosuccinimidyl-propionate] (DTSSP) were obtained from Pierce. Matrices for Matrix-assisted Laser Desorption Ionization-Time of Flight (MALDI-TOF) mass spectrometry and calibration standards were purchased from Bruker Daltonics. Sequencing grade proteolytic enzymes were from Roche Applied Science.
Cloning and overexpression of SSO0551
Two constructs were designed in order to get overexpression of the SSO0551 ORF (starting with an ATG codon at nucleotide 484790 on the Crick strand of S. solfataricus P2 genome (NC_002754)) and an N-terminal extended version of SSO0551 (starting with an ATG codon at nucleotide 484916). For both proteins, an N-terminal 6His tag was added to render the purification of the recombinant products easier. For this purpose, synthetic oligonucleotide primers were oAB22 (5'-gctagc ATGAAGCCCAAACCC-3') and oAB49 (5'-gctagc ATGAAGGTATATATTATAGAC-3') that both contain an engineered Nhe I site, oAC34 (5'-cggatcct acTCATTTTTCAAGTATTTTC-3') and oAE62 (5'-ggatcc tcaTCATTTTTCA AGTATTTTCTC-3') that both contain an engineered Bam HI site (restriction sites underlined in the primer sequences and nucleotides not present in the original sequence shown by lower case). Oligonucleotide pairs oAB22/oAC34 and oAB49/oAC34 were used for two distinct PCR amplifications of SSO0551 with S. sulfolobus total DNA as template. A 643-bp fragment (N-ter 6His-tag extended version of SSO0551) and a 517-bp fragment (N-ter 6His-tag SSO0551) were obtained, respectively. They were cloned into pCRScript-cam (Stratagene), resulting in plasmids pSBTN-AB36 and pSBTN-AB37, respectively. The two inserts were removed by digestion with Nhe I and Bam HI and ligated with T4 DNA ligase into plasmid pSBTN-AB23 (Armengaud J. & Chaumont V., unpublished data), a derivative of pCR T7/NT-topo (Invitrogen) containing a T7 promoter and 6 His-tag, previously digested with the same endonucleases. The resulting plasmids pSBTN-AB30 and pSBTN-AB31, respectively, were verified by DNA sequencing in order to ascertain the integrity of the nucleotide sequence. Hyperexpression of the recombinant SSO0551 constructs was achieved with E. coli Rosetta(DE3)pLysS strain (Novagen), freshly transformed with the plasmids described above. Cultures were carried out at 30°C as described earlier .
Purification of recombinant SSO0551 protein
The purification of recombinant SSO0551 was performed from 44 g (wet material) packed cells. Buffer A consisted of 50 mM K2HPO4/KH2PO4 buffer (pH 7.2) containing 400 mM K-glutamate. The pellet was thawed on ice and resuspended in 120 mL of buffer A. The cells were disrupted by sonication with a total energy delivered of 71 kJ. The cell-extract was then centrifuged at 30,000 g for 20 min at 4°C to remove cellular debris and aggregated proteins. The supernatant was subjected to a 20 min heat treatment using a water bath maintained at 70°C, and immediately centrifuged a second time at 30,000 g for 20 min at 4°C. Chromatographic steps were performed at room temperature using an Äkta Purifier FPLC system (Amersham Biosciences). The 135 mL supernatant was applied at a flow rate of 2.8 mL/min onto a XK 26 × 20 column (Amersham Biosciences) containing 50 mL of Chelating Sepharose Fast Flow (Amersham Biosciences) and previously loaded with 200 mM NiSO4, washed with milliQ water and equilibrated with Buffer A containing 50 mM imidazole. The fraction collected during the IMAC loading was shown to contain the SSO0551 protein. This 222 mL fraction was concentrated to a volume of 56 mL by means of Centricon Plus-20 filtration units (Millipore) and then dialyzed overnight at 4°C against 20 mM K2HPO4/KH2PO4 buffer (pH 7.2) containing 20 mM NaCl (buffer B). The 78 mL supernatant obtained after centrifugation at 30,000 g for 10 min at 4°C was divided and applied in two separate runs onto a 6 mLResource-S ion-exchange column (30 mm × 16 mm, 15 μm) from Amersham Biosciences, previously equilibrated with buffer B and operated at a flow rate of 3 mL/min. After a 10 column volume wash with buffer B, proteins were resolved with a 25 column volume linear gradient from 20 to 500 mM NaCl in buffer B. Recombinant SSO0551 was eluted at approximately 250 mM NaCl and desalted by overnight dialysis against Buffer B. The resulting 20 mL protein solution was concentrated to a volume of 8 mL by means of Centricon Plus-20 filtration units (Millipore). The sample was again divided and applied in two separate runs onto a superdex75 gel filtration packed into a HR 16/50 column at a flow rate of 1.5 mL/min in 20 mM K2HPO4/KH2PO4 buffer (pH 7.2) containing 100 mM NaCl. The fractions obtained with the two runs were pooled and dialyzed overnight at 4°C against 10 mM HEPES buffer (pH 7.2). After dialysis, the fraction was centrifuged at 26,000 g for 20 min at 4°C and the protein concentration was measured by spectrophotometry using a molar absorption coefficient of 19060 M-1 cm-1 at 280 nm. The purified protein was flash frozen in liquid nitrogen and stored at -80°C at a concentration of 0.48 mg/mL.
Far- and near-UV circular dichroism spectra were recorded at 20°C between 200 and 300 nm on a J-810 Jasco spectropolarimeter equipped with a PTC-424S Jasco Peltier, using a quartz cuvette of 1 mm path length, with a 20 nm/min scanning speed and a band-width of 1 nm. Three spectra of purified SSO0551 at 1.92 μM in 10 mM HEPES buffer (pH 7.2) were averaged and corrected from the baseline for buffer solvent contribution. Experimental data were analyzed using the program K2D  described by Andrade et al. .
Determination of native molecular mass by gel filtration
The native molecular mass of SSO0551 was estimated by gel filtration chromatography on a Superdex 200 gel packed into a HR10/30 column (Amersham Biosciences) with a final bed volume of 24 mL. The column was equilibrated at room temperature at a flow rate of 0.5 mL/min with 50 mM Tris/HCl buffer, pH 8.3, containing 50 mM NaCl and eluted with the same buffer. Protein standards used to calibrate the column were ribonuclease A (15.8 kDa), chymotrypsinogen A (21.2 kDa), ovalbumin (49.4 kDa), albumin (69.8 kDa), aldolase (191 kDa) and catalase (215 kDa), all from Amersham Biosciences. Exclusion limit was evaluated with dextran blue 2000 (Amersham Biosciences). A sample consisting of 90 μL of SSO0551 at 25.2 μM was injected and specific absorptions at 280 and 266 nm were followed.
Matrix-Assisted Laser Desorption/Ionization Time-Of-Flight (MALDI-TOF) mass measurements were performed using a Biflex IV instrument (Bruker Daltonics) in positive ionization mode. Protein samples and large peptidic fragments (>3500 Da) were applied to the target using sinapinic acid prepared as saturated solution in 30 % acetonitrile, 70 % milli-Q water and 0.1 % TFA as matrix. Samples were prepared using the dried droplet method and measured in linear mode. Small peptide samples were measured in reflectron mode using α-cyano-4-hydroxycinnamic acid in 30% acetonitrile containing 0.1% trifluoroacetic acid as matrix. Mass spectra were obtained by summation of 100–210 laser shots. The instrument was calibrated for determination of entire protein masses using either a mixture of chymotrypsin and bovine serum albumine, or apomyoglobin and aldolase. For peptides, the instrument was calibrated using a pepmix calibration kit (Bruker Daltonics). When necessary, the mass spectrometer was also internally calibrated using some of the theoretical peptide masses.
Limited protease digestion
For in-solution partial digestion, 0.2 nmol of pure SSO0551 were diluted into buffer D1 (20 mM TRIS/HCl, pH 7.8), buffer D2 (20 mM NH4HCO3, pH 7.8) or buffer D3 (20 mM TRIS/HCl, pH 7.8, containing 10 mM CaCl2 and 5 mM DTT). Trypsin or chymotrypsin was added to SSO0551 diluted into buffer D1, whereas Glu-C or Arg-C was added to the protein diluted into buffer D2 or D3, respectively. Several enzyme/protein ratios (1:50 (w/w), 1:20 (w/w) and 1:2 (w/w)) were tested for each endoprotease. The digestions were performed at room temperature and aliquots were analyzed from 30 sec to 10–240 min. Digested samples were desalted using ZipTipC18 or ZipTipC4 pipette tips (Millipore) according to the protocol specified by the manufacturer and their mass directly evaluated by MALDI-TOF. Eventually, partially proteolyzed mixtures of larger quantities (10 nmol of SSO0551) were fractionated by reverse-phase HPLC using an Aquapore RP-300 column (PerkinElmer; 100 × 1.0 mm, 7 μm, 300 Å pore size) developed at 200 μL/min with a linear gradient from 5 to 90 % of acetonitrile in TFA 0.1 % over 45 min. The elution was monitored at 220 nm with an Agilent 1100 Series HPLC system equipped with a G1315 diode array detector. Individual fractions were concentrated by evaporation in a SpeedVac (Savant) and directly analyzed by MALDI-TOF.
Lysine labeling by NHS-biotin
N-hydroxysuccinimide-biotin (NHS-biotin) was used to label ε-amino groups of SSO0551 lysines. After reaction the biotin labels resulted coupled to the lysines through a stable amide bond. The increase in mass for each label (C10H14N2O2S1) should be 226.293 amu if average mass is considered or 226.078 amu in monoisotopic mode. Modification of lysine residues was carried out by incubating 1.25 nmol of SSO0551 in 20 mM HEPES, pH 7.2, with various amount of freshly prepared NHS-biotin reagent dissolved in anhydrous dimethylsulfoxide. After 30 min of incubation at room temperature, the reagent in excess was removed by a 30 min micro-dialysis against 20 mM HEPES, pH 7.2. Samples were directly desalted by using ZipTipC4 (Millipore) prior MALDI-TOF analysis. They were eventually digested overnight with an endoprotease (trypsin, GluC or ArgC) and desalted by using ZipTipC18 pipette tips (Millipore) prior mass analysis.
Lysine cross-linking with DTSSP
3,3'-Dithio-bis [sulfosuccinimidyl-propionate] (DTSSP) was used to cross-link two ε-amino groups of SSO0551 lysines, essentially as described in . The mass increase (in monoisotopic mode) for each label should be 191.991 amu (C6H8O3S2) or 87.998 amu (C3H4O1S1) when DTT treated. The increase in mass for an intramolecular cross-link between two lysines should be 173.981 amu (C6H6O2S2) or 175.997 amu (2 × C3H4O1S1) when DTT treated. Therefore after reduction of the disulfide bridge by DTT, an additional increase of 2.016 amu should be measured. Reaction was carried out by incubating 0.25 nmol of SSO0551 in 20 mM NaH2PO4/Na2HPO4, pH 7.5 containing 150 mM NaCl, with various amount of DTSSP reagent (molar ratio of 20, 35, and 50 mol of DTSSP per mol of polypeptide). After 30 min of incubation at room temperature, the reagent in excess was removed by a 30 min micro-dialysis against 20 mM NaH2PO4/Na2HPO4, pH 7.5 containing 150 mM NaCl. Prior overnight trypsin proteolysis, urea (330 mM final concentration) was added to each sample. Before being desalted by using ZipTipC18 pipette tips (Millipore), the digested peptide mixture was eventually reduced with 50 mM DTT for 30 minutes at 37°C to reduce the thiol linker.
In silico analysis
Sequence searching was performed using PSI-BLAST with default parameters. Multiple sequence alignments were performed using VectorNTI software package (Informax Inc). Secondary structure predictions were obtained through the PSIPRED v2.4 web-interfaced facilities  described by McGuffin et al. . The molar absorption coefficient at 280 nm for SSO0551 was obtained from calculation of the amino acid composition of the recombinant protein [40, 41]. Isotopic and average mass of both DTSSP cross-linker and NHS-biotin were calculated using a web-interfaced molecular weight calculator . The peptide assignment and the first attempt for identifying the labeled products and cross-linking products were performed using the FindMod package at ExPaSy . If no match was found, a more detailed search for multiple labels or combinatorial cross-linkable peptide pairs was carried out. Partially proteolyzed products were assigned using the FindPept tool . Tertiary structure predictions were carried out using publicly available online services, including 3D-PSSM , FUGUE  and PSIPRED . Ab initio modeling was performed using the ROBETTA server [34, 47]. Each model was analyzed in terms of proteolytic sensitivity using the NICKPRED software [35, 48, 49]. Residues accessibility have been calculated using a modified version of Connolly's MS program (; Pellequer JL, unpublished results). Structural homologs were searched using DALI web server from the European Bioinformatics Institute . Model views were obtained with the MOLSCRIPT program  and rendered using RASTER3D .
List of abbreviations
- amu :
atomic mass unit
- COG :
Cluster of Orthologous Group
- DTSSP :
- IPTG :
- HPLC :
high performance liquid chromatography
- EDTA :
- HEPES :
- HEPPS :
- IMAC :
immobilized metal ion adsorption chromatography
- MALDI-TOF :
Matrix-assisted Laser Desorption/Ionization Time-of-Flight
- NHS-biotin :
- PSI-BLAST :
Position-Specific Iterated Blast
- Tris :
We gratefully acknowledge Yvan Zivanovic (CNRS-IGM, Orsay, France) for kind gift of S. sulfolobus total genomic DNA and Patrick Forterre (Université d'Orsay, Orsay, France) for initial discussions of the interest of characterizing SSO0551 protein. We thank our enthusiast technical assistants (CEA-VALRHO): Valérie Chaumont for performing the cloning and overexpression experiments, Charles Marchetti for operating the fermenter facilities, Bernard Fernandez for assistance with chromatography and recording circular dichroïsm signal, Isabelle Dany for initial fingerprint mass characterization of overproduced SSO0551, and Pascale Richard for technical support.
- Woese CR, Kandler O, Wheelis ML: Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci U S A 1990, 87(12):4576–4579.PubMed CentralView ArticlePubMedGoogle Scholar
- Dennis PP: Ancient ciphers: translation in Archaea. Cell 1997, 89(7):1007–1010. 10.1016/S0092-8674(00)80288-3View ArticlePubMedGoogle Scholar
- Olsen GJ, Woese CR: Archaeal genomics: an overview. Cell 1997, 89(7):991–994. 10.1016/S0092-8674(00)80284-6View ArticlePubMedGoogle Scholar
- Makarova KS, Koonin EV: Comparative genomics of Archaea: how much have we learned in six years, and what's next? Genome Biol 2003, 4(8):115. 10.1186/gb-2003-4-8-115PubMed CentralView ArticlePubMedGoogle Scholar
- Matte-Tailliez O, Zivanovic Y, Forterre P: Mining archaeal proteomes for eukaryotic proteins with novel functions: the PACE case. Trends Genet 2000, 16(12):533–536. 10.1016/S0168-9525(00)02137-5View ArticlePubMedGoogle Scholar
- Armengaud J, Fernandez B, Chaumont V, Rollin-Genetet F, Finet S, Marchetti C, Myllykallio H, Vidaud C, Pellequer JL, Gribaldo S, et al.: Identification, purification, and characterization of an eukaryotic-like phosphopantetheine adenylyltransferase (coenzyme A biosynthetic pathway) in the hyperthermophilic archaeon Pyrococcus abyssi . J Biol Chem 2003, 278(33):31078–31087. 10.1074/jbc.M301891200View ArticlePubMedGoogle Scholar
- Armengaud J, Urbonavicius J, Fernandez B, Chaussinand G, Bujnicki JM, Grosjean H: N2-methylation of guanosine at position 10 in tRNA is catalyzed by a THUMP domain-containing, S-adenosylmethionine-dependent methyltransferase, conserved in Archaea and Eukaryota. J Biol Chem 2004, 279(35):37142–37152. 10.1074/jbc.M403845200View ArticlePubMedGoogle Scholar
- Giaever G, Chu AM, Ni L, Connelly C, Riles L, Veronneau S, Dow S, Lucau-Danila A, Anderson K, Andre B, et al.: Functional profiling of the Saccharomyces cerevisiae genome. Nature 2002, 418(6896):387–391. 10.1038/nature00935View ArticlePubMedGoogle Scholar
- Steinmetz LM, Scharfe C, Deutschbauer AM, Mokranjac D, Herman ZS, Jones T, Chu AM, Giaever G, Prokisch H, Oefner PJ, et al.: Systematic screen for human disease genes in yeast. Nat Genet 2002, 31(4):400–404.PubMedGoogle Scholar
- Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, et al.: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 2000, 403(6770):623–627. 10.1038/35001009View ArticlePubMedGoogle Scholar
- She Q, Singh RK, Confalonieri F, Zivanovic Y, Allard G, Awayez MJ, Chan-Weiher CC, Clausen IG, Curtis BA, De Moors A, et al.: The complete genome of the crenarchaeon Sulfolobus solfataricus P2. Proc Natl Acad Sci U S A 2001, 98(14):7835–7840. 10.1073/pnas.141222098PubMed CentralView ArticlePubMedGoogle Scholar
- Gabaldon T, Huynen MA: Prediction of protein function and pathways in the genome era. Cell Mol Life Sci 2004, 61(7–8):930–944.View ArticlePubMedGoogle Scholar
- Kerr ID: Sequence analysis of twin ATP binding cassette proteins involved in translational control, antibiotic resistance, and ribonuclease L inhibition. Biochem Biophys Res Commun 2004, 315(1):166–173. 10.1016/j.bbrc.2004.01.044View ArticlePubMedGoogle Scholar
- Bisbal C, Martinand C, Silhol M, Lebleu B, Salehzada T: Cloning and characterization of a RNAse L inhibitor. A new component of the interferon-regulated 2–5A pathway. J Biol Chem 1995, 270(22):13308–13317. 10.1074/jbc.270.22.13308View ArticlePubMedGoogle Scholar
- Marx V: Quest: solve elusive. Genomics & Proteomics 2004, (1):22–28.
- Vincentelli R, Bignon C, Gruez A, Canaan S, Sulzenbacher G, Tegoni M, Campanacci V, Cambillau C: Medium-scale structural genomics: strategies for protein expression and crystallization. Acc Chem Res 2003, 36(3):165–172. 10.1021/ar010130sView ArticlePubMedGoogle Scholar
- Goulding CW, Perry LJ: Protein production in Escherichia coli for structural studies by X-ray crystallography. J Struct Biol 2003, 142(1):133–143. 10.1016/S1047-8477(03)00044-3View ArticlePubMedGoogle Scholar
- Fancy DA: Elucidation of protein-protein interactions using chemical cross-linking or label transfer techniques. Curr Opin Chem Biol 2000, 4(1):28–33. 10.1016/S1367-5931(99)00047-2View ArticlePubMedGoogle Scholar
- Back JW, de Jong L, Muijsers AO, de Koster CG: Chemical cross-linking and mass spectrometry for protein structural modeling. J Mol Biol 2003, 331(2):303–313. 10.1016/S0022-2836(03)00721-6View ArticlePubMedGoogle Scholar
- Sinz A: Chemical cross-linking and mass spectrometry for mapping three-dimensional structures of proteins and protein complexes. J Mass Spectrom 2003, 38(12):1225–1237. 10.1002/jms.559View ArticlePubMedGoogle Scholar
- Young MM, Tang N, Hempel JC, Oshiro CM, Taylor EW, Kuntz ID, Gibson BW, Dollinger G: High throughput protein fold identification by using experimental constraints derived from intramolecular cross-links and mass spectrometry. Proc Natl Acad Sci U S A 2000, 97(11):5802–5806. 10.1073/pnas.090099097PubMed CentralView ArticlePubMedGoogle Scholar
- Trester-Zedlitz M, Kamada K, Burley SK, Fenyo D, Chait BT, Muir TW: A modular cross-linking approach for exploring protein interactions. J Am Chem Soc 2003, 125(9):2416–2425. 10.1021/ja026917aView ArticlePubMedGoogle Scholar
- Zappacosta F, Ingallinella P, Scaloni A, Pessi A, Bianchi E, Sollazzo M, Tramontano A, Marino G, Pucci P: Surface topology of Minibody by selective chemical modifications and mass spectrometry. Protein Sci 1997, 6(9):1901–1909.PubMed CentralView ArticlePubMedGoogle Scholar
- Leite JF, Cascio M: Probing the topology of the glycine receptor by chemical modification coupled to mass spectrometry. Biochemistry 2002, 41(19):6140–6148. 10.1021/bi015895mView ArticlePubMedGoogle Scholar
- Back JW, Sanz MA, De Jong L, De Koning LJ, Nijtmans LG, De Koster CG, Grivell LA, Van Der Spek H, Muijsers AO: A structure for the yeast prohibitin complex: Structure prediction and evidence from chemical crosslinking and mass spectrometry. Protein Sci 2002, 11(10):2471–2478. 10.1110/ps.0212602PubMed CentralView ArticlePubMedGoogle Scholar
- Schulz DM, Ihling C, Clore GM, Sinz A: Mapping the topology and determination of a low-resolution three-dimensional structure of the calmodulin-melittin complex by chemical cross-linking and high-resolution FTICRMS: direct demonstration of multiple binding modes. Biochemistry 2004, 43(16):4703–4715. 10.1021/bi036149fView ArticlePubMedGoogle Scholar
- Zappacosta F, Pessi A, Bianchi E, Venturini S, Sollazzo M, Tramontano A, Marino G, Pucci P: Probing the tertiary structure of proteins by limited proteolysis and mass spectrometry: the case of Minibody. Protein Sci 1996, 5(5):802–813.PubMed CentralView ArticlePubMedGoogle Scholar
- Leite JF, Amoscato AA, Cascio M: Coupled proteolytic and mass spectrometry studies indicate a novel topology for the glycine receptor. J Biol Chem 2000, 275(18):13683–13689. 10.1074/jbc.275.18.13683View ArticlePubMedGoogle Scholar
- D'Ambrosio C, Talamo F, Vitale RM, Amodeo P, Tell G, Ferrara L, Scaloni A: Probing the dimeric structure of porcine aminoacylase 1 by mass spectrometric and modeling procedures. Biochemistry 2003, 42(15):4430–4443. 10.1021/bi0206715View ArticlePubMedGoogle Scholar
- Hubbard SJ, Eisenmenger F, Thornton JM: Modeling studies of the change in conformation required for cleavage of limited proteolytic sites. Protein Sci 1994, 3(5):757–768.PubMed CentralView ArticlePubMedGoogle Scholar
- Glocker MO, Borchers C, Fiedler W, Suckau D, Przybylski M: Molecular characterization of surface topology in protein tertiary structures by amino-acylation and mass spectrometric peptide mapping. Bioconjug Chem 1994, 5(6):583–590. 10.1021/bc00030a014View ArticlePubMedGoogle Scholar
- Bennett KL, Kussmann M, Bjork P, Godzwon M, Mikkelsen M, Sorensen P, Roepstorff P: Chemical cross-linking with thiol-cleavable reagents combined with differential mass spectrometric peptide mapping – a novel approach to assess intermolecular protein contacts. Protein Sci 2000, 9(8):1503–1518.PubMed CentralView ArticlePubMedGoogle Scholar
- Kim DE, Chivian D, Baker D: Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res 2004, 32(Web Server):W526–531.PubMed CentralView ArticlePubMedGoogle Scholar
- Hubbard SJ, Beynon RJ, Thornton JM: Assessment of conformational parameters as predictors of limited proteolytic sites in native protein structures. Protein Eng 1998, 11(5):349–359. 10.1093/protein/11.5.349View ArticlePubMedGoogle Scholar
- Andrade MA, Chacon P, Merelo JJ, Moran F: Evaluation of secondary structure of proteins from UV circular dichroism spectra using an unsupervised learning neural network. Protein Eng 1993, 6(4):383–390.View ArticlePubMedGoogle Scholar
- McGuffin LJ, Bryson K, Jones DT: The PSIPRED protein structure prediction server. Bioinformatics 2000, 16(4):404–405. 10.1093/bioinformatics/16.4.404View ArticlePubMedGoogle Scholar
- Gill SC, von Hippel PH: Calculation of protein extinction coefficients from amino acid sequence data. Anal Biochem 1989, 182(2):319–326. 10.1016/0003-2697(89)90602-7View ArticlePubMedGoogle Scholar
- Kelley LA, MacCallum RM, Sternberg MJ: Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol 2000, 299(2):499–520. 10.1006/jmbi.2000.3741View ArticlePubMedGoogle Scholar
- Shi J, Blundell TL, Mizuguchi K: FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol 2001, 310(1):243–257. 10.1006/jmbi.2001.4762View ArticlePubMedGoogle Scholar
- Hubbard SJ: The structural aspects of limited proteolysis of native proteins. Biochim Biophys Acta 1998, 1382(2):191–206.View ArticlePubMedGoogle Scholar
- Connolly ML: Solvent-accessible surfaces of proteins and nucleic acids. Science 1983, 221: 709–713.View ArticlePubMedGoogle Scholar
- Kraulis PJ: MOLSCRIPT: a program to produce both detailed and schematic plots of protein structures. J Appl Cryst 1991, 24: 946–950. 10.1107/S0021889891004399View ArticleGoogle Scholar
- Merritt EA, Bacon DJ: Raster3D: Photorealistic molecular graphics. Meth Enzymol 1997, 277: 505–524.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.