- Research article
- Open Access
Molecular models of NS3 protease variants of the Hepatitis C virus
BMC Structural Biology volume 5, Article number: 1 (2005)
Hepatitis C virus (HCV) currently infects approximately three percent of the world population. In view of the lack of vaccines against HCV, there is an urgent need for an efficient treatment of the disease by an effective antiviral drug. Rational drug design has not been the primary way for discovering major therapeutics. Nevertheless, there are reports of success in the development of inhibitor using a structure-based approach. One of the possible targets for drug development against HCV is the NS3 protease variants. Based on the three-dimensional structure of these variants we expect to identify new NS3 protease inhibitors. In order to speed up the modeling process all NS3 protease variant models were generated in a Beowulf cluster. The potential of the structural bioinformatics for development of new antiviral drugs is discussed.
The atomic coordinates of crystallographic structure 1CU1 and 1DY9 were used as starting model for modeling of the NS3 protease variant structures. The NS3 protease variant structures are composed of six subdomains, which occur in sequence along the polypeptide chain. The protease domain exhibits the dual beta-barrel fold that is common among members of the chymotrypsin serine protease family. The helicase domain contains two structurally related beta-alpha-beta subdomains and a third subdomain of seven helices and three short beta strands. The latter domain is usually referred to as the helicase alpha-helical subdomain. The rmsd value of bond lengths and bond angles, the average G-factor and Verify 3D values are presented for NS3 protease variant structures.
This project increases the certainty that homology modeling is an useful tool in structural biology and that it can be very valuable in annotating genome sequence information and contributing to structural and functional genomics from virus. The structural models will be used to guide future efforts in the structure-based drug design of a new generation of NS3 protease variants inhibitors. All models in the database are publicly accessible via our interactive website, providing us with large amount of structural models for use in protein-ligand docking analysis.
After the development of serological tests for hepatitis A and B viruses in the 1970s it became clear that an additional agent accounted for approximately 90% of transfusion-associated hepatitis (non-A non-B hepatitis, NANBH) .
The novel agent, hence termed hepatitis C virus (HCV), currently infects approximately 3% of the world's population and it was classified within the Flavivirideae family. Diagnostic tests for anti-HCV antibodies developed thereafter proved that HCV was indeed the predominant cause of NANBH . In view of the lack of vaccines against HCV, there is an urgent need for a treatment of the disease by an effective antiviral drug. This necessity has boosted research on the structural biology of HCV with the primary focus being to identify possible targets for pharmaceutical intervention .
Rational drug design has not been the primary way for discovering major therapeutics. However, recent successes in the area give reason to expect that drug discovery projects will increasingly be structure based. One of the possible targets for drug development against HCV is the NS3 protease variants. HCV RNA is translated into a polyprotein that during maturation is cleaved into functional components. One component, nonstructural protein 3 (NS3), is a 631-residue bifunctional enzyme with protease and helicase activities.
The N-terminal portion of the NS3 protein was predicted to contain a serine protease domain as judged from conserved sequence patterns and by homology to Flavi- and Pestiviruses [4–6]. The NS3 serine protease processes the HCV polyprotein by both cis and trans mechanisms. The interative refinement and optimization of drug leads is an effective strategy for generating potent preclinical candidate [7, 8]. Ongoing genome sequencing efforts have led to the identification of hundreds of potential therapeutic targets, many of which represent possible sources of crossover pharmacology. Homology or comparative modeling is a key feature of an integrated drug discovery effort because it allows this genomics information to be utilized early in the development of target ligands or in the engineering of ligand specificity .
Genome sequencing efforts are providing us with complete genetic blueprints for hundreds of organisms, including humans. We are now faced with assigning, understanding and modifying the functions of proteins encoded by these genomes. This task is generally facilitated by 3D structures , which are best determined by experimental methods such as X-ray crystallography and NMR spectroscopy. The theoretical approaches  can be divided into physical and empirical methods. The physical prediction methods are based on interactions between atoms and include molecular dynamics and energy minimization , whereas the empirical methods depend on the protein structures that have been already determined by experiment. They include combinatorial  and comparative modeling [14, 15].
Comparative modeling uses experimentally determined protein structures to predict conformation of other proteins with similar amino acid sequences. For modeling of proteins was used restrained-based modeling implemented in the program MODELLER . The models consist of coordinates for all non-hydrogen atoms in the modeled part of a protein. Models are generated entirely automatically in a four-step procedure : (i) fold assignment, (ii) sequence-structure alignment, (iii) model building, and (iv) model evaluation. This procedure was applied to variants of NS3 protease using Perl-CGI, C and MPI programming.
We modeled the structure of variants of NS3 protease variants available in the National Center for Biotchnology Information (Genbank), using structural bioinformatics tools. Knowledge of the three-dimensional structure variants will undoubtedly aid the design of useful inhibitors that may be used as a drug against hepatitis C virus. In order to speed up the modeling process all NS3 models were generated in a Beowulf cluster (BioComp, S.J. Rio Preto, Brazil). The potential of the structural bioinformatics for development of new antiviral drugs is discussed.
Results and discussion
Primary sequence comparasion
The identity between the sequences of a bifunctional protease structure (PDB access codes:1CU1, 1DY9) [31, 38] (templates) and NS3 protease variants (targets) is shown in Table 1. The secondary structural elements are indicated in the Figure 2 without inhibitor and in the Figure 3 with inhibitor. The sequence from crystallographic structure 1CU1 shows more than 79.1% identity with the sequences of NS3 protease variants, which provide high accuracy for the models (Table 1).
Quality of the models
The atomic coordinates of crystallographic structure 1CU1 solved to resolution of the 2.5 Å were used as starting model for modeling of the NS3 protease variant structures, and the structure of NS3 complexed with an inhibitor (PDB access code: 1DY9) was used to generate homology models for docking studies. Binding of an inhibitor to the active site of an enzyme is typically connected with local and possibly also global structural rearrangement of the enzyme (induced-fit mechanism). Therefore structure-based drug design preferentially relies on the crystal structures of enzyme-inhibitor complexes containing bound inhibitors of similar chemical structures to the compounds being designed. Such complexes offer more detailed and accurate picture of the inhibitor-enzyme interactions and structural complementarity between the inhibitor and the active site. The homology models of the variants of NS3 protease which used the NS3 complexed with an inhibitor are more adequate to docking simulations. The atomic coordinates of all water molecules were removed from the templates.
The analysis of the Ramachandran diagram φ-ψ plots of the 1CU1 structure (template) were used to compare the overall stereochemical quality of the NS3 protease variants structures against template solved by biocrystallography (Table 1). They present over 94.0% of the residues in the most favorable regions. The same analysis for crystallographic structure (1CU1) present 88.9% of residues in the most favorable, 10.5% additional allowed regions, 0.6% generously allowed regions, and 0.0% disallowed regions, which strongly indicates that the molecular models present good overall stereochemical quality.
The NS3 protease variant structures are composed of six subdomains, which occur in sequence along the polypeptide chain (Figure 2 and 3). The protease domain exhibits the dual β-barrel fold that is common among members of the chymotrypsin serine protease family. The helicase domain contains two structurally related β-α-β subdomains and a third subdomain of seven helices and three short β strands. The latter domain is usually referred to as the helicase α-helical subdomain. The 13-residue protease activation domain of NS4A contributes one strand to the N-terminal protease β-barrel and is considered to be the sixth subdomain .
Differences in subdomain structure in the NS3 protease variant molecule and in the structures of the isolated protease and helicase domains were assessed in several ways. Inspection of the molecule revealed that the subdomain folds are similar. Overall preservation of structure is also apparent when the subdomains from the various structures are superposed .
The rmsd value of bond lengths and bond angles, the average G-factor and Verify 3D values are shown in Table 2 for NS3 protease variants structures. The same analysis for crystallographic structure (1CU1) present rmsd values of the 0.013Å bond lengths and 1.67°, the average G-fator values of the 0.14 torsion angles and 0.28 covalent geometry, and Verify 3D values of the 321.53 score total and 1.09S quality.
Database design, access, and interface
A MySQL database based on relational database management system (RDBMS) was developed to archive protein structure identified in infectious agents such as NS3 protease variants from hepatitis C virus. All supporting data related to the 3D structures modeling, such as protein codes, atomic coordinates in PDB format from modeled proteins, fasta sequence, links to others databases and various information about the protein were arranged in the MySQL  database under a master table. The aim this database is to provide access to a collection of annotated models generated by automated homology modelling of NS3 protease variants from hepatitis C virus. All models in the database are publicly accessible via our interactive website (Figure 1) . The database user interface provides user friendly menus, so that all information can be printed in one step from any standard web browser. A small ribbon representation is included to obtain a first impression of the model structure (Figure 2 and 3). Atomic coordinates for the homology models can be downloaded in PDB format and their primary sequence in fasta format. The fields are defined with links to the target sequence, the template structure entries in PDB , structural information and analysis. There are two homology models for each sequence in the database, one obtained using 1CU1 as template and other using 1DY9 as template. The second model is adequade for docking simulation, since it was used as template a structure complexed with an inhibitor (PDB access code: 1DY9).
Large scale protein homology modeling, in which whole sequence databases or whole genomes are used as input into automated modeling algorithms, have been reported by several groups . By utilizing powerful computer systems with multiple processors, these efforts have allowed the creation of large databases of homology models of proteins. This project increases the certainty that homology modeling is an useful tool in structural biology and that it can be very valuable in annotating genome sequence information and contributing to structural and functional genomics from virus, bacteria and other organisms.
Inhibition studies have shown that NS3 is only modestly inactivated by classic serine protease inhibitors such as chloromethylketones or phenylmethyl sylfonylfluoride . The structural models will be used to guide future efforts in the structure based design of a new generation of NS3 protease variants inhibitors. This database is freelly available for all users on the Web, providing us with large amount of structural models for use in protein-ligand docking analysis.
Molecular modeling is usually the method of choice when there is a clear relationship of homology between the sequence of a target protein and at least one known structure. This computational technique is based on the assumption that tertiary structures of two proteins will be similar if their sequences are related, and it is the approach most likely to give accurate results . There are two main approaches to homology modeling: (1) fragment-based comparative modeling [14, 19] and (2) restrained-based modeling . For modeling of NS3 protease variants from hepatitis C virus we used the second approach. Model building of NS3 protease variants was carried out using the program MODELLER . MODELLER is an implementation of an automated approach to comparative modeling by satisfaction of spatial restraints [20–22]. The modeling procedure begins with an alingment of the sequence to be modeled (target) with related known three-dimensional structure (templates). This alignment is usually the input to the program. The output is a three-dimensional model for the target sequence containing all main-chain and sidechain non-hydrogen atoms.
Next, the spatial restraints and CHARMM energy terms enforcing proper stereochemistry  were combined into an objective function. Finally, the model is obtained by optimizing the objective function in Cartesian space. The optimization is carried out by the use of the variable target function method  employing methods of conjugate gradients and molecular dynamics with simulated anneling. Several slightly different models can be calculated by varying the initial structure. A total of 1000 models were generated for each enzyme and the final models were selected based on stereochemical quality. All optimization process was performed on a Beowulf cluster with 16 nodes (BioComp, AMD Athlon XP 2100+).
Analysis of the models
The overall stereochemical quality of the final models for each NS3 protease variants from hepatitis C virus was acessed by the program PROCHECK . The root mean square deviations (rmsd) differences from ideal geometries for bond lengths and bond angles were calculated with X-PLOR [26, 27]. G-factor value is essentially just log-odds score based on the observed distributions of the stereochemical parameters. It was computed for the following properties: torsion angles (the analyses provided the observed distributions of φ-δ, χ1-χ2, χ-1, χ-3, χ-4 and ω values for each of the 20 amino acid types) and covalent geometry (for the main-chain, bond lengths and bond angles) these average values were calculated using PROCHECK . The Verify-3D measures the compatibility of a protein model with its sequence, these values were calculated using 3D profile [28–30].
Houghton M: Hepatitis C viruses. In "Fields Virology". 3rd edition. Edited by: Fields BN, Knipe DM, Howley PM. Philadelphia, New York; 1996:1035–1058.
Kuo G, Choo QL, Alter HJ, Gitnick GL, Redeker AG, Purcell RH, Miyamura T, Dienstag JL, Alter MJ, Stevens CE, Tagtmeyer GE, Bonino F, Colombo M, Lee WS, Kuo C, Berger K, Shister JR, Overby LR, Brandley DW, Houghton M: An assay for circulating antibodies to a major etiologic virus of human non-A, non-B hepatitis. Science 1989, 244(4902):362–364.
Urbani A, de Francesco R, Steinkuhler C: Proteases of the Hepatitis C Virus. In Proteases of Infectious Agents. Edited by: Ben M. Dunn, Academic Press, San Diego; 1999:61–91.
Miller RH, Purcell RH: Hepatitis C Virus Shares Amino Acid Sequence Similarity with Pestiviruses and Flaviviruses as Well as Members of Two Plant Virus Supergroups. PNAS 1990, 87: 2057–2061.
Francki RIB, Fauquet CM, Knudson DL, Brown F: Classification ans nomentclature of viruses: Fifth report of the International Committee on Taxonomy of Viruses. Arch Virol Suppl 1991, 2: 223–233.
Rice CM: Flavivirideae: The viruses and their replication. In "Fields Virology". 3rd edition. Edited by: Fields BN, Knipe DM, Howley PM. Philadelphia, New York; 1996:931–959.
de Azevedo WF, Mueller-Dieckmann HJ, Schulze-Gahmen U, Worland PJ, Sausville E, Kim SH: Structural basis for specificity and potency of a flavonoid inhibitor of human CDK2, a cell cycle kinase. Proc Natl Acad Sci USA 1996, 93: 2735–2740. 10.1073/pnas.93.7.2735
de Azevedo WF, Leclerc S, Meijer L, Havlicek L, Strnad M, Kim SH: Inhibition of cyclin-dependent kinases by purine analogues: crystal structure of human cdk2 complexed with roscovitine. Eur J Biochem 1997, 243: 518–526. 10.1111/j.1432-1033.1997.0518a.x
Veerapandian P: Structure-based drug design. Edited by: Marcel Dekker. New York, INC; 1997.
Brenner SE, Levitt M: Expectations from structural genomics. Proteins Sci 2000, 9: 197–200.
Fasman GD: Prediction of Protein Sturcture and the Principles of Protein Conformation. New York, Plenum Press; 1989.
Brooks CL III, Karplus M, Pettit BM: Proteins: A Theoretical Perspective of Dynamics, Structure and Thermodynamics. John Wiley & Sons, New York; 1988.
Cohen FE, Kuntz ID: Tertiary structure prediction. In Prediction of Protein Structutre and the Principles of Protein Conformation. Edited by: Fasman GD. Plenum Press, New York; 1989:647–705.
Blundell TL, Sibanda BL, Sternberg MJE, Thornton JM: Kownledge-based prediction of protein structures and the design of novel molecules. Nature (London) 1987, 326: 347–352. 10.1038/326347a0
Sali A, Blundell TL: Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated anneling and dynamic programming. J Mol Biol 1990, 212: 403–428.
Sali A, Blundell TL: Comparative Protein Modelling by Satisfaction of Spatial Restraints. J Mol Biol 1993, 234: 779–815. 10.1006/jmbi.1993.1626
Sánchez R, Sali A: ModBase: A database of comparative protein structure models. Bioinformatics 1999, 15: 1060–1061. 10.1093/bioinformatics/15.12.1060
Kroemer RT, Doughty SWW, Robinson AJ, Richards WG: Prediction of the three-dimensional structure of human interleukin-7 by homology modeling. Protein Eng 1996, 9(6):493–498.
Blundell TL, Carney D, Gardner S, Hayes F, Howlin B, Hubbard T, Overington J, Singh DA, Sibanda BL, SutCliffe M: 18th Krebs, Hans Lecture knowledge-based protein modeling and design. Eur J Biochem 1988, 172(3):513–520.
Sali A, Overington JP: Derivation of rules for comparative protein modeling from a database of protein structure alignments. Proteins Sci 1994, 3: 1582–1596.
Sali A, Potterton L, Yuan F, van Vlijmen H, Karplus M: Evaluation of comparative protein modeling by MODELLER. Proteins 1995, 23(3):318–326.
Sali A: Modeling mutations and homologous proteins. Curr Opin Biotechnol 1995, 6(4):437–451. 10.1016/0958-1669(95)80074-3
Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M: CHARMM: A program for macromolecular energy minimization and dynamics calculations. J Comp Chem 1983, 4: 187–217. 10.1002/jcc.540040211
Braun W, Go N: Calculation of protein conformations by proton-proton distance constraints. A new efficient algorithm. J Mol Biol 1985, 186(3):611–626.
Laskowski RA, MacArthurm MW, Smith DK, Jones DT, Hutchinson EG, Morris AL, Naylor D, Moss DS, Thornton JM: PROCHECK v.3.0 – Program to check the stereochemistry quality of protein structures – Operating instructions. 1994.
Schwieters CD, Kuszewski JJ, Tjandra N, Clore GM: The Xplor-NIH NMR Molecular Structure Determination Package. J Magn Res 2003, 160: 66–74. 10.1016/S1090-7807(02)00014-9
Brünger AT: X-PLOR, A System for Crystallography and NMR,. Yale Univ Press, New Haven, CT, Version 3.1; 1992.
Bowie JU, Luthy R, Eisenberg D: A Method to Identify Protein Sequences That Fold into a Known Three-Dimensional Structure. Science 1991, 253: 164–170.
Luthy R, Bowie JU, Eisenberg K: Assessment of protein models with three-dimensional profiles. Nature 1992, 356: 83–85. 10.1038/356083a0
Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22(12):2577–2637. 10.1002/bip.360221211
Yao N, Reichert P, Taremi SS, Prosise WW, Weber PC: Molecular Views of Viral Polyprotein Processing Revealed by the Crystal Structure of the Hepatitis C Virus Bifunctional Protease-Helicase. Structure (London) 1999, 7: 1353–1363.
DuBois: MySQL,. New Riders, Indianapolis, IN, USA; 2000.
Database of NS3 Protease from Hepatitis C Virus[http://www.biocristalografia.df.ibilce.unesp.br/virus]
Westbrook J, Feng Z, Chen L, Yang H, Berman HM: The Protein Data Bank and structural genomics. Nucleic Acids Res 2003, 31: 489–491. 10.1093/nar/gkg068
Foster MJ: Molecular modeling in structural biology. Micron 2002, 33: 365–384. 10.1016/S0968-4328(01)00035-X
Bouffard P, Bartenschalager R, Ahlborn-Laake L, Mous J, Roberts N, Jacobsen H: An in vitro assay for hepatitis C virus NS3 serine proteinase. Virology 1995, 209: 52–59. 10.1006/viro.1995.1229
Koradi R, Billeter M, Wuthrich K: MOLMOL: a program for display and analysis of macromolecular structures. J Mol Graphics 1996, 14: 51–55. 10.1016/0263-7855(96)00009-4
Di Marco S, Rizzi M, Volpari C, Walsh M, Narjes F, Colarusso S, De Francesco R, Matassa VG, Sollazzo M: Inhibition of the Hepatitis C Virus Ns3/4A Protease the Crystal Structures of Two Protease-Inhibitor Complexes. J Biol Chem 2000, 275: 7152. 10.1074/jbc.275.10.7152
Uchôa HB, Jorge GE, da Silveira NJF, Camera JC Jr, Canduri F, de Azevedo WF Jr: Parmodel: a web server for automated comparative modeling of proteins. Biochem Biophys Res Commun 2004, 325: 1481–1486. 10.1016/j.bbrc.2004.10.192
This work was supported by grants from FAPESP (SMOLBNet 01/07532-0, 02/04383-7, 04/00217-0), CNPq, CAPES and Instituto do Milênio (CNPq-MCT). WFA (CNPq, 300851/98-7) is researcher for the Brazilian Council for Scientific and Technological Development.
NJFS carried out the molecular modeling and participated in the building and design of the database tables, HAA and CEB carried of analysis of the models and created database interface, FPS selected the primary sequences of NS3 protease variants of the hepatitis C virus, IMVGCM participated of the study, PR conceived the study and suggestions in this manuscript, JRRP participated in the study and suggestions in this manuscript, WFA conceived the study, and participated in its analysis and coordination. All authors read and approved the final manuscript.