Upon the loss of a 19 residue length signal peptide, the mature fVIII zymogen circulates in blood as a single chain protein with 2332 residues (A1-A2-B-A3-C1-C2) out of which the B-domain consists of 573-residues . Factor VIII was experimentally characterized to possess eight disulfide bonds . Each of the A-domains contains two disulfide bonds and one free cysteine residue while the C1 and C2 domains possess one disulfide bond in each domain. The function of fVIII has long been implicated to depend critically on the binding of calcium and copper ions to the protein [29, 54–57]. Several biochemical studies proposed that the A1 domain contains one high-affinity calcium ion while there might be a low-affinity calcium ion bound to the other A-domains, though the specific details were not known [58, 59]. Also, the binding of a Type-I copper ion in each of the A1 and A3 domains and a Type-II copper ion binding at the interface of A1 and A3 domains were proposed based on comparison with copper ion binding network in several copper-binding proteins as well as experimental mutagenesis studies .
The B-domain in fVIII has no known structural homology in the protein database. Consequently, the modeling of B-domain is intractable to modern computational methods. The single-chain fVIII protein without the B-domain (741-1668) appeared to retain the full fVIII functionality as investigated by Donath et al . In their study, the A1-A2 domains (Ala1-Arg740) were directly fused with A3-C1-C2 domains (Ser1669-Tyr2332) to generate a single-chain B-domainless fVIII protein. The fusion site, Arg740-Ser1669, was proposed to be functionally similar to other cleavage sites and was readily cleaved by thrombin and fXa. Thus, the exclusion of B-domain of zymogen fVIII may be considered to bear no effect on the structure-function studies related to fVIII protein and its activation pathways. The lack of experimental structural data for the missing linker peptides between A1:A1 and A2:A3 poses a major challenge to computational modeling as the peptide sequences are 40-50 residue long. In this study, we applied a step-wise modeling approach to generate the full models of fVIII and fVIIIa in several stages as described below.
Step1. Refinement of the core structure of fVIII zymogen
The full model of single chain B-domainless fVIII was generated using the 3.7 Å resolution X-ray structure of the fVIII zymogen (PDB:2R7E) . The crystal structural coordinates for the B-domainless protein reported include: Ala1-Asn214; Ala222-Gln334; Phe360:Asp725; Arg1689-Tyr2332. We modeled the missing loop residues between Asn214 and Ala222 using the MODLOOP server http://modbase.compbio.ucsf.edu/modloop/modloop.html. The initial X-ray crystal structure was processed to include one calcium ion in the A1-domain and one copper (I) ion each in the A1 and A3 domains. The coordinates for the ions corresponded to those in the crystal structure. The calcium ion in the crystal structure was not well positioned as it was found to coordinate with only two atoms. These atoms correspond to one of the side-chain carboxylate atoms of Asp126 and the backbone oxygen atom of Ala110 within the A1-domain. The other likely coordinating atoms in the acidic-rich calcium binding loop were positioned at least 5 Å away from the calcium ion. The copper (I) ions within the A1 and A3-domain were located within the coordinating distance of His1954/Cys2000/His2005/Met2010 and His267/Cys310/Met320/His315. In addition, we introduced one copper (II) ion at the A1-A3 domain interface with the coordinating residues His99/His161 of A1- domain and His1957 of A3-domain, in consensus with the reported experimental studies as well as the structural comparison with other copper-binding proteins [29, 63–66]. The starting model used for MD refinement contained the following residues: A1 (Ala1-Gln334); A2 (Lys376-Thr715); A3-C1-C2 (Arg1689-Tyr2332). The crystal structure together with the copper and calcium ions was immersed in a period box of waters to set up the MD simulation. The model was subjected to initial refinement of 15 ns of MD simulations in explicit water medium in order to relax the structure and to improve the overall stereochemical space of the structure. Analysis of the structure corresponds to the 15 ns of MD trajectory showed ~7 Å RMS deviation from the starting structure when superimposed the backbone atoms of the two structures. The resulting structure was subsequently used to model the missing linker peptides connecting the A1/A2 and A2/A3 domains as described below.
Step2: Generation of acidic linker structure between A1 and A2 domains
The 46-residue length linker peptide between Glu332 and Pro379 is predominantly acidic with 18 residues being negatively charged. The sequence from 341 to 349 (EEAEDY*DDD) has large density of acidic residues. The residue Tyr346 (Y*) in the sequence represents sulfated tyrosine. Attempts to generate a possible structure for the linker peptide using homology modeling methods failed due to no corresponding template structure in the PDB. Consequently, the structure of the peptide was modeled using MD simulations for a total time period of more than 200 ns. Initially two beta-sheets that connect the linker peptide from the core residues of A1 (Ala322-Glu332) and A2 domain (Pro379-Ala388) were selected from the solvent equilibrated model (step1) as shown in Figure 7a. Analysis of the structure showed that the distance of backbone Cα atoms between Glu332 and Pro379 residues was 30.4 Å. A possible structure for the missing linker peptide between two connecting residues was generated by starting from a linear conformation for the sequence Glu332-Pro379. In order to let the structure fold into stable conformation, the structure was subjected initially to a 100 ns of unconstrained MD simulations in explicit water. Then, while imposing distance constraint of 30 Å between the two Cα atoms of Glu332 and Pro379, the structure was further simulated for additional 50 ns. The resulting linker peptide structure was then superimposed manually with the coordinates of the two beta sheet structures such that Glu332 and Lys377 residues form peptide bonds with Pro333 and His378 respectively. The conformation of the peptide bonds Glu332-Pro333 and Lys377-His378 were adjusted to acceptable Phi-Psi values. While constraining the residues Ala322-Cys329 of A1-domain and Thr381-Ala388 of A2-domain, the resulting model was subjected to 100 ns of MD simulations in explicit water. These residues were constrained in order to be able to insert, subsequently, the optimized loop structure back into the model of fVIII from step1. The last 30 ns of the stabilized MD trajectories were clustered to generate 10 possible conformations for the linker peptide as shown in Figure 7b. All conformations were superimposed against the MD equilibrated model of fVIII from step1. Since the two beta-sheets of A1 and A2 domains were constrained during the optimization of the linker peptide, we were able to align the coordinates of the beta-sheets with the corresponding residues in the MD equilibrated model (from step1). Out of the ten structures, only two conformations satisfied the spatial constraints such that the peptide could be inserted, with no steric clashes, into the model of fVIII. The rest of the structures did not fit into the fVIII structure as they overlapped with the core residues of A1 and A2 domains. We chose one of the two conformations to fit between the A1 and A2 domains to complete the missing linker peptide.
Step3: Generation of linker peptide structure between A2 and A3 domains
The structure prediction for the linker peptide between the A2 and A3 domains followed a similar approach adopted in step2. MD simulations on fVIII model from step1 revealed that the C-terminal residues from 712 to 725 within the A2-domain appear to be flexible with no stabilization of the conformation. Consequently, these residues were deleted from the model of fVIII and reconstructed to build the linker peptide between Cys711 and Arg1689. The linker peptide from Cys711 to Arg740 of A2-domain and Ser1669-Arg1689 forms a contiguous sequence in the B-domainless fVIII model considered in the present study. The Cα-Cα distance between Cys711 (A2-domain) and Ser1690 (A3-domain) in the solvent equilibrated model of fVIII (from step1) is ~43 Å. In order to generate a possible structure for the linker peptide using MD simulations, a short model of A2 and A3 domains was selected from the MD equilibrated model of fVIII (from step1). The model comprised of selected residues of the A2 and A3 domains. The residues corresponded to: Val621-Ile639/Ser650-His660/Val678-Asp712 of A2-domain and Ser1690-Arg1705/Gly1760-Pro1809/Glu1811-Lys1827 of A3-domain. We chose the reduced model in place of the full structure of fVIII to increase the computational speed of MD simulations and also to minimize the computational cost. Initially, a random conformation was created for the missing linker peptide corresponding to the residues between Asp712-Arg740 and Ser1669-Ser1690 as a contiguous sequence (Figure 8a). The Arg740 (A2-domain)-Ser1669 (A3-domain) residues were directly bonded to represent the contiguous B-domainless fVIII structure. The tyrosine residues at positions 718, 719 and 723 were post-translationally modified in fVIII. Accordingly, the side-chain hydroxyl group in the three residues was modified to include the sulfated group. The positional constraints were applied during the simulation such that only the linker peptide region (Asp712-Arg740-Ser1669-Ser1690) moved while the rest of the protein remains constrained. The model was subsequently refined for 100 ns of MD simulation to generate a possible conformation for the linker peptide.
Step4: Full model of B-domainless Factor VIII zymogen
The solvent-equilibrated models of A1-A2 and A2-A3 linker peptide structures, derived from the steps 2 and 3, were inserted into the equilibrated model from step-1 to generate the full structure of fVIII. Since the non-loop regions of the models used for A1-A2 and A2-A3 linker peptide modeling were constrained during the MD simulations, both models superimposed with the solvent equilibrated model (step1) with no steric clashes. Sequence characterization studies showed that asparagine (ASN) residues at positions 41, 239, 582, 1810 and 2118 were post-translational glycosylation sites . Accordingly, these sites were modified to include the N-linked acetyl glucosamine (N-GlcNAc) carbohydrate attachments using the GLYCAM server http://glycam.ccrc.uga.edu/. In addition, tyrosine (TYR) residues at positions 346 (within the A1-A2 linker region) was modified to incorporate the sulfated tyrosines . The other three tyrosine sulfation sites at 718, 719 and 723 positions at the C-terminus of A2-domain were already modified in the step3. The final model was subsequently refined for 70 ns of MD simulations in explicit water medium.
Step5: Full model of activated Factor VIII
The domain organization of fVIII zymogen and its activated form, fVIIIa, are similar except that the activated form does not possess B-domain. The residues involved in the inter-domain interactions between A1:A2 and A2:A3 domains appeared to be comparable as suggested by recent site-specific mutagenesis studies [6, 68, 69]. In this study, a panel of 30 residues that appeared to be important for A1:A2 and A2:A3 interactions were subjected to alanine mutagenesis in both fVIII and fVIIIa. The reported protein decay rates in most of the mutants were comparable in both active and zymogen forms, suggesting that the residues involved in the inter-domain interactions might be identical. Thus, it may be reasonable to model the activated form from the zymogenic structure.
In order to build the activated fVIII model in which the Arg372-Ser373 peptide bond gets cleaved during the proteolysis by thrombin and/or Xa, the solvent-equilibrated fVIII zymogen structure, derived from the 70 ns of MD simulations, was modified to generate the activated form. This was accomplished by explicitly breaking the Arg372 and Ser373 bond and creating a hetero-trimeric complex among the A1 (Ala1-Arg372), A2 (Ser373-Arg740) and A3-C1-C2 (Ser1690-Tyr2332) domains. The full model of fVIIIa, thus obtained, was subjected to further refinement of 75 ns of MD simulations to obtain the solution structure of fVIIIa.