Crystal structures of the human Dysferlin inner DysF domain

Background Mutations in dysferlin, the first protein linked with the cell membrane repair mechanism, causes a group of muscular dystrophies called dysferlinopathies. Dysferlin is a type two-anchored membrane protein, with a single C terminal trans-membrane helix, and most of the protein lying in cytoplasm. Dysferlin contains several C2 domains and two DysF domains which are nested one inside the other. Many pathogenic point mutations fall in the DysF domain region. Results We describe the crystal structure of the human dysferlin inner DysF domain with a resolution of 1.9 Ångstroms. Most of the pathogenic mutations are part of aromatic/arginine stacks that hold the domain in a folded conformation. The high resolution of the structure show that these interactions are a mixture of parallel ring/guanadinium stacking, perpendicular H bond stacking and aliphatic chain packing. Conclusions The high resolution structure of the Dysferlin DysF domain gives a template on which to interpret in detail the pathogenic mutations that lead to disease.


Background
Dysferlinopathies are a group of autosomal recessive inherited late onset progressive muscular dystrophies caused by malfunction of dysferlin protein. Mutations in the dysferlin protein cause three phenotypes called limb girdle muscular dystrophy type 2B [1], Miyoshi myopathy [2], and distal anterior compartment myopathy [3]. Dysferlin is not a part of the dystrophin-glycoprotein complex, but its function is linked with calcium-activated membrane repair caused by fusing aggregated intracellular vesicles with the sarcolemma at the site of injury [4][5][6]. The mechanism of membrane repair is not yet determined in detail and the specific role of dysferlin needs to be defined at the structural level.
Dysferlin, a member of the ferlin protein family, is a type II anchored membrane protein with a single C terminal helix buried in the membrane. Ferlin proteins are defined as containing four or more C2 domains and a C terminal trans-membrane helix. There are 6 ferlin proteins expressed in human; dysferlin, myoferlin, otoferlin, Fer1L4, Fer1L5, and Fer1L6 [7]. Myoferlin is the most similar paralogue to dysferlin. Both proteins are predicted to have the same domain composition with overall sequence identity of 56%. Dysferlin is expressed in most tissues but is found in abundance in skeletal muscle, heart, brain and placenta.
The multiple domain architecture of dysferlin was analysed by a combination of Gene3D [8], SMART [9] and Pfam [10] domain family resources. The domain architecture is predicted to consist of seven C2 domains (C2A to C2G), three Fer domains (FerA, FerB and FerI), two DysF domains, one nested inside the other, and a C terminal trans-membrane domain (Figure 1a). C2 domains are found in hundreds of proteins and many are known to bind to phospholipids or proteins, often in a calcium dependent manner [6]. In dysferlin and myoferlin C2A binds to phospholipids in a calcium dependent manner. [11,12]. The other C2 domains do not show calciumdependent binding to lipids, but do show some calcium independent binding to phospholipids [11,12]. The other C2 domains are believed to interact with dysferlin binding proteins or to be involved in dimerisation [13]. Recently it has been shown that there is a minor variant of the dysferlin C2A domain, C2Av1, that does not bind via calcium [14]. Crystal structures of the canonical and variant structures of C2A show conservation of structure, even where the sequence is not conserved. Biophysical characterisation of the interactions with phospholipids and calcium indicate that the C2A and C2Av1 domains are highly conformationally flexible [14]. The Fer domains (FerA, FerB, and FerI) are short conserved regions found only in the ferlin protein family and are not yet shown to be folded domains.
Dysferlinopathy causing mutations are dispersed throughout the length of the protein, but many fall in the DysF domains [15,16]. One DysF domain is inserted into the other DysF domain, by gene duplication, forming an inner DysF domain and a two part (N terminal and C terminal) outer DysF domain [17]. The function of the DysF domain is unknown. The human myoferlin (dysferlin paralogue) inner DysF domain structure was solved by Nuclear Magenetic Resonance (NMR) and showed a novel fold. This consists of two long beta strands connected by a long loop that caps the sheet edges in certain sections [18]. The structure contains arginine/tryptophan stacks that holds the fold together and are largely conserved throughout DysF domain sequences (Figure 1b).
In this study, we have determined the three dimensional structure of human dysferlin inner DysF domain by X-ray crystallography at 1.9 Å. This is the first DysF domain crystal structure.

Structure characterisation
The structure was solved by molecular replacement with the NMR structure of the inner DysF of myoferlin [18]. The sequence identity between the inner DysF domain in dysferlin and myoferlin is 61% (Figure 1b). Most datasets that were collected, processed and refined in space group P2 1 3 (cubic), with the best resolution dataset diffracting to 1.9 Å. A single dataset was collected in a different space group P2 1 2 1 2 1 (orthorhombic) to 2.2 Å. The crystal packing is almost the same in the two forms, but the distortions of the perfect cube in the orthorhombic structure means the monomers of the crystallographic trimer in the cubic crystals are no longer identical (Figure 2a, b). This leads to some of the flexible regions (eg residues 965-671) being more visible in some chains with the orthorhombic data. Except where stated otherwise the analysis is based on the single chain in the 1.9 Å structure. Statistics for both datasets are summarised in Table 1  The inner DysF domain construct starts at residue Met942 and ends at Gln1052 and has a Ser on the N terminus from the TEV protease site. The additional Ser, Met942 and Gln1052 are not visible in the structures. The main secondary structure consists of two long antiparallel β-strands, one at each terminus (N terminus 946-958, C terminus 1036-1049). These β-strands are connected with a long loop (77 residues). The loop caps the edge of the N terminal β-strand with main chain hydrogen bonds from residues 966, 971 and 973. The C terminal β-strand is capped with main chain hydrogen bonds from residues 993, 995, 996, 1000, 1013 and 1015. These give short β strands within the linking loop. There is also a single turn of 3 10 helix from 983 to 986 (Figures 1b, 2a). This secondary structure is also conserved in the NMR structure of the myoferlin inner DsyF domain [18].
The structure is highly conserved between independent copies in the crystals. The crystallographic trimer of the P2 1 3 crystal superimposes onto the trimer in the P2 1 2 1 2 1 asymmetric unit with an RMSD Calpha of 0.89 Å over 326 residues of the trimer. The four individual chains from the two assymetric units superimpose no worse than 0.79 Å Calpha RMSD over 108 residues for any pair. The conservation with the NMR structure of myoferlin is also extremely high with a Calpha RMSD of 1.78 Å over 106 residues being the worst and 1.45 Å over 103 residues being the best of the 20 models ( Figure 2c). The biggest variation between dysferlin models is between 965 and 971 and 1018 and 1021 ( Figure 2d). These are also the regions with the highest temperature (B) factors in dysferlin, the regions with the largest RMS between NMR models in myoferlin, as can be seen from the CING database [20,21], and the regions that differ most between dysferlin and myoferlin. However the region from 958-960 also has high B factors, while conserving the backbone trace between structures quite well.
The trimer (with 9 phosphates) is predicted to be the stable assembly by PDBePISA [22]. Formation of the assembly buries a total 5250 Å 2 . The interface between the protein chains (repeated 3 times in the trimer) forms 5 (pisa) -7(ccp4mg) hydrogen bonds and 3 salt bridges. Three of these are mainchain-mainchain, with β3 forming intermolecular β strand links to β1 as well as the intramolecular H bonds to β6. 2 H bonds are mainchain- sidechain and the others are sidechain-sidechain. Indeed the formation of β3 is probably driven by the trimerisation. The relatively low B factor in this side of the loop may be partially caused by this packing. We have no evidence from gel filtration or NMR spectroscopy (data not shown) for the existence of a trimer, nor is there any likely way to invoke a trimer in vivo. There are two DysF domains in dysferlin and the reported oligomeric state of the full length protein is a dimer. Therefore two or four DysF domains seems more likely higher order assemblies, although extending the beta sheet either with other DysF domains or other proteins does seem to be a likely method of interaction.

Arginine/tryptophan (R/W) stacks
As reported for the myoferlin inner DysF domain [18], the dysferlin DysF domain is held together by arginine/aromatic sidechain stacking. Superficially there is a stack of arginines and tryptophans which runs the entire length of one face of the beta sheet, and a single small group on the other face (Figure 3a). However, at the good resolution of these crystal structures, we can give a much confident description of the exact nature of the interactions. In particular arginines can interact with aromatics either in a stacked (ie with the guanadinium parallel to the aromatic ring-above the six membered ring) [23] or in an H bond (with the amino group pointing at the ring and the guanadinium plane perpendicular to the aromatic ring). Theoretical calculations favour the H bond arrangement in vacuum but the stacked arrangement in water [24]. We do see both these types of interaction but a number of the arginine-guanadinium groups do not lie above the centre of aromatic ring. In fact the atlas of sidechain interactions [25,26], does not have the Arg -Trp pairs above the centre of the ring in most of the clusters. Only one cluster lies above the 6 membered ring. In four of the six cases the guanadiniums lie above or beyond the NE-CZ containing edge of the ring and in one case beyond the opposite edge of the indole. In these cases there is still a hydrophobic interaction of the aliphatic sidechain with the ring and the guanadinium is available to form other H-bonds. A working definition of aliphatic stacking is that at least two side chain atoms are within 3.8 Å of Trp ring atoms (the CH2-CHar Van der Waals radii add up to 3.74 Å [27]).
The details of the R/W interactions are laid out in Table 2 and examples shown in Figure 3. The analysis shows that in fact there is not one continuous stack, but three on one face of the sheet. R1046 is to one side of W994, with only one non-H atom within 3.8 Å. Instead Cys988 forms an aromatic H bond interaction with W994. The second break occurs between, R1040 and W1042, where both side chains hydrogen bond to E955, but there is no direct interaction between the two. On the other face R1039 stacks with both F954 and W1027. Most but not all the stacking interactions are conserved in the myoferlin inner domain, but the NMR structure does not define which mode of packing is taking place. By sequence homology most of these interactions would be found in the dysferlin outer dysf domain (sequence identity 32%). The inner dysf domain of myoferlin lacks equivalents of K983 and W1042, although the overall fold is well conserved; conversely the outer dysferlin dysF domain lacks an equivalent of C988 and R1048, so in both cases the stacks will be a bit shorter.

Dysferlinopathy mutations
There are 15 missense mutations in the inner dysferlin domain reported in the Leiden dysferlin mutation database [15,16]. These are summarised in Table 3 which Outliers (%) 0.9 1.9 Values in parentheses refer to the highest resolution shell. R merge = Σ (I -< I>) / Σ < I > . R work = Σ (|F obs |-|F calc |)/ Σ|F obs | for 95% of data. Rfree is the same equation for 5% of the data excluded from refinement.
summarises the position. Figure 4 shows position of all the point mutations and close up views of the three most reported mutations. The three most frequent mutations disrupt the R/W stacks and are likely to lead to a less stable or possibly unfolded domain. The unfolded domain may lead to degradation of the entire dysferlin protein. It is also notable that there are several surface residues mutated, although in this very flat domain nearly every residue contributes to the surface, which may indicate that the DysF domain is involved in protein-protein interactions.

Conclusions
The structure of the inner DysF domain of dysferlin confirms the unusual fold of this domain first seen in the myoferlin homologue. The high resolution of the structure allows detailed analysis of the interactions forming the R/W stacking seen in this domain. It also provides a better model for understanding the disease causing point mutations in this domain seen in dysferlin patients. The most common mutations will disrupt the R/W stacking, making the domain more unstable or inherently unfolded, probably enhancing the degradation of the protein. Some of the dysferlin mutants map to the surface, implying that proteinprotein interactions may have a role in the function of this domain.

Protein preparation
Human dysferlin cDNA (Jain Foundation) was used to amplify the DNA encoding for the residues 943-1052. The human dysferlin inner DysF domain was cloned into pNic28Bsa4 plasmid (supplied by Dr Opher Gileadi of the Structural Genomics Consortium), which is a modified pET28a plasmid that allows ligation independent cloning [48]. The vector contains an N terminal histidine tag followed by a TEV protease cleavage site. The cloned sequence was confirmed by DNA sequencing and the plasmid transformed into E. coli strains Rosetta 2 (DE3) cells. One fresh colony was inoculated into 100 ml of Luria Broth media (LB) and the culture was grown overnight. 8 × 500 ml of LB were inoculated with 1% of overnight culture and incubated at 37°C until the optical density reached 0.5 at a wavelength of 600 nm. Then the cultures were induced with 0.5 mM of IPTG and incubated at 18°C for 20 hours. Cells were collected by centrifuging at 4°C at 4000 rpm for 20 minutes and resuspended in binding buffer (20 mMTris, pH 7.5, 500 mM NaCl, 20 mM imidazole, 10 mM 2-mercaptoethanol). The cells were stored at -20°C. After incubating with EDTA free protease inhibitor cocktail (Roche Applied Science, Switzerland), 0.1% Triton X-100 and DNAseI on ice for 30 minutes, the cells were sonicated at 20 W output for 4 minutes of 4 second on/off pulses for three cycles on ice and then centrifuged at 48,000 g for 1 hour. The protein lysate was applied to a HisTrap (GE Healthcare) affinity column. The column was washed with binding buffer for 10 column volumes and the protein eluted in 20 mM Tris, pH 7.5, 500 mM NaCl, 500 mM Imidazole, 10 mM 2-mercaptoethanol buffer. The eluents containing the  LGMD2B & MM 25 [28][29][30][31][32][33][34] In Stack 3 0.522

Protein crystallisation and structure determination
The inner DysF domain was screened for crystallisation at 16°C, by the sitting-drop method in 96-well crystallisation plates (Molecular Dimensions), using PACT screen [49]. After 11 days, small crystals appeared in a drop with mother liquor consisting of 0.2 M NaBr and 20% PEG 3350. This crystallisation could not be reproduced initially. 0.2 M NaBr was then added to the protein sample and the crystallisation screen was done again. Bigger crystals were produced in one day, where the reservoir contained 0.04 M potassium dihydrogen phosphate, 16% w/v PEG 8000 and 20% v/v Glycerol. 0.3 mm cubic crystals were grown by the hanging drop method based on this condition. The crystal was cryoprotected in crystallisation buffer with the glycerol concentration increased to 25%. Initial diffraction data was collected to 2.3 Å in house using MicroMax TM -007 rotating anode X-ray generator (λ = 1.54 Å) and Saturn 944+ CCD detector with Varimax optics. Further data was collected at Soleil beamline proxima 1 and at beam ID29 at ESRF. Diffraction images were processed using XDS software package [50], and scaled using aimless in the CCP4 program suite [51]. The initial phases of the dysferlin inner DysF domain were determined by molecular replacement with the program Mr Bump [52] using human myoferlin inner DysF domain [18] [PDB:2K2O] as the search model. The model was manually rebuilt in COOT [53] and refined initially in Refmac [54] and then continued in PHENIX [55]. Data collection and refinement statistics are summarized in Table 1.
No human subjects were directly used in this study. Human mutation data was taken from publicly accessible databases.