Dataset of experimental PfM18AAP inhibitors
A dataset of 32 compounds known as inhibitors of PfM18AAP was extracted from National Center for Biotechnology Information PubChem bioassay (AID 743024) (https://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=743024). Another high throughput screened dataset of 3502 known bioactive inhibitors of PfM18AAP was extracted from AID 1822 used for docking studies against PfM18AAP (http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=1822). A library of 153,873 compounds was obtained from the ChEMBL antimalarial database used for finding novel inhibitors against PfM18AAP metalloproteinase [https://www.ebi.ac.uk/chembl/]. Additionally, 27 antimalarial drugs described by WHO, 32 analogous of quinine compounds(QN) (AID 660170), 24 analogous of chloroquine (CQ) (AID 404780), and 17 analogous of 8-aminoquinoline(8-AmQN) (AID 554037) were also extracted for molecular docking, 3D QSAR model and pharmacophore similarity search. 2D structures were converted to 3D structures using Corina 2.64v [13] and open babel [14].
Molecular descriptors
The molecular descriptors were calculated by VLifeMDS version 4.3 using Gasteiger-Marsili charge [15, 16]. The PfM18AAP inhibitors compounds along with their activity pIC50 values were given as input for force field calculation. The steric and electrostatic interaction energies are computed using a methyl probe of charge +1.
Development of 3D QSAR models
The biological activity (pIC50) of inhibitors was selected as dependent variables and descriptors as independent variables. The 60 % data for the training set and 40 % for test set were manually selected. The unicolumn statistics were calculated to validate training and test sets. The 3D QSAR models were built using PLSR, PCR, and kNN-MFA by stepwise forward-backward method [17].
3D QSAR Model validation
Internal validation
To perform internal validation (cross validation), a compound is eliminated from the training set and then its biological activity is predicted to validate model accuracy. This step is repeated until the biological activity of every compound in the training set is predicted once. The cross-validated coefficient, q2 is calculated using the given Eq. (1):
$$ {q}^2=1-\frac{{\displaystyle \sum }{\left({y}_i-{\widehat{y}}_i\right)}^2}{{\displaystyle \sum }{\left({y}_i-{y}_{means}\right)}^2} $$
(1)
Where, yi and \( {\hat{y}}_i \) are the actual and predicted activities of the ith molecule in the training set respectively, and ymeans is the average activity of all the molecules in the training set [18, 19].
External validation
External validation (pred_r2) is carried out by calculating predicted correlation coefficient (pred_r2) value using following Eq. (2):
$$ pred\_{r}^2=1-\frac{{\displaystyle \sum }{\left({y}_i-{\widehat{y}}_i\right)}^2}{{\displaystyle \sum }{\left({y}_i-{y}_{means}\right)}^2} $$
(2)
Where, yi and \( {\hat{y}}_i \) are the actual and predicted activities of the ith molecule in the test set, respectively, and ymeans is the average activity of all the molecules in the training set.
A Z-score value is calculated by the following Eq. (3):
$$ Zscore=\frac{\left(h-\mu \right)}{\sigma } $$
(3)
Where, h is the q2 value calculated for the actual dataset, μ is the average q2and σ is the standard deviation calculated for various models built on different random datasets [20].
F-test is Fisher value which indicates statistical significance, a value greater than 30 is considered good, which gives an idea of the chances of failure of the model. On the other hand, q2_se is the standard deviation of cross validated prediction and r2_se is standard deviation is a measure of the absolute quality of a model.
Pharmacophore modeling
The pharmacophore model was built using the Phase module of Schrodinger maestro [21]. The same set of inhibitors of PfM18AAP was subjected to LigPrep module which produces high-quality, all-atom 3D structures with correct chirality. Some pharmacophore hypotheses were generated along with their respective set of aligned conformations. These hypotheses were generated by a systematic variation of many sites and a number of matching active compounds. These selected features were used to build a series of pharmacophore hypotheses by selecting find the common pharmacophore option in phase. The common pharmacophore hypotheses were analyzed using the survival score to yield the best alignment of the active ligands using a maximum overall root mean square deviation (RMSD) value of 2 Å for distance tolerance. Finally, several pharmacophore hypotheses were generated along with their respective set of aligned conformations. All pharmacophore hypotheses were scored for active survival, inactive survival, site, vector, volume, the number of matches, selectivity, energy, active, and inactive terms. Survival score secured by each hypothesis is the measure of the quality of alignment for a particular hypothesis [22].
Docking and scoring
Molecular docking
To understand the nature of the interaction of inhibitors described above [23] with PfM18AAP, molecular docking was performed using GOLD v5.2 (Genetic Optimization for Ligand Docking) [24] and GLIDE module of Schrödinger using [21] against the PfM18AAP. The crystal structure of PfM18AAP (4EME) was obtained from protein data bank (www.rcsb.org/pdb/explore/explore.do?structureId=4eme). Since PfM18AAP requires cofactors for enzymatic activity, Zn was retained during docking analysis [25]. In GOLD docking, the 10 best docked complexes were ranked based on their GOLD fitness score. In GLIDE docking, the top 10 compounds were selected based on G-score. The binding affinity of docked complex was calculated using X-Score v1.2.1 [26]. Protein-ligand interaction was analyzed by using Pymol version 1.1r. www.pymol.org/ and LigPlot + v1.4.5 [27].
Screening of PfM18AAP inhibitors
In this work, High Throughput Virtual Screening (HTVS) used Glide module of the Schrodinger software suite [21]. The ligand libraries were first prepared by adding hydrogen and generating conformations through the LigPrep module. This LigPrep module generated tautomer with the OPLS2005 force field, the total no. of 411,766 output structures were obtained. Then grid on the protein active site was generated. Firstly, HTVS for every ligand library was done and the top 1000 ranked compounds from every library were subjected to Extra-Precision (XP) screening. In both the cases, the structures were flexibly docked on the protein structure. The non-planar conformations were penalized. Structures were having more than 200 atoms or more than 35 rotatable bonds were not docked. Also, the Van Der Waal’s radius scaling factor was set to 0.8, and the partial charge cutoff was set to 0.15. From these 1000 compounds, the top 10 compounds from every library were extracted as target-bound complexes. These complexes were re-scored, and their binding affinity was calculated using X-score software.