Structuprint: a scalable and extensible tool for two-dimensional representation of protein surfaces
© Kontopoulos et al. 2016
Received: 10 November 2015
Accepted: 2 February 2016
Published: 24 February 2016
The term ‘molecular cartography’ encompasses a family of computational methods for two-dimensional transformation of protein structures and analysis of their physicochemical properties. The underlying algorithms comprise multiple manual steps, whereas the few existing implementations typically restrict the user to a very limited set of molecular descriptors.
We present Structuprint, a free standalone software that fully automates the rendering of protein surface maps, given - at the very least - a directory with a PDB file and an amino acid property. The tool comes with a default database of 328 descriptors, which can be extended or substituted by user-provided ones. The core algorithm comprises the generation of a mould of the protein surface, which is subsequently converted to a sphere and mapped to two dimensions, using the Miller cylindrical projection. Structuprint is partly optimized for multicore computers, making the rendering of animations of entire molecular dynamics simulations feasible.
Structuprint is an efficient application, implementing a molecular cartography algorithm for protein surfaces. According to the results of a benchmark, its memory requirements and execution time are reasonable, allowing it to run even on low-end personal computers. We believe that it will be of use - primarily but not exclusively - to structural biologists and computational biochemists.
KeywordsMolecular cartography Protein surfaces Visualization Surface comparison Structural biology
Over the last two decades, the growth rate of the Protein Data Bank has been exponential. As structural data for biomolecules are increasingly made available, the study of homologous proteins can be performed not only at the level of sequence, but also at the level of three-dimensional structure. This has led to the development of numerous sophisticated methods, concerning, among others, the analysis of structural evolution  and the structure-based design of new drugs .
For the comparison of protein surfaces in particular, a family of methods is based on the reduction of the dimensionality of the system. The concept of projecting a three-dimensional protein structure to two dimensions was first introduced by Fanning et al. under the term ‘molecular cartography’ . They presented this notion as a novel method for studying the entire surface of a protein, emphasizing on the topography of antigenic sites. It involved conversion of the protein structure into a triaxial ellipsoid, followed by its transformation into a graticule (a latitude/longitude grid). Pawłowski and Godzik later expanded on this approach by annotating protein surface maps according to the physicochemical properties of the exposed residues (e.g., charge or hydrophobicity), as a means to compare evolutionarily related proteins .
Even though a number of modifications to the aforementioned methodologies for two-dimensional protein representation have been proposed [5–7], molecular cartography has not found much use in the literature. This may be partly due to the significant amount of effort that is required to manually convert the atomic coordinates of a PDB file first into a spherical structure and then into a map. Visualizing the distribution of a particular physicochemical property on the surface further increases the complexity and the overall approach becomes increasingly tedious. A few applications that implement molecular cartography algorithms are available (SURF’S UP! , PST , Udock ), but the range of supported physicochemical descriptors for visualization is typically limited to charge and hydrophobicity. Integrating other predictors is either unfeasible or not straightforward for the end user, creating an obstacle for specialized analyses. Moreover, an application that harnesses the power of multiprocessor systems to simultaneously render multiple protein surface maps is not to this day available. This would be very useful, for example, when visualizing entire molecular dynamics simulations or comparing the members of a large protein family.
Amino acid properties database
Values for 328 properties/descriptors were calculated for the 20 common amino acids with MOE 2010.10  and were stored within an SQLite database. In particular, the database contains 11 categories of descriptors: i) 33 adjacency and distance matrix descriptors [12–16] (e.g., Balaban’s connectivity topological index ); ii) 41 atom/bond count descriptors [17, 18] (e.g., the number of double bonds); iii) 18 conformation dependent charge descriptors  (e.g., the water accessible surface area of polar atoms); iv) the 16 Kier and Hall connectivity and kappa shape indices [20, 21] (e.g., the Zagreb index); v) 21 MOPAC descriptors  (e.g., the ionization potential); vi) 48 partial charge descriptors (e.g., the total positive partial charge); vii) 12 pharmacophore feature descriptors (e.g., the number of hydrophobic atoms); viii) 11 potential energy descriptors (e.g., the solvation energy); ix) 16 physical properties [18, 23–27] (e.g., the molecular weight); x) 18 subdivided surface areas; xi) 94 surface area, volume, and shape descriptors (e.g., globularity). A detailed explanation of each descriptor is provided in the properties codebook which accompanies the tool. By drawing values from this database, Structuprint can visualize the distribution of a property across protein surfaces. Users can extend it by adding measurements for more chemical components or provide their own custom SQLite database in order to incorporate novel descriptors.
Generation of a mould of the surface of a protein
The main steps of the algorithm implemented by Structuprint are shown in Fig. 1. The tool first produces a mould of the protein structure’s surface in two steps. The structure is initially placed within a 3D grid with cell dimensions of 1 × 1 × 1 Å. Then, one dummy atom is inserted in each empty grid cell that neighbours a single protein atom. This process was previously described by Vlachakis et al.  and is extended here, with dummy atoms being assigned the identity of the amino acid to which their neighbouring protein atom belongs. This results to a quite accurate approximation of the underlying protein surface at the level of residue atoms.
Transformation of the mould into a sphere
Projection of the sphere onto a map
This projection was selected on the basis of its simplicity and ease of understanding. It is one of the most popular projections in cartography, as it can depict the entirety of the sphere, including the poles. Latitude and longitude lines are parallel and straight. Projection-induced distortion is zero at the equator, increases gradually towards higher latitudes, and becomes maximal at the poles. This leads to significant overestimation of the distance among atoms at the upper and lower parts of the figure (Fig. 1), similarly to the areal exaggeration of Greenland and Antarctica. Nevertheless, the Miller cylindrical projection introduces less polar distortion than the Mercator projection, on which it is based.
The previous step resulted in a map of the protein surface with data points coloured by a property of choice. However, this ‘primary’ map is not suitable for detecting areas with an overall concentration of atoms with high or low property values, which is one of the main benefits of this cartographic approach. For instance, a small area with both negatively and positively charged residue atoms would not appear as almost neutrally charged, but as a tiny dipole. To prevent the appearance of small ‘hot spots’ and redistribute the property values among neighbouring data points, the algorithm includes a smoothing step. The map is iteratively divided in grid squares of varying dimensions, from 0.001° × 0.001° to 0.5° × 0.5°, with a step increase of 0.001°. In each iteration of this process, grid cells are assigned the average value of all data points within them. Finally, the value of every data point is defined as the average value of its corresponding grid cell across all iterations. This smoothing method ensures that areas with pronounced accumulation of high or low values are easily discernible from those with a mixed population.
The default interface of Structuprint is a cross-platform, command-line interface (CLI). It consists of two executables: structuprint_frame and structuprint. The structuprint_frame executable produces a TIFF figure from a single input PDB file, using the R package ggplot2  for plotting. The structuprint executable is responsible for processing multiple superimposed PDB files - either serially or in a parallel manner -, generating a TIFF figure per input file and a final GIF animation, rendered with the Imager Perl module . Most parameters of the underlying algorithms can be modified by the user, such as the delay between animation frames, the background colour, and the appearance of ID numbers on final figures. A full descriptive list of the available parameters for both executables can be found in Structuprint’s manual, distributed along with the application and also available from its website.
On Unix-like systems (e.g., GNU/Linux, OS X), Structuprint supports task parallelism when generating animations. Using the Parallel::ForkManager Perl module , Structuprint can take advantage of multiple CPU cores by assigning each input PDB file to a different processor. The simultaneous rendering of multiple individual frames considerably reduces the total execution time, allowing for visualization of entire molecular dynamics simulations within a reasonable time frame.
Results and discussion
Examples of usage
To illustrate the utility of this tool, we present three different examples of usage in this section. Two-dimensional visualization with Structuprint enhances the representation of protein surfaces and facilitates the interpretation of the results in all three cases.
Visualization of molecular dynamics simulations
A seldom explored application of molecular cartography involves the generation of 2D animations from a series of PDB files. Here, we visualized a portion of a folding simulation of a variant of the chicken villin headpiece subdomain (HP-35 NleNle) from the Folding@Home project . The part of the input simulation was 50 ps long, with one frame being extracted every 0.25 ps. Each frame was structurally superimposed to the previous one with UCSF Chimera’s MatchMaker tool . Then, two separate animations were produced: one of the simulation frames in ribbon representation and one of the corresponding 2D maps, with the topological polar surface area - a measure of polarity - as the property of choice. For comparison purposes, these two animations are jointly shown in Additional file 2. This approach simplifies the detection of conformational changes during the course of the simulation, along with fluctuations in the distribution of physicochemical variables.
Depiction of surface conservation
Comparison of conformational changes, e.g., due to mutations
We have developed a user-friendly application for two-dimensional visualization of protein surfaces, optionally supporting multicore processing and user-provided physicochemical descriptors. Structuprint provides an alternative view of molecular surfaces, which - as shown in the previous section - could be of great use to a variety of researchers, including biochemists, structural biologists, and biophysicists.
Availability and requirements
Project name: Structuprint
Project home page: http://dgkontopoulos.github.io/Structuprint/
Operating systems: Prebuilt packages and installers are available for GNU/Linux distributions (Ubuntu 14.04, Debian 8, Fedora 22, CentOS 7, openSUSE 13.2), Windows, and OS X. For all other operating systems, installation from the source code is required. The GUI is available by default only for GNU/Linux systems.
Programming languages: Perl 5, R
License: GNU GPLv3+
Any restrictions to use by non-academics: None
Availability of data and materials
The datasets supporting the conclusions of this article are included within the article and its additional files.
ala31 → pro mutant
central processing unit
fractional water accessible surface area of hydrophobic atoms over all atoms
graphical user interface
- HP-35 NleNle:
villin headpiece subdomain double norleucine mutant (Lys24Nle/Lys29Nle)
molecular operating environment
molecular orbital package
protein data bank
randomized axelerated maximum likelihood
The authors express their gratitude to two anonymous reviewers for helpful comments, and to all researchers who made their data publicly available on the Protein Data Bank, the UniProt database, or on Simtk.org. No funding was received for this project.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Orengo CA, Thornton JM. Protein families and their evolution - a structural perspective. Annu Rev Biochem. 2005;74:867–900. doi:10.1146/annurev.biochem.74.082803.133029.View ArticlePubMedGoogle Scholar
- Cheng T, Li Q, Zhou Z, Wang Y, Bryant SH. Structure-based virtual screening for drug discovery: a problem-centric review. AAPS J. 2012;14(1):133–41. doi:10.1208/s12248-012-9322-0.View ArticlePubMedPubMed CentralGoogle Scholar
- Fanning DW, Smith JA, Rose GD. Molecular cartography of globular proteins with application to antigenic sites. Biopolymers. 1986;25(5):863–83. doi:10.1002/bip.360250509.View ArticlePubMedGoogle Scholar
- Pawłowski K, Godzik A. Surface map comparison: studying function diversity of homologous proteins. J Mol Biol. 2001;309(3):793–806. doi:10.1006/jmbi.2001.4630.View ArticlePubMedGoogle Scholar
- Chirgadze Y, Kurochkina N, Nikonov S. Molecular cartography of proteins: surface relief analysis of the calf eye lens protein gamma-crystalin. Protein Eng. 1989;3(2):105–10. doi:10.1093/protein/3.2.105.View ArticlePubMedGoogle Scholar
- Badel-Chagnon A, Nessi J, Buffat L, Hazout S. “Iso-depth contour map” of a molecular surface. J Mol Graph. 1994;12(3):162–8. doi:10.1016/0263-7855(94)80082-0.View ArticlePubMedGoogle Scholar
- Yang H, Qureshi R, Sacan A. Protein surface representation and analysis by dimension reduction. Proteome Sci. 2012;10(Suppl 1):S1. doi:10.1186/1477-5956-10-S1-S1.View ArticlePubMedPubMed CentralGoogle Scholar
- Sasin JM, Godzik A, Bujnicki JM. SURF’S UP! - protein classification by surface comparisons. J Biosci. 2007;32(1):97–100. doi:10.1007/s12038-007-0009-0.View ArticlePubMedGoogle Scholar
- Koromyslova AD, Chugunov AO, Efremov RG. Deciphering fine molecular details of proteins’ structure and function with a Protein Surface Topography (PST) method. J Chem Inf Model. 2014;54(4):1189–99. doi:10.1021/ci500158y.View ArticlePubMedGoogle Scholar
- Levieux G, Montes M. Towards real-time interactive visualization modes of molecular surfaces: examples with Udock. IEEE VR 2015 Workshop on Virtual and Augmented Reality dedicated to Molecular Science (VARMS). 2015.Google Scholar
- Molecular Operating Environment (MOE). 2010.10. 1010 Sherbooke St. West, Suite #910, Montreal, QC, Canada, H3A 2R7: Chemical Computing Group Inc; 2010. https://www.chemcomp.com/MOE-Molecular_Operating_Environment.htm. Accessed 19 Feb 2016.
- Wiener H. Structural determination of paraffin boiling points. J Am Chem Soc. 1947;69(1):17–20.View ArticlePubMedGoogle Scholar
- Balaban AT. Five new topological indices for the branching of tree-like graphs. Theor Chim Acta. 1979;53:355–75.View ArticleGoogle Scholar
- Balaban AT. Highly discriminating distance-based topological index. Chem Phys Lett. 1982;89(5):399–404. doi:10.1016/0009-2614(82)80009-2.View ArticleGoogle Scholar
- Petitjean M. Applications of the radius-diameter diagram to the classification of topological and geometrical shapes of chemical compounds. J Chem Inf Comput Sci. 1992;32(4):331–7. doi:10.1021/ci00008a012.View ArticleGoogle Scholar
- Pearlman RS, Smith KM. Novel software tools for chemical diversity. In: Kubinyi H, Folkers G, Martin YC, editors. 3D QSAR in drug design: three-dimensional quantitative structure activity relationships. Volume 2. Netherlands: Springer; 1998. p. 339–53. doi:10.1007/0-306-46857-3_18.Google Scholar
- Lipinski CA, Lombardo F, Dominy BW, Feeney PJ. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev. 1997;23(1–3):3–25. doi:10.1016/S0169-409X(96)00423-1.View ArticleGoogle Scholar
- Oprea TI. Property distribution of drug-related chemical databases. J Comput Aided Mol Des. 2000;14(3):251–64. doi:10.1023/A:1008130001697.View ArticlePubMedGoogle Scholar
- Stanton DT, Jurs PC. Development and use of charged partial surface area structural descriptors in computer-assisted quantitative structure-property relationship studies. Anal Chem. 1990;62(21):2323–9. doi:10.1021/ac00220a013.View ArticleGoogle Scholar
- Kier LB, Hall LH. The nature of structure-activity relationships and their relation to molecular connectivity. Eur J Med Chem. 1977;12:307–12.Google Scholar
- Hall LH, Kier LB: The molecular connectivity chi indexes and kappa shape indexes in structure-property modeling. In: Lipkowitz KB, Boyd DB, editors. Reviews in Computational Chemistry. Volume 2. Hoboken, New Jersey: John Wiley & Sons, Inc.; 1991. p. 367–422. doi:10.1002/9780470125793.ch
- Stewart JJP. MOPAC manual. 7th ed. 1993.Google Scholar
- Lide DR, editor. CRC handbook of chemistry and physics. Boca Raton: CRC Press; 1994.Google Scholar
- Wildman SA, Crippen GM. Prediction of physicochemical parameters by atomic contributions. J Chem Inf Comput Sci. 1999;39(5):868–73. doi:10.1021/ci990307l.View ArticleGoogle Scholar
- Ertl P, Rohde B, Selzer P. Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. J Med Chem. 2000;43(20):3714–7. doi:10.1021/jm000942e.View ArticlePubMedGoogle Scholar
- Hou TJ, Xia K, Zhang W, Xu XJ. ADME evaluation in drug discovery. 4. Prediction of aqueous solubility based on atom contribution approach. J Chem Inf Comput Sci. 2004;44(1):266–75. doi:10.1021/ci034184n.View ArticlePubMedGoogle Scholar
- Kazius J, McGuire R, Bursi R. Derivation and validation of toxicophores for mutagenicity prediction. J Med Chem. 2005;48(1):312–20. doi:10.1021/jm040835a.View ArticlePubMedGoogle Scholar
- Vlachakis D, Kontopoulos DG, Kossida S. Space constrained homology modelling: the paradigm of the RNA-dependent RNA polymerase of dengue (type II) virus. Comput Math Methods Med. 2013;2013:108910. doi:10.1155/2013/108910.PubMedPubMed CentralGoogle Scholar
- Snyder JP. Map projections - a working manual, U.S. Geological survey professional paper 1395. Washington, DC: United States Government Printing Office; 1987.Google Scholar
- Hammer E. Über die Planisphäre von Aitow und verwandte Entwürfe, insbesondere neue flächentreue iihnlicher Art. Petermanns Geogr Mitt. 1892;38(4):85–7.Google Scholar
- Miller OM. Notes on cylindrical world map projections. Geogr Rev. 1942;32(3):424–30.View ArticleGoogle Scholar
- Wickham H. ggplot2: elegant graphics for data analysis. New York: Springer; 2009.View ArticleGoogle Scholar
- Cook T. Imager - Perl extension for generating 24 bit images. https://metacpan.org/pod/Imager. Accessed 27 Sep. 2015.
- Champoux Y. Parallel::ForkManager - A simple parallel processing fork manager. https://metacpan.org/pod/Parallel::ForkManager. Accessed 27 Sep. 2015.
- Box GEP, Cox DR. An analysis of transformations. J R Stat Soc Series B Stat Methodol. 1964;26(2):211–52.Google Scholar
- Ensign DL, Kasson PM, Pande VS. Heterogeneity even at the speed limit of folding: large-scale molecular dynamics study of a fast-folding variant of the villin headpiece. J Mol Biol. 2007;374(3):806–16. doi:10.1016/j.jmb.2007.09.069.View ArticlePubMedPubMed CentralGoogle Scholar
- Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE. UCSF Chimera - a visualization system for exploratory research and analysis. J Comput Chem. 2004;25(13):1605–12. doi:10.1002/jcc.20084.View ArticlePubMedGoogle Scholar
- Shibata N, Inoue T, Nagano C, Nishio N, Kohzuma T, Onodera K, Yoshizaki F, Sugimura Y, Kai Y. Novel insight into the copper-ligand geometry in the crystal structure of Ulva pertusa plastocyanin at 1.6-Å resolution: structural basis for regulation of the copper site by residue 88. J Biol Chem. 1999;274(7):4225–30. doi:10.1074/jbc.274.7.4225.View ArticlePubMedGoogle Scholar
- Do CB, Mahabhashyam MSP, Brudno M, Batzoglou S. ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res. 2005;15(2):330–40. doi:10.1101/gr.2821705.View ArticlePubMedPubMed CentralGoogle Scholar
- Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–3. doi:10.1093/bioinformatics/btu033.View ArticlePubMedPubMed CentralGoogle Scholar
- Glykos NM, Cesareni G, Kokkinidis M. Protein plasticity to the extreme: changing the topology of a 4-α-helical bundle with a single amino acid substitution. Structure. 1999;7(6):597–603. doi:10.1016/S0969-2126(99)80081-1.View ArticlePubMedGoogle Scholar