It is recognized that small regions of proteins tend to fold independently and are then stabilized by interactions between these distinct subunits or modules. The dissection of proteins into structurally independent and functionally distinct subunits led to the idea that proteins can be considered as collections of smaller units such as domains . Different definitions of domains have been in existence . While some define a domain as a recognizable substructure within a protein and connected to other domains by very few structural elements such as a loop or a helix, others define domains as a parts of a protein molecule that behave in a quasi-independent manner and are considered as cooperative units in protein folding [3, 4]. A further definition describes a domain as a relatively compact part of a protein that is characterized by its own pattern of intramolecular collective dynamics and which are distinguishable from those of other domains [5–8]. In the belief that an unequivocal definition of a module must be based on the most fundamental property of protein 3D structure, namely, the adjacency matrix of inter-residues contact, we adopted a network representation of the protein.
In an earlier work , we had used a well-established, global method for identifying modules in networks . The algorithm converges towards the maximization of the modularity of the given protein network; the network being defined by an adjacency matrix with a 1 denoting the existence of residue-residue contacts and a 0 for non-contacts. Maximizing the modularity score, as defined by Guimera et al.  results in maximizing intra-module contacts while minimizing the inter-module contacts. The modularity M is given by
is the number of modules, L is the number of links in the network, l is the number of links between nodes in module s and d is the sum of the degrees of the node in module s. In doing so, this allows the representation of the residues of the protein in terms of their intra-module degree, z and participation coefficient, P., which are given by
is the number of links of residue i to other residues in its module s
, is the average of κ over all residues in module s
, is the standard deviation of κ in module s
is the number of links of node i to nodes in module s and k
is the total degree of node i.
We demonstrated that the labeling of residues in terms of these invariants, allowed for information rich representations of the studied proteins as well as to sketch a new way to link sequence, structure and the dynamical properties of proteins. We discovered a strong invariant character of protein molecules in terms of P/z characterization, pointing to a common topological design of all protein structures. This invariant representation, applied to different protein systems enabled us to identify the possible functional role of high P/z residues during the folding process. Effectively, this invariance is a cartographic representation of the contact network for proteins and is represented by the plot of the residues in the P - z space. Since it is identical for all the proteins, it does not embed any structural peculiarities or information for separating between different protein folds [11, 12].
We also observed that the modules identified using the procedure outlined above correlated well with early folding units or "foldons" and thus a knowledge of the modules existing in a given protein can help to identify residues that are critical for folding.
A significant use for the modules identified using our methodology is for the development of algorithms for protein 3D structure determination. In addition knowing the modules for a protein can help in the understanding of the folding pathway for that protein since residues with high |P/z| values tend to be protected during transition state and hence are fixed early in the folding process.
The modules can also be used for engineering new enzymes which is typically carried out by building a chimera of multiple proteins by cutting and pasting sequences from the respective proteins. A knowledge of the modules can guide the cuts in order to obtain chimeras that can fold in-vitro. In addition models of such early folding units can be invaluable in understanding the biochemical pathways of diseases that are known to be pathological through partially folded forms of proteins leading to the development of therapeutics.