Accurate descriptions of the different non-covalent interactions involved in protein folding and stability are essential for a number of related problems. Potential energy functions based on such terms have been widely used to facilitate: fold recognition [1–3], homology modelling [4, 5], docking , ab-initio structure prediction [7–9], sequence design  and the analysis of protein folding kinetics [11, 12]. In each case, the purpose of the potential function is to discriminate between a variety of alternative conformations, selecting the most energetically favourable (assumed to be the most native) for further analysis . Different potential energy functions have been defined at different levels of structural resolution . At the atomic level, various pairwise inter-atom potentials (force-fields) have been developed from the detailed analysis of small, protein-like compounds. These include: ECEPP [15, 16], MM [17, 18], AMBER [19, 20], CHARMM [21–23] and GROMOS . Potential functions between distinct groups of atoms have also been defined, typically between pairs of residues [8, 25–28] or idealised elements of secondary structure [9, 29–34]. These 'potentials of mean force' (mean-fields) have the nature of free energies [27, 35], and may be derived by conformational averaging  or, more commonly, by empirical methods as described below.
There are two commonly used methods for deriving empirical potential energy functions . The first method employs a statistical analysis of the observed 'interactions' [8, 25, 26, 37, 38]. In this method, the observed occurrence of a particular interaction is weighted by its expected occurrence in a given reference state [27, 39]. The resulting statistical interaction propensities can be either converted into energies using the Boltzmann distribution [8, 25, 26, 38] or log-odds scores [40, 41]. However, it has been shown that these two types of propensity are essentially the same . In the second method, a potential function can be directly optimised in order to discriminate between native and near-native (decoy) structures . This technique resembles machine learning, and has been applied in a variety of different ways, usually by maximising the discrimination between an average decoy and the native structure [43–46]. Either of the above two methods may be applied to any feature of the protein structure that can be parameterised . In the current work, we focus on the statistical analysis of residue interaction propensities. Previously, a variety of different methods have been applied to derive empirical residue-residue interaction potentials, often yielding remarkably consistent results . However, the physical basis of the empirically derived potentials remains ambiguous . Specifically, it has been shown that protein structures are inconsistent with the assumptions that underlie the use of the Boltzmann distribution [28, 48].
The major criticism of empirical residue-residue interaction potentials is that they ignore the protein/solvent boundary [27, 28, 48]. Consequently, there is an apparent attractive force between residues that co-segregate into the protein surface or core regions . To address this, several groups have developed residue-specific environment potentials. These residue-specific environment potentials are usually correlated with hydrophobicity, measuring the extent to which each residue is buried in the protein core. In this way these single-body environment potentials capture information about the protein/solvent boundary. Such potentials have been combined with residue-residue interaction potentials: as a 'solvent correction factor' [49, 50], as an ad-hoc repulsive term , and using a Bayesian framework to avoid over-counting .
The above combination of two-body, residue-residue interaction potentials with single-body, residue-specific environment potentials raises the question as to which type of potential is the most specific for the native protein structure. To address this question, we separated statistical residue interaction propensities into two different types of score: a two-body, residue-residue 'contact-type' score, and a single-body, residue 'contact-count' score.
These two types of score can be expected to capture qualitatively different kinds of residue interaction propensities. The resulting propensities can be understood in terms of biophysical properties of protein structure. For example, the contact-type score can encode the fact that hydrophobic residues tend to interact with other hydrophobic residues in preference to hydrophilic residues. In contrast, the contact-count score can encode the fact that bulky hydrophobic residues tend to have more residue-residue interactions than small hydrophilic residues.
Here we report a comparison of two-body, residue-residue 'contact-type' scores and single-body, residue 'contact-count' score, as described below.