Error analysis in the determination of the electron microscopical contrast transfer function parameters from experimental power Spectra
 Carlos Oscar S Sorzano^{1, 2}Email author,
 Abraham Otero^{1},
 Estefanía M Olmos^{1} and
 José María Carazo^{2}
DOI: 10.1186/14726807918
© Sorzano et al; licensee BioMed Central Ltd. 2009
Received: 18 September 2008
Accepted: 26 March 2009
Published: 26 March 2009
Abstract
Background
The transmission electron microscope is used to acquire structural information of macromolecular complexes. However, as any other imaging device, it introduces optical aberrations that must be corrected if highresolution structural information is to be obtained. The set of all aberrations are usually modeled in Fourier space by the socalled Contrast Transfer Function (CTF). Before correcting for the CTF, we must first estimate it from the electron micrographs. This is usually done by estimating a number of parameters specifying a theoretical model of the CTF. This estimation is performed by minimizing some error measure between the theoretical Power Spectrum Density (PSD) and the experimentally observed PSD. The high noise present in the micrographs, the possible local minima of the error function for estimating the CTF parameters, and the crosstalking between CTF parameters may cause errors in the estimated CTF parameters.
Results
In this paper, we explore the effect of these estimation errors on the theoretical CTF. For the CTF model proposed in [1] we show which are the most sensitive CTF parameters as well as the most sensitive background parameters. Moreover, we provide a methodology to reveal the internal structure of the CTF model (which parameters influence in which parameters) and to estimate the accuracy of each model parameter. Finally, we explore the effect of the variability in the detection of the CTF for CTF phase and amplitude correction.
Conclusion
We show that the estimation errors for the CTF detection methodology proposed in [1] does not show a significant deterioration of the CTF correction capabilities of subsequent algorithms. All together, the methodology described in this paper constitutes a powerful tool for the quantitative analysis of CTF models that can be applied to other models different from the one analyzed here.
Background
The transmission electron microscope distorts the structural information contained in the electron micrographs by changing the amplitude of the Fourier coefficients at all spatial frequencies and flipping their phase at certain annular regions [2]. This effect is usually modeled in Fourier space by the Contrast Transfer Function (CTF), which in turn has to be estimated from the electron micrographs. Normally, a theoretical model of the CTF is assumed and the parameters defining this model are optimized so that the experimentally observed PSD and the theoretically predicted PSD coincide as much as possible [1, 3–10]. Therefore, the PSD has to be estimated first. This step is traditionally performed by the fast, although less accurate, periodogram averaging [10–13] or parametric methods, more accurate but much slower to compute [8, 12]. The estimated periodogram can be further enhanced [14] to highlight the Thon rings and, therefore, facilitate the task of fitting the parameters of the theoretical model.
Fully twodimensional models multiply by three the number of parameters needed since each parameter is allowed to vary in two dimensions. For instance, the defocus is assumed to vary elliptically, thus three parameters are needed for its full description (major axis, minor axis, and the angle between the major axis and the coordinate horizontal axis); the same applies to all parameters varying in 2D. Moreover, rich physical models like the ones in [1, 9, 10] need many parameters to account for a pletorah of physical effects. Finally, as shown in [1], rich twodimensional background models are needed to fully account for the astigmatism introduced not only by the electron microscope but also by the film scanner.
Overall, theoretical CTF models may end up with many parameters demanding robust optimization algorithms that avoid local minima. Crosstalking between parameters cannot be avoided, ie, sometimes the same CTF can be obtained with two different sets of CTF parameters. Moreover, the amount of noise present in the electron micrographs passes to the PSD estimates, no matter how much averaging is performed, and errors in the estimates of the CTF parameters are to be expected. Sensitivity analysis [15] is a branch of mathematics studying how errors at the input of a mathematical model translate into errors at its output.
In this paper, we propose to use sensitivity analysis to identify those CTF parameters that have the strongest influence in the estimation on the CTF. Knowing this list of "sensitive" parameters, simpler models for the CTF can be proposed (as will be shown in the Results Section, two parameters of the CTF model analyzed can be safely removed). We also propose the use of bootstrap resampling to estimate the accuracy in the estimation of each individual parameter and to reveal the internal structure of the model: which model parameters influence in a given parameter, for instance, which are the model parameters influencing the defoci? The bootstrap resampling also allows us to estimate the experimental distribution of each CTF parameter. Confidence intervals for each CTF parameter can be computed using these experimental distributions. We use these confidence intervals to identify which parameters are not significantly different from zero, and therefore can be omitted from the CTF model. Finally, we use Factor Analysis in order to further clarify the internal structure of the CTF model (thanks to this analysis, the CTF parameters can be divided into different groups, each one accounting mainly for a different part of the CTF). In this work we apply the general principles of sensitivity analysis to the identification of the most relevant parameters in the CTF model introduced by [1] as well as their internal relationships and accuracy of their estimates.
Moreover, we explore the effect of the variability in the estimation of the CTF parameters in subsequent algorithms for CTF correction. In particular, we analyzed its effects on CTF phase correction and CTF amplitude correction using the Iterative Data Refinement (IDR) [16]. We show that in our experiments, the estimation errors of the CTF detection performed in [1] does not significantly deteriorates the CTF correction.
Results and discussion
As described in the Methods section, the average sensitivity of the CTF with respect to a given parameter ( , Eq. 21) is a measure of how variations in that parameter translate into variations of the CTF. In a similar manner we also define the average sensitivity of the PSD and the average sensitivity of the first zero of the CTF with respect to a given parameter. In the following sections we present and discuss our results.
Results
To estimate the sensitivity of the CTF on the model parameters we used two sets of experimental images (LTag and GltS) corresponding to samples embedded in ice with no carbon. The LTag images correspond to the Large T antigen [17, 18], while the GltS images correspond to the Glutamate synthase [19]. The LTag images had a sampling rate of 5.6 Å per pixel while the GltS images were digitized with a pixel size of 1.59 Å. The two datasets used in this paper are the same ones used in [1]. In all, we studied the sensitivity of the parameters using a total of 217 micrographs. The rationale for employing two distinct datasets is not to bias the statistical analysis by using a single type of micrographs. The fact that both datasets have very different sampling rates helps in the analysis of the effect of the sampling rate. The average sensitivity of the CTF with respect to a given parameter (Eq. 21) was evaluated as follows. For each micrograph and parameter, the CTF was studied with the value estimated by the CTF fitting program ( ) [1]. Then, we perturbed each parameter individually by a small amount ( ) as described in the Methods section in order to test its influence in the CTF. In particular, we studied variations of 20%, 10%, 5%, 2%, 1%, 1%, 2%, 5%, 10%, and 20% (the expected value in Eq. 21 was computed for each variation and the resulting sensitivities were averaged). For a particular variation of the parameter ( ), we computed the integral in Eq. 21. The set of all perturbations and all micrographs empirically defined the statistical distribution of the sensitivity over which an ensemble average was taken.
CTF Sensitivity
V  61.23 (100.00)  s _{ m }  0.4890 (100.00)  V  0.2901 (100.00) 

Δf_{ M }  37.93 (61.95)  s _{ M }  0.4613 (94.34)  Δf_{ M }  0.2507 (86.42) 
Δf_{ m }  37.45 (61.16)  b  0.2447 (50.04)  Δf_{ m }  0.2497 (86.07) 
C _{ a }  6.132 (10.01)  K _{ G }  0.2441 (49.92)  Q _{0}  0.0331 (11.41) 
ΔV/V  5.32 (8.69)  V  0.0371 (7.59)  θ  0.0053 (1.83) 
θ  1.51 (2.47)  Δf_{ m }  0.0288 (5.89)  T _{ m }  0.0010 (0.34) 
Q _{0}  0.96 (1.57)  Δf_{ M }  0.0285 (5.83)  C _{ s }  0.0001 (0.03) 
α  0.80 (1.31)  K  0.0096 (1.96)  ΔV/V  NA 
C _{ s }  0.78 (1.27)  C _{ a }  0.0061 (1.25)  C _{ a }  NA 
K  0.72 (1.18)  ΔV/V  0.0058 (1.19)  K  NA 
ΔR  0.24 (0.39)  K _{ s }  0.0037 (0.76)  α  NA 
ΔF  0.03 (0.05)  C _{ m }  0.0025 (0.51)  ΔR  NA 
T _{ m }  NA  K _{ g }  0.0022 (0.45)  ΔF  NA 
G _{ M }  0.0021 (0.43)  
C _{ M }  0.0019 (0.39)  
G _{ m }  0.0018 (0.37)  
Q _{0}  0.0015 (0.31)  
θ _{ G }  0.0013 (0.27)  
c _{ m }  0.0013 (0.27)  
α  0.0011 (0.22)  
g _{ M }  0.0010 (0.20)  
g _{ m }  0.0008 (0.16)  
θ  0.0007 (0.14)  
c _{ M }  0.0004 (0.08)  
ΔR  0.0002 (0.04)  
θ _{ s }  0.0002 (0.04)  
C _{ s }  0.0001 (0.02)  
θ _{ g }  0.0001 (0.02)  
ΔF  NA  
T _{ m }  NA 
The highest values of sensitivity are found in . Interestingly, within this measure, the most significant parameter is the microscope voltage (V) that is a user supplied parameter. Not surprisingly, the energy spread of the electrons ( ) is a related magnitude and is also a parameter with a high sensitivity. The next two most sensitive parameters are the defoci and the chromatic aberration.
Considering , parameters from the background PSD (s_{ m }, s_{ M }, b and K_{ G }) are by far the most sensitive. Of the CTF parameters, only the microscope voltage and the defoci have a significant weight. As expected, in the case of , the most sensitive values defining the first zero of the CTF are the microscope voltage V, the defoci and the fraction of electrons being scattered Q_{0}.
CTF Overall sensitivity and accuracy
Parameter  Symbol  Overall Sensitivity  Accuracy (%) 

Microscope voltage  V  207.59  NA 
Major defocus  Δf_{ M }  154.19  1.18 
Minor defocus  Δf_{ m }  153.13  1.07 
Background PSD  s _{ m }  100.00  5.50 
Background PSD  s _{ M }  94.34  4.38 
Background PSD  b  50.04  5.26 
Background PSD  K _{ G }  49.92  11.91 
Fraction of scattered electrons  Q _{0}  13.28  21.86 
Chromatic aberration  C _{ a }  11.26  14.29 
Energy spread  ΔV/V  9.87  52.95 
Defocus azimuthal angle  θ  4.44  NA 
CTF Gain  K  3.14  7.41 
Aperture semiangle  α  1.53  54.23 
Spherical aberration  C _{ s }  1.33  NA 
Background PSD  K _{ s }  0.76  14.70 
Background PSD  C _{ m }  0.51  40.61 
Background PSD  K _{ g }  0.45  11.22 
Focal plane displacement  ΔR  0.43  NA 
Background PSD  G _{ M }  0.43  23.58 
Background PSD  C _{ M }  0.39  38.18 
Background PSD  G _{ m }  0.37  29.72 
Sampling rate  T _{ m }  0.34  NA 
Background PSD  c _{ m }  0.27  40.60 
Background PSD  θ _{ G }  0.27  NA 
Background PSD  g _{ M }  0.20  17.51 
Background PSD  g _{ m }  0.16  13.29 
Background PSD  c _{ M }  0.08  19.07 
Perpendicular displacement  ΔF  0.05  NA 
Background PSD  θ _{ s }  0.04  NA 
Background PSD  θ _{ g }  0.02  NA 
 1.
Simplified model 1: Since the CTF is usually computed on nondrifted images and as shown by the sensitivity analysis the PSD is not very sensitive to the drift parameters (ΔF and ΔR), our simplified model 1 does not estimate these two parameters and sets them to 0. This model has a total of 25 parameters to be estimated.
 2.
Simplified model 2: The last step of the PSD estimation is the computation of the subtractive Gaussian parameters (K_{ g }, c_{ M }, c_{ m }, θ_{ g }, g_{ M }and g_{ m }). In our experience this last Gaussian helps to accurately fit the low pass frequencies. However, as shown by the sensitive analysis, the theoretical PSD is not too sensitive to these parameters, so we also estimated a simplified model without this last Gaussian (and without the parameters already removed in the Simplified model 1). There is a total of 19 parameters to be estimated.
 3.
Simplified model 3: In this simplified model we forced the remaining Gaussian (the one with positive sign in the background PSD) to be symmetric (G_{ M }= g_{ m }, C_{ M }= C_{ m }and θ_{ G }= 0) besides all the simplifications already done in the Simplified model 2. This leaves only 16 parameters to be estimated.
To evaluate the estimation accuracy of each of the model parameters, the bootstrap resampling strategy described in the "Accuracy of the CTF estimates" Section was followed. One thousand random samples where extracted from the dataset of a single micrograph of the LTag group. All fitted models had the same usersupplied parameters (microscope voltage, sampling rate and spherical aberration). Due to the results in our previous experiment, we removed from the model the perpendicular and focal plane displacements. After solving for the corresponding one thousand regression problems, the ensemble of all model parameters were collected. 3.6% of these regression parameters were considered as failures of the algorithm to correctly estimate the model parameters and the corresponding models were deleted from the dataset. The accuracy of an estimate was measured as the ratio between the median of absolute deviations (MAD, a robust equivalent of the standard deviation) and the median of the absolute value of the parameter being considered (a robust equivalent of the mean). Working with medians is a robust way of estimating the central position of a distribution. The accuracy of the value of the goal function being minimized in [1] of the remaining 97.4% bootstrapped samples was 0.5%. This low value indicates that the remaining bootstrapped models were quite homogeneous with respect to the regression error. The accuracy of each parameter was estimated and the resulting values are listed in Table 2. Those entries with NA indicate that the accuracy was not available in this case because the parameter is supplied by the user, or the parameter has not been estimated (displacements), or the parameter is meaningless in this case (the image used for the example was not astigmatic and therefore the angles of the ellipses involved in the model can take any value).
Correlation of each model parameter with the rest of model parameters
Parameter  Symbol  Correlated parameters 

Major defocus  Δf_{ M }  Q_{0} (0.87), Δf_{ m }(0.86), b (0.15), K_{ g }(0.19), c_{ m }(0.13), c_{ M }(0.10), g_{ m }(0.17), g_{ M }(0.10), K_{ s }(0.15), s_{ m }(0.15), s_{ M }(0.11) 
Minor defocus  Δf_{ m }  Q_{0} (0.88), Δf_{ M }(0.86), b (0.21), K_{ g }(0.22), c_{ m }(0.18), c_{ M }(0.10), g_{ m }(0.17), g_{ M }(0.10), K_{ s }(0.21), s_{ m }(0.20), s_{ M }(0.16) 
Background PSD  s _{ m }  K_{ s }(0.96), s_{ M }(0.98), b (0.74), K_{ G }(0.67), C_{ M }(0.31), G_{ M }(0.31), G_{ m }(0.26), K_{ g }(0.19), c_{ m }(0.16), g_{ M }(0.27), g_{ m }(0.11), Q_{0} (0.11), Δf_{ m }(0.20), Δf_{ M }(0.15), α (0.20), K (0.12) 
Background PSD  s _{ M }  K_{ s }(0.95), s_{ m }(0.98), b (0.67), K_{ G }(0.62), C_{ M }(0.27), G_{ M }(0.30), G_{ m }(0.26), K_{ g }(0.22), c_{ m }(0.19), g_{ M }(0.28), Δf_{ m }(0.16), Δf_{ M }(0.11), α (0.22), K (0.19) 
Background PSD  b  K_{ s }(0.83), s_{ m }(0.74), s_{ M }(0.67), K_{ G }(0.47), C_{ m }(0.13), G_{ M }(0.55), G_{ m }(0.49), K_{ g }(0.23), c_{ m }(0.14), g_{ M }(0.15), g_{ m }(0.10), Δf_{ m }(0.21), Δf_{ M }(0.15), α (0.12) 
Background PSD  K _{ G }  K_{ s }(0.67), s_{ m }(0.67), s_{ M }(0.62), b (0.47), C_{ M }(0.58), C_{ m }(0.35), G_{ M }(0.13), c_{ M }(0.33), α (0.12) 
Fraction of scattered electrons  Q _{0}  Δf_{ m }(0.88), Δf_{ M }(0.87), K_{ g }(0.29), c_{ m }(0.29), c_{ M }(0.22), g_{ m }(0.16), G_{ m }(0.17), G_{ m }(0.14), c_{ M }(0.12), K_{ s }(0.12), s_{ m }(0.11) 
Chromatic aberration  C _{ a }  ΔV/V (0.71) 
Energy spread  ΔV/V  C_{ a }(0.71), K (0.19), α (0.14) 
CTF Gain  K  α (0.74), ΔV/V (0.19), s_{ M }(0.19), s_{ m }(0.12), K_{ g }(0.10), g_{ m }(0.11) 
Aperture semiangle  α  K (0.74), ΔV/V (0.14), K_{ s }(0.20), s_{ M }(0.22), s_{ m }(0.20), K_{ g }(0.30), K_{ G }(0.12), b (0.12) 
Background PSD  K _{ s }  s_{ m }(0.96), s_{ M }(0.95), b (0.83), K_{ G }(0.67), C_{ M }(0.27), G_{ M }(0.34), G_{ m }(0.29), K_{ g }(0.31), c_{ m }(0.24), g_{ M }(0.23), Δf_{ m }(0.21), Δf_{ M }(0.15), α (0.20), Q_{0} (0.12) 
Background PSD  C _{ m }  K_{ G }(0.35), C_{ M }(0.68), G_{ M }(0.60), G_{ m }(0.56), b (0.13), K_{ g }(0.20), c_{ M }(0.21), g_{ M }(0.15) 
Background PSD  K _{ g }  c_{ M }(0.60), c_{ m }(0.55), K_{ s }(0.31), s_{ M }(0.22), s_{ m }(0.19), b (0.23), K (0.10), α (0.30), Q_{0} (0.29), Δf_{ M }(0.19), Δf_{ m }(0.22), C_{ m }(0.20), G_{ M }(0.10), 
Background PSD  G _{ M }  K_{ G }(0.13), C_{ M }(0.66), C_{ m }(0.60), G_{ m }(0.86), b (0.55), K_{ s }(0.34), s_{ M }(0.30), s_{ m }(0.31), K_{ g }(0.10), c_{ m }(0.11) Q_{0} (0.14) 
Background PSD  C _{ M }  K_{ G }(0.58), C_{ m }(0.68), G_{ M }(0.66), G_{ m }(0.66), K_{ s }(0.27), s_{ M }(0.27), s_{ m }(0.31), c_{ M }(0.21), c_{ m }(0.17), g_{ M }(0.27), g_{ m }(0.14) Q_{0} (0.13) 
Background PSD  G _{ m }  C_{ M }(0.66), C_{ m }(0.56), G_{ M }(0.86), b (0.49), K_{ s }(0.29), s_{ M }(0.26), s_{ m }(0.26), c_{ m }(0.17), Q_{0} (0.17) 
Background PSD  c _{ m }  K_{ g }(0.55), c_{ M }(0.71), g_{ m }(0.31), g_{ M }(0.27), b (0.14) K_{ s }(0.24), s_{ M }(0.19), s_{ m }(0.16), C_{ M }(0.17), G_{ M }(0.11), G_{ m }(0.17), Q_{0} (0.29), Δf_{ m }(0.17), Δf_{ M }(0.13) 
Background PSD  g _{ M }  c_{ M }(0.34), c_{ m }(0.27), g_{ m }(0.71), b (0.15) K_{ s }(0.23), s_{ M }(0.28), s_{ m }(0.27), C_{ M }(0.27), c_{ m }(0.15), Δf_{ m }(0.10), Δf_{ M }(0.10) 
Background PSD  g _{ m }  c_{ M }(0.36), c_{ m }(0.31), g_{ M }(0.71), b (0.10) s_{ m }(0.11), C_{ M }(0.14), K (0.11), Q_{0} (0.16), Δf_{ m }(0.17), Δf_{ M }(0.17) 
Background PSD  c _{ M }  K_{ g }(0.60), c_{ m }(0.71), g_{ m }(0.36), g_{ M }(0.34), K_{ G }(0.33), C_{ M }(0.21), C_{ m }(0.21), Q_{0} (0.22), Δf_{ m }(0.10), Δf_{ M }(0.10) 
Factor loadings greater than 0.5 for the first seven factors of a factor analysis with ten factors of the bootstrapped ensemble of model parameters
Factor  Loadings greater than 0.5 

Factor 1  s_{ M }(0.98), s_{ m }(0.97), K_{ s }(0.96), K_{ G }(0.75), b (0.73) 
Factor 2  C_{ M }(0.90), G_{ M }(0.87), G_{ m }(0.84), C_{ m }(0.72) 
Factor 3  Q_{0} (0.94), Δf_{ M }(0.92), Δf_{ m }(0.92) 
Factor 4  c_{ M }(0.83), c_{ m }(0.80), K_{ g }(0.76) 
Factor 5  K (0.94), α (0.87) 
Factor 6  g_{ M }(0.96), g_{ m }(0.74) 
Factor 7  C_{ a }(0.99), ΔV/V (0.72) 
Discussion
From the experiments performed, it turns out that the most important parameter when dealing with the CTF is the microscope voltage. This fact will certainly not come to a surprise to any practitioner in the field, but it clearly stress the point that small inaccuracies in its provision (and most CTF estimation algorithms rely on the user providing manually this value rather than automatically calculating it) result in large variations in the CTF related quantities. Since the CTF estimation algorithms try to fit as much as possible the experimentally observed PSD with the theoretically predicted PSD, this probably means that there is the possibility of a strong crosstalking between the microscope voltage and all the rest CTF parameters. Fortunately, the microscope readings of the voltage are valid up to a few tens of Volts (2 ppm/minute in a JEOL 3011) meaning that the accuracy in the estimation of this parameter is well below 0.01%.
The second set of most sensitive parameters are the ones modeling the defoci, which are estimated by all CTF estimation programs. Defoci alone do not allow to correct for any aberration caused by the CTF. For a 1D correction of the phase, at least Q_{0} is needed (which also has a relatively large weight on the sensitivity of the CTF zeros, although not as large as those of the microscope voltage and the defoci). Some programs estimate Q_{0} although in some other programs it is also directly input by the user. Again, due to its relatively medium sensitivity, small errors in the user estimation of Q_{0} probably turn into medium errors in the estimate of the zeros, or in a medium crosstalking to the other CTF parameters. As shown by the factor analysis, Q_{0} changes correlate well with changes in the defoci. Thus, the defoci values are strongly affected by the estimation of Q_{0}. However, as shown by the bootstrap analysis, the accuracy of our algorithm in the estimation of the defoci values in the experiment run was in the range around 1% meaning that this estimate is rather stable. Note that Q_{0} has to be estimated mostly at low frequencies. In this region of the spectrum there is an important contribution of the amplitude contrast where high background arising from direct electron beam and inelastic scattering makes the estimation difficult. Therefore, it might be good to perform the estimation of Q_{0} by some other means [22, 23]. A 2D phase correction also needs the estimation of the azimuthal angle θ. Although, the micrograph dataset of our experiment was not perfectly nonastigmatic, there was no micrograph with large astigmatism. This resulted in a relatively low sensitivity to θ in the three measured quantities. However, if strongly astigmatic images were recorded, the sensitivity to this parameter may have been much larger. The next most sensitive parameters are related to the background PSD (s_{ m }, s_{ M }, b and K_{ G }). s_{ m }and s_{ M }take care of the PSD shape at low frequencies, K_{ G }takes care of the medium frequency range, and b explains the background PSD at high frequencies. This means that it is important to do a good fitting in the whole spectrum. Of course, these parameters are only important if the full experimental PSD is to be fitted. Fitting of the background is absolutely crucial if an accurate amplitude correction is to be performed. The high sensitivity of the PSD to the background PSD highlights the importance of a good background fitting or background subtraction. Those programs that estimate the CTF zeros by first subtracting the background need to be sure that the subtracted background is not modifying the positions of the zeros. As is shown by the internal structure of the regression model revealed by bootstrap resampling, there is a significant "crosstalking" between the background parameters and the defoci. Finally, the two most important parameters of the CTF envelope decay are the chromatic aberration and the energy spread of the electrons at the source. Both parameters affect the E_{ spread }term that depends with the fourth power of the frequency (R^{4}) and not with the second power as a Gaussian. However, the coherence envelope E_{ coherence }depends as a Gaussian with frequency and is governed by the sensitivity to the defoci, which is much larger than that of the chromatic aberration or the energy spread. The coefficient of R^{2} in E_{ coherence }is π^{2} α^{2}Δf (R)^{2}, this means that the envelope is also astigmatic for astigmatic images. Any program that does not fit an astigmatic Gaussian envelope cannot properly correct for the amplitude decay of astigmatic images. On the other hand, E_{ spread }is not astigmatic. According to the relative sensitivities, E_{ coherence }is more important than E_{ spread }.
We also explored a methodology to determine the accuracy in the estimation of each parameter. For this we made use of bootstrap resampling to build an empirical distribution of each parameter. From this distribution we were able to estimate the accuracy in each parameter. It is interesting to see that under similar fitting conditions (the accuracy of the goal function of the regression was 0.5% meaning that all the bootrstrapped models were similar in explanation power), the most important parameters are very precisely estimated (1% in the case of the defoci, and about 5% in the case of s_{ m }, s_{ M }and b). The rest of parameters are much less centered around a central value and can vary much more (some of them like the aperture semiangle can vary up to 54%, without affecting much the regression goal function).
The ensemble of models stemming from the bootstrap resampling also allows to identify which model parameters influence a given parameter by means of identifying statistically significant correlations. For each model parameter, a set of significantly correlated parameters is computed and shown in Tables 3 and 4. It is interesting to see that there is a nonnegligible correlation between all the components of the background PSD, meaning that similar explicative power can be attained simply by shifting part of the information from one background component to the other. The sign of the correlation indicates whether a given parameter must be increased or decreased if another parameter is increased. It is also important to recognize the nonnegligible correlation between the two most important parameters (defoci) and the PSD model at low frequencies (explained by Q_{0}, the base line b, the term headed by K_{ s }and the term headed by K_{ g }). Most of these terms correspond to the background estimation. This implies that the background must be carefully estimated rather than simply subtracted after a rough estimation provided by a lowpass filter of the experimental PSD. A theoretical model for the background PSD is lacking in electron microscopy, and instead we use an arbitrary model that has been shown to perform well with micrographs. However, more research should be carried out in this direction to correctly identify a physically justified model for the background PSD.
The use of Factor Analysis with the bootstrapped ensemble of models allowed us to identify those main components of the regression. Seven groups of parameters were identified by keeping only those factor whose associated eigenvalue was larger than 1. These groups of parameters explain different aspects of the regression and within each group parameters are strongly correlated with each other. The groups identified were:

Oscillatory behavior of the CTF: through the parameters Q_{0}, Δf_{ M }and Δf_{ m }

Amplitude and coherence decay of the CTF: controlled by the parameters K and α.

Energy spread decay of the CTF: controlled by the parameters C_{ a }and ΔV/V.

General fitting of the background PSD: particularly through the parameters s_{ M }, s_{ m }, K_{ s }(lowfrequency), K_{ G }(mediumfrequency), b (highfrequency). There is a strong "crosstalking" between all the components.

Fitting of the background PSD at medium frequencies: through the parameters C_{ M }, C_{ m }, G_{ M }and G_{ m }.

Fitting of the background PSD at low frequencies: with the parameters c_{ M }, c_{ m }and K_{ g }controlling the amplitude and location of this low frequency model.

Fitting of the background PSD at low frequencies: with the parameters g_{ M }and g_{ m }controlling the width of this low frequency model. Note that this set of parameters is not so much correlated to the previous set controlling different features of the same part of the model.
The Factor Analysis reveals the internal structure of the crosstalking between parameters. As can be seen in the following example, crosstalking between parameters is unavoidable. Let us consider, for instance, the group formed by C_{ a }and ΔV/V. It participates exclusively in the envelope due to the beam energy spread. It can be easily seen in Eq. (7) that increases in C_{ a }can be compensated by decreases in ΔV/V and viceversa (this also explains the different signs of these two parameters with respect to Factor 7 in Table 5).
Our analysis of the effect of the variability of the CTF estimation on the CTF correction either through CTF phase correction or CTF amplitude correction shows that in the experiment performed, there is not a significant difference between the FSC of the volume corrected with the truly applied CTF and the FSC of the volume using the bootstrap ensemble. This would be pointing out that the different estimates around the true value obtained with the algorithm of [1] can be successfully used for CTF correction.
Finally, although not considered in this work, we would like to comment on the effect of the micrograph recording support (film and film scanner, or CCD camera). To the best of our knowledge, none of the CTF models published so far consider the effect of the Modulation Transfer Function (MTF) of the recording support. They are usually considered to behave as lowpass filters with a relatively flat bandpass region within which the microscopic information is supposed to fit. If the MTF actually modulated the amplitudes of the microscopic information, this would translate into variations of the areas of the CTF and the PSD analyzed in this paper, but not in variations in the positions of the zeros. This means that the MTF has no effect on the sensitivity analysis performed for the first zero of the CTF. The effect of a monotonically decaying MTF on the analysis performed in this paper would be a decrease in the overall sensitivity of all the parameters (since all the areas under the CTF and PSD would be smaller).
Conclusion
In this article we have devised a mathematical methodology to quantitative analyze CTF models. This mathematical framework gives a clue about the sensitivity of each CTF parameter, the origin of crosstalking between parameters and which parameters are more likely to induce crosstalking. At the same time, the use of our methodology also permits the estimation of the accuracy in the determination of each CTF parameter. For the CTF and PSD model of [1] we have shown that the most important parameters are the microscope voltage and the defoci, then a few parameters determining the background PSD revealing the importance of a good background fitting, and finally Q_{0} (representing the mixture of amplitude and phase contrast) and the chromatic aberration so that amplitude correction can be performed.
The bootstrap analysis performed has revealed the accuracy achieved in the estimation of each parameter. Generally speaking, the most sensitive parameters identified in the previous section are estimated with higher accuracy. In particular the most important parameters, voltage and defoci, are estimated with accuracies in the order of 0.01% and 1%. The bootstrap analysis also allowed to identify the internal structure of the model (which parameters influence in which). Applying Factor Analysis to the bootstrapped data, we have been able to divide the PSD parameters into seven groups each one accounting for a different aspect of the final PSD fitting.
We have also checked that if the PSD is less sensitive to a parameter, it does not mean that it can be safely removed from the model (in fact the hypothesis tests performed with the experimental parameter distribution estimated by bootstrapping indicate that they cannot be removed from the regression without losing modeling power). It rather means that we are allowed to commit a bigger error in its estimation without affecting too much the final result. Through the estimation of the empirical joint distribution of the model parameters we have shown that the background PSD model is crucial in order to have meaningful estimates of the CTF parameters.
Finally, we have checked whether the variability observed in the CTF detection affects or not the quality of the CTF correction, either phase or amplitude correction. In our experiments, the different estimates of the CTF do not significantly hinder the posterior CTF correction algorithms.
Although we have applied the sensitivity analysis to a single CTF model, the idea is general and can be applied to other CTF models in order to reveal their most sensitive parameters as well as the internal structure of the model as described through the factor analysis and the correlation between model parameters.
Methods
In order to make the paper selfcontained we briefly summarize the CTF model of [1], and then we proceed with the sensitivity analysis and the accuracy of the CTF estimates.
CTF model
where R ∈ ℝ^{2} denotes the spatial frequency in Å^{1}. The structure of this PSD is formed by two terms. The first one is the PSD of the noise colored by the CTF (represented by H (R)). The second one is the PSD after CTF and is referred to as "background" PSD.
being V the acceleration voltage of the microscope.
where C_{ a }is the chromatic aberration coefficient, and is the energy spread of the emitted electrons represented as a fraction of the nominal acceleration voltage.
where α is the semiangle of aperture.
where ΔF is the mechanical displacement perpendicular to the focal plane and ΔR, the displacement in the focal plane (drift).
The first term provides a constant baseline; the second term is a decaying exponential representing the background PSD behavior; the third and fourth terms of the model are intended to provide more flexibility in the PSD modeling process. All terms are assumed to be elliptically symmetric accounting for a possible anisotropy of the spectrum after convolution with the Point Spread Function (the realspace counterpart of the CTF). Parametrical models of the corresponding ellipses are given in Eq. 12. This model for the background was established purely on empirical basis without any theoretical support. To the best of the authors' knowledge there is no wellestablished physical model for the background noise, and the merits of the proposed models relay in their ability to fit the experimentally observed PSDs.
Sensitivity analysis
The CTF function H (R) depends only on R assuming that the estimated CTF parameters, , are fixed. However, if we consider the CTF parameters to be also variables, then we could define a new function (R, Θ) such that H (R) = (R, ). Because of the noise, we assume that the estimated parameters are not exactly the true parameters, Θ*, but a close approximation, ie, = Θ* + ΔΘ, being ΔΘ a small displacement around the true parameters.
where E{·} is the expectation operator with respect to the distribution of .
We propose to use to sort all CTF parameters according to their sensitivity. Parameters with low sensitivity may be estimated more roughly while the estimation of more sensible parameters has to be more careful. The sensitivity also reflects indirectly which are the most important parameters defining the characteristics of a given CTF. The more sensitive is a given parameter, the more important it is to estimate it correctly.
Accuracy of the CTF estimates
The problem solved in [1] can be regarded as a regression problem of the experimentally observed PSD as a function of the frequency. The model parameters are given by the PSD parameters described in the previous section. For determining the accuracy of each parameter in the model, an empirical distribution of each parameter can be constructed through bootstrap resampling of the measured data (the pairs frequencyexperimental PSD) [24]. For each resampled dataset, the PSD model parameters are estimated producing, thus, an ensemble of parameter estimates out of which the empirical distribution of each parameter is easily estimated. An important consequence of bootstrap resampling is that the distribution of the model parameters of the resampled datasets around the model parameters estimated from the whole dataset is the same as the distribution of the model parameters from the whole dataset around the true parameters. This allows to estimate many statistics of the unknown distribution of the model parameters estimated from the whole dataset from the bootstrapped distribution. In particular, we concentrate on two aspects: the estimation of the accuracy of each model parameter (computed as the percentage of variation of that parameter with respect to its nominal value,  ); and the computation of the confidence interval for each model parameter to test the hypothesis that each one is significantly different from zero (if they are, they cannot be removed from the model without losing part of the modeling power).
The empirical joint distribution of all parameters can also be computed using bootstrapping, and it can be used to estimate the possible crosstalking between model parameters through the computation of the correlation matrix from the bootstrapped ensemble. Statistically significant correlations show which parameters have an influence on other parameters: the larger the correlation coefficient in absolute value, the stronger the influence. In this way, for any model parameter we can construct a list of other variables in the model influencing it.
Careful observation of the influence lists easily pinpoints groups of variables where all of them influence all the others, as shown in the Results Section. However, it is not straightforward to manually identify these variable groups. For this purpose, we propose the use of factor analysis [25] to identify the underlying factors explaining the bootstrapped ensemble. The elements of the loading matrix provide an estimate of the correlation between the model parameters and the identified factors. Only statistically significant correlations are considered. As shown in the Results Section, each factor mainly correspond to a group of variables that are strongly interrelated plus a few of low correlated, although significantly, variables.
Declarations
Acknowledgements
The authors would like to thank the collaboration of Dr. Jonic from the Institut de Minéralogie et de Physique des Milieux Condensés (IMPMC, CNRS) in Paris for providing us with the GltS micrographs, and Dr. Núñez from the Centro de Investigaciones Biológicas (CSIC) for providing us with the LTag micrographs.
This work was funded by the European Union (projects FP6502828 and UE512092), the 3DEM European network (LSHGCT2004502828) and the ANR (PCV06142771), the Spanish Ministerio de Educación y Ciencias (CSD20060023, BIO200767150C01 and BIO200767150C03), the Spanish Fondo de Investigación Sanitaria (04/0683), Univ. San Pablo CEU (USPPPC 04/07) and the Comunidad de Madrid (SGEN01662006). The project described was supported by Award Number R01HL070472 from the National Heart, Lung, And Blood Institute. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Heart, Lung, And Blood Institute or the National Institutes of Health.
Authors’ Affiliations
References
 Sorzano COS, Jonic S, NúñezRamírez R, Boisset N, Carazo JM: Fast, robust and accurate determination of transmission electron microscopy contrast transfer function. J Struct Biol. 2007, 160(2):249–262. 10.1016/j.jsb.2007.08.013View ArticlePubMedGoogle Scholar
 Frank J: ThreeDimensional Electron Microscopy of Macromolecular Assemblies: Visualization of Biological Molecules in Their Native State. New York, USA: Oxford Univ. Press; 2006.View ArticleGoogle Scholar
 Huang Z, Baldwin PR, Mullapudi S, Penczek PA: Automated determination of parameters describing power spectra of micrograph images in electron microscopy. J Struct Biol. 2003, 144(1–2):79–94. 10.1016/j.jsb.2003.10.011View ArticlePubMedGoogle Scholar
 Mallick SP, Carragher B, Potter CS, Kriegman DJ: ACE: Automated CTF Estimation. Ultramicroscopy 2005, 104: 8–29. 10.1016/j.ultramic.2005.02.004View ArticlePubMedGoogle Scholar
 Mindell JA, Grigorieff N: Accurate determination of local defocus and specimen tilt in electron microscopy. J Struct Biol. 2003, 142(3):334–347. 10.1016/S10478477(03)000698View ArticlePubMedGoogle Scholar
 Saad A, Ludtke S, Jakana J, Rixon F, Tsuruta H, Chiu W: Fourier amplitude decay of electron cryomicroscopic images of single particles and effects on structure determination. J Struct Biol. 2001, 133(1):32–42. 10.1006/jsbi.2001.4330View ArticlePubMedGoogle Scholar
 Sander B, Golas MM, Stark H: Automatic CTF correction for single particles based upon multivariate statistical analysis of individual power spectra. J Struct Biol. 2003, 142(3):392–401. 10.1016/S10478477(03)000728View ArticlePubMedGoogle Scholar
 VelázquezMuriel JA, Sorzano COS, Fernández JJ, Carazo JM: A method for estimating the CTF in electron microscopy based on ARMA models and parameter adjusting. Ultramicroscopy 2003, 96: 17–35. 10.1016/S03043991(02)003777View ArticlePubMedGoogle Scholar
 Zhou ZH, Hardt S, Wang B, Sherman MB, Jakana J, Chiu W: CTF determination of images of iceembedded single particles using a graphics interface. J Struct Biol. 1996, 116(1):216–222. 10.1006/jsbi.1996.0033View ArticlePubMedGoogle Scholar
 Zhu J, Penczek PA, Schröder R, Frank J: ThreeDimensional Reconstruction with Contrast Transfer Function Correction from EnergyFiltered Cryoelectron Micrographs: Procedure and Application to the 70S Escherichia coli Ribosome. J Struct Biol. 1997, 118(3):197–219. 10.1006/jsbi.1997.3845View ArticlePubMedGoogle Scholar
 AvilaSakar AJ, Guan TL, Arad T, Schmid MF, Loke TW, Yonath A, Piefke J, Franceschi F, Chiu W: Electron cryomicroscopy of bacillus stearothermophilus 50S ribosomal subunits crystallized on phospholipid monolayers. J Molecular Biology 1994, 239: 689–697. 10.1006/jmbi.1994.1406View ArticleGoogle Scholar
 Fernández JJ, Sanjurjo J, Carazo JM: A spectral estimation approach to contrast transfer function detection in electron microscopy. Ultramicroscopy 1997, 68: 267–295. 10.1016/S03043991(97)000326View ArticleGoogle Scholar
 Welch PD: The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms. IEEE Trans. Audio Electroacoustics 1967, AU15: 70–73. 10.1109/TAU.1967.1161901View ArticleGoogle Scholar
 Jonic S, Sorzano COS, Cottevieille M, Larquet E, Boisset N: A novel method for improvement of visualization of power spectra for sorting cryoelectron micrographs and their local areas. J Struct Biol. 2007, 157(1):156–167. 10.1016/j.jsb.2006.06.014View ArticlePubMedGoogle Scholar
 Saltelli A, Chan K, Scott EM: Sensitivity analysis. Hoboken, New Jersey, USA: Wiley; 2000.Google Scholar
 Sorzano COS, Marabini R, Herman GT, Censor Y, Carazo JM: Transfer function restoration in 3D electron microscopy via iterative data refinement. Phys Med Biol. 2004, 49(4):509–522. 10.1088/00319155/49/4/003View ArticlePubMedGoogle Scholar
 GómezLorenzo M, Valle M, Frank J, Gruss C, Sorzano COS, Chen XS, Donate LE, Carazo JM: Large T antigen on the simian virus 40 origin of replication: a 3D snapshot prior to DNA replication. EMBO Journal 2003, 22: 6205–6213. 10.1093/emboj/cdg612PubMed CentralView ArticlePubMedGoogle Scholar
 Valle M, Chen XS, Donate LE, Fanning E, Carazo JM: Structural Basis for the Cooperative Assembly of Large T Antigen on the Origin of Replication. J Mol Biol. 2006, 357(4):1295–1305. 10.1016/j.jmb.2006.01.021View ArticlePubMedGoogle Scholar
 Cottevieille M, Larquet E, Jonic S, Petoukhov MV, Caprini G, Paravisi S, Svergun DI, Vanoni MA, Boisset N: The subnanometer resolution structure of the glutamate synthase 1.2MDa hexamer by cryoelectron microscopy and its oligomerization behavior in solution: functional implications. J Biol Chem 2008, 283(13):8237–8249. 10.1074/jbc.M708529200View ArticlePubMedGoogle Scholar
 Braig K, Adams PD, Brunger AT: Conformational Variability in the Refined Structure of the Chaperonin GroEL at 2.8 Å Resolution. Nature Structural Biology 1995, 2: 1083–1094. 10.1038/nsb12951083View ArticlePubMedGoogle Scholar
 Harauz G, van Heel M: Exact filters for general geometry three dimensional reconstruction. Optik 1986, 73: 146–156.Google Scholar
 Toyoshima C, Unwin NTP: Contrast transfer for frozenhydrated specimens: Determination from pairs of defocused images. Ultramicroscopy 1988, 25: 279–292. 10.1016/03043991(88)900034View ArticlePubMedGoogle Scholar
 Toyoshima C, Yonekura K, Sasabe H: Contrast transfer for frozenhydrated specimens II: Amplitude contrast at very low frequencies. Ultramicroscopy 1993, 48: 165–176. 10.1016/03043991(93)901792View ArticleGoogle Scholar
 Efron B, Tibshirani R: An introduction to the bootstrap. Boca Raton, Florida, USA: Chapman & Hall; 1993.View ArticleGoogle Scholar
 Dillon WR, Goldstein M: Multivariate analysis: Methods and applications. New York, USA: John Wiley; 1984.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.