Introduction We now have at least 3 methods of simulating ALMA imaging with different configurations (Heddle's in AIPS, Viallefond/Guilloteau's in GILDAS, and Holdaway/Morita's in SDE), with more possibly on the horizon (AIPS++ hopefully). I may have left some out (miriad, e.g.), but the point is just that we do have the capability now to do some real imaging simulations and compare the results. The question now is how do we make comparisons between the different arrays, given the results of the simulations? One way, which has been already used to some extent on the results of Heddle's simulations, is to simply qualitatively examine the images and differenced images, and make statements regarding the relative "quality" of the different configurations based on that. All along, we have also had the idea that eventually we come up with some standard _metrics_ which would be quantitatively based and would be used for true comparison. We seem, however, to have gotten bogged down in the details, and have failed to come to any consensus as to which metric (or combination of them) to use. This is partly because nobody has been willing to assume the leadership in this respect, and so we have waffled along with little progress. Separately from this, there is the fact that a better metric of this type does not necessarily imply a better overall configuration design - most importantly because of what we have been calling "operational issues." I don't know how to rationally address that issue, because it quickly degenerates into a subjective and political one, and is hard to get a good handle on. However, I will note that we have had no good exposition of exactly what "operational issues" really entails in its entirety (the one issue that gets stressed is how often we move antennas, but what are the others?), much less whether these issues favor one configuration type over another. Even after such an exposition, it is still useful to have the information on which is the better configuration "scientifically" (I use the term broadly), so that some weighting of the relative importance of operational vs. scientific quality can be specified, and a final decision on configuration type can be made. I note that there are other issues which affect the scientific quality of a configuration design which cannot be strictly set down as a quantitative measure in the way that I'm discussing here (as a simple e.g., repeatability of antenna positions for monitoring observations), and they will have to be folded into the discussion in the end. At any rate, I think it still behooves us to come up with the quantitative metrics as soon as possible, and hence I will address some of them here, and then make a real suggestion at the end. Types of Metrics They generally fall into 2 categories: uv-based metrics, and image-based metrics. We have been recently concentrating on the image-plane ones, but I don't think we should necessarily completely forget about the uv-based ones. uv-based Metrics In my opinion, the attractiness of uv-based metrics is that they are not dependent on the source structure, but rather only on the source declination and hour angle range of observation. Also, they tend to be simple to calculate. 1. fraction of occupied cells Simply count the number of occupied cells in the uv space for a given observation. The selection of the uv cell size is somewhat arbitrary, but should probably be roughly the half the antenna diameter? The earliest place I can find where this metric is actually explicitly calculated is Bob Hjellming's MMA memo 30, where he tabulates Nocc/Ntheo, which is equivalent to the fraction of occupied cells. However, the idea is certainly older than that, i.e., that one desires "complete" uv coverage in observations (Cornwell maybe instigated this in MMA thinking, but it is probably much older than that, harkening back to the old Ryle idea of complete coverage...). More recently, Holdaway & Morita have used this measure. 2. large-scale "smoothness" of cell population (I borrowed the terminology [and description] from Ed Fomalont). from Ed's email: I suggest 'gridding' the uv plane into large cells, say 10x10 original cells for the smaller configurations and 100x100 for the larger configurations, and determining the data weight in each of these big cells. By data weight I mean the integration time of the data in each of these cells, but other weighting schemes could be used. For a 'good' array, this average uv coverage should be a smoothly decreasing function of distance from zero spacing, the smoother the better. Fit the distribution of these uv densities to the best elliptical Gaussian (maybe something else is better like a density related to the inverse distance from the center). The rms deviation of the average uv distribution to this best fitting Gaussian is a measure of the overall smoothness of the actual uv coverage. A normalized metric which measures this overall smoothness of the uv coverage could be: M2 = SUM(i){ [(W(i)-E(i)]**2 } / NT*NT W(i) is the weight of data in the ith big cells E(i) is the weight of the best fitting elliptical Gaussian to the distribution of W(i) over the uv plane. (any tapering of the data included before this gridding) NT is the total weight of data. Of course, all array uv coverage will have a central hole with a size of the diameter of the array telescope. This hole can or can not be included in the calculation of this metric, whatever is felt most appropriate. end of this topic in Ed's email. I note here that rather than fitting a elliptical Gaussian, one might wish to use some other function, e.g., uniform (flat), Blackman-Harris, Kaiser-Bessel, etc. Also, as a simplification, one might wish to do azimuthal binning in uv space, to obtain a single radial profile, rather than doing the full 2-D fit and deviation. 3. detailed (smaller-scale) "lumpiness" of cell population (terminology and description again borrowed from Ed). from Ed's email: For each of the big uv cells which were used in the above smoothness calculation, calculate the following: M3 = SUM(i){ SUM(j) {[W(i)/n - w(i,j)]**2 } } / NT*NT w(i,j) is the weight of the jth uv cell in the ith big cell n is the number of little cells in each big cell end of this topic in Ed's email. Another way of defining this is, e.g., calculating something like what they did for the VLBA "quality metric" - calculate for each uv cell the distance to the nearest uv data point, square it, and sum this over all cells. Do this for various declinations, then sum over those for the overall metric. An even simpler proxy of this might be to calculate the size of the largest "hole" in the uv plane. IIRC, Adrian Webster did some of this when looking at designs of the most compact configuration. 4. the Visibility SNR (VSNR) curve. Defined in Cornwell et al. (1993) [but see also Holdaway 1990]. Take the FT of the difference image, average in radial bins, and divide this into the radially binned FT of the model. This is really a hybrid between uv- and image-based metrics, but since what comes out is defined in the uv plane, I put it in here with the uv-based metrics. Note that this metric *is* explicitly source structure dependent, unlike the previous 3, and it is a bit trickier/more complicated to compute. image-based Metrics I would divide these into 2 subclasses: beam-based, and true image-based. beam-based As with the uv-based metrics, the attractiness of beam-based metrics is that they are not dependent on the source structure, and are also simple to calculate. 1. amplitude of maximum positive sidelobe. Here, there can be a distinction between "near-in" and "far-out" sidelobes if one wishes (and there probably should be, I guess?). Note that the amplitude of the maximum negative sidelobe is defined to be 1/N for N antennas - at least in the case of natural weighting. 2. beam sidelobe rms This can be analytically defined in uv space also, so is kind of a hybrid between a uv- and a beam-based metric (see e.g., Cornwell 1984). As for the maximum sidelobe, the "near-in" vs. "far-out" distinction can be made. 3. how close is the central lobe of the beam to a Gaussian? 4. what is the relative amount of "power" in the central lobe to that in the sidelobes? image-based In my opinion, the attractiness of image-based metrics is that they are more intuitive to us as astronomers, i.e., we are accustomed to dealing with images, and used to seeing the errors associated with, e.g., incomplete uv plane sampling. We are also accustomed to dealing with some of these metrics (the dynamic range, e.g.) directly. Also, at least some of them tend to be simple to calculate. 1. Dynamic Range Usually the astronomer defines this as the peak in the image to the off-source rms. One could also do this using the on-source rms, but that would be done separately to the off-source calculation (it makes no sense to me to combine the two). We have had some discussion about how to define "on-source" vs. "off-source". I must admit that I don't see the problem in this - we have the models which went into the simulations, so just pick some level which is less than the desired final dynamic range (IIRC, we've spec'ed this at 10^6) and define those pixels in the model with flux density > that cutoff as "on-source". This is no more arbitrary than the definition of the models which go into the simulation in the first place, IMHO. As a subclass of this, it might also be interesting to find the peak (both positive and negative) off-source, in addition to the rms (since [at least with VLA data] the off-source noise is often non-Gaussian). This gives some indication of the possibility of "false-detections." 2. Fidelity A generic description of this quantity is that it is defined for a given pixel as the ratio of the flux density in the input model at that pixel to the absolute value of the flux density in the difference image (the input model [convolved to the correct resolution] minus the simulated/restored image) at that pixel. In general, this only makes sense for "on-source" pixels, and in practice, this might involve some lower level cutoff in the difference image. Now, combinations over pixels can be formed, in order to come up with one (or a few) numbers which attempt to quantify the whole image. In the simplest case, one might take all of the "on-source" pixel fidelities, and take the median. In more complicated cases, one could consider taking only pixels above some flux density (probably ratioed to the peak), and calculating the median of that set of pixels - repeat this for many different levels and a histogram can be constructed. I would also suggest that in each of these histogram bins we calculate the min fidelity as well as the median. Mark Holdaway has suggested a possible variant of this which he calls "moment fidelity" - in this case, the fidelity is weighted by the flux density at that pixel in the convolved model: f_i = w_i * model / abs( model - reconstruction ) where w_i = F_i / sum_j{F_j}, i.e., w_i is the weighted flux density in pixel i (normalized by the flux density in all pixels). Another possibility is to take the mean fidelity at different spatial scales - this allows you to find, e.g., striping (as the logical conclusion of this, just take the FT of the difference image and analyze that). 3. fractional error Just take, for each "on-source" pixel, the fractional error as the inverse of the fidelity. Then, quantities as described above for fidelity can be calculated in a similar way. This avoids a divide by 0 problem when the reconstructed image is exactly equal to the convolved model (infinite fidelity). 4. Dave Woody has made a suggestion: What about doing a simple linear fit of the (diff-map)^2 to A + B*(original simulation image)^2 ? 1/sqrt(B) would be interpreted as the fidelity, i.e., the errors in the map that are proportional to the image. 1/sqrt(A) would be the "off-source" dynamic range. This fit should not be computationally time consuming or difficult to code. 5. ability to distinguish near-by multiple sources This is a metric that we haven't really discussed before, but is a very standard one in the discussion of filters in signal processing. The point is that even though two configurations may have the same "resolution" (which we generally take as the full-width of the best fit Gaussian to the central lobe of the synthesized beam), one may still be better than the other at distinguishing two very near-by point sources. One might be able to analytically define this in uv space, but it would be relatively easy to make a simulation image which would test this property (a modification of John Conway's DOTS image, with more well-defined [rather than random] point source placement). The metric which comes out of this is the minimum detectable separation for two point sources. A modification of this test is to have one of the point sources be much stronger than the other (10000:1 or even more?). Also, issues of whether having the point sources centered on pixels or not could be explored. 6. random uv generation Mel Wright used a method where he randomly sampled the uv plane with some number of data points, then compared that to the original image (with both point source and eye chart models). This suggests to me a possible metric, where a given configuration is compared against either completely random uv sampling, or possibly random antenna placement. Recommendations My recommendation is to use the following set of metrics: 1 - All uv-based except the VSNR, and numbers 1 and 2 of the beam-based. I think we should do these because they are easy, and they are source independent. We can sort out the details after agreeing that we actually want to calculate them (e.g., exactly how to pick size and extent of uv cells, details of the "smoothness" and "lumpiness" metrics, where to set the cutoff between near-in and far-out sidelobes if we want to, etc.). The real reason to include these is to see if there is a uv-based metric that is a good proxy for the image-based metrics. In this way, a uv-based metric might be identified that was as good at distinguishing as any of the image-based ones, and the benefit of source stucture independence would then be retained through its use. 2 - Dynamic range (using off-source rms). The reason to include this is because it is something that astronomers are used to, and is the only metric in real observations that means anything (because we cannot measure true "fidelity" in reality, e.g.). We can decide on details after agreeing to calculate this beast (specifically how to specify "on-source" vs. "off-source", e.g.). 3 - Histogram of fractional error vs. pixel flux density. The reason to include this is because it is probably the best indicator of true imaging quality. The problem is that the metric is heavily biased toward the particular source being modeled (we have already discussed the impact of this WRT short spacings). Note that I prefer fractional error instead of fidelity, due to reasons we have already discussed. We can decide on details after agreeing to calculate this beast (how many bins, how to specify them, etc.). We should use these in a first iteration, then see if there is one (or a few) that are particularly good at indicating "quality". This is a bit slippery, as it is unclear how to absolutely define quality (which is why we are having all of this extended discussion in the first place), but I think we should start with a larger set of possible metrics and gain some experience with them, and then narrow it down at a (not so far away) future date. References Ryle & Hewish, The Syntheis of Large Radio Telescopes, MNRAS, 1960 Harris, On the Use of Windows for Harmonic Analysis with the Discrete Fourier Transform, Proc. IEEE, 66, 51-83, 1978 Mutel & Gaume, A Design Study for a Dedicated VLBI Array, VLBA Memo 84, 1982 Walker, Fast Quality Measure, VLBA Memo 144, 1982 Cornwell, Quality Indicators for the MM Array, MMA Memo 18, 1984 Hjellming, The 90 meter Configuration of the Proposed NRAO mm Array, MMA Memo 30, 1985 Cornwell, Crystalline Antenna Arrays, MMA Memo 38, 1986 Holdaway, Imaging Characteristics of a Homogeneous Millimeter Array, MMA Memo 61, 1990 Holdaway, Evaluating the MMA Compact Configuration Designs, MMA Memo 81, 1992 Cornwell, Holdaway, & Uson, Radio-interferometric imaging of very large objects: implications for array design, A&A, 271, 697-713, 1993 Holdaway, Foster, & Morita, Fitting a 12km Configuration on the Chajnantor Site, MMA Memo 153, 1996 Holdaway, What Fourier Plane Coverage is Right for the MMA?, MMA Memo 156, 1996 Keto, The Shapes of Cross-Correlation Interferometers, ApJ, 475, 843-852, 1997 Holdaway, Effects of Pointing Errors on Mosaic Images with 8m, 12m, and 15m Dishes, MMA Memo 178, 1997 Helfer & Holdaway, Design Concepts for Strawperson Antenna Configurations for the MMA, MMA Memo 198, 1998 Holdaway, Hour Angle Ranges for Configuration Optimization, MMA Memo 201, 1998 Wright, Image Fidelity, BIMA memo 73, 1999 there are a couple of more obscure references to Morita's work that I couldn't get proper references for or full copies of but which probably have relevant information: Morita, Array Configuration of Large Radio Interferometers for Astronomical Observations, National Astronomical Observatory, NRO-TR-56, 1997 Morita, Ishiguro, & Holdaway, Array Configuration of the Large Millimeter and Submillimeter Array (LMSA), URSI-GA, 1996