ASAC Submission, Computing (Software) Division
2001-04-09

B.E. Glendenning, G. Raffi

In this submission we briefly outline our position on some issues that were posed to us by A.Wootten and E. van Dishoeck on behalf of the ASAC or were mentioned in the February 2001 (Florence) ASAC report.

 

  1. ALMA Pipeline
  2. The ALMA Science Software Requirements (SSR) committee is presently working on detailed post-processing requirements, including pipeline aspects.

    There are two aspects of a pipeline:

    1. The computational "engines" that perform the calculations on the data; and
    2. The "infrastructure" that stages the data from the archive and moves it through the appropriate engines.

    It is clearly desirable that the "engines" be implemented in a post-processing system that observers have access to at their home institutions (for example, so they can modify the pipeline scripts to attempt to improve their images). It is also important and practical to assume that

    post-processing software can be re-used here, as much as it is suitable or applicable.

    For the "infrastructure" we would anticipate that an existing system, (e.g., the STScI "OPUS" system), which is not necessarily a post-processing system, should be investigated and might be adopted and adapted as necessary. This decision will be made during the detailed design of the pipeline system.

  3. Post-Processing System
  4. The ALMA construction budget does not have provision for development of an off-line data reduction (or post processing) system but only for the definition of interfaces towards a system that should fulfill ALMA reduction needs. In particular, the provision for the fundamental data product of ALMA being in a package-neutral FITS format ensures that ALMA data can be processed by more than one package

    At present, the ALMA Science Software Requirements (SSR) committee is working on producing a document with the requirements that the ALMA array will place on a post-processing system. Once these requirements are in hand a "reuse" analysis of existing packages will be undertaken to "score" how far existing packages are from ALMA needs. Also, the fulfillment of ALMA requirements will be tracked over time by ALMA management.

    Additionally, a proposal from Europe on a demonstration project to assess the suitability of AIPS++ for ALMA has been submitted to the NRAO. The project involves various calibration and imaging operations (including combining single dish and interferometric data) on IRAM interferometer data. This proposal has provisionally been accepted, although discussion of timescale and success criteria have not yet taken place.

    The ALMA Project book section 12.7 repeats the baseline assumption so far, namely:

    The AIPS++ package (http://www.aips2.nrao.edu) will be developed as required to cope with general ALMA data processing needs. The ALMA data products will be written in a FITS-based format so that other packages may be used for special-processing or user preference. This section will be expanded in a later revision.

    Note that the AIPS++ extra development is assumed here to be entirely funded from the operational budgets of the members of the AIPS++ consortium (ATNF, BIMA, JBO/MERLIN, NFRA, NRAO).

    Post-Processing data reduction and perhaps the development of new algorithms should be seen to be within the scope of the activities of the Regional Data Centers. While the activities of these should also be coordinated, this does not fall under the direct responsibilities or budget

    of the computing division for ALMA Phase 2. The "imaging and calibration" group is of course funded to investigate imaging and calibration techniques for ALMA.

  5. Data Rates
  6. The current specification on data rates is (project book, section 12.1):

    Sustained data rate, science data

    6 MB/s (Average)
    60 MB/s (Peak sustained)

    This rate essentially comes from ALMA (MMA) Memo #164 (Scott et. al. 1996) corrected for the number of baselines in ALMA and further doubled to account for pixel as well as UV-data. Note that the baseline correlator is capable of much higher rates, and accommodating them would be accomplished through computer upgrades without requiring any interface or other changes to the correlator.

    ALMA Memo #164 can be found at:

    http://www.alma.nrao.edu/memos/html-memos/abstracts/abs164.html

    And a more recent discussion by Scott and Myers (2000) can be found in the following software note:

    http://www.alma.nrao.edu/development/computing/docs/joint/notes/DataRates.pdf

    It is not presently known whether data will flow from Chile to the other data centers via a network connection or by shipment of magnetic or optical media. ESO has a contract with a Chilean consulting firm to determine the answer to this question. In any event we expect there to be a network connection sufficient for observers to follow the course of their observations (quick-look images and data inspection).

    We would agree that once decisions about the ACA and future correlator developments have been made that it would be prudent to redetermine the data rates. Approximately speaking, increasing the average data rate affects costs (how much storage and network capacity you must purchase) whereas increasing the peak rate affects technology. For example, doubling the current peak data rate would mean that we could not use Gigabit Ethernet in a straightforward way. This is an area where there is an interplay between the scientifically desired data rate and the available technology.

     

  7. Archive
  8. Connected to the data rates is the concept of Archive. We assume that there shall be a master archive in Chile (probably in Santiago) and archive copies at Regional Centers in Europe, US and Japan. These centers should also combine the functions of user support for Pipeline reduction

    (if repeated or improved) and for off-line data processing. The archive should be built according to state-of-art technology and following the most recent interface concepts (like Astronomy Virtual Observatory definitions), allowing inter-operability independently of the physical location of data. This work is fully foreseen for Phase 2.

    The archive system should be open to the possibility of re-processing data. While this capability should be built in the archive and tested with Pipelines, we understand that ALMA development responsibility will end with the archival of data. The Regional Data Centers should play a role both in the data distribution phase and in support for archival research. They might extend of course the scope of the archive system adding contributions about search and reduction algorithms.

  9. Operational Model

R. Lucas on behalf of the SSR is preparing a document outlining the assumptions and some questions that the SSR has made in their work to date.

Some questions within the analysis group have been raised about these assumptions. They refer to the feasibility and cost of some requirements and also to the development strategy in time. Features like fully automatic operation, implying full checking of scripts in advance, data quality checks, interaction with users, modification of scripts at break-points etc might in fact represent a very high effort and might not be possible to be implemented exactly in the requested form within the limits of a reasonable budget for software development. These questions may result in a separate document.

We would like some agenda time in an upcoming ASAC meeting to discuss these issues.