[Return | SoftNeSS Home Page]

Report of Working Group on Data Exchange Formats

Workshop Goals

  1. Determine the feasibility of adopting a common file standard for data exchange
    1. within the U.S. neutron scattering community
    2. within the international neutron scattering community
    3. between the synchrotron and neutron scattering communities
  2. If feasible, begin development of such a standard.

Feasibility

This working group was extremely fortunate to include among its members Mark Koennecke, who has been developing a data exchange standard for neutron scattering for the European Community [1] and Jon Tischler, who has been developing a data exchange standard for the Advanced Photon Source CATs [2]. Building on the work done by these two, Przemek Klosowski (chairman of this working group) produced a document [3] combining some of these ideas to provide a further indication of how such a standard might work for neutron scattering data. These three documents formed the bases for discussion in the working group.

There was a strong consensus among the workshop participants that a standard for data exchange would be an important development, and would greatly ease many of the problems encountered in visualizing and analyzing data from many sources. This would be particularly important to the outside users who are not affiliated with any of the neutron sources and who frequently collect data from similar instruments at different sources. Furthermore, the large amount of preparatory work which had already been done by Tischler, Koennecke, and Klosowski indicated that it should be practical to develop such a data exchange standard. The working group decided to proceed with this development since it appeared to be technically feasible and to have the support of the US neutron scattering community representatives present. Also, since representatives of the European neutron scattering community and the US X-ray scattering community were included in the working group, it appeared likely that a standard could be developed that was accepted by the neutron and X-ray community worldwide.


Four levels of standardization

The working group identified four levels of standardization which need to be developed:

  1. Underlying file format

    The file format must satisfy the following conditions:

    This last requirement means in essence that the data format must be extensible and self-describing.

    A number of different standard file formats were initially considered, but of these, only two seemed to merit more serious consideration. These were the Network Common Data Format (netCDF) developed and supported by Unidata and the Hierarchical Data Format (HDF) developed and supported by the National Center for Supercomputing Applications (NCSA). The initial choice for the European neutron scattering standard was netCDF, since the earlier versions of HDF had some deficiencies in the ability to describe data [1]. However, starting with version 3.3, release 3 (HDF3.3r3), HDF provides the necessary capability to set dimension scales, attributes (like units), etc. for the data [2]. Since the hierarchical arrangement of HDF offers some additional advantages (e.g., the ability to put multiple scans in one file), HDF has now become the standard of choice, and was the standard initially suggested for use by the APS CATs [2]. Furthermore, the current version of HDF incorporates essentially all of netCDF as a subset.

    As a consequence, the working group decided to adopt the HDF standard. This open non-proprietary standard for storing self-describing data can be transferred to different hardware platforms transparently, and is widely used and well-supported. It is available on all operating systems in common use, is supported by a large number of visualization applications, and has a number of utilities to facilitate its implementation.

  2. Structures used for representing data

    If the underlying file format (HDF) defines the syntax of the data standard, then the next level addresses the issue of its semantics. The data organization must be sufficiently defined so that both the data files and the programs written at different locations will agree on the packaging of like information. This additional structure is just an agreement on how various types of information would be included in a HDF file, and in no way removes the basic portability of HDF files. HDF-aware visualization tools, file content listing and editing programs, etc., would still work with the file regardless of the nature of this secondary structure.

    The time available at the workshop did not permit much discussion of the structure at this level. Therefore it was agreed that a committee consisting of Klosowski, Koennecke, and Tischler would develop a draft document defining the necessary structures based on a combination of those proposed in the supporting documents [1-3].

  3. Dictionary of variable names

    In keeping with the linguistic analogy, the next requirement is for a common vocabulary or glossary that can be used in referring to the data stored in the standard files. Since the files are to be self-describing, it is essential that all terms used to locate data within the files have unique definitions. For example, if a program is looking for sample temperatures as a variable named "temperature" it would probably not recognize temperatures stored with a variable name "temp". Similarly disastrous consequences might follow if one laboratory used the variable name "temperature" to represent sample temperatures while another used the variable name "temperature" to represent the ambient temperature around the instrument components. Therefore it is important to develop an accepted set of unique definitions for all variables which are anticipated to be stored in these files.

    Significant efforts in this direction are already available in the work of Klosowski, Koennecke, and Tischler [1-3]. Therefore these three agreed to try to reconcile any conflicts between the dictionaries they have so far developed, and to extend the resulting master dictionary as necessary to encompass any additional variables identified.

  4. Lists of required variables

    As the final level of the standard, specifications will be provided to define what variables must be provided if the file is to be used for a specific purpose. For example, if the only intended use of the file is to provide a simple x-y plot of a data set, then only a minimal amount of information need be present (e.g., name and address of the file owner, some identification of the data, and the x and y data sets) although additional information might be recommended (e.g., units for the data, choice of dependent variable). However, if the file contains experimental data from a time-of-flight chopper spectrometer intended for use in a full data analysis, considerably more information will be needed by the data analysis program (e.g., source-to-chopper, chopper-to-sample, and sample-to-detector distances; angles for the detectors; chopper parameters including type of chopper and all variables relevant to its resolution and transmission; units for the data arrays; etc.).

    The working group felt that representative groups for each type of instrument should be established to identify the variables which are essential for analysis of the corresponding data. The representatives from the various laboratories who were present as part of this working group will serve as the initial points of contact. They will work to identify the various types of instruments in use at their institutions, and also to identify a local representative for each type of instrument. These local instrument representatives from the various laboratories would then be the natural group to identify the essential variables for that instrument type.


Development Process

The working group proposed to produce a draft document covering the first three levels of the standard which would then be submitted to the scattering community for discussion and refinement. At the same time, the various representative groups for the different types of instruments would proceed with the development of the lists of variables "required" for analysis of data from their respective instrument types.

The final adoption of a standard would follow extensive consultation with the international neutron and X-ray communities to ensure its wide acceptance and eventual use.

Once the standard is developed and accepted, it would be up to representatives at each facility to write the routines which convert from their existing data file structures to the new standard file structure. Examples would also be provided showing how to adapt existing data analysis software to utilize the new standard files.


How would such a standard work?

Although there would be value in conforming to the standards prescribed in each of these four areas, maximum benefit would be derived by complete adherence to the whole standard.

Individual facilities would probably continue to use their own local formats for storing and archiving data. These data files could then be converted to the new standard format as required (e.g., for a user to take the data away to be analyzed elsewhere, or for a local person to convert the data into a format which can be read by a "universal" analysis or visualization package). This approach allows adoption of the standard with only minimal disruption at the local level.

No one would be forced to use the new standard. The people maintaining specific software packages could decide whether to interface their specific packages to the new standard. However, it is anticipated that the benefits to be gained will induce many people to convert their analysis, visualization, etc., codes to this standard relatively quickly.


Advantages

Implementation of such a standard would make it possible to use your favorite program in a transparent manner to analyze or visualize data collected at different facilities, without having to worry about the idiosyncrasies of the local data file formats.

Only a single conversion routine would be needed at each instrument, rather than requiring a multitude of such routines.

It would facilitate the use of the same software packages at multiple sites, thus minimizing the amount of effort which must be put into developing new data handling, analysis, and visualization software.


Summary

There was a consensus at the workshop that it was important to develop a standard format for data exchange. It is particularly timely to do so now, as it will be possible to build upon the considerable efforts which have already gone into the development of such standards for the European neutron scattering community and the US synchrotron community. By acting now, it should be possible to merge all these standardization efforts to produce a standard which could be used by all the international scattering communities, both neutron and X-ray. This decision to develop such a standard represents an important step in cooperation among the US neutron scattering laboratories, the international neutron scattering community, and the X-ray scattering community. It is hoped that collaborative efforts such as this will eventually lead to significant savings in the effort required for software development within all of these communities.

Considerable progress was made during the course of the workshop in defining the framework for the standard and in defining the process by which the details of the standard will be developed. It is intended that documents describing the proposed standards in preliminary form will be available for scrutiny by the various affected communities early in 1995. At that point it may be appropriate to convene another workshop to discuss any necessary modifications or refinements to these standards.


Supporting Documents

  1. Mark Koennecke, ISIS. (Hired with European Community funds to develop a European Standard.) "Proposal for a European Neutron Scattering Data Exchange Format", version 4.2, June 17, 1994.
  2. Jon Tischler, ORNL and UNI-CAT. (Chairman of the data standards committee for the APS CATs.) "Draft of Proposed Data Standards for the APS", version 6, August 19, 1994.
  3. Przemek Klosowski, NIST. "HADES -- HDF-based Data Exchange Standard, A Proposal for the Neutron Scattering Community" (note -- this also includes code to convert between the NIST and HADES data file formats). revision 1.6, October 5, 1994.

[Return | SoftNeSS Home Page]

Comments to: Ray Osborn <ROsborn@anl.gov>
Revised: Saturday, November 9, 1996

Copyright © 1994, Kent Crawford, Argonne National Laboratory. All rights reserved.