Data Formats Working Group

From canSAS

Mailing List Archive

Timeline

  • 2007-12-31 agree on v1.0 format
  • 2008-01-01 start implementing v1 at facilities
  • 2008-06 representative sampling of data available for inter-facility comparison
  • 2008-10 presentation of results at NOBUGS2008 meeting (date TBA)

Considerations

  • a key point of what we discussed at NIST:

namely that our goal is to agree a format which that whilst using as much best XML practice as is reasonable, leaves the file instantly human-readable, editable in the simplest of editors, and importable by simple text import filters in programs that don't recognise the XML.

  • document what we decide
    • 1DWG will take care of documenting the format it defines.
      • make that definition with a schema (for absolute validation of any proposed XML file against the standard)
      • instructions on how to use that schema
      • XSL style sheets to present the XML contents in various forms (also serves as examples)
      • a couple of examples
      • maybe also some words.
    • move some of this discussion to
      • discussion page
      • other wiki pages
      • /dev/null after its usefulness has been exhausted
  • coordinate with other communities
  • should we consider a file naming convention?
  • should we consider a SAS scan naming convention?
    • sequential run number from facility
    • convention set by the detector software provider
  • XML representation of the I vs. Q data
    • tabular format
    • vector format
  • general XML coding style
    • readability by humans
      • with lots of computer skills
      • with rudimentary computer skills
    • readability by computers
    • availability of style sheets
  • scalability of XML format to 2D data?
  • What is required?
  • What is optional?
  • Use the same tags again in similar contexts
    • X,Y pairs for example, whether detector position, beam center, sample position
inconsistent consistent
<beam_size axis="x" units="mm">12.00</beam_size>
<beam_size axis="y" units="mm">12.00</beam_size>
<x0 units="mm">322.64</x0>
<y0 units="mm">327.68</y0>
<pixel_x units="mm">5.00</pixel_x>
<pixel_y units="mm">5.00</pixel_y>
<beam_size axis="x" units="mm">12.00</beam_size>
<beam_size axis="y" units="mm">12.00</beam_size>
<beam_center axis="x" units="mm">322.64</beam_center>
<beam_center axis="y" units="mm">327.68</beam_center>
<pixel_size  axis="x" units="mm">5.00</pixel_size>
<pixel_size  axis="y" units="mm">5.00</pixel_size>

Points for Discussion

  • Do we want to advocate/recommend particular names for particular tags; eg, SASdata, SASsample, Idata, etc.?
    • which ones?
  • provide for (optional) inclusion of sample prep details
  • provide for (optional) inclusion of other (non-SAS) data in the XML
  • Need to allow for more than a single SAS data set in one .xml file

Other Points

  • It's not clear how to specify that multiple runs were reduced together
(AJJ) Assuming that those multiple runs were first stored as XML then referencing the individual files would give all that back information (a la Ghosh suggestion). At NIST we take absolute I vs Q files and combine them to produce an absolute I vs Q file thus that is reasonable here. What about elsewhere?
  • How does one include the instrument information of the many runs that we used to make up the composite file
  • If we have reduction information, then everything needs to be in there, i.e. the run numbers for the can, the standard, the uniform field, etc.
  • Information on the averaging, is it radial, sector, rectangular, etc.

Members

  • Andrew Jackson (NIST)
  • Pete Jemian (APS)
  • Steve King (ISIS)
  • Ken Littrell (ORNL)
  • Andy Nelson (ANSTO)
  • Ron Ghosh (ILL)
  • Jan Ilavsky (APS)

News/Status