The GETADE 'Modern' Oceanographic Format

Introduction

At a meeting held in Copenhagen in May, 1993, some members of the IODE Group of Experts on Technical Aspects of Data Exchange (TADE) met to discuss a modernization of the IODE standard format GF3 for international data exchange. The results were a definition of a newer format that built on the strong points of GF3 but removed its media dependence. This format is called the GETADE format and it is described here.

Format Structure

The File/Series Header Definition and Data Cycle Definition Records of GF3 were generalized to remove all mandatory and fixed format portions of the records. By doing this, the two records reduce to a single Definition Record. These records are used to specify the format of records that follow. The File/Series Header and Data Cycle Records of GF3 were treated in the same way as the definition records in that all fixed format portions were removed. Again this makes these two GF3 record types look the same and basically contain data only. The first two bytes of each record contains an identifier to indicate what type of record it is, that is which definition record describes the contents of the record. None of the other GF3 record types were used. The resulting formatting scheme is quite simple yet flexible enough to accommodate any data structure. Each record may be of a different length although it is expected that the Definition records will typically be 80 characters or less. The data records may be any length up to the limits imposed by hardware or software. The formatting scheme has the following characteristics.
  1. The format builds on the strongest features of GF3. It uses the coding scheme and structuring of information in definition records.
  2. The format scheme permits more freedom in structuring data than does GF3. This has been done by removing some of the more rigid rules of GF3. In so doing, the formatting scheme has become simpler.
  3. Metadata may be informally recorded with the data in the same fashion as in GF3. Alternatively, if desirable, a more formal structure can be specified for the metadata which makes it more readily accessible for machine processing.
  4. By using the parameter codes developed in GF3, the formatting scheme can include a wide variety of data. A review of these code tables is underway to expand the range of variables even further.
  5. By using GF3 parameter codes, the formatting scheme encourages the standardization of naming conventions and the use of SI units. This encourages compatibility of data sets for international exchange.
  6. The formatting scheme has been structured to reflect the very common tabular nature of data sets. The same structure appears in widely available commercial spreadsheet software.
  7. The hierarchical nature of oceanographic data collections has been preserved by using the same features as GF3.
  8. The data and information are encoded using the ASCII characters. At present the need for including rules to permit data and information exchange in binary have not been addressed. Differing hardware encoding of binary data introduces complications.

Formatting Rules

  1. There are two record types. The first is a Definition Record and the second is a Data Record. If desired, a single title line may be added at the start of a file to document the source of the file.
  2. All data and information is encoded in ASCII.
  3. Each variable to be reported in a data record must be described in a corresponding definition record. Every definition record will be found at the start of the data file. Each will reference a record type designation that indicates in which record type and in which byte in the record where the variable may be found.
  4. Variables are presented in the data record in the same order as they appear in corresponding Definition records. The formatting of the data in the record is given by the characteristics of each variable.

Definition Records

There is no maximum number of definition records allowed. However, there is a maximum of approximately 100 different record types permitted. This number is a limitation imposed by the allowance of using all printable ASCII characters as definition record indicators. The first definition record referenced is the record type that indicates the cycling of information. For example, with oceanographic station data, the first definition record should be that to record information about the station. Then, each time this record type is encountered, it would indicate a new station. Any number of tables of information could be presented about a station. There is an attribute in the definition record to indicate what sort of delimiter is placed in an output record. The use of the delimiter will allow ready incorporation of the information into spreadsheet software. Automatic reading of a data record with delimiters must take into consideration the presence of these delimiter characters. While the format description permits the use of different delimiters in different record types, this should not be done. The use of a variety of delimiters in a single data set complicates the reading software and defeats the purpose of using a delimiter to allow ready reading of the data records by a spreadsheet program. The definition record makes use of GF3 parameter codes. As part of this scheme it is permissable for a user to generate their own code. Where possible this should be avoided since it requires special handling of the data to determine what are the data described by the new code. Information about new codes or current code tables is available from RNODC-Formats located at ICES. Values reported using the format can be rescaled by multiplicative and additive factors. To convert from the recorded values to observed values the following is used. observed value = recorded value * multiplier factor + additive factor The Definition Record has the following contents. The number of definition records repeat as required to specify the contents of data records to follow. All parameters for a record should be recorded one after another in the order in which they appear.

Byte

NumberWidthDescription
12Record type. For each Definition record, this byte should be set to 00 .
32Definition record identifier. The first definition record is the primary one and indicates the basic unit of the data. These may be defined using any printable ASCII character.
58The GF3 code of the parameter.
141The type of variable. This may be 'I' for integer, 'F' for real numbers or 'A' for character variables.
154The number of bytes for the parameter value. Note that numeric values should be left justified while character values should be right justified.
193The dummy value to be used if values are missing. These are the same as used by GF3. If the field is numeric, the use of 9s is recommended. If the field is character, any repeated character (such as NNNN in an A4 field) would do, but by default blanks are used.
228The factor by which the stored data are to be multiplied.
308The factor which should be added to the stored value of the data.
382A field to specify the ASCII code of the character used as a delimiter. If the value is a printable ASCII character (between 30 and 127) this is the character used in the data. If it is outside of the range, no delimiter is used.
4040The plain language descriptor of the parameter. This is used simply to describe the parameter code.

Data Records

Data records always have the record type indicator (as defined in the corresponding definition record) as the first 2 bytes of a record. They may be of any length desired. The length is determined by the data to be encoded, their inherent structures and limitations of hardware or media. The contents of each data record is completely determined by the corresponding definition record. While in priniciple unnecessary, it is often helpful to ensure a blank character between values. This not only improves readability but also eases the programming task of reading the data in certain computer languages. In some circumstances it may be desireable for a sort key to be present in each record. This is possible by the user defining the sort key as a separate variable to be positioned as the first variable in each data record. Alternatively, the variables on which a sort should take place may be placed as the first in each data record. The user has complete freedom in defining a sort key as needed.

Example

The following example illustrates how the WOCE WHP Data Summary Report, and the Water Sample Report could be encoded using this format. A sample of the file is shown.

A SAMPLE FILE STRUCTURE FOR WOCE WHP .SUM AND .SEA FILES.

    0        1         2         3         4         5         6         7         8
    12345678901234567890123456789012345678901234567890123456789012345678901234567890
    0001CNSH2XXN A4                        NODC country and ship code
    0001IDEN4CRN A5                        WOCE cruise number
    0001IDEN4SEN A3                        WOCE section identifier
    0001IDEN4STN A4                        WHP station number
    0001CAST5NON A2                        WOCE cast number
    0001CAST5WON A4                        WOCE cast type code
    0001YEAR7ZTN I5 94                     Year as 4 digits of the station
    0001DATE7ZTN I4 93                     Month and day as MMDD of the station
    0001HHMM7ZTN I5 94                     Hour and minute as HHMM of the station
    0001WOTM2XXN A3                        WOCE time event code
    0001LATD7XXN I4 93   1.0     0.0       Latitude (deg) +/- 90 +ve north
    0001LATM6XXN I7 96   0.01    0.0       LatiTUDe (dec min) +/- 90 +ve north
    0001LOND7XXN I5 94   1.0     0.0       Longitude (deg) +VE east
    0001LONM6XXN I7 96   0.01    0.0       Longitude (dec min) +VE east
    0001MMFX5WON A4                        WOCE Navigation system code
    0001DTOB2XXD I5-94   1.0     0.0       Uncorrected depth (m) to bottom
    0001HTSF7XXD I5-94   1.0     0.0       Height above bottom (m)
    0001MTWH2XXN I5-94   1.0     0.0       Meter wheel reading (m)
    0001PRES5MXD I5-94   1.0     0.0       Maximum pressure (dbs)
    0001NOBO2XXn I3-92   1.0     0.0       Number of bottles
    0001PASA2XXn A20                       WOCE codes for parameters sampled
    0002TEXT7XXN A80                       Plain language comments
    0003SANO2XXN A2                        WOCE sample number
    0003BONO2XXN A3                        WOCE bottle number
    0003PRES7PRD I4-93   0.1     0.0       Raw CTD pressure (dbs) on bottle closure
    0003PRES5CAD I6-95   0.1     0.0       Calibrated CTD pressure (dbs)
    0003TEMP7STD I8-97   0.0001  0.0       Temperatures (deg C) from CTD
    0003PSAL7prD I8-97   0.0001  0.0       Salinities (PSU) from CTD
    0003SGTH2XXD I8-97   0.0001  0.0       Potential temperature (deg C)
    0003PSAL7BSD I8-97   0.0001  0.0       Salinity (PSU) from water sample
    0003DOXY6XXD I6-95   0.1     0.0       Dissolved oxygen (umol/kg)
    0003SLCA6XXD I7-96   0.01    0.0       Silicate (umol/kg)
    0003NTRA6XXD I6-95   0.01    0.0       Nitrate (umol/kg)
    0003NTRI6XXD I5-94   0.01    0.0       Nitrite (umol/kg)
    0003PHOS6XXD I5-94   0.01    0.0       Phosphate (umol/kg)
    0003CF112XXD I6-94   0.001   0.0       CFC-11 (pmol/kg)
    0003CF122XXD I6-94   0.001   0.0       CFC-12 (pmol/kg)
    0003FFFF6XXN I11  1  1.0     0.0       WOCE quality flags
    0003text7xxn A10                       WOCE comments

    02   THE WOCE PROGRAM USES A NUMBER OF CODES AND METHODS WHICH ARE PRESENTLY 
    02   UNDEFINED WITHIN THE GF3 CODING TABLES. THESE ARE DEFINED HERE FOR ALL
    02   PARAMETER CODES WHICH DO NOT USE A VALUE OF 6 OR 7 FOR THE GF3 "KEY FOR USER
    02   DEFINED OPTIONS". IN MANY CASES THE USER SHOULD CONSULT CODE TABLES
    02   PUBLISHED BY WOCE.
    02   CNSH2XXN IS THE STANDARD NODC COUNTRY AND SHIP CODES COMBINED.
    02   IDEN4CRN IS THE WOCE cruise number.
    02   IDEN4SEN IS THE WOCE SECTION IDENTIFIER
    02   IDEN4STN IS THE WHP station number.
    02   CAST5NON IS THE WOCE cast number.
    02   CAST5WON IS THE WOCE cast type code.
    02   WOTM2XXN IS THE WOCE time event code.
    02   MMFX5WON IS THE WOCE Navigation system code.
    02   DTOB2XXD IS THE Uncorrected depth (m) to bottom
    02   MTWH2XXN Is the Meter wheel reading (m) of wire out
    02   PRES5MXD IS THE Maximum pressure (dbs)
    02   NOBO2XXN IS THE Number of bottles AT THE STATION
    02   PASA2XXN ARE THE WOCE codes for parameters sampled
    02   SANO2XXN IS THE WOCE sample number
    02   BONO2XXN IS THE WOCE bottle number
    02   PRES5CAD IS THE Calibrated CTD pressure (dbs)
    02   SGTH2XXD IS THE Potential temperature (deg C)
    02   CF112XXD IS THE VALUE OF CFC-11 (pmol/kg)
    02   CF122XXD IS THE VALUE OF CFC-12 (pmol/kg)
    02   FFFF6XXN IS THE 10 DIGIT WOCE quality flags


    0106MT11/5  S1 119 2 ROS 19900130 2353 BO -61  -3600  -66  -4030  CN 3995   12 4001 4095 24 1-8 AWI CTD 1
    03 3  3  21   114  282414  338555  282387  338659  2148	181   059  001  009  

1894 0910 2222222222 0106MT11/5 S1 119 2 ROS 19900131 0150 EN -61 -3605 -66 -4035 DR 3970 03 2 2 62 521 274732 336992 274611 336991 2100 222 009 001 012 1803 0912 2222222222 0106MT11/5 S1 119 3 LVS 19900201 0244 MR -61 -3620 -66 -4050 CN 3760 175 3750 3805 7 1-6,9-14 03 1 1 113 1039 231871 347148 231658 347652 1869 353 258 012 038 1720 0862 2222222266 0106MT11/5 S1 120 1 ROS 19900201 1255 BE -58 -1955 -68 -1525 DR 3850 03 7 10 21 106 282374 338864 282349 338774 2118 171 049 001 010 -999 -999 2222222299 0106MT11/5 S1 120 1 ROS 19900201 1437 BO -58 -2010 -68 -1530 GPS 3855 15 3901 3920 24 1-8 AWI CTD 1 03 7 10 21 106 282374 338864 282349 338774 2118 171 049 001 010 -999 -999 2222222299