GETADE Formatting Guidelines

Introduction

These guidelines were devised by the IOC's Group of Experts on the Technical Aspects of Data Exchange (GETADE).

It contains guidelines for formatting of certain essential information describing oceanographic data on computer media. They may be used when exchanging data with individuals, or agencies. As well, they are useful in suggesting archiving practices. These guidelines do not discuss the structure of a data file either as it resides in a user's or data centre's facility. Rather the intent is to recommend guidelines to ensure that there are no misunderstandings and that data will be usable by the receiver. There is a large number of other parameters and information not mentioned that are extremely important and should be included in any data exchange. The originator is encouraged to supply as much information as is available so that the data set is as fully described as possible.

Some general comments are needed which will help in the interpretation of the guidelines.

Oceanographic data can be of many different types ranging over a variety of disciplines. Each observation must have recorded its location in space, and time. Depending on the instrumentation used and the type of data collected, some of the space or time information is often recorded once for a series of observations. The other space or time information, the independent variable, is associated directly with each observation. For each sample, it is vital that all of the independent variables be directly associated. For example, for temperature profiles sampled using an XBT, the independent variable of depth is often the only one paired with the temperature observations. The other space-time information, the location of the profile and the date and time, is recorded only once in a "station header". For other observations, such as those collected using porpoising instruments, the station header may record the starting location of the tow, but each observation may have a latitude, longitude, date, time and depth associated with it as well so that the exact location of every observation is recorded. Another example is sediment core data where the header may contain the location, date and time of the core being collected. If individual core samples are subsequently taken for analysis, the position in the core of the sample, the length of the sample, and the date on which the sample was taken must be recorded with each. With this diversity of data types and the collection process, the only advice that can be given in general is that all of the necessary space-time information must be given so that every observation can be located to the capabilities of the observation programme.

Further information concerning these guidelines and formatting practices is available from the RNODC Formats (ICES)

Formatting Advice

Country/Platform

The country of registration of the platform (e.g. ship) and identity of the platform. It is advisable to use the IOC country and platform code tables to indicate the country and platform (these are available from the RNODC Formats). This is not necessarily the same as the country of the organisation or project responsible for collecting the data. It is also helpful to supply the ITU call sign of the ship so that any data sent in real-time may be easily identified. Where it is not possible to use a code table, supply the name of the platform used in the data collection in plain text.

Cruise Identifier

The originator's cruise identifier. This may be recorded as part of the data or in a Cruise Summary Report. The cruise identifier is a convenient and useful way to group data collections. Many data centres use this as part of the identification and archiving process.

Data

Data should be reported in standard SI oceanographic units. If not, then the units used should be clearly described in the file header. Data may be reported as real numbers with or without the decimal, or as integers with the number of decimals indicated. Scientific notation maybe used but if so great care must be taken to ensure that there is no ambiguity in interpretation. Reliable precision should be preserved in the data set.

Data may be in the same file as header information or may be stored in separate files. In this latter case, a clear and unique link must be provided between the header and data values. (see comments regarding station number later)

Some commercially available computer applications use a comma (ASCII 44) as a delimiter for separating fields. Some countries use commas to indicate the position of the decimal. If data are exchanged in comma delimited files, care must be taken to ensure there is no confusion with the use of commas as decimal points. Generally, the use of a period to indicate the decimal is preferred since there can be no confusion.

Date

This conveys information about the year, month, and day on which the data were collected. A great many variations are possible and are in use. Generally, numeric schemes are preferable to avoid confusion caused by language differences. A date in the form YYYYMMDD is preferred where YYYY is the four digit year, MM the two digit month and DD the two digit day. Since oceanographic data collections extend back more than 100 years it is extremely important to use a four digit year. If a format other than that described above is used, then it should be fully described in the file header.

Another common practice is to use either day number from the start of the year, or Julian day. Julian day is not the same as day number since the former is the number of days measured from noon, 1 January, 4713 BC. A clear description must be given if either of these forms is used.

Time

Time should be reported as hours, minutes and seconds or decimal equivalents, in UTC (GMT) and if not, the time zone must be indicated. Always use a 24 hour clock with hour values between 0 and 23. Whenever possible, it is advisable to convert local times to UTC.

Instrumentation

Information about what instrument was used to make the observation should be included with the data. If this is not possible some other means of identifying the method by which the observation was made should be included.

Missing Values

The use of missing value indicators is strongly recommended. There are many ways to indicate a value is missing but care must be taken to ensure that they cannot be confused with valid data. Indicators should be clearly described in the file header.

Operator

The operator (or Chief Scientist) can be supplied as plain text and is useful in providing a contact for questions about the data collection. The full name and address of the operator is useful to include with the data collection. This normally may be supplied as plain text.

Position

A indication of precision and/or the equipment used to establish position should be described in the file header (e.g. GPS, DGPS, etc.). Latitude and longitude coordinates must be specified. Geographical descriptions such as place names and bearings from locations should not be used by themselves (although they may also be included). There are a number of ways of reporting latitude and longitude and care must be taken to indicate the precision of the position when known.

No matter how latitude and longitude are specified, it must be clear where is the origin, and how position is measured with respect to the origin. It is common to refer to latitudes north of the equator as positive and those south as negative. It is increasingly common to refer to longitudes with Greenwich as the origin and longitudes east as positive. Some groups report longitudes from 0 to 360 degrees while others use +/- 180 degrees measured from Greenwich. It is advisable to use N,S,E,W designators since there can be no doubt once the origin is specified. Do not use equivalent designators from other languages as this can be confusing.

Data Quality

Originators should give some indication of their own evaluation of the data quality. Originators should indicate whether calibration has been applied to the data.

Originators may use quality flags to indicate the degree of confidence in the associated variables. If quality flags are used then the quality checking procedures should be documented in detail. If these procedures are not documented then the flags are likely to be of no use to secondary users.

There are many schemes used to flag data quality. All are acceptable provided the interpretation of the flag is documented. One scheme employed for data reported in real-time is illustrated here. It uses a one character field with the following interpretation:

0 = data are not checked
1 = data are checked and appear correct
2 = data are checked and appear inconsistent but correct
3 = data are checked and appear doubtful
4 = data are checked and appear to be wrong
5 = data are checked and the value has been altered

In the case that data values are altered as a result of quality control practices, it is strongly advised that the original value be preserved with the data.

Record terminator/Field separator

All data should normally be exchanged as ASCII files. Each record can be terminated by an ASCII linefeed (ASCII 10) or carriage control/linefeed

(ASCII 10, 13 pair).

Field separators may be any printable ASCII character including the space. (ASCII 32). The comma (ASCII 44) is a common field separator used by spreadsheet programs. Ensure that if a space is used as a field separator, then data must be present either as legitimate values or as missing value indicators because many programming languages cannot easily distinguish values when separated with a space.

Sounding

The inclusion of sounding is a useful check of the position of a station. Since this is often stored in the header record a comparison of this value to the deepest observation in a data cycle can sometimes be helpful to ensure the data cycles are properly linked to the headers.

Soundings should always be given in metres and as a positive number. In addition some information should be given concerning whether or not the sounding has been corrected and in what way.

Station Number

The station number is often assigned incrementally for consecutive stations sampled by the originator. Changes in either space or time should cause the station number to be incremented. As well, stations sampled at ocean weather stations, for example, may be labelled specially.

This is an important parameter to include. Very often it is used by a data centre in combination with the cruise number as a unique identifier of a station. If data values are split between files, the unique combination of cruise number and station number is very important in allowing data collections to be properly recombined. Use of unique station identifiers is strongly advised.