Tips for DIF Writing

1. Which fields should be completed?

All applicable fields for a given data set should be completed. Some information may not be available or cannot be recovered, and some fields do not necessarily apply to every data set. When the information cannot be obtained or would not make sense, the field should be omitted.

Some fields are required, meaning that the fields must be present in the DIF in order for the DIF to load into DIF databases. These fields are Entry_ID, Entry_Title, Parameters, Data_Center and Summary.

Other fields are deemed critical, meaning these fields are crucially important for data set selection (i.e., searching), user understanding of the data, or data access. Fields critical for searching are Parameters, Temporal_Coverage, Spatial_Coverage and Location. If a user conducts a search by these criteria, and the DIF does not contain the information, the DIF will not be found in the search. Fields critical for user understanding of the data are Entry_Title, Summary and Data_Resolution. Without these fields, users cannot know if a dataset is appropriate for their needs. Lastly, the Data_Center field is critical for users to be able to access the data once they decide the data is appropriate for their needs.

2. How should the data be aggregated?

A key question when writing a DIF is: What should be described in a single entry? In some cases, a group of data, commonly known as a data set, should be described by a single DIF. In other cases, related data may be grouped together even though they may be identifiable in smaller subsets, as in a catalog or inventory. Both situations indicate the DIF describes an aggregation of data. Your determination of the appropriate group of data to describe should be the primary guideline.

Suggestions:

Data set characteristics that might suggest separate directory entries may be:
- a unique sensor/platform/project combination
- a unique parameter, parameter combination, or set of independent variables contained in the data set
- distinct spatial or temporal coverages for a given group of data
- a unique processing level.
Data set characteristics that do not indicate a need for separate entries include:
- identical data sets available on different media
- data held at multiple locations, except when significant differences exist between the data archived in different places (i.e., different processing algorithms)
- data used for the interpretation or organization of a data set (i.e., map overlays, indices to data), which would not be listed separately but which would be mentioned in the summary.

Some rules of thumb:

If the DIF entries each contain virtually identical information in most fields, the DIFs should probably be aggregated by writing a single entry (or a small handful of entries). In this way, the user, when conducting a search, won't be overwhelmed by too many essentially identical entries.
If the data are located at one location (at a single data center, on a single online server, etc.), a single DIF entry may be written to insure access to that data.

3. What constitutes a good title?

A good title is important because the user, when receiving the results of a metadata search, is often presented with a long list of similar titles. Titles should convey the content of the data set(s) and should be descriptive enough so the user, at search time, can distinguish the content from other similar data sets. Generally, a good title may include parameters measured, geographic location, source name, sensor name, project, temporal coverage in years, or data center name.

4. Why do some fields require valid keywords?

Valid keywords (or valids) are important in allowing precise searching of metadata records. The use of valid keywords insures that DIFs are keyed in a consistent manner. Thus, users searching for entries will be able to retrieve all entries related to a particular criteria. In theory, a search on the term, aerosols, will return all entries about aerosol data. Therefore, it is very important for DIF authors to key DIFs with all appropriate valids in all applicable fields. DIF authors must be extremely familiar with the lists of valid keywords and understand the variety of words in the lists.

Furthermore, valid keywords are especially important to international users who may be unfamiliar with some English scientific terms. A browsable list of keywords will prompt the non-English speaker to select the most appropriate words when conducting a search.

The following fields require the use of valid keywords to aid users in searching for information: Parameters, Sensor Name, Source Name, Chronostratigraphic Unit, Location, Project, and Data Center Name. In addition, the following fields require the use of valid keywords to insure proper maintenance and presentation of information: Personnel Role, URL Content Type (in Related URL), and IDN Node.

Where there is a hierarchy of valids, as with Parameters, DIF authors should key DIFs to the most specific level possible. For example, it is far better to key the DIF with Atmosphere>Aerosols>Dust/Ash when applicable rather than Atmosphere>Aerosols. If users conduct searches with Dust/Ash, they will miss the DIF keyed with only Atmosphere>Aerosols.

5. What constitutes a good summary?

A good summary is descriptive. It describes the data set so that users are able to decide whether the data set is appropriate for their needs. For a list of possible data set characteristics to include in the summary, see the Summary Checklist in the Summary section of this document.

Go to the Next Page.
Go to the DIF Guide Table of Contents.