Tips for DIF Writing
1. Which fields should be completed?
All applicable fields for a given data set should be completed. Some
information may not be available or cannot be recovered, and some fields
do not necessarily apply to every data set. When the information cannot
be obtained or would not make sense, the field should be omitted.
Some fields are required, meaning that the fields must be present
in the DIF in order for the DIF to load into DIF databases. These fields
are Entry_ID, Entry_Title, Parameters, Data_Center and Summary.
Other fields are deemed critical, meaning these fields are
crucially important for data set selection (i.e., searching), user
understanding of the data, or data access. Fields critical for searching
are Parameters, Temporal_Coverage, Spatial_Coverage and Location. If a
user conducts a search by these criteria, and the DIF does not contain the
information, the DIF will not be found in the search. Fields critical for
user understanding of the data are Entry_Title, Summary and
Data_Resolution. Without these fields, users cannot know if a dataset is
appropriate for their needs. Lastly, the Data_Center field is critical
for users to be able to access the data once they decide the data is
appropriate for their needs.
2. How should the data be aggregated?
A key question when writing a DIF is: What should be described in a single
entry? In some cases, a group of data, commonly known as a data set,
should be described by a single DIF. In other cases, related data may be
grouped together even though they may be identifiable in smaller subsets,
as in a catalog or inventory. Both situations indicate the DIF describes
an aggregation of data. Your determination of the appropriate
group of data to describe should be the primary guideline.
Suggestions:
- Data set characteristics that might suggest separate directory entries
may be:
- a unique sensor/platform/project combination
- a unique parameter, parameter combination, or set of independent
variables contained in the data set
- distinct spatial or temporal coverages for a given group of data
- a unique processing level.
- Data set characteristics that do not indicate a need for separate entries
include:
- identical data sets available on different media
- data held at multiple locations, except when significant differences
exist between the data archived in different places (i.e., different
processing algorithms)
- data used for the interpretation or organization of a data set
(i.e., map overlays, indices to data), which would not be
listed separately but which would be mentioned in the summary.
Some rules of thumb:
- If the DIF entries each contain virtually identical information in
most fields, the DIFs should probably be aggregated by writing a single
entry (or a small handful of entries). In this way, the user, when
conducting a search, won't be overwhelmed by too many essentially
identical entries.
- If the data are located at one location (at a single data center,
on a single online server, etc.), a single DIF entry may be written to
insure access to that data.
3. What constitutes a good title?
A good title is important because the user, when receiving the results
of a metadata search, is often presented with a long list of similar
titles. Titles should convey the content of the data set(s) and should be
descriptive enough so the user, at search time, can distinguish the
content from other similar data sets. Generally, a good title may include
parameters measured, geographic location, source name, sensor name,
project, temporal coverage in years, or data center name.
4. Why do some fields require valid keywords?
Valid keywords (or valids) are important in allowing precise searching of
metadata records. The use of valid keywords insures that DIFs are keyed
in a consistent manner. Thus, users searching for entries will be able
to retrieve all entries related to a particular criteria. In theory, a
search on the term, aerosols, will return all entries about aerosol data.
Therefore, it is very important for DIF authors to key DIFs with all
appropriate valids in all applicable fields. DIF authors must be
extremely familiar with the lists of valid keywords and understand the
variety of words in the lists.
Furthermore, valid keywords are especially important to international
users who may be unfamiliar with some English scientific terms. A
browsable list of keywords will prompt the non-English speaker to select
the most appropriate words when conducting a search.
The following fields require the use of valid keywords to aid users in
searching for information: Parameters, Sensor Name, Source Name,
Chronostratigraphic Unit, Location, Project, and Data Center Name. In
addition, the following fields require the use of valid keywords to insure
proper maintenance and presentation of information: Personnel Role, URL
Content Type (in Related URL), and IDN Node.
Where there is a hierarchy of valids, as with Parameters, DIF authors
should key DIFs to the most specific level possible. For example, it is
far better to key the DIF with Atmosphere>Aerosols>Dust/Ash when
applicable rather than Atmosphere>Aerosols. If users conduct searches
with Dust/Ash, they will miss the DIF keyed with only Atmosphere>Aerosols.
5. What constitutes a good summary?
A good summary is descriptive. It describes the data set so that users
are able to decide whether the data set is appropriate for their needs.
For a list of possible data set characteristics to include in the summary,
see the Summary Checklist in the
Summary section of this document.
Go to the Next Page.
Go to the DIF Guide Table of Contents.