Monterey Bay Aquarium Research Institute
Shore Side Data System
Development: Ontologies and SSDS

Introduction

The Shore Side Data System provides a highly flexible and dynamic system for managing data and metadata. Much like on-line stores, though, if you provide the wrong information with your data set, or use the wrong name when searching for something, the system will not work very well.

This vocabulary problem is being addressed on many fronts. Two key developments which will be critical to data systems such as SSDS are ontologies, and the semantic web (loosely, embodying computer-usable knowledge in on-line information). In the short term, ontologies are more critical to account for in a data management system such as SSDS, and we discuss them here.

What's An Ontology?

Extensive writing about ontologies can be found on-line. Crudely put, an ontology is a set of related definitions or specifications. In the computational realm, these definitions and relationships tend to bear the name "ontology" when they can be used computationally, for example as part of a formal logic. A more formal definition for an ontology is "a specification of a conceptualization" (Tom Gruber). Deborah L. McGuinness wrote in Ontologies Come of Age (The Semantic Web: Why, What, and How, MIT Press, 2001):
People (and computational agents) typically have some notion or conceptualization of the meaning of terms. Software programs sometimes provide a specification of the inputs and outputs of a program, which could be used as a specification of the program. Similarly ontologies can be used to provide a concrete specification of term names and term meanings.

As outlined in the same article, ontologies can have different levels of detail, from simple hierarchies of terms, to glossaries or dictionaries and simple thesauri, all the way to complex logical notations conveying many types of relationships between terms. Obviously the more sophisticated the ontology, the more functions and complexity which can be automatically managed.

These levels of sophistication can be recognized in practical data management situations, such as the Standard Name Exercise undertaken by MBARI and Naval Postgraduate School data managers recently. The goal of the exercise, simply stated, was to come up with standard variable names for data from the Adaptive Oceanographic Sampling Network and similar oceanographic observing systems. But it turns out that the "standard names" are used for many purposes, from simple search terms, to specification of terms with known meanings and relationships to other terms (e.g., units of measurement). To achieve more sophisticated data management goals, more advanced ontologies must be defined and used.

SSDS Ontology Needs and Status

The Shore Side Data System has a fairly sophisticated schema, or organization, for its metadata. The existing schema (which is not by any means done!) names most of the key metadata variables which can describe the data. However, in most cases the SSDS schema does not provide any guidance on the values to use when a user enters those variables. Standardizing those entries will be critical to making the system work correctly for all the potential users.

An example will make this problem clear. (For more detailed discussion, the reader is encouraged to visit a web site documenting some of our metadata standardization efforts.) If the computer "knows" what units a measurement is taken in -- say, that wind speed was measured in knots -- the computer can be programmed to convert between those units and other units (e.g., meters/second) when requested by a user. This allows the user to present data on comparable scales, for example, even if the instrument manufacturers used different scales.

But someone must enter the metadata describing the units for wind speed, and if they enter "kts" for knots, and all the computer knows is "knots" -- or worse, they enter "k" and the computer thinks that means "degrees kelvin" -- then the user can't get their data scaled to the desired units.

Similar problems occur for any data that you might want to process automatically. Variable name is the most common of these, since it is needed for searching, and for understanding the actual meaning of the data. (But see the Standard Naming Exercise web site for even more subtle issues of naming!)

So the specification of the metadata by the user should follow rules, not just about the format for specifying the metadata, but about the terms we use. SSDS has the following ontological needs:

  • existence of well-defined ontologies for SSDS metadata of interest
    • marine data terminology
    • units
    • standard variables
    • sensors, instruments, and platforms (the generic names, not the individual names like "M1")
  • ability to incorporate those ontologies into SSDS capabilities
    • metadata schema
    • database concepts
    • metadata entry tools (critical for user acceptance of system)
    • metadata display algorithms (e.g., for humans, or for entry into FGDC-standard systems)
    • search capabilities (searching SSDS and other data)
    • data presentation software (e.g., for units conversion)
Some of these applications require sophisticated application of advanced semantic web technology, but many simply require pointing an application to a defined ontology (assuming one exists).

SSDS Plans

The typical oceanographic data system developer starting to face this problem has no obvious reference for implementing metadata-oriented systems, or to choose an appropriate level of sophistication. Partly this is due to the subject matter complexity, and partly due to the recent growth of the technology. Of particular note is the lack of mature ontologies geared toward widespread, interoperable computational use.

For SSDS, we hope to take advantage of a mixture of local work (the the Standard Naming Exercise) and ongoing national and international efforts to develop and use ontological materials. The referenced sites below capture our knowledge to date of such efforts.

To address the above concerns, MBARI, CeNCOOS, and SURA collaborated with many other institutions to submit a 1-year proposal to the National Science Foundation to acclerate marine metadata understanding. The resulting Marine Metadata Interoperability project seeks to establish a community resource which will reference all of the initiatives relating to marine metadata, and emphasize interoperability solutions. This project is expected to create an international collaboration to advance the development of marine metadata ontologies, tools, and practices. See the Reference Pages section for more detailed information on this project.

Reference Pages

The following pages at the MMI sitehave many references to ontological and other metadata information.

  • Vocabularies and Ontologies
    A fairly long list of standards, conventions, and ontologies relevant to variable naming.
  • Tools
    Metadata tools, XML editors, ontology development tools, ontology representation languages, ontology libraries, etc.
  • Content standards and conventions
    A fairly long list of standards describing what should go into metadata -- that is, what fields need to be included.
  • Transport Protocols
    A number of transport protocols used to find and move metadata around the internet.
  • Guidelines and Cookbooks
    Guidelines to help people figure out how to do metadata right.
  • Examples
    Examples, including best practices, MMI-sponsored work, and so on.

The MMI site also includes general information of interest to the data management community, such as proposal opportunities, conferences and workshop schedules, and email lists on the subject.

Last updated: Feb. 06, 2009