ROLE OF ONTOLOGIES IN CREATING HYDROLOGIC METADATA
Luis Bermudez1, Michael Piasecki2
ABSTRACT
Recent developments and discussions on nationwide scales increasingly stress the need for semantic
interoperability among communities due to the lack of specific domain descriptions of the data
being processed. These shortcomings are largely based on the fact that each community typically
only focuses on its specific needs with little or no attention paid to making these community-specific
data descriptions part of a much bigger data realm. We state that part of the problem arises because
the formalizations of the available metadata schemas are general, difficult to implement and
inflexible to be extended. In this paper we attempt to show that through the use of the Ontology Web
Language (OWL) and the creation of domain specific ontologies some of these shortcomings can be
overcome. This is demonstrated by creating two examples of knowledge inference to reason from a
domain ontology. In the first one we created an ontology for the US Geological Survey Hydrologic
Units, where logical inference is use to extract a list of desire watershed names that a user could
select to fill a metadata element. In the second example we show the possibilities of extending a
metadata element and restricting it to accept values from another distributed resource, such as the
Global Change Master Directory (GCMD). We conclude that knowledge representation systems like
OWL provide a more flexible platform and possibility to reuse distributed resources, simplifying the
process of creating metadata schemas for the hydrologic community.
1
INTRODUCTION
Metadata is needed to facilitate sharing of data among communities, minimizing duplication,
reducing costs and facilitating efficient analysis and decision making. (Commission on Geosciences
Environment and Resource CGER 1995). To create metadata, a user or a system needs a set of
metadata elements that guides the creation process. These elements are typically arranged in a
metadata model, catalog, or schema and are summarized in a specification document, or standard,
that is published by the creating entity, like the International Standard Organization (19115:2003
Geographic Metadata Standard), Federal Geographic Data Committee, FGDC, or a specific
community that creates its own standard like the Ecological Markup Language, EML.
1
Graduate Student, Drexel University, Department of Civil Architectural & Environmental
Engineering, 3141 Chestnut Street, Philadelphia, PA 19104, USA, Phone: (215) 895-1391; FAX:
+(215) 895-1363, E-mail : leb27@drexel.edu
2
Professor, Drexel University, Department of Civil Architectural & Environmental Engineering,
3141 Chestnut Street, Philadelphia, PA 19104, USA, Phone: (215) 895-1391; FAX: +(215) 8951363, E-mail : m29@drexel.edu
2
For Hydrology, standard descriptions for gage stations, watersheds, well pumping
observations and other hydrologic data are not explicitly available. However, by reusing, restricting
or creating new elements from existing metadata models it should be possible to create a hydrologic
metadata model to fit the needs of the hydrologic community. For hydrology related metadata
models can be deduced from conceptual models that are described using the Unified Modeling
Language (UML) or the Extensible Markup Language, XML, schemas. For example ISO’s
Geographic Metadata (ISO 2003) is published as diagrams using UML, while EML is specified
using an XML schema.
Current formalizations of metadata models are difficult to use because they are very general
and complex (Elmargarmid and Pu 1990; Stocks and Quinn 2002; Helly, Koppers et al. 2003),
which translates into a lack of support by commercial software packages. For example, up to now
there is no software that would allow extending a metadata model, or a web based application to
create instances of ISO metadata.
Ideally, domain experts should be able to edit a metadata model and extend it to fit their own
needs by reusing Web resources. For example creating a new concept, or a property, or restricting a
property to have certain values or cardinalities of a resource available in the web. We explore a
novel way to create metadata conceptual models and extend them, reusing distributed resources in
the WWW through the use of the Ontology Web Language (OWL).
We found that OWL has richer expressions capabilities than object models in UML and XML
schemas to create conceptual models. This is because OWL is a language that supports description
logics while the other two do not. Using OWL it is possible to create logical statements like inverse,
transitive, symmetric and functional relations. For example, defining that a watershed has only one
possible outlet location is not possible in UML or XML schemas. UML and XML schemas will
simply declare that a watershed has a property outlet with a cardinality of one. If a watershed is
described in two different XML documents or UML instances, and the outlet location differs in each
document, the instances will pass the XML schema and UML test. In contrast, in OWL we could
declare that outlet is a functional property and the instances will not pass a semantic validation test.
In OWL it is possible to easily extend conceptual models distributed in the Web, reusing
previously created resources, due the capabilities of the Resource Description Framework, RDF, to
link concepts across the WWW. UML does not accept this, because it breaks the principle of
modularization (Baclawski 2002) and XML schemas are not well suited for this purpose.(Hunter
and Lagoze 2001; Gil and Ratnakar 2002; Hendler 2002). This is the reason why other related XML
technologies have appeared to deal with this issue, like XPath and XLink. Also, representing
restrictions in OWL is much more flexible because it allows multiplicity of restrictions on properties
in a way that it does not affect the membership of objects in a class. (Baclawski 2002).
We present a definition of ontologies and conceptual models and then show with two
examples the role of OWL as a conceptual schema that allows to create more flexible metadata
models for hydrologic communities.
2
ONTOLOGIES
In computer science an ontology is an explicit and formal specification of mental abstractions,
which conforms to a community agreement about a domain and design for a specific purpose.
(Gruber 1993). It is different from the term Ontology (first letter in upper case) used in Philosophy
to describe the existing things in the world. (Fonseca 2001). Different abstractions, specifications
and agreements exist among communities, so different domain ontologies could exist, while only a
single Ontology is possible.
An ontology provides the structure of the controlled vocabulary, similar to a dictionary or a
thesaurus. The vocabulary agreed to by a community is the expression of concepts (mental
3
abstractions) of their domain. Since a concept could be expressed in different ways and differ in
meaning from one person to another, the controlled vocabulary helps to solve semantic
incompatibilities. (Bishr 1998; Harvey, Kuhn et al. 1999; Sheth 1999; Hadzilakos, Halaris et al.
2000). For example, when conceptualizing the observation of the water level in a river US
Geological Survey (USGS) refers to it as stage while the National Oceanic and Atmospheric
Administration (NOAA) refers to it as gage height. Also stage could be a hydrologic parameter but
also a place for performing arts.
A formal specification of a vocabulary could be given in different ways, such as a plain list of
words, a dictionary, a taxonomy, an Entity-Relational (ER) diagram, an Object Model in Unified
Modeling Language (UML) diagram, an XML schema and possible many others. What makes a
controlled vocabulary being an ontology is that in an ontology the concepts are defined explicitly by
creating classes or entities. A class or entity is created using a mental abstraction, which could be a
classification, an aggregation or a generalization (Batini, Ceri et al. 1992). For example, a list of
terms such as: USA, Germany, and Colombia do not represent any explicit conceptual relation until
an explicit class Country is abstracted to classify them. In addition to this requirement an ontology
needs to conform to strict hierarchical subclass relationships between the classes (McGuinness
2003). Also, classes have properties and relations among them as shown in Figure 1.
In the small ontology example presented, the classes BodyOfWater, River and Lake are shown
explicitly as boxes with the name of the class in bold in the first row. Properties are presented in the
second and third rows. The property connectsTo applies to all the classes that are inherited from
BodyOfWater, while length and area apply only to the local classes River and Lake respectively.
Figure 1 is one of the many possible representations of an ontology. A given domain ontology
should be understandable to members of a community and members of other communities, by
describing it in a formal manner. A formal way to express ontologies is the Ontology Web Language
(OWL).
Figure 1. Small ontology example
Ontologies provide the mechanism to create the necessary classes and properties in similar
way as object models. Ontologies in OWL, supports logical statements like inverse, transitive,
symmetric and functional relations, that allows richer semantic declarations for creating control
vocabulary that could be used in metadata schemas.
3
CONCEPTUAL MODELS
The development of the metadata model is similar to the development of an information system. It
starts with the specification requirements, that answers the question why the metadata is going to be
created and how it is going to be used. The requirements are presented as a list of possible elements.
Some elements could then be put together under entities and related to each other to facilitate the
4
understating of the model. The rearrangement of elements, creating classes for entities, presenting
elements as properties of classes and the relating classes to each other is what is called a conceptual
model.
A concept is a mental abstraction of a real world object. Concepts are related to each other via
statements like isA, isPartOf or isMemberOf (Batini, Ceri et al. 1992) and contain some
characteristics called properties. A set of statements is a conceptual model that helps domain experts
to express formally a system or domain.
Conceptual Models are formalized in diagrams, like Entity-Relational (ER) diagrams,
Unified Modeling Language (UML) and in ontologies like OWL. UML is the current standard of the
Object Management Group (OMG) to create models, and is also used by ISO and the OpenGIS
consortium to share their conceptual models. OWL is a recommendation of W3C to specify
ontologies and while it shares many similarities it has some advantages over UML.
Differences and similarities between UML and DAML+OIL and RDF models are discussed
by (Baclawski 2002). Since OWL is very similar to DAML+OIL, most of the analysis done by
Baclawski applies also to OWL. Both OWL and UML allow explicit declarations of classes and
properties, generalization relations, datatypes, restriction of properties, and declaring container for
classes. However, a property in UML and OWL are very different. In UML, properties are binary
relations whilst in OWL, properties could have complex domain and ranges and could be restricted
multiple times in different classes. This allows flexible extensions in OWL, not possible in UML.
4
OWL: RICH SEMANTIC DECLARATIONS
Description logics (DL) allow declaring logical statements that UML is not able to express, such as
inverse, transitive, symmetric and functional relationships. Description logics are used to build
intelligent applications that allow a system to reason, and make deductions based on explicit
representation of knowledge. For the creation of metadata we use these OWL-DL capabilities to
identify resources that can be used to create dynamic user interfaces for creation of metadata
instances. Also, these resources are used to validate the semantics of metadata instances.
Metadata models declare elements and the domain values of these elements. We create a
statement to assert that an element could have a set of finite values. This statement is an assertion
that makes use of the inverse, transitive or symmetric logical expressions.
Suppose that the element MD_Identifier of the ISO 11915 standard is defined by a hydrologic
community to permit only names of watersheds located in a particular region. Using UML, it would
be necessary to create the exact list of the watershed codes or names in an enumeration or codelist.
In contrast, using OWL it is only necessary to declare a statement, that refers to an already existing
hydrologic-units-ontology. Then, a knowledge system, will be able to infer the values and use them
as required.
Figure 2, shows an ontology for the hydrologic unit system used by the US Geological Survey.
The hydrologic unit system is a hierarchical classification of nested large-to-smaller watersheds
within a certain region. Based on this ontology a system that handles knowledge inference uses this
ontology to get values to either populate a predefined list in an input form or to validate the data
semantically. A statement could be something like:
“the property MD_Identifier of EX_Geographic_Description is restricted to allow all
cataloging units that are part of the Subregion Delaware ”.
In this particular case, the cataloging units Schuylkill and Lehigh will appear on the list, while
hydrologic units that are part of the Potomac should not be included. Schuylkill will appear because
Schuylkill is part of the Lower Delaware, and Lower Delaware is declared to be part of Delaware.
Since Is Part Of is declared to be a transitive property, the system will infer all that Schuylkill is also
part of Delaware.
5
It should be pointed out that other concepts that could be used in populating metadata, should
also be declared in an ontology. These include geographic locations (e.g. name of stations),
instruments (e.g. in-situ devices, remote sensors) and properties of observed phenomena (e.g. stage,
precipitation intensity).
Figure 2 USGS Hydrologic Units Ontology
5
FLEXIBLE EXTENSIONS OF METADATA PROPERTIES WITH OWL
Representing restrictions in OWL is much more flexible than UML because it allows
multiplicity of restrictions on properties in way that it does not affect the membership of objects in a
class. (Baclawski 2002). This is done indirectly by stating that the class that is restricting the
property is a subclass of a class called restriction. Nonetheless, we use this OWL feature to restrict
metadata elements with success.
In OWL restrictions can be applied on properties declaring a different cardinality or a different
range. Suppose that the element iso:keyword should be restricted by a hydrologic community to
have all values related to surface water from a web catalog of scientific keywords. For example,
such a catalog could be the Global Change Master Directory (GCMD), which should also be
expressed as an ontology. Figure 3 shows a class named MD_Keywords_EXT, which is a subclass of
iso:MD_Keywords. It extends the property iso:keyword, but it also restricts it to allow
allValuesFrom gcmd:Surface_Water.
It is important to note that the logical reading of this statement is “all individuals that have
values for the property iso:keyword of type gcmd:Surface_Water are of type MD_Keywords_EXT. A
system to collect hydrologic metadata for the above example could make sure that all the individuals
6
meet the requirement of having the element iso:keyword to be either gcmd:Discharge or
gcmd:Stage_Height or any other value of type gcmd:Surface_Water..
Figure 3 Extension of iso:keywords
6
SUMMARY
This paper outlines the role that ontologies can play for the creation of interoperable metadata sets
for a specific realm, in this case the hydrologic community. We stated that knowledge
representations systems, like OWL, permit a larger degree of flexibility for creating metadata
models than conventional conceptual models expressed in UML or XML schemas. Creating a
logical statement was sufficient to restrict metadata elements to a finite set of values from a control
vocabulary. This could be achieved due to the richer semantics declarations possible in ontologies
and the capabilities of knowledge inference of description logics. Also, we showed that in OWL
models it is possible to apply, indirectly, restrictions on properties so that they could conform to
specific needs of hydrologic communities.
For ontology-driven information system to work, a consensus among communities is also
needed. OWL is a promising new technology that is opening up far reaching possibilities for the
Semantic Web World (Berners-Lee, Hendler et al. 2001) which carries the promise of much
improved human- machine interactions. Scientific communities like the hydrologic community are
poised to take advantage of this technology to solve semantic problems and share metadata models
on a broader scope reaching across communities.
ACKNOWLEDGEMENTS
This work has been supported by the National Ocean Partnership Program, NOPP, through NASA
grant number NAG13-0040, and by the National Science Foundation, NSF, Geoscience program.
We would also like to acknowledge the valuable contributions by the members of smileConsult,
GmbH in Hannover, Germany as well as the many discussions we had with individuals from groups
like the openGIS consortium JENA, RDF-logic, and Stanford University (makers of PROTÉGÉ).
7
7
REFERENCES
Baclawski, K., Kokar, M., Kogut, P., Hart, L., Smith, J., Letkowski, Jerzy, Emery, Pat (2002).
"Extending the Unified Modeling Language for ontology development." Software System
Model 1: 1-15.
Batini, C., S. Ceri and S. B. Navathe (1992). Conceptual Database Design. Redwood City,
California, The Benjamin/Cummings publishing Company, Inc.
Berners-Lee, T., J. Hendler and O. Lassila (2001). "The Semantic Web." Scientific American
184(5): 34-43.
Bishr, Y. (1998). "Overcoming the semantic and other barriers to GIS interoperability." Geographic
Information Science 12(4): 299-314.
Commission on Geosciences Environment and Resource CGER (1995). A Data Foundation for The
National Spatial Data Infrastructure. Washington, D.C., National Academy Press.
Elmargarmid, A. and C. Pu (1990). "Guest Editors' Introduction to the Special Issue on
Heterogeneous Databases." ACM Computing Surveys 22: 175-178.
Fonseca, F. T. (2001). ONTOLOGY-DRIVEN GEOGRAPHIC INFORMATION SYSTEMS.
Spatial Information Science and Engineering. Maine, The University of Maine.
GCMD. Global Change Master Directory. Retrieved January 06, 2004, from
http://gcmd.gsfc.nasa.gov/Resources/valids/gcmd_parameters.html.
Gil, Y. and V. Ratnakar (2002). TRELLIS: An Interactive Tool for Capturing Information Analysis
and Decision Making. A. Gómez-Pérez and V. Richard Benjamins(eds.). Knowledge
Engineering and Knowledge Management. Ontologies and the Semantic Web : 13th
International Conference, EKAW 2002, Siguenza, Spain, Springer-Verlag Heidelberg. p. 37
- 42.
Gruber, T. (1993). "A Translation Approach to Portable Ontology Specification." Knowledge
Acquisition 5(2): 199-220.
Hadzilakos, T., G. Halaris, M. Kavouras, M. Kokla, G. Panopoulos, I. Paraschakis, T. Sellis, L.
Tsoulos and M. Zervakis (2000). ""Interoperability and Definition of a National Standard for
Geospatial Data: The Case of the Hellenic Cadastre",." International Journal of Applied
Earth Observations and Geoinformation 2(2): 120-128.
Harvey, F., W. Kuhn, H. Pundt and Y. Bishr (1999). "Semantic interoperability: A central issue for
sharing geographic information." The Annals of Regional Science 33(2): 213-232.
Helly, J., A. A. P. Koppers and H. Staudigel (2003). "Scalable models of data sharing in Earth
sciences." Geochem. Geophys. Geosyst 4(1), 1010, doi:10.1029/2002GC000318.
Hendler, J. (2002). "XML and the Semantic Web." XML Journal October.
Hunter, J. and C. Lagoze (2001). Combining RDF and XML Schemas to Enhance Interoperability
Between Metadata Application Profiles The Tenth International World Wide Web
Conference, Hong Kong, ACM Press, May 1 - 5 2001. 457-466.
ISO (2003). "Geographic Information - Metadata."
McGuinness, D. L. (2003). Ontologies Come Age. In D. Fensel, J. Hendler, H. Lieberman and W.
Wahlster (ed.)^(eds.) Spinning the Semantic Web. London, England, The MIT Press.
Sheth, A. P. (1999). Changing focus on interoperability in information systems: from system,
syntax, structures to semantics. In M. F. Goodchild, M. J. Egenhofer, R. Fegeas and C.
Cottman.(ed.)^(eds.) Interoperating geographic information systems. Boston, Kluwer
Academic Publishers: 5-29.
Stocks, K. and J. Quinn (2002). Data technologies: Geospatial data integration. W. Michener and P.
Tooby(eds.). Scalable Information Networks for the Environment (SINE). Report of an NSFsponsored workshop, San Diego Supercomputer Center, Oct. 29-31 2001. pp 23-29.
8