| GEO BON
Group on Earth Observations
Biodiversity Observation Network
(GEO BON)
Principles of the GEO BON Information
Architecture
Version 1.0 – 14 June 2010
1
| GEO BON
Authors
Éamonn Ó Tuama, GBIF, Denmark (co-lead)
Hannu Saarenmaa, University of Helsinki, Finland (co-lead)
Stefano Nativi, IMAA-CNR, Italy
Mark Schildhauer, NCEAS, USA
Nicolas Bertrand, NERC, UK
Edward van den Berghe, IOBIS, USA
Lori Scott, NatureServe, USA
Meredith Lane, NBII, USA
Gladys Cotter, NBII, USA
Dora Canhos, CRIA, Brazil
Roman Khalikov, ZIN-RAS, Russia
Document Version History
Version
Date
V1.0
2010-06-14
Modified by
Comment
2
| GEO BON
Table of Contents
Principles of the GEO BON Information Architecture......................................................................1
1. Introduction................................................................................................................................4
2. Review of the concepts of GEO BON........................................................................................5
3. Main approach of the information architecture..........................................................................6
4. Data types and data content........................................................................................................8
5. Networks and their information resources.................................................................................9
5.1. Existing global networks....................................................................................................9
5.2. National and regional networks........................................................................................12
6. Discovery services and registries.............................................................................................16
6.1. GBIF UDDI, GBRDS, and Metadata Catalogue...............................................................16
6.2. International Long Term Ecological Research (ILTER) Network....................................17
6.3. Knowledge Network for Biocomplexity (KNB)...............................................................17
6.4. NASA GCMD...................................................................................................................17
6.5. NBII...................................................................................................................................18
7. Interoperability and information management services...........................................................18
8. Ontologies, thesauri, dictionaries, semantic mediation...........................................................21
9. Organism names and habitat classifications............................................................................24
10. Workflow of services and integration of applications............................................................25
Climate change & biodiversity applications: A GEOSS Architecture Implementation Pilot. .29
11. Portals, search engines, querying and harvesting...................................................................31
12. Open access issues..................................................................................................................32
13. Activities to implement GEO BON........................................................................................34
14. References..............................................................................................................................34
Annex 1: Data requirements template distributed to thematic work groups....................................36
Annex 2: Acronyms.........................................................................................................................38
3
| GEO BON
1. Introduction
While preparing the detailed implementation plan (GEO BON 2010) for the Group on Earth
Observations Biodiversity Observation Network (GEO BON) in December 2009-March 2010, it
became obvious that all the biodiversity networks that will make up the GEO BON boast such a
multitude of data and employ such a wide variety of access mechanisms that all their pertinent
features which affect data integration and interoperability cannot be sufficiently covered in the
plan. Of necessity, the plan had to focus on activities and deliverables. Yet there is a need to
gather in one place and briefly document the "diversity of biodiversity networks" and their chief
characteristics. This is the purpose of this Companion Document to the implementation plan.
When implementation of GEO BON actually begins, more detailed surveys and design documents
will be produced.
Working Group 8 (WG8) of the GEO BON is concerned with data integration and interoperability.
It is a significant challenge to coordinate, standardize, and manage in situ data that are collected
by disparate institutions and individuals for differing purposes. As envisaged in the Concept
Document (GEO BON 2008), GEO BON, building on existing networks and initiatives, should
develop an implementation plan for an informatics network in support of the efficient and
effective collection, management, sharing, and analysis of data on the status and trends of the
world’s biodiversity, covering variation in composition, structure and function at ecosystem,
species and genetic levels and spanning terrestrial, freshwater, coastal, and open ocean marine
domains. It is probably safe to say that all this will be impossible to achieve in the short term.
However, several activities are being done by various research groups all the time. GEO BON can
leverage this, by connecting them and supporting their work so that integrated and novel products
can be produced more efficiently.
WG8 thus has a mandate that is somewhat different from other GEO BON working groups. It is
not directly aiming at certain products about biodiversity. Instead, it will focus on building
permanent structures and linkages that will support producing those and similar products more
efficiently. Close interactions with the other GEO BON groups is thus necessary in order to
understand existing infrastructures and ascertain requirements. At the Asilomar meeting 22-25
February 2010, preliminary interviews were held with representatives of each of the thematic
working groups. Interview notes are available (http://imsgbif.gbif.org/CMS/DMS_.php?ID=1056).
The template for a more detailed survey of requirements for the thematic areas is presented in
Annex 1.
4
| GEO BON
The GEO BON Concept Document covers data integration and interoperability only in very
general terms. Alone, it is not a sufficient basis for the aspects of the implementation plan that
WG8 is concerned with. More extensive guidance can be found in the documentation from the
GEOSS Architecture and Data Committee (ADC) and in existing interoperability pilot projects
that have been prototyping the GEOSS information system (Figure 1). WG8 has therefore taken
as its goal to introduce these concepts into the design of GEO BON.
Figure 1 – Conceptual operational view diagram of the GEOSS Common Infrastructure (GCI) and
its relationship to observations and end-users in the nine Societal Benefit Areas (SBAs).
2. Review of the concepts of GEO BON
The presentation by Scholes provides a general introduction to GEO BON (Scholes 2009).
According to the GEO BON Concept Document, the network should make use of existing
resources, including data, data systems and catalogues; it should be comprehensive, dealing with
all aspects of biodiversity on a global scale; it should provide a framework that is scientifically
robust, that enables setting of priorities, gap analyses, and facilitates modelling of biodiversity
change in a changing environment. The GEO BON system will largely be built from contributing
systems that have their primary responsibility at regional, national or sub-national scales. GEO
BON will add value by connecting such networks together. At the global level, GEO BON will
5
| GEO BON
build on the experience of GBIF, ILTER, IODE, and others, but fill gaps in data and extend the
coverage to other types of data such as genetic and ecosystem levels. Data sources will encompass
field observations (including those by volunteer networks of citizen observers), specimen and
image collections, and remote sensing imagery. Work will be needed to harmonise observation
standards, to promote use of multidisciplinary interoperability standards, and to define and update
interoperability arrangements –applying the System of Systems approach promoted and
implemented by GEOSS (Nativi, 2010).
GEO BON will help to promote data publication principles in support of full and open availability
of data and information, recognizing relevant international instruments and national policies and
legislation.
One more aspect that is of interest to WG8 is the end-to-end concept of GEO BON. The network
is built to deliver major products based on integrated data and information. The system will, for
instance, enable quantifying and mapping the drivers of biodiversity change, including threats;
recording the impacts of biodiversity change with a focus on vital ecosystem functions and
resulting services; and reporting the current state and changes in biodiversity over time. GEO
BON will enable integrated assessment across scales: from extensive surveys (e.g., remote
sensing of land cover, productivity) to intensive site-based observations (e.g., in situ long term
ecosystem research). Products across all these areas are being identified by various GEO BON
working groups. It will be necessary for WG8 to review them, and adjust its plan accordingly so
that production of these deliverables can be supported by the information infrastructure.
3. Main approach of the information architecture
In keeping with the GEOSS conceptual approach, the informatics infrastructure for GEO BON
should be based on a decentralised and distributed architecture. Service Oriented Architecture
(SOA) is the leading approach to build such networks. It has been solidified through formal
definitions by several organisations and networks, for instance, in the LIFEWATCH Reference
Model (Hernandez-Ernst 2009), and has been followed by GBIF from its inception (cf. Saarenmaa
2005). By adopting an SOA approach, inventory and discovery via a system of metadata
catalogues and registries becomes a core component of the GEO BON network and provides the
foundation for integration with other community clearinghouse systems. The design facilitates
development of complex systems implementing interoperability at the enterprise level: services
establish a high form of abstraction encapsulating both application and process logic.
For interoperability within GEOSS, the GEO BON infrastructure must implement the SOA
international standards and Earth system science multidisciplinary best practices, e.g. the GEOSS
6
| GEO BON
Standards and Interoperability Forum (SIF) interoperability arrangements. In fact, Earth system
information is usually encoded using one or more common (generally agreed upon)
representations (models) such as the ISO 211/OGC Reference Model and the Orchestra
Framework (Percivall, 2010), while SOA standards are built using a combination of industry
specifications.
However, the SOA pattern presumes that any service producer and consumer share both a
distributed computing protocol and a semantic domain which is comprised of a data and metadata
model. In heterogeneous and complex systems (like a system of systems), this is generally not the
case. Thus, the introduction of broker components, implementing mediation services, has proven
to be a good solution to implementing interoperability for a number of issues including discovery
services. Experiments on this type of solution were successfully demonstrated in the context of
GEOSS IP3 (Interoperability Process Pilot Project) (Khalsa 2009) and AIP-2 (Architecture
Implementation Pilot –phase 2) pilots for Climate Change and Biodiversity (GEOSS AIPa;
GEOSS AIPb; IPCC 2007).
GEO BON will need to contribute to the GEOSS Common Infrastructure (GCI). The GCI consists
of a web-based portal, a clearinghouse component for searching data, information and services,
and registries containing information about GEOSS components, standards, best practices, and
requirements (Figure 2).
Figure 2. Interactions of GEOSS Registries, Portal and Clearinghouse 1
Various community portals and clearinghouse components may contribute to GEOSS by
implementing the necessary international standards to contribute to the GCI.
1
Extracted from: GEOSS Core Architecture Implementation Report (http://portal.opengeospatial.org/files/?artifact_id=24315)
7
| GEO BON
The primary resources ‘outside’ of the common infrastructure are the web sites, services, data, and
portals operated by GEO Members and Participating Organizations (Figure 2).
GEO BON will contribute to the GCI by registering these resources and implementing
interoperability solutions mediated via the GCI Web Portal, Clearinghouse Catalogue (using a
Distributed Community Catalogue) and registries. Whether GEO BON will need a portal or
portals (i.e. Community Portal) needs to be investigated, as well as how such components would
relate to existing ones (e.g., the GEO Portal and GBIF Data Portal). GEO BON will build on, not
duplicate, existing systems.
Each of the candidates and building blocks of such infrastructure will be discussed below.
4. Data types and data content
GEO BON will need to support a rich observations’ information model such as that described in
the OGC Observations and Measurements specification2 , and cover such data types as species
occurrences based on points (e.g. point locations), lines (e.g., transects), polygons (e.g., range
distributions); imagery/gridded data (remotely sensed images, coverages); population and time
series data (e.g., density, abundance, age stratification, trends).
Strategies for traversing across data from the different levels of organisation of biodiversity (genes
-> species -> ecosystem) are of interest to GEO BON. Spatial and taxonomic referencing are two
main ways for linking across levels and a key concern for GEO BON, to enable integration, will
be to ensure that genetic sequence data are documented with a georeference and the
environmental parameters of the extraction environment using the appropriate standards.
Because of the diversity of data types across levels of organisation, GEO BON will need to adopt
a broker model architecture (see section 3) in which data interoperability is achieved through
mediation (many data content standards in use; interoperability achieved through mapping of
concepts at consumer end) rather than harmonisation (data providers agree up-front on a common
data exchange schema). This is in the scope of interoperability arrangements which characterize a
system of systems approach.
It is likely that the data providers will be required to serve more complex, integrated data. For
instance, the early focus of GBIF providers was on museum collection specimens and field
observations treated as “primary data” (i.e., what was observed, where and when) (GBIF 2009).
However, many of these data actually were part of much richer datasets but those were not in the
2
http://www.opengeospatial.org/standards/om
8
| GEO BON
interest of GBIF at that time, and were left out. Conversely, data providers to the LTER network,
which primarily deals with rich ecological datasets, also have primary occurrence data that is of
interest to GBIF. However, LTER providers do not have interfaces for serving these data in a way
that GBIF can use. If these networks would agree on common interfaces, and a supporting
framework for each other's data types, much more data would end up being made available.
5. Networks and their information resources
5.1. Existing global networks
GEO BON builds on existing networks such as those summarised in the tables below. Not all
existing networks are based on a similar architecture. Some operate centralised databases while
others follow the SOA model. Some kind of open access principle is common, but standardsbased discovery (via metadata catalogues) and access mechanisms have not been implemented
widely.
In Table 1 we compare major examples of the existing global networks that carry biodiversity
data. For each network, a concise description is provided that includes what aspects of
biodiversity are addressed by the network (e.g., genetic, species, ecosystem, terrestrial, freshwater,
marine), the data types managed, and the standards used.
Table 1. Characteristics of some examples of major global networks that make biodiversity observation
data and information available.
Name
BOLD
Eco-system
Coverage
Taxonomic or
Topical
Coverage
Data or
Information
Types
Covered
Data and
Metadata
Standards
Any
All organisms
DNA barcode
and specimen
records
Darwin Core;
GCMD DIF
Any
Ecology, earth Data, metadata,
science
workflow
(Barcode of Life Database;
project of Consortium for the
Barcode of Life - CBOL)
DataONE
(Data Observation Network for
Earth)
FGDC, EML,
Dublin Core,
Darwin Core,
NETCDF,
GCMD
Architecture
Access and
Protocols
Websitebased
search;
REST
services
Distributed
Various
9
| GEO BON
Name
Eco-system
Coverage
Taxonomic or
Topical
Coverage
Data or
Information
Types
Covered
Discover Life
Any
All organisms
Species data
and
information
EOL
Any
All organisms
Species data,
information,
images
Any
Geospatial data Maps, satellite
imagery,
spatial datasets
(Encyclopedia of Life)
FAO GeoNetwork
FishBase
GBIF
(Global Climate Observing
System)
TDWG
Aggregator
DiGIR,
TAPIR,
web
services
Centralized
Portal/
Search
GeoNetwork
Catalogue
Fish data and
information
Any
All organisms
Organism
occurrence,
Names data,
provider and
dataset
metadata
TDWG, EML
SOA:
portal,
registry,
providers
Open
access:
DiGIR,
TAPIR,
web
services
Any
Any
Datasets,
documents,
tools
FGDC
Registry,
providers
Portal/
Search
Any
All invasive
species
Organism
occurrence
data, species
information
TDWG, EML
GBIF IPT
with GISIN
extension
Open
access:
TAPIR,
web
services
(Global Invasive Species
Information Network)
GCOS
Access and
Protocols
Fish
(Global Change Master
Directory)
GISIN
Architecture
Marine,
aquatic
(Global Biodiversity
Information Facility)
GCMD
Data and
Metadata
Standards
Any
physical,
atmospheric,
chemical and
oceanic,
biological
terrestrial,
properties of hydrologic, and
climate system
cryospheric
components
10
| GEO BON
Name
GOOS
Eco-system
Coverage
Taxonomic or
Topical
Coverage
Data or
Information
Types
Covered
Data and
Metadata
Standards
Architecture
Marine
Physical
oceanographic
data
Temperature,
salinity, etc.
ESRI Shapefile;
ESRI
GeoDatabase
Distributed
Terrestrial
Biodiversity,
climate,
coastal, Forest
and Land
Cover, Glacier,
Hydrology,
Land,
Permafrost,
Water
Biological,
Physical
Terrestrial
All
Ecological
EML
Registry,
providers
Any
CITES species
Species
information;
range maps
EML; no
standard data
model
Internal
database
Websitebased
search
Any
All
Ecological
EML
Distributed
metadata
catalogue
(Metacat)
Metacat
protocol
Marine
All organisms
Organism
occurrence data
TDWG, ISO
19115/19139
(Global Ocean Observing
System)
GTOS
(Global Terrestrial Observing
System)
ILTER Network
Access and
Protocols
GOSIC
Portal
(Global
Observing
Systems
Information
Center)
(International Long-term
Ecological Research Network
of Networks)
IUCN Red List
(International Union for the
Conservation of Nature /
CITES)
KNB
(Knowledge Network for
Biodiversity)
OBIS
Aggregator Portal/ web
services
(Ocean Biogeographic
Information System; project of
Census of Marine Life CoML)
11
| GEO BON
Name
speciesLink
UNEP -WCMC
(United Nations Environment
Programme World
Conservation Monitoring
Centre)
Eco-system
Coverage
Taxonomic or
Topical
Coverage
Data or
Information
Types
Covered
Data and
Metadata
Standards
Architecture
Access and
Protocols
focus on
Brazil
all organisms
Organism
occurrence
data, integrated
with species
information
DarwinCore,
DiGIR, Tapir
distributed
Open
access:
DiGIR,
TAPIR,
web
services
Any
Habitats (e.g.
reefs,
mangroves) and
species of
conservation
and protection
concern
Interactive
maps
(protected
areas),
organism and
habitat-type
atlases)
OGC
Internal
databases;
links to
external
tools
Portal/
search of
internal
databases
5.2. National and regional networks
Because there is a multitude of national and regional networks, listing them all is simply not
feasible. Table 2 lists examples of some of the most advanced or largest networks.
These networks can be incorporated in GEO BON if they adhere to the data exchange standards
and protocols that will be adopted by GEO BON, e.g., by developing specific interoperability
arrangements where required. Some of these networks are quite advanced in their development.
Mapping all of their specialized functions to the global level may not therefore be possible,
although adherence to, or adoption of, common standards for data and metadata would allow
access to the data they share. Harmonisation or mediation, and ensuring compatibility of
approaches will be key to the successful development of GEO BON across regions.
In Europe, for example, LIFEWATCH has adopted an SOA model based on Orchestra and
therefore fits closely with the GEOSS and GEO BON information architectures. Provided
LIFEWATCH becomes a legal entity funded by EU member states, GEO BON will have a clear
mechanism for mobilising key biodiversity data across Europe and across all the GEO BON
themes.
12
| GEO BON
Table 2. Characteristics of some examples of major national or regional networks that make biodiversity
observation data and information available.
Name
AKN
Data or
Information
Types
Data and
Metadata
Standards
Architecture
Birds
Occurrence
observations
TDWG
Aggregator
Australia
All organisms,
molecular to
ecological
Specimen,
observation,
ecological,
sequence
TDWG,
LSID
Sweden
All organisms
Occurrence
observations
TDWG
ASEAN Member
States
Protected areas,
wetlands; all
organisms
Maps (protected
areas, wetlands);
species
information
Catalogue of
Life (names)
Africa
Terrestrial
ecology and
biodiversity
Weather, remote
sensing,
geospatial, soil,
vegetation, species
richness, animal
diversity,
socioeconomic
Not stated
Europe
Terrestrial,
aquatic; all
organisms
Species
information
Centralized
Europe
Terrestrial
Species indicators,
vegetation and
habitat quality
measures
Not available
online
Geographic
Coverage
Topical
Coverage
Western
Hemisphere
Access and
Protocols
Web
services
(Avian Knowledge
Network)
ALA
(Atlas of Living
Australia)
Artdatabanken /
Artportalen
Distributed TAPIR, web
and federated
services
Centralized
TAPIR, web
services
("art" = species in
Swedish)
ASEAN BISS
(ASEAN Biodiversity
Information Sharing
Service)
Biota-Africa
(Biodiversity Monitoring
Transect Analysis in
Africa)
DAISIE
(Delivering Alien
Invasive Species
Inventories for Europe)
EBONE
(European Biodiversity
Observation Network)
Centralized
13
| GEO BON
Name
GAP
Data or
Information
Types
Data and
Metadata
Standards
Architecture
Terrestrial
Land cover
datasets
FGDC
Centralized
Portal/
search;
download
USA
Geospatial
Maps, satellite
imagery, spatial
datasets
FGDC
Western
Hemisphere
Invasive
species,
pollinators,
ecosystems,
protected areas
Specimen data,
species
information,
metadata,
publications
TDWG
Distributed
Portal/
Search, web
services
Costa Rica
Species,
ecosystems,
conservation
Specimen data,
species
information,
ecosystems
TDWG
Centralized
Portal/
Search
USA northeast
Invasive plant
species
Specimen data,
maps
Europe
Any
Any
Western
Hemisphere
Species,
ecosystems,
terrestrial,
aquatic
Species and
ecosystem
information,
distribution and
status, population
viability,
invasiveness
impact ranks,
climate change
vulnerability
Geographic
Coverage
Topical
Coverage
USA
(Gap Analysis Program)
GOS
(Geospatial One Stop)
IABIN
(Inter-American
Biodiversity
Information Network)
INBio
(Instituto Nacional de la
Biodiversidad)
IPANE
Access and
Protocols
(Invasive Plant Atlas of
New England)
LIFEWATCH
NatureServe
NBII
(National Biological
Information
Infrastructure)
USA (to global)
Terrestrial,
Landcover,
aquatic, marine;
occurrence
species data and observation, and
information
ecological data and
information;
documents
Not online
yet
FGDC
Biological
Profile
Federated
(aggregated
and
synthesized
by
NatureServe
from its
members)
Website
search, web
services,
download
FGDC,
TDWG
Distributed
Portal/
Search, web
services
14
| GEO BON
Data or
Information
Types
Data and
Metadata
Standards
Architecture
Terrestrial
Occurrence
observation,
habitat, geospatial
TDWG
Aggregator
Portal/
Search, web
services
Oceania
Marine,
terrestrial
Specimen,
occurrence
observation;
protected areas
TDWG
Distributed
(links)
Portal/
Search
Mexico
Any
Specimen; maps
TDWG,
Z39.50
Centralized
Portal/
Search
South Africa
Threatened and
other species;
area checklists
Species data and
information; maps
Portal/
Search of
internal
databases
Antarctica
Biology,
Ecology,
Geography
Species surveys,
research results;
metadata;
documents
Portal/
Search, web
services
Meso-America,
East Africa
Earth
observations
(satellite);
forecast models
Visualizations,
analytical tools,
images
Portal/
Links
SpeciesLink
Brazil
all organisms
Organism
occurrence data,
integrated with
species
information
ZooInt
Russia
Animals
Specimen data;
classifications
Name
NBN
Geographic
Coverage
Topical
Coverage
UK
(National Biodiversity
Network)
PBIF
(Pacific Biodiversity
Information Forum)
REMIB
Access and
Protocols
(World Network for
Biodiversity)
SIBIS
(South African National
Biodiversity Institute’s
Integrated Biodiversity
Information System)
SCAR-MarBIN
(Scientific Committee on
Antarctic Research
Marine Biodiversity
Information Network)
SERVIR
(Regional Visualization
and Monitoring System)
(Zoological Integrated
Retrieval System)
DarwinCore,
DiGIR, Tapir
distributed
Open
access:
DiGIR,
TAPIR, web
services
Centralized
Website /
internal
databases
15
| GEO BON
6. Discovery services and registries
As GEO BON will be based on a SOA featuring loosely coupled components that can be joined in
arbitrary ways, discovery of available resources via a system of registries and community
metadata catalogues is an essential requirement for the GEO BON network and provides the
foundation for integration with other clearinghouse systems. The GEO ADC has designed the
GEOSS Core Architecture as a system for exchange and dissemination of observations and guided
implementation of its initial operating capability: the GEOSS Common Infrastructure (GCI).
Consisting of a GEO Web Portal, a Clearinghouse, and Registry components (Figure 2), GCI
provides a process to register, discover and use services accessible using the Interoperability
Arrangements recognized by GEOSS –see the SIF activity.
GEO BON, as part of the wider GEOSS can exploit the functionality provided by the GCI. It is
expected that all components, services, standards and special interoperability arrangements
contributing to GEO BON will be discoverable through the GEOSS Clearinghouse and should
therefore be entered in the appropriate registry (Components Registry, Services Registry,
Standards and Special Arrangement Registry - http://geossregistries.info/). The GEOSS portal
provides web-based entry forms for populating the registries. One of the main tasks for GEO
BON is thus to identify the main components contributing to the network, list the services that
they provide and the standards or special interoperability arrangements used by those services.
Metadata catalogues and registries are two particular types of service fundamental for resource
discovery and will normally be maintained by individual communities of practice (e.g. GEOSS
Distributed Community Catalogues).
As with metadata catalogues, communities of practice may also maintain their own specialist
registries. It is not yet clear how much integration is envisaged between such community
registries and the GEOSS Registries, but any GEOSS-equivalent components, services and
standards should certainly be registered. The next sections list some of the main metadata
catalogues and registries that are expected to play a role in GEO BON. It is envisaged that these
catalogues would connect to the GEOSS Clearinghouse by implementing the required interface
based on the OGC CSW (Catalog Service for the Web) specification.
6.1. GBIF UDDI, GBRDS, and Metadata Catalogue
GBIF maintains a UDDI Registry at http://registry.gbif.net/uddi/web for participating nodes on its
network to advertise their data and services. Their web service access points are registered thereby
enabling GBIF to harvest data into a dynamic, regularly refreshed cache which is fronted by a
web portal providing unified search and retrieval across the whole network (http://data.gbif.org).
16
| GEO BON
At the time of writing, over 300 data providers and some 10,0003 resources have been included in
GBIF registry. The GBIF Global Biodiversity Resources Discovery System (GBRDS), now at the
alpha version (http://gbrds.gbif.org/), will significantly enhance the GBIF UDDI service by
creating a single annotated index of publishers, institutions, networks, collections, datasets,
schemas and services. The GBIF metadata infrastructure, planned for 2010, will feature a
centralised, indexed cache of harvested metadata documents derived, through reciprocal sharing
agreements, from both GBIF Participants’ metadata catalogues and other participating networks’
catalogues. It will support multiple metadata models natively including Ecological Metadata
language (EML), ISO 19115/19139, Natural Collections Descriptions (NCD), and FGDC
Biological Profile.
6.2. International Long Term Ecological Research (ILTER) Network
ILTER is developing a distributed system of metadata catalogues and has recommended a
metadata profile for the network based on EML. Some of the ILTER member nodes already
contribute metadata through the KNB, e.g., Taiwan Ecological Research Network (TERN) and
Japanese Long Term Ecological Research Network (JaLTER).
6.3. Knowledge Network for Biocomplexity (KNB)
The Knowledge Network for Biocomplexity (KNB) (http://knb.ecoinformatics.org/) has created a
set of open source software tools for use by the ecological community including Metacat, a
metadata database, and EML. About 20 other groups/networks in several countries and regions of
the world use KNB software and their data holdings are accessible via the KNB catalogue, and
through individual portals.
6.4. NASA GCMD
NASA's Global Change Master Directory (GCMD) (http://gcmd.nasa.gov/) is a web based
catalogue that enables users to locate and obtain access to more than 30,000 descriptions of Earth
science datasets and services covering all aspects of Earth and environmental sciences. The
GCMD goals are closely aligned with those of the International Directory Network (IDN) of the
Committee on Earth Observation Satellites (CEOS) which works to foster the exchange of
information among international agencies. The metadata standard used is Directory Interchange
Format (DIF) although other standards are supported through cross mappings, e.g., ISO 19115,
North American Profile ISO 19115, FGDC, ESRI profile of FGDC. By providing subset views of
the full metadata catalogue, the GCMD enables participating organisations to maintain and
3
http://www.gbif.org/participation/data-publishers/who-is-publishing/
17
| GEO BON
document their data in one place without having to create their own online directory while at the
same time contributing to the GCMD general search pages for scientists in other disciplines to
access and use. Over 100 portals are in use.
6.5. NBII
The main metadata profile in use in the NBII is the NBII Biological Profile of the FGDC Content
Standards for Digital Geospatial Metadata, but EML is also accepted. Participating nodes make
their metadata records available through a weekly harvesting process. NBII also participates in
two other FGDC clearinghouse initiatives, the National Spatial Data Clearinghouse (NSDI) and
the Geospatial One-Stop (GOS) sharing those biological databases that are geospatially
referenced.
The NBII Clearinghouse uses Mercury (http://mercury.ornl.gov/), a web-based system for search
and retrieval of data, developed by the Oak Ridge National Laboratory (ORNL). Mercury harvests
metadata and data from distributed participating servers, builds a centralized index and provides
search interfaces to allow users to perform simple, fielded, spatial and temporal searches across
these metadata sources. Several metadata standards are supported including FGDC, Dublin-Core,
EML and ISO-19115. Mercury is based on a SOA and supports various services such as
Thesaurus Service, Gazetteer Web Service, UDDI Directory Services, RSS, Geo-RSS and
OpenSearch.
7. Interoperability and information management services
Developing interoperability arrangements for sharing varied and complex data types is a
demanding process. Not all can be implemented in the short term and therefore some priorities
will need to be agreed on. What these priorities are will depend on the kind of early products that
GEO BON is aiming for. We identify two categories of data for GEO BON which ideally will be
combined: i) data required for the delivery of the specific products identified by the thematic
working groups, and ii) data for supporting the vision of GEOSS as an informatics network, i.e.,
integration of existing biodiversity observation networks for long-term benefits.
Several standards are available to aid interoperability arrangements in GEO BON. These are listed
in Table 3 and include standards for metadata, data exchange and transfer protocols.
18
| GEO BON
Table 3. Standards for metadata, data exchange and transfer protocols.
Name
Brief Description
ABCD
“ABCD Schema is a common data specification for biological collection units, including living and
preserved specimens, along with field observations that did not produce voucher specimens. It is
intended to support the exchange and integration of detailed primary collection and observation data.”
http://www.tdwg.org/activities/abcd
BioCASe
The BioCASe protocol is based on DiGIR but adapted for use with ABCD encoded data. Its main user
is the Biological Collection Access Service for Europe (BioCASE - http://search.biocase.org/).
http://www.biocase.org/products/protocols
CSDGM
The Content Standard for Digital Geospatial Metadata (CSDGM), (FGDC-STD-001-1998), the US
Federal Metadata standard, "provides a common set of terminology and definitions for the
documentation of digital geospatial data." http://www.fgdc.gov/standards/projects/FGDC-standardsprojects/metadata/base-metadata/index_html
CSDGM
Biological Data Profile of the Content Standard for Digital Geospatial Metadata "broadens the
- Biological Data Profile application of the CSDGM so that it is more easily applied to data that are not explicitly geographic
(laboratory results, field notes, specimen collections, research reports) but can be associated with a
geographic location. The profile changes the conditionality and domains of CSDGM elements, requires
the use of a specified taxonomical vocabulary, and adds elements."
http://www.fgdc.gov/metadata/geospatial-metadata-standards
CSDGM
– Profile for Shoreline
Data
Metadata Profile of CSDGM for Shoreline Data “addresses variability in the definition and mapping of
shorelines by providing a standardized set of terms and data elements required to support metadata for
shoreline and coastal data sets. The profile also includes a glossary and bibliography.”
http://www.fgdc.gov/metadata/geospatial-metadata-standards
Darwin Core
The Darwin Core is a set of standards including a glossary of terms intended to facilitate the sharing of
information about biological diversity. It is based primarily on taxa, their occurrence in nature as
documented by observations, specimens, and samples, and related information.
http://rs.tdwg.org/dwc/index.htm
DiGIR
Distributed Generic Information Retrieval is a protocol that provides unified access to distributed
databases allowing clients to retrieve information from distributed servers. It uses HTTP as transport
mechanism with messages and data encoded in XML (Darwin Core). http://digir.sourceforge.net/
Dublin Core
"The Dublin Core metadata standard is a simple yet effective element set for describing a wide range of
networked resources. The Dublin Core standard includes two levels: Simple and Qualified. Simple
Dublin Core comprises fifteen elements; Qualified Dublin Core includes three additional elements
(Audience, Provenance and RightsHolder), as well as a group of element refinements (also called
qualifiers) that refine the semantics of the elements in ways that may be useful in resource discovery."
(http://dublincore.org/documents/usageguide/). http://dublincore.org/documents/dcmi-terms/
EML
"Ecological Metadata Language (EML) is a metadata specification developed by the ecology discipline
and for the ecology discipline. It is based on prior work done by the Ecological Society of America and
associated efforts... EML is implemented as a series of XML document types that can be used in a
modular and extensible manner to document ecological data. Each EML module is designed to
19
| GEO BON
describe one logical part of the total metadata that should be included with any ecological dataset."
http://knb.ecoinformatics.org/software/eml/
MIENS
MIENS – Minimum Information about an Environmental Sequence, an extension to the minimum
information about a genome/meta-genome sequence (MIGS/MIMS) specification of the Genomics
Standard Consortium is a proposal for documenting the environmental parameters in the extraction
environment associated with a sequence. http://gensc.org/gc_wiki/index.php/MIGS/MIMS/MIENS
Natural Collections
Descriptions
Natural Collections Descriptions (NCD) is a standard for facilitating the exchange of information on all
kinds of collections of natural history material including specimens, original artwork, photographs,
archives, published material. http://www.tdwg.org/activities/ncd/
OAI-PMH
The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) provides a “low-barrier”
mechanism for interoperability across distributed metadata repositories. Data providers expose
metadata and service providers, in turn, consume the metadata through a client application known as a
harvester that issues OAI-PMH service requests over HTTP. http://www.openarchives.org/pmh/
OGC CSW
The Open Geospatial Consortium Catalogue Services for the Web (CSW) specification defines "the
interfaces, bindings, and a framework for defining application profiles required to publish and access
digital catalogues of metadata for geospatial data, services, and related resource information". Note
that, from an interoperability perspective, this is not a single standard as it encompasses several noninteroperable search technologies. http://www.opengeospatial.org/standards/cat
OGC WCS
The Open Geospatial Consortium Web Coverage Service (WCS) “supports electronic retrieval of
geospatial data as "coverages" – that is, digital geospatial information representing space-varying
phenomena.” http://www.opengeospatial.org/standards/wcs
OGC WMS
“The OpenGIS® Web Map Service (WMS) Implementation Specification provides three operations
(GetCapabilities, GetMap, and GetFeatureInfo) in support of the creation and display of registered and
superimposed map-like views of information that come simultaneously from multiple remote and
heterogeneous sources.” http://www.opengeospatial.org/standards/wms
OGC WFS
“The OpenGIS® Web Feature Service (WFS) Implementation Specification allows a client to retrieve
and update geospatial data encoded in Geography Markup Language (GML) from multiple Web
Feature Services. The specification defines interfaces for data access and manipulation operations on
geographic features, using HTTP as the distributed computing platform. ”
http://www.opengeospatial.org/standards/wfs
ISO 19115
“ISO 19115:2003 defines the schema required for describing geographic information and services. It
provides information about the identification, the extent, the quality, the spatial and temporal schema,
spatial reference, and distribution of digital geographic data.”
http://www.iso.org/iso/iso_catalogue/catalogue_tc/ catalogue_detail.htm? csnumber=26020 . Countries
that are members of ISO are required to provide metadata in a profile of ISO 19115. The INSPIRE
initiative in the European Union is recommending use of ISO 19115, and a North American Profile
(NAP) for the USA and Canada is under development. The new ANZLIC metadata standard used in
Australia and New Zealand complies with ISO 19115.
ISO 19139
“ISO/TS 19139:2007 defines Geographic MetaData XML (GMD) encoding, an XML Schema
implementation derived from ISO 19115.”
20
| GEO BON
http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=32557
TAPIR
Designed as a generic tool that can be applied to domains other than biodiversity and natural science
collections data, the TDWG Access Protocol for Information Retrieval (TAPIR) is a specification for
accessing structured data on distributed databases using HTTP for transport and XML for encoding
messages and data. It combines and extends the features of DiGIR and BioCASe protocols.
http://www.tdwg.org/activities/tapir/
Taxon Concept Schema
“The Taxon Concept Schema (TCS) provides a standard for taxon names and taxon concepts in the
exchange and integration of biodiversity and natural history data.” http://www.tdwg.org/activities/tnc/
8. Ontologies, thesauri, dictionaries, semantic mediation
GEO BON is a network of networks. Despite of all the efforts towards standardization of data
and protocols described in the previous section, we must accept the fact that these networks use,
and will continue to use, differently defined data and terminology. Therefore, additional layers of
semantics that allow integration of data need to be adopted by GEO BON. This exercise is well
known also in large corporations and governments where parallel information systems may exist.
A term "Semantic Enterprise Architecture" has been coined to describe this dimension
(http://www.mkbergman.com/859/seven-pillars-of-the-open-semantic-enterprise/). It seems clear
to us that such an architecture must be designed also for GEO BON. Fortunately, technology is
rapidly making this feasible.
The growth of the Internet, along with formal standards for the exchange of metadata through the
Web, have created stunning new opportunities for enhancing collaborative research in biodiversity
through the development of consistent ways of expressing the observations and measurements that
constitute the basic data informing scientific researchers’ models and analyses. Ontologies,
thesauri, and dictionaries are no longer simply lists of terms (words, or concepts) to be read by
humans, to help them understand the meaning and relationships among the values contained in
some dataset. These modern variations of “controlled vocabularies” are now constructed in
standardized syntaxes that allow for computers to rapidly exchange, search, manipulate, and even
“reason” with these constructs, thereby providing some major conveniences to researchers in
terms of powerful capabilities for operating on data. While HTML opened the doors to appealing
exchange of graphical and natural language information via the Web, the new standards of
OWL/RDF (ontologies) and SKOS (thesauri) will enable a “Semantic Web” where information is
not just “rendered” as in a normal Web page, but rather can be conditionally purposed or
transformed due to the additional rich content that is associated with it via its semantic links to
21
| GEO BON
concepts expressed in W3C-sanctioned formats for ontologies and thesauri (Berners-Lee 2001).
Scientists will certainly agree that any understanding of natural phenomena depends upon our
having some shared “model” of the various concepts essential to our disciplines. Our education
and research experience naturally inculcates an ability to converse using specialized terms, with
reasonable assurance that we have some common understanding what we mean (semantically)
when we refer to “species”, and how these are affected by “climate” or “habitat”, etc. Specialized
scientific terms such as these, while varying perhaps in their details and subtleties in interpretation
by discipline or even individual preference, nevertheless have great utility in leading to efficient
communication. However, while there is undoubtedly heuristic value for GEO BON in compiling
a glossary or dictionary of scientific terms using some standard Web exchange syntax, our focus
here will be on the application of these approaches in the service of data integration and
interoperability.
Before demonstrating why we believe that ontologies in particular, but also thesauri and even
simple controlled vocabularies or glossaries, will prove invaluable in facilitating data
interoperability, it is useful to recall how data interoperability challenges will arise in the context
of GEO BON.
GEO BON activities collectively investigate aspects of biodiversity that encompass a number of
different themes, e.g., ranging from impacts on ecosystems services in the ocean, to understanding
patterns in the genetic variability of microbes in soil, while also spanning broad spatial and
temporal scales of focus, from plot level to regional and global levels. This heterogeneity in
research interests inevitably entails heterogeneity in the data resources which inform the
subsequent scientific analyses and models.
It is beneficial for the distinct GEO BON working groups to try to standardize their protocols and
datasets to the greatest extent possible through discussion and agreement, as many of them are
doing, since this will enhance the value of the data to a broader range of researchers by providing
a common understanding of the contents of the data, and the data’s appropriateness for
investigating various research questions. Moreover, this standardization will enable the
construction of generalized database schema that can serve large quantities of data in a consistent
and scalable way. But there will undoubtedly still remain a large number of variations in the way
data are collected due to specific research or logistical motivations, and this will lead to
significant challenges in attaining optimal interoperability of the data for integrated analysis.
Furthermore, when GEO BON groups (and others) attempt to pass beyond the boundaries of the
data collected using their own internally well-conceived design, as inevitably happens when
studying the interactions of biodiversity phenomena within a complex, dynamic biotic and abiotic
22
| GEO BON
environment, then data models conceived externally by other researchers or projects are
essentially black boxes, where the specific details of the observations and measurements collected
by “other researchers” - that is, critical metadata - are often lacking or presented using
inconsistent formats. The quest for additional data to inform any given analysis typically leads to
an arduous, inefficient, and error-prone process of trying to find the data, interpret whether it is
appropriate for one’s need, and then integrate it with one’s existing data to accomplish a richer
analysis.
The goal of Working Group 8 is to address the above issues by providing generalized solutions
that can cut across working group activities in ways that simplify data interoperability with a
minimum of effort. Efforts to develop a unified approach to ontology construction and
deployment should provide immediate advantages with regards to data interoperability and
integration within GEO BON. While the technical details of constructing proper ontologies are
beyond the scope of this paper, it is relatively easy to understand how ontologies and thesauri will
provide benefit, and also to understand how they differ in capability from each other, and even
more so, from simple controlled vocabularies such as glossaries or dictionaries.
In the simplest case, it will be useful if all data resources of interest to GEO BON researchers had
some minimal information describing the resource. In earlier sections, we have described several
well-established frameworks for capturing this information, e.g. in the case of specimen records,
the Darwin Core standard, or for ecological data, the more generic Ecological Metadata Language
(EML). In both cases (and there are many others, e.g., the FGDC’s CSDGM for geospatial
metadata, or the Dublin Core metadata element set) the metadata fields specify what type of
information is desired. For example, information about “spatial location in geo-coordinates”, or
“data set creator”, or more specifically, the content of some column of data in a spreadsheet, e.g.,
“attribute label”, “attribute definition”, and “attribute units”. In this example, the attribute label
might be “len” indicating the label as it might actually be listed in the header line of a spreadsheet,
or attribute definition in a SQL table creation statement; while the attribute definition might be
“standard length of fish”, and the attribute units might be “centimetres”. Note that often what the
scientist or database developer has captured is simply the attribute label, which will often tend to
be abbreviated, and sometimes cryptic, and rarely immediately interpretable without conferring
with the creator of that data structure.
Highly standardized data, such as gene sequences, or biological specimen records, can derive
great utility from agreeing upon a common metadata standard, and even a common standard for
storing the data themselves. This is the case for biological specimen records, where a standard
like Darwin Core is already providing major utility to researchers wanting to confederate records
23
| GEO BON
from highly distributed and internally heterogeneous botanical and zoological collections. But
beyond this “core”, the ways that various associated contextual variables relating to the
surrounding habitat’s physical or ecological structure, such as co-occurring species, micro-climate
characteristics, local hydrological features, etc., quickly complicate matters.
9. Organism names and habitat classifications
There are numerous taxonomic names databases and several notable initiatives to organize the
naming systems in use by custodians world-wide. Obviously, they will be accessed by the various
networks that make up GEO BON. Their services need to be registered as part of the GCI for
seamless access by GEO BON applications.
The Integrated Taxonomic Information System (ITIS) is a partnership of federal agencies and
other organizations from the United States, Canada, and Mexico, with data stewards and experts
from around the world. The ITIS database is an automated reference of scientific and common
names of biota of interest to North America. ITIS data are available through web services,
described at http://www.itis.gov/ws_description.html. The KNB also provides APIs to search the
ITIS database of taxonomic nomenclature, see http://knb.ecoinformatics.org/software/.
Species 2000 http://www.sp2000.org/ is an autonomous federation of taxonomic database
custodians, involving taxonomists throughout the world. It provides access through web services,
web portal download and CD-ROM to names of over 60% of world species. ITIS and Species
2000 work together to create The Catalogue of Life http://www.catalogueoflife.org, the goal of
which is a comprehensive catalogue of all species on earth.
WoRMS, a combination of several species lists of marine groups is managed through the World
Register of Marine Species, see http://www.marinespecies.org/.
The Global Names Index (GNI) s the first component of a semantic environment for biology
called the Global Names Architecture (GNA). GNI itself is a fairly simple list of names, with
reference to who holds the names, and links back to the sources of the names. See
http://www.globalnames.org/data_sources for a list of participating scientific names repositories.
GNI has been developed by GBIF and the Encyclopedia of Life (EOL). GBIF is currently
extending the basic scope of the GNI to create a dynamic index of taxonomic catalogues and
annotated species checklists. It serves as a global name service broker capable of serving multiple
taxonomic resources through a single and consistent access point (see http://names.gbif.org/).
For habitat classifications, there are too many systems in the world to cover here. The U.S.
Geological Survey (USGS) is leading the effort within the ecosystems societal benefit area of
24
| GEO BON
GEOSS to classify and map global ecosystems in a standardized, robust, and practical manner at
scales appropriate for on-the-ground management. The global ecosystems mapping task is
creating a globally agreed, robust, and viable classification scheme for terrestrial, freshwater, and
marine ecosystems and initiating a mapping approach to spatially delineate the classified
ecosystems. See http://rmgsc.cr.usgs.gov/ecosystems/method.shtml for details on the conceptual
approach and mapping methodology.
Observations of biodiversity in the field are made of biological concepts - not names. Hence the
ability to uniquely identify the concept are paramount. Scientific names, unfortunately, are not
alone sufficient for this purpose, but need to be accompanied by additional information about in
what sense the name or classification has been used. Synonym lists that map historical names to
current concepts are also needed. Such semantic information can best be attached to a globally
unique identifier, such as LSID or URI, that is shareable between the various networks.
Taxonomic services are increasingly offering such identifiers, and GEO BON needs to promote
their use. Data integration, in particular, can benefit when datasets can automatically be united
using shared identifiers.
10. Workflow of services and integration of applications
Between the acquisition and integration of data into a useful product, there usually is a long chain
of transformations and analytical steps. WG8 will consider the needs of other working groups in
order to find out what kind of services are required to build intermediate aggregated data
(including workflows, etc), that might be needed to support them, and identify suitable networks
such as GBIF offering primary occurrence data which, depending on fitness for use, can be
transformed into secondary, derived aggregated products. This is closely related to plans for
modelling by WG7. It might be possible to design a chain of various services, offered by different
providers, which could be used to produce GEO BON deliverables.
In keeping with GEO IP3 and AIP-2 experiments (see Nativi 2009), a “system of systems” needs
to manage and serve more than measurements and data: it must support modelling resources,
allowing ad hoc, on demand service chaining. Three architecture patterns were recognized for this
service chaining: a) User defined (transparent) chaining: the human user manages the workflow.
b) Workflow-managed (translucent) chaining: the human user invokes a workflow management
service that controls the chain (the user is aware of the individual services). c) Aggregate service
(opaque): the user invokes a service that carries out the chain, with the user having no awareness
of the individual services (see Figures 3, 4 and 5).
25
| GEO BON
Figure 3. Service Chain: transparent architecture pattern.
26
| GEO BON
Figure 4. Service Chain: translucent architecture pattern.
27
| GEO BON
Figure 5. Service Chain: opaque architecture pattern.
These patterns differ primarily in the visibility of the services to the user, the chaining flexibility,
and the user control of chaining. Presently, most of the chaining environmental services realise the
opaque and translucent patterns. This is not only for technological reasons (e.g.the lack of
effective and simple chain definition protocols and tools). In fact, human users who construct a
new chain or invoke an existing chain of services should determine the semantic validity of the
results of a service chain. The present technologies do not address whether the results of a chain
are semantically valid. Indeed, the advanced semantic services (ontologies, thesauri, dictionaries,
semantic mediation), discussed in the previous chapter, may contribute to address this challenge
by providing the required knowledge.
Moreover, according to ISO 19119, important factors to be considered for the semantic evaluation
of a chain result are uncertainty and error propagation. This information must be included in the
28
| GEO BON
data models (specific metadata section) and must be considered by every service node which
participates in a chain of services.
GEO BON must be able to manage all the three architecture patterns to enable discovery of
biodiversity, ecological, and environmental service nodes on the Web and on-demand adaptive
chaining of those nodes. These advanced services will take full advantage of international open
standards, in particular those developed within the Open Geospatial Consortium (OGC), ISO
TC211 and GEOSS contexts. Uncertainty and error propagation information should be supported
by the architecture patterns. The new GEOSS AIP phases (e.g. the AIP-3 use scenarios on
biodiversity and climate change area) and the GEO “Model Web” task will provide valuable
experiences on these topics.
GEO BON will also need to consider various toolboxes/scientific workflow systems such as those
offered by Kepler (https://kepler-project.org/), BioMoby (http://www.biomoby.org/) and Taverna
(http://www.taverna.org.uk/) amongst others.
Climate change & biodiversity applications: A GEOSS Architecture Implementation
Pilot
A typical biodiversity application scenario requires modelling the impact of climate change on
species distribution (see Santana 2008, Nativi 2007, 2009). To achieve this, heterogeneous data
resources (e.g. biodiversity, ecological, climatological and environmental resources) and
processing services (e.g. implementing Ecological Niche Modelling (ENM) algorithms) are
required to interoperate. An interoperability framework implements the required functionalities,
e.g., modelling error propagation, uncertainty estimation, heterogeneous data sources mediation,
service chaining, etc. In the framework of GEOSS AIP (phase 2), a general service chaining
model for Climate Change and Biodiversity applications was designed and tested (see Figure 6).
The results were successful; the new AIP climate change and biodiversity scenarios (phase 3) are
based on this model.
29
| GEO BON
Figure 6. Service chaining framework for Climate Change and Biodiversity applications.
Figure 6 depicts the logical components chained by the service interoperability framework to
build Climate Change & Biodiversity applications. Their functions are as follows:
• Biodiversity Data Provider: a component which is able to provide biodiversity data.
• Climatological Data Provider: a component which is able to provide climatological data.
• Catalogue: performs queries on the available biodiversity and climatological datasets. It
allows filtering of datasets based on spatial and temporal metadata, data provider,
keywords and so on. It may implement distribution and mediation functionalities through
the same service interfaces.
• Model Provider: runs ENM on the selected biodiversity and climatological datasets.
• Use Scenario Controller: enables the running of a workflow implementing the business
process of the typical biodiversity scenario described above. Generally, it is controlled by
the user through the GUI (e.g. a Web client).
• Graphical User Interface (GUI): The component for user interaction.
30
| GEO BON
11. Portals, search engines, querying and harvesting
The GEO Portal will provide a web-based interface for searching and accessing the data,
information, imagery, services and applications available throughout GEOSS, including a user
interface to databases, services and other portals. An assessment period was launched in 2008 to
evaluate various prototypes for the GEO Portal. The prototypes have been mainly user interfaces
to locate the services catalogued on the GEO Clearinghouse. No attempt to actually pull together
information from the sources and present or analyse it has yet been made.
The search facilities of LTER/ILTER, NBII, KNB, and NASA GCMD work similarly. Datasets
can be located based on metadata keywords. Thousands of datasets are available for download,
but as they have thousands of different data models, data have not been integrated in any of these
networks. Only metadata have been integrated.
The GBIF Data Portal works differently as it has a unified information model onto which data
from various resources are mapped. It harvests and integrates selected information from a growing
network, currently about 300 data providers with 10,000 datasets. However, it has been estimated
that this is still only about 10-20% of existing primary biodiversity data. In particular, networks in
non-GBIF countries such as Brazil, China and Russia have to be accessed through their national
interfaces. On the other hand, the GBIF Data Portal does not yet provide a metadata catalogue for
searching across datasets (but see section 6 ).
Various services providing aggregated information based on these integrated data systems are
emerging, e.g., integration of GBIF occurrence data with IUCN / UNEP—WCMC Protected
Areas (http://www.wdpa.org/) and the Global Register of Migratory Species
(http://groms.gbif.org/ ); predictive maps of distribution (LifeMapper,
http://specify5.specifysoftware.org/Informatics/informaticslifemapper.html; AquaMaps,
http://www.aquamaps.org); predicting distribution of crop wild relatives
(http://www2.gbif.org/PosterCCC31low-res.pdf).
How can GEO BON build on these experiences? There are several possibilities:
•
•
•
Include metadata capabilities in GBIF data providers that are compatible with those of
other networks. Then make GBIF data available for searching in these networks.
Make selected LTER/ILTER data resources also GBIF data providers. Those resources
that have occurrence data would need to be mapped to the GBIF data model (Darwin
Core).
Broker agreements that the GBIF Data Portal also becomes a GEO BON Data Portal by
31
| GEO BON
•
•
•
•
harvesting data also from non-GBIF GEO member countries. These additional data can
then be shown optionally.
After the above exercise that will increase content, perform gap analyses and show in what
geographic areas and organism groups analyses of various kinds (trends etc.) can be made.
Promote registration of value-adding services such as LifeMapper in GEO Clearinghouse.
Promote establishing more such services.
Build chains of these value-adding services using workflow tools for the purpose of
producing the outputs wanted by the other working groups. This will mean that these
services become permanently available and can be reused to produce similar or related
outputs later with less effort. Make these value-chains available on GEO Portal and
biodiversity portals.
Ensure registration of all biodiversity community endorsed standards and, in particular,
those of TDWG, in the GEO Portal standards registry, so that they are widely promoted
and available.
The above considerations are probably still too limited. GEO BON needs to go well beyond the
present GBIF data model to support monitoring data and a richer observational model with
geographic features such as polylines, polygons and associated attributes.
At a minimum GEO BON will need a place to gather the products and their underlying data from
all WGs. In order to demonstrate GEO BON utility, it would be useful to show the underlying
chain (workflow) from data management through analysis to report. Using the main GEO Portal
for this would be the preferred alternative. Specific scenarios of use for each WG will help to
scope this. The GEO BON portal would not be a toolbox for analysis, at least in the beginning,
but discovery of datasets could possibly be supported.
12. Open access issues
The GEOSS 10-Year Implementation Plan explicitly acknowledges the importance of data sharing
in achieving the GEOSS vision and anticipated societal benefits. GEO membership requires
agreeing that 1) There will be full and open exchange of data, metadata, and products shared
within GEOSS, recognizing relevant international instruments and national policies and
legislation, 2) All shared data, metadata, and products will be made available with minimum time
delay and at minimum cost, 3) All shared data, metadata, and products being free of charge or
available at no more than cost of reproduction will be encouraged for research and education.
In 2006, GEO established Task DA-06-01, “Furthering the Practical Application of the Agreed
GEOSS Data Sharing Principles.” A white paper has been written by a team commissioned by
32
| GEO BON
CODATA to review the current practices and issues in data sharing, and alternatives for
implementation have been laid out.
The GEO BON Concept Document states that governments, organizations and institutions that
sign up to GEO BON will need to acknowledge and promote the principles of open access to
scientific and monitoring data, fair use of data for educational and research purposes, and the
development of international Intellectual Property Rights (IPR) laws that protect the investments
of private industry but that are not so restrictive that societal benefit from scientific research on
biodiversity is stifled.
All the major networks that are candidates to form the GEO BON already adhere to similar
principles as those mentioned above. Many of them belong to the Conservation Commons
(www.conservationcommons.net/), a community of practice promoting open access.
However, there is evidence that lack of scientific credit for data sharing activities is still
hampering actual implementation of the open access principles. If GEO BON will be able to put
the focus on the development of workflows and value chains, it will probably have an opportunity
to connect data providers and data users closer to each other, and in this way help data providers
to get the recognition they need. Formal citation of datasets could provide a mechanism to
quantify the use of a dataset. Several working groups are looking into potential mechanisms (e.g.
SCOR/IODE working group on Data Publishing, http://www.iode.org/index.php?
option=com_content&task=view&id=110&Itemid=129). Data papers such as those accepted by
the Ecological Society of America might assist in making data more rapidly available by
providing a formal “publication” mechanism for data. The GenBank model, where sequence data
are made available at the time of publication, could be employed for other biodiversity data
publication.
Another mechanism to quantify the use of publicly available data is to keep detailed statistics on
how often data from specific datasets are downloaded from portals redistributing those data.
These statistics can be used by the original data provider in his/her reports to funding agencies, to
demonstrate the relevance of the work done and data collected.
In an ideal world, all data would be publicly available. In particular, if data are used to underpin
management decisions, it is important that all concerned parties have access to the data on which
the decisions were based. It is clear, however, that strong adherence to openness of data might
restrict the volume and timeliness of available data. Will GEO BON accept any data, or only data
with fully open access? Breaking away from this simple principle would make the system very
33
| GEO BON
complex, and for this reason many data integrators (including GBIF, OBIS and CRIA) only accept
data ‘without strings attached’. The decision on whether or not to allow restricted data into the
system goes beyond the mandate of WG8, but has clear implications for the development of the
data and information infrastructure.
13. Activities to implement GEO BON
The next steps required to start the actual implementation of GEO BON are described in the
detailed implementation plan (GEO BON 2010). The main areas for action centre around:
establishment of a working group and coordinating unit; review of existing provider networks and
establishment of partnerships; review of the data processing needs of the thematic working
groups; design of the information architecture of GEO BON; building the components such as
portal, registry, ontologies; registration of data and services; provision of a helpdesk, and outreach
and capacity building.
14. References
[Berners-Lee 2001] Berners-Lee, T., Hendler, J. & Lassila, O. 2001. The semantic web.
Scientific American, May 2001.
[GBIF 2009] GBIF 2009. Global Strategy and Action Plan for the Digitisation of Natural History
Collections, 5 p. http://www2.gbif.org/GSAP_NHC.pdf
[GEO BON 2008] The GEO Biodiversity Observation Network Concept Document.
http://earthobservations.org/documents/geo_v/20_GEO%20BON%20Concept
%20Document.pdf
[GEO BON 2010] Group on Earth Observations Biodiversity Observation Network (GEO BON)
Detailed Implementation Plan, Version 1.0 – 22 May 2010
http://earthobservations.org/documents/cop/bi_geobon/geobon_detailed_imp_plan.pdf
[GEOSS AIP2a] GEOSS AIP-2 Climate Change and Biodiversity WG, Arctic Food Chain, Use
Scenario - Engineering Report.
[GEOSS AIP2b] GEOSS AIP-2 Climate Change and Biodiversity WG, The Impact of Climate
Change on Pikas Regional Distribution, Use Scenario - Engineering Report.
34
| GEO BON
[Hernandez-Ernst 2009] Hernandez-Ernst, V. & al. 2009. LifeWatch Reference Model, version
0.4, 230 p. Fraunhofer IAIS, Cardiff University. http://www.lifewatch.eu/index.php?
option=com_content&view=article&id=69&Itemid=18
[IPCC 2007] IPCC, 2007: Summary for Policymakers. In: Climate Change 2007: The Physical
Science Basis. Contribution of Working Group I to the Fourth Assessment Report of the
Intergovernmental Panel on Climate Change [Solomon, S., D. Qin, M. Manning, Z. Chen,
M. Marquis, K.B. Averyt, M.Tignor and H.L. Miller (eds.)]. Cambridge University Press,
Cambridge, United Kingdom and New York, NY, USA.
[Khalsa 2009] Khalsa, S.J., Nativi, S., Geller, G., The GEOSS Interoperability Process Pilot
Project (IP3), IEEE Transactions on Geoscience and Remote Sensing special issue on data
archiving and distribution, vol. 47, num. 1. January 2009, pp. 80-91.
[Nativi 2007] Nativi, S., Mazzetti, P., Saarenmaa, H., Kerr, J., Kharouba, H. , Ó Tuama E. &
Singh Khalsa, S. J. 2007. Predicting the Impact of Climate Change on Biodiversity – A
GEOSS Scenario, The Full Picture, pp. 262-264; edited by the Group of earth Observation
(GEO) secretariat, Tudor Rose, Leicester, UK.
[Nativi 2009] Nativi,S., Mazzetti, P., Saarenmaa, H., Kerr, J., Ó Tuama, É., “Biodiversity and
Climate Change Use Scenarios framework for the GEOSS Interoperability Pilot Process”,
Ecological Informatics, Vol. 4 Issue. 1, January 2009, pp. 23-33.
[Nativi, 2010] Nativi,S., 2010,“The implementation of international geospatial standards for Earth
and space sciences”, Int. Journal of Digital Earth, Vol. 3, Supplement 1, 2010, pp. 2:13.
[Percivall 2010] Percivall, G., 2010, “The application of open standards to enhance the
interoperability of geoscience information”, Int. Journal of Digital Earth, Vol. 3,
Supplement 1, 2010, pp. 14:30.
[Santana 2008] Santana, F.S., Siqueira, M.F., Saraiva, A.M., Correa, P.L.P. 2008. A reference
business process for ecological niche modelling. Ecological Informatics 3(1): 75-86
[Saarenmaa 2005] Saarenmaa, H. 2005. Sharing and accessing biodiversity data globally through
GBIF. 25th Annual ESRI International User Conference, San Diego, 25-29 July 2005.
ArcUser Online January-March 2006. Environmental Systems Research Institute,
Redlands, California. http://www.esri.com/news/arcuser/0206/biodiversity1of2.html
[Scholes 2009] Scholes, R.J. 2009. GEO BON - Group on Earth Observations Biodiversity
Observing Network. Presentation 17 slides. IGOS-GEO Symposium, Washington DC, 19
November 2009.
http://www.earthobservations.org/documents/cop/bi_geobon/200911_geobon.ppt
35
| GEO BON
Annex 1: Data requirements template distributed to thematic
work groups
Guidelines for thematic work groups from Work Group 8
To aid WG 8 in understanding data interoperability issues of the various thematic groups, we have
set out some guidelines that may help you in organizing your data requirements.
You are requested to:
•
•
Add your responses after each of the numbered points below and use the appended table to
list resources where appropriate. Return to WG 8 (eotuama@gbif.org) by March 7th so
that we can collate responses for the WG 8 section of the implementation plan.
Appoint a technical liaison to WG8.
Information required:
1. What are the main existing data resources that are being used to support your WG theme?
Please give us names of institutions, projects, people, specific information resources
(URL), and types of data included (e.g. some major resources might be very focused-- e.g.
biodiversity occurrence records, while others might be much broader-- monitoring data
with lots of environmental context along with taxon info). The appended table can be used
to list these resources.
2. Are there any major interoperability issues among these that you are aware of? Spatial or
temporal resolution? Lack of critical variables? Difficulty in obtaining data records due to
clumsy Web interfaces? Access restrictions?
3. What other resources are you aware of that are available that would assist your WG
theme? E.g., are there lots of spreadsheets spread among researchers in your community
that contain invaluable information?
4. What new types of information should be gathered to advance your WG theme? (Identify
Data GAPS)
5. What are the main Data and metadata formats or standards used by researchers in your
WG? Please provide formal names for these if known. Is there adequate metadata or other
descriptive information to assist researchers in using these resources?
36
| GEO BON
6. What are the computational approaches typical for researchers in your WG-- community
models or individual models. Are people used to using community resources-- wellestablished data frameworks, or are these very fragmented-- individual analyses done on
data contained within individual labs or people's PCs.
7. List the kinds of output products that would be generated by your network.
Data required by your network; also list data types that might not yet be available
Data Type
Standard used Organisation
(for data or
metadata)
Web Portal
Service
Access
protocol
1
Species occurrence
Darwin Core
OBIS
www.iobis.org
DiGIR
2
Species monitoring
EML
ILTER
www.....
Metacat
protocol
3
Species occurrence
Darwin Core
GBIF
http://data.gbif.org
Several
REST web
services
Synergies
(name other
WGs that would
also use these
data)
4
5
6
7
(examples highlighted in yellow)
37
| GEO BON
Annex 2: Acronyms
ABCD: Access to Biological Collection Data
ADC: Architecture & Data Committee
AIP: Architecture Implementation Pilot
API: Application Programming Interface
BOLD: Barcode of Life Database
CEOS: Committee on Earth Observation Satellites
CoML: Census of Marine Life
CRIA: Centro de Referência em Informação Ambiental
CSGDM: Content Standard for Digital Geospatial Metadata
DIF: Data Interchange Format
DataOne: Data Observation Network for Earth
DiGIR: Distributed Generic Information Retrieval
EBONE: European Biodiversity Observation Network
EML: Ecological Metadata Language
ENM: Ecological Niche Modelling
EOL: Encyclopedia of Life
ESRI: Environmental Systems Research Institute
EU: European Union
FAO: Food and Agriculture Organisation
FGDC: Federal Geographic Data Committee
38
| GEO BON
GAP: Gap Analysis Program
GBIF: Global Biodiversity Information Facility
GBRDS: Global Biodiversity Resources Discovery System
GCI: GEOSS Common Infrastructure
GCOS: Global Climate Observing System
GEOSS: Global Earth Observation System of Systems
GISIN: Global Invasive Species Information Network
GNA: Global Names Architecture
GNI: Global Names Index
GOOS: Global Ocean Observing System
GOS: Geospatial One Stop
GOSIC: Global Observing Systems Information Center
GTOS: Global Terrestrial Observing System
GUI: Graphical User Interface
HTML: HyperText Markup Language
IABIN: Inter-American Biodiversity Information Network
IDN: International Directory Network
ILTER: International Long Term Ecological Research
INBio: Instituto Nacional de la Biodiversidad
IODE: International Oceanographic Data and Information Exchange
39
| GEO BON
IP3: Interoperability Pilot Process
IPANE: Invasive Plant Atlas of New England
IPT: Integrated Publishing Toolkit
ISO: International Organization for Standardization
ITIS: Integrated Taxonomic Information System
IUCN: International Union for Conservation of Nature
JaLTER: Japanese Long Term Ecological Research Network
KNB: Knowledge Network for Biocomplexity
LSID: Life Science Identifier
LTER: Long Term Ecological Research
MIENS: Minimum Information about an Environmental Sequence
NASA GCMD: NASA Global Change Master Directory
NBII: National Biological Information Infrastructure
NBN: National Biodiversity Network
NCD: Natural Collections Descriptions
NetCDF: Network Common Data Form
OAI-PMH: Open Archives Initiative – Protocol for Metadata Harvesting
OBIS: Ocean Biogeographic Information System
OGC: Open Geospatial Consortium
OGC CSW: Open Geospatial Consortium Catalog Services for the Web
OGC WCS: Open Geospatial Consortium Web Coverage Service
40
| GEO BON
OGC WFS: Open Geospatial Consortium Web Feature Service
OGC WMS: Open Geospatial Consortium Web Map Service
ORNL: Oak Ridge National Laboratory
OWL: Web Ontology Language
PBIF: Pacific Biodiversity Information Forum
RDF: Resource Description Framework
REMIB: Red Mundial de Información sobre Biodiversidad (World Network for Biodiversity)
RSS: Really Simple Syndication
SBA: Societal Benefit Area
SCAR-MarBIN: Scientific Committee on Antarctic Research - Marine Biodiversity Information Network
SIBIS: South African National Biodiversity Institute’s Integrated Biodiversity Information System
SIF: Standards & Interoperability Forum
SKOS: Simple Knowledge Organisation System
SOA: Service Oriented Architecture
SQL: Structured Query Language
TAPIR: TDWG Access Protocol for Information Retrieval
TDWG: Taxonomic Databases Working Group
TERN: Taiwan Ecological Research Network
UDDI: Universal Description Discovery and Integration
UNEP: United Nations Environment Programme
UNEP-WCMC: United Nations Environment Programme—World Conservation Monitoring Centre.
41
| GEO BON
URI: Uniform Resource Identifier
WoRMS: World Register of Marine Species
ZooInt: Zoological Integrated System
42