The document discusses technologies and infrastructure for publishing biodiversity data from environmental impact assessments (EIA). It covers the types and formats of EIA biodiversity data, tools for data capture and digitization, platforms for data discovery and publishing, ensuring data quality, and hosting data centers to facilitate long-term archiving and publishing of EIA biodiversity data.
1 of 49
More Related Content
EIA Biodiversity Data Mobilisation
1. GLOBAL BIODIVERSITY INFORMATION FACILITY WWW.GBIF.ORG Publishing EIA Biodiversity Data: Technology and Infrastructure Vishwas Chavan, Nick King and Francois Rogers Global Biodiversity Information Facility [email_address] Scoping Workshop on Developing an EIA Biodiversity Data Publishing Framework in South Africa 2-4 March 2010, Cape Town, South Africa
2. Contents EIA Biodiversity Data: Types and formats Data Capture & Digitisation tools Data Discovery Data Publishing Data Quality & fitness-for-use Data Hosting Centers Community Building Platforms
3. What are the challenges? More data types Richer user interface Better management Richer content Better synchronisation Improved discovery
5. Indices Nomenclators Namebanks Biology Conservation Ecology Distribution Phylogenies ... Geolocation Country Collector Date … Voucher specimen Blood sample DNA Barcode Image Audio Video ... BHL Plazi.org ... EIA Biodiversity data are very diverse Evidence Metadata Taxon names Taxon concepts Observation Literature Species banks
7. Data Capture and Digitisation Tools Florin Pandora Taxis Cassia FieldNote Mandala ATTA BirdRecorder
8. uBio Tools Name recognition tool (FindIT) Author abbreviation resolver Checking classification (TSN name mapper) Deconsrtuct scientific name (ParseIT) Find scientific name (CrawlIT) etc… http://www.ubio.org
9. GBIF Templates Capture data in DwC compatible format Occurrence Data Template Names Data Template Facilitate authoring ’resource metadata’ Occurrence template Documentation for occurrence template
10. GBIF Informatics Architecture Improved access to Names, Metadata and Primary Biodiversity Data Distributed GBIF informatics architecture Faster and easier publishing of data
11. DATA DISCOVERY GBRDS REGISTRY METADATA CATALOGUE GBRDS: Global Biodiversity Resources Discovery System
13. GBRDS, a Discovery System Consumers Data Publishers Searching Retrieving Discovering Discovery System Registering Service Publishers Others…
14. That links to resources… Who? Institutions, Collections … What? Where? When? How Data, Services, GUID/LSID… Location, Access points… Temporal Scope… Formats, protocols, qualities A distributed service ………… .. which resolves to information resources … ./
15. Global Biodiversity Resources Discovery System Institutions/Collections LSIDs/DOI/GUIDs Standards Protocols Resources Services/Applications etc…
16. Global Biodiversity Resources Discovery System Institutions/Collections LSIDs/DOI/GUIDs Standards Protocols Resources Services/Applications etc… GBRDS Registry Release: April 2010
18. User Perspective Data Producer Perspective Document data with minimum effort Assess the value of the data for others Bridge the gap between data owners and users Educate users about the characteristics of the data Craglia: http://www.ec-gis.org/Workshops/6ec-gis/papers/craglia-metadata.doc Two perspectives on metadata Discover if data exists Identify source, provenance Make judgement about data quality and usability before getting it Minimise costs involved in the search, retrieval, integration and use of the data
19. Two levels of metadata Discovery Metadata Full Metadata Discover if a resource exists; get information on - Ownership Location How to get further information Provides a full description of the resource, including - Data quality Data lineage Full access and exploitation
20. Natural Collections Descriptions (NCD) Ecological Metadata Language (EML) ISO 19115/19139 FGDC Biological Data Profile Metadata Standards Dublin Core MRTG Multimedia Metadata Schema IPT 1.1 Metadata Profile
22. Key Components: the IPT IPT The Integrated Publishing Toolkit is a state-of-the-art tool to simplify the mobilisation of biodiversity information resources such as Names, Metadata and primary biodiversity data Data Publisher Registration (GBRDS) + Publishing of Names, Metadata, Primary biodiversity data etc…
23. Simple process! The Integrated Publishing Toolkit (IPT) is designed to simplify the mapping, indexing and harvesting of Names, Metadata and Primary Biodiversity Data!
24. GBIF Integrated Publishing Toolkit (IPT) Open source Java web application Bypasses limitations of traditional wrapper tools in publishing large amounts of data by publishing whole datasets in DwC-Archive dumps (especially useful for small data publishers or those with little or no internet access) Has a richer environment than current wrapper tools, providing some data cleaning, visualisation capabilities, and the ability to publish dataset metadata Documentation and download http:// code.google.com/p/gbif-providertoolkit/ Demo site http://ipt.gbif.org
25. * Darwin Core (Text-Archive) based on standard submitted to TDWG for review Feb 2009 IPT Publishes Through… More to come….
26. IPT Demo Screencast of IPT demo GBIF Help Desk (helpdesk@gbif.org) IPT 1.1 Release: April 2010
28. Scope of the Global Names Architecture Referencing names in Checklists to a common Nomenclatural Index
29. Checklist Bank – A Name Services brokerage Global broker of taxonomic data Index of Taxonomic Catalogues and Annotated Checklists Extends the GBIF network to support publishing Species-level data
30. Publishing Checklists to GBIF Using Integrated Publishing Toolkit Via pre-composed Spreadsheet templates Exporting according to DwC Archive format and registering a local data file (self-serve) GBIF desktop publishing tool Other taxonomic editors (EDIT/ITIS) that support DwC Archive format
31. Desktop Annotated Checklist Builder Create, manage, publish Synonymised checklists Vernacular Names Distribution data Bibliography Type/Specimen data Mac OS/ Windows Publishes “GBIF-ready” format DwC Archive – simple, extensible Text-based format Q3 2010
32. Controlled Vocabularies Server ISO: Countries ISO: Language DwC: Basis of Record DwC: Nomenclatural Status DwC: Sex (Gender) DwC: Taxonomic Status IUCN: Threat Status … v ocabularies.gbif.org Vocabularies publishing platform – Internationalise all GBIF vocabularies
33. Controlled Vocabularies Server Create, manage, publish Extensions to Darwin Core Extend Occurrence Data Extend Species Data v ocabularies.gbif.org Tie to vocabularies that are also drafted and published to this system. Then translate to your native langauge..
35. Fitness-for-use Primary biodiversity data can be used for multiple purposes by various user communities worldwide. Assessing and enhancing fitness-for-use of data is therefore critical for the scientific and social relevance of biodiversity science. Fitness-for-use varies from one use case to another..... Data quality assessment and quality control are important components of ‘fitness-for-use’ regime
36. Loss of Data Quality At the time of collection During digitisation During documentation During storage and archiving During analysis and manipulation During dissemination and presentation Through the use to which they are put
37. Issues influencing data quality Accuracy and precision Completeness Currency and Timeliness Update frequency Consistency Flexibility Transparency Performance measures and targets Data cleaning Outliers setting targets for improvement Truth in labelling Error and bias Uncertainty Auditability Edit Controls Minimise duplication and reworking of data Maintenance of original (or verbatim) data Categorisation can lead to loss of data and quality Documentation Feedback Education and Training Accountability
39. Data Cleaning: definition & framework A process used to determine inaccurate, incomplete, or unreasonable data and then improving the quality through correction of detected errors and omissions General framework for data cleaning Define and determine error types Search and identify error instances Correct the errors Document error instances and error types; and Modify data entry procedures to reduce future errors
40. Tools and Best Practices http://mapstedi.colorado.edu/ http://manisnet.org/GeorefGuide.html
45. Data Hosting Centers Caters to data publishers without skills & resources Facilitate long term archival and publishing GBIF Plans Criteria for establishing DHC Criteria for endorsement of DHC Tools and Best Practices for DHC
-Nick mentioned the key challenges in his presentation entitled “Why the need for a global infrastructure to discover, share, publish and use biodiversity data
-Nick mentioned the key challenges in his presentation entitled “Why the need for a global infrastructure to discover, share, publish and use biodiversity data