Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Presentation of  GBIF and sharing of biodiversity data with Web Services December  13, 2005 USDA, Beltsville Dag Terje Filip Endresen – The Nordic Gene Bank, IPGRI
TOPICS Biodiversity data Standards Data exchange Web Services, technology Workflows
Biodiversity collections data Preserved reference collections , such as those in museums and herbaria.   Living collections ,  like botanical and zoological gardens, aquaria,  seed banks , microbial strain cultures and tissue collections. Data collections , from surveys of objects in the field, such as observations. These collections have most of their  attributes in common , although the  terminology  used to describe them may  differ substantially . [http://www.bgbm.org/TDWG/CODATA/ABCD-Evolution.htm]
TDWG -  T axonomic  D atabases  W orking  G roup TDWG Mission: To provide an  international forum  for biological data projects To develop and promote the use of  standards To facilitate  data exchange . The TDWG web site is hosted by The Natural History Museum in London, UK. [http://www.tdwg.org/]
Biodiversity informatics standards
MCPD   M ulti  C rop  P assport  D escriptors MCPD  is developed jointly by IPGRI and FAO as an international standard for germplasm passport data exchange. The MCPD is designed to be compatible with the  IPGRI   crop specific descriptor lists  and the FAO World Information and Early Warning System ( WIEWS ). The MCPD was first released in 1997. [http://www.ipgri.cgiar.org/publications/pdf/124.pdf] The  MCPD  descriptor list is compatible with ABCD. MCPD was in fact developed with some input from TDWG (on plant uses categories, version 1998).
IPGRI Crop Specific Descriptors The IPGRI  crop descriptors  (as well as other networks) expand the MCPD List to meet their specific needs. As long as these additions allow for an easy conversion to the format proposed in the multi-crop passport descriptors, basic passport data can be exchanged worldwide in a consistent manner. The International Union for the Protection of New Varieties of Plants ( UPOV ) maintains crop descriptors for protection of intellectual property right (since 1961). The  COMECON  descriptor lists came even earlier, and was the result of a cooperation of the Eastern European Genebanks in PGR documentation (1949 –1999).
Taxonomic Database Working Group   Standards development and maintenance Darwin Core 2  - Element definitions designed to support the sharing and integration of primary biodiversity data". [http://darwincore.calacademy.org/] Access to Biological Collection Data (ABCD) 2.0  -  An evolving comprehensive standard for the access to and exchange of data about specimens and observations (a.k.a. primary biodiversity data)“ [http://www.bgbm.org/TDWG/CODATA/Schema/] Structure of descriptive data (SDD) 1.0   Compare SDD with PGR evaluation and characterization data. [http://wiki.cs.umb.edu/twiki/bin/view/SDD/CurrentSchemaVersion]
Darwin Core 2 (DwC2) The  Darwin Core 2  is a simple set of data element definitions designed to support the sharing and integration of  primary biodiversity data. The Darwin Core is intended to be  simple  simplicity reduces the barriers for data providers. The Darwin Core is  not  a sufficient model or data structure for managing primary data, such as a collection database. Darwin Core can be compared to the  MCPD  of the PGR community as a minimum common descriptor list. [http://darwincore.calacademy.org]
ABCD   A ccess to  B iological  C ollection  D ata ABCD  is a  common data specification  for data on biological specimens and observations  (including plant genetic resources seed banks). The design goal is to be both  comprehensive  and  general   (ABCD 2 has about 1200 elements). Development of the ABCD started after the  2000  meeting of the TDWG. ABCD was developed with support from  TDWG/CODATA , ENHSIN, BioCASE, and GBIF. GBIF  accepted the ABCD schema in 2002. The  MCPD  descriptor list is now completely mapped and compatible to ABCD. [http://www.bgbm.org/TDWG/CODATA/Schema/]
PGR sub-unit of ABCD PGR
Bioinformatics concepts and Ontology Ontologies are specifications of the concepts in a given field and the relationships among those concepts. Extensible Markup Language/ Resource Description Format (XML/RDF) is one way to describe the elements.
Biodiversity informatics  data exchange tools
DiGIR Di stributed  G eneric  I nformation  R etrieval Distributed  - a protocol for retrieving structured data from multiple, heterogeneous databases across the Internet. Generic  - a protocol independent of the data retrieved and of the software to retrieve it. The DiGIR protocol uses the  Darwin Core  as its data definition. [http://digir.net] [https://sourceforge.net/projects/digir] Major contributors to DiGIR are University of Kansas Natural History Museum, the MaNIS project (University of California, Berkeley) and GBIF.
BioCASE Bio logical  C ollection  A ccess for  E urope BioCASE  establish web-based unified  access to   biological collections in Europe  while leaving control of the information with the collection holders. ABCD  is the main data definition used by BioCASE. The  PyWrapper  protocol is designed to handle any schema and connect to any SQL capable database. BioCASE provide  full access  to its registry for  GBIF . Being a BioCASE provider thus means being a GBIF provider. [http://www.biocase.org/]   BioCASE development is coordinated by the Botanischer    Garten und Botanisches Museum Berlin-Dahlem – BGBM.
Protocol integration - TAPIR There is a need to  integrate  the current protocols in use by different biodiversity informatics community networks.  During the TDWG meeting in Christchurch, NZ in October 2004, the presented unified protocol under development was named  TAPIR . The  T DWG  A ccess  P rotocol for  I nformation  R etrieval. It was agreed to start testing the protocol by rewriting the data provider software of the existing BioCASE and DiGIR implementations. The TAPIR protocol will be supported by the next generation of DiGIR and BioCASE. [http://ww3.bgbm.org/tapir]
BioMOBY BioMOBY i s an international research project on methodologies for biological data representation, distribution, and discovery. MOBY-S  is a web service based interoperability solution. S-MOBY  is a Semantic Web-based interoperability solution. [ http://www.biomoby.org/]
Web service technology
Simplicity and global standards Important factors behind the success of the web is simplicity and ubiquity. A service provider with a web site can reach the global community. 3 simple methods (GET, POST, and PUT) and a simple markup language. Web services is about expanding the Web as a platform not only to information but also to services.
Web Service definition – W3C A Web service is a  software system  identified by a URI, whose public interfaces and bindings are defined and described using XML. Its definition can be  discovered  by other software systems.  These systems may then  interact  with the Web service in a manner prescribed by its definition, using XML based messages conveyed by Internet protocols. W3C,  Web Services Glossary [http://www.w3.org/TR/ws-gloss]
Some web service keywords Application-to-application Platform independent Programming language independent Object model independent
Some Web Service standards XML : All exchanged data is formatted with XML tags. The message is transmitted through a transport protocol such as SOAP or RPC. Data can be transported between applications using common protocols such as HTTP, FTP or SMTP. WSDL : The public interface to the web service is described by  W eb  S ervices  D escription  L anguage (WSDL). This is an XML-based service description on how to communicate with the web service. UDDI : The web service information is published using this protocol. It enables applications to look up web services information in order to determine whether to use them. [ http://en.wikipedia.org/wiki/Web_services ]
Example of a service call All exchanged data is formatted with XML tags.
Example of a service response
Message transport protocols * The message (XML) is transmitted through a service transport protocol such as SOAP or RPC.  * And wrapped in a common internet transport protocol like HTTP, FTP, SMTP ... for transport through the internet.
Regular SOAP message Information intended for the recipient is written in the  body .  Such as Remote Procedure Call information, XML messages, or error messages. The  header  contains additional information on the SOAP message . Such as digital signature information, transaction information, and routing information.   The SOAP envelope consists of a header and a body.
Communication protocol  Although SOAP does not depend on the underlying communication protocol, HTTP is usually used. Because of this, it is possible to communicate with Web services protected by firewalls.
Data warehouse model (Slide by Samy Gaiji, IPGRI)
Decentralized model (Slide by Samy Gaiji, IPGRI)
Network data flow The  Data Provider  is the web service package (wrapper) installed at the data source. The  Data Portal  is a gateway to data published from the data provider nodes. Provider wrapper software Provider etc... DB User Working Database Online Database Portal Working Database Working Database
Combination of services Web services can be combined to create new services. Seed bank Accession Inventory Weather Info Service GIS Species Occurrences Service New service to  plan collecting  missions for  under-collected  species to a  period of good weather.
Biodiversity informatics workflow tools
Workbench Bioinformatics analyses often involve combining the use of databases and analysis programs which are linked in a specific order to form a workflow process. Flow of data from one analytical step to another can be captured in a formal workflow language.
Taverna workflow The Taverna Workbench allows users to construct complex analysis workflows from components located on both remote and local machines, run these workflows on their own data and visualize the results. BioMOBY objects can be connected in a workflow. [http://taverna.sourceforge.net/]
Science Environment for Ecological Knowledge The Science Environment for Ecological Knowledge (SEEK) is a system designed to facilitate not only data acquisition and archiving, but integrating, transforming, analyzing, and synthesizing ecological and biodiversity data. [http://seek.ecoinformatics.org/] [http://kepler-project.org/] SEEK
Kepler workflow example - GARP
Thank you for listening!

More Related Content

GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

  • 1. Presentation of GBIF and sharing of biodiversity data with Web Services December 13, 2005 USDA, Beltsville Dag Terje Filip Endresen – The Nordic Gene Bank, IPGRI
  • 2. TOPICS Biodiversity data Standards Data exchange Web Services, technology Workflows
  • 3. Biodiversity collections data Preserved reference collections , such as those in museums and herbaria. Living collections , like botanical and zoological gardens, aquaria, seed banks , microbial strain cultures and tissue collections. Data collections , from surveys of objects in the field, such as observations. These collections have most of their attributes in common , although the terminology used to describe them may differ substantially . [http://www.bgbm.org/TDWG/CODATA/ABCD-Evolution.htm]
  • 4. TDWG - T axonomic D atabases W orking G roup TDWG Mission: To provide an international forum for biological data projects To develop and promote the use of standards To facilitate data exchange . The TDWG web site is hosted by The Natural History Museum in London, UK. [http://www.tdwg.org/]
  • 6. MCPD M ulti C rop P assport D escriptors MCPD is developed jointly by IPGRI and FAO as an international standard for germplasm passport data exchange. The MCPD is designed to be compatible with the IPGRI crop specific descriptor lists and the FAO World Information and Early Warning System ( WIEWS ). The MCPD was first released in 1997. [http://www.ipgri.cgiar.org/publications/pdf/124.pdf] The MCPD descriptor list is compatible with ABCD. MCPD was in fact developed with some input from TDWG (on plant uses categories, version 1998).
  • 7. IPGRI Crop Specific Descriptors The IPGRI crop descriptors (as well as other networks) expand the MCPD List to meet their specific needs. As long as these additions allow for an easy conversion to the format proposed in the multi-crop passport descriptors, basic passport data can be exchanged worldwide in a consistent manner. The International Union for the Protection of New Varieties of Plants ( UPOV ) maintains crop descriptors for protection of intellectual property right (since 1961). The COMECON descriptor lists came even earlier, and was the result of a cooperation of the Eastern European Genebanks in PGR documentation (1949 –1999).
  • 8. Taxonomic Database Working Group Standards development and maintenance Darwin Core 2 - Element definitions designed to support the sharing and integration of primary biodiversity data". [http://darwincore.calacademy.org/] Access to Biological Collection Data (ABCD) 2.0 - An evolving comprehensive standard for the access to and exchange of data about specimens and observations (a.k.a. primary biodiversity data)“ [http://www.bgbm.org/TDWG/CODATA/Schema/] Structure of descriptive data (SDD) 1.0 Compare SDD with PGR evaluation and characterization data. [http://wiki.cs.umb.edu/twiki/bin/view/SDD/CurrentSchemaVersion]
  • 9. Darwin Core 2 (DwC2) The Darwin Core 2 is a simple set of data element definitions designed to support the sharing and integration of primary biodiversity data. The Darwin Core is intended to be simple simplicity reduces the barriers for data providers. The Darwin Core is not a sufficient model or data structure for managing primary data, such as a collection database. Darwin Core can be compared to the MCPD of the PGR community as a minimum common descriptor list. [http://darwincore.calacademy.org]
  • 10. ABCD A ccess to B iological C ollection D ata ABCD is a common data specification for data on biological specimens and observations (including plant genetic resources seed banks). The design goal is to be both comprehensive and general (ABCD 2 has about 1200 elements). Development of the ABCD started after the 2000 meeting of the TDWG. ABCD was developed with support from TDWG/CODATA , ENHSIN, BioCASE, and GBIF. GBIF accepted the ABCD schema in 2002. The MCPD descriptor list is now completely mapped and compatible to ABCD. [http://www.bgbm.org/TDWG/CODATA/Schema/]
  • 11. PGR sub-unit of ABCD PGR
  • 12. Bioinformatics concepts and Ontology Ontologies are specifications of the concepts in a given field and the relationships among those concepts. Extensible Markup Language/ Resource Description Format (XML/RDF) is one way to describe the elements.
  • 13. Biodiversity informatics data exchange tools
  • 14. DiGIR Di stributed G eneric I nformation R etrieval Distributed - a protocol for retrieving structured data from multiple, heterogeneous databases across the Internet. Generic - a protocol independent of the data retrieved and of the software to retrieve it. The DiGIR protocol uses the Darwin Core as its data definition. [http://digir.net] [https://sourceforge.net/projects/digir] Major contributors to DiGIR are University of Kansas Natural History Museum, the MaNIS project (University of California, Berkeley) and GBIF.
  • 15. BioCASE Bio logical C ollection A ccess for E urope BioCASE establish web-based unified access to biological collections in Europe while leaving control of the information with the collection holders. ABCD is the main data definition used by BioCASE. The PyWrapper protocol is designed to handle any schema and connect to any SQL capable database. BioCASE provide full access to its registry for GBIF . Being a BioCASE provider thus means being a GBIF provider. [http://www.biocase.org/] BioCASE development is coordinated by the Botanischer Garten und Botanisches Museum Berlin-Dahlem – BGBM.
  • 16. Protocol integration - TAPIR There is a need to integrate the current protocols in use by different biodiversity informatics community networks. During the TDWG meeting in Christchurch, NZ in October 2004, the presented unified protocol under development was named TAPIR . The T DWG A ccess P rotocol for I nformation R etrieval. It was agreed to start testing the protocol by rewriting the data provider software of the existing BioCASE and DiGIR implementations. The TAPIR protocol will be supported by the next generation of DiGIR and BioCASE. [http://ww3.bgbm.org/tapir]
  • 17. BioMOBY BioMOBY i s an international research project on methodologies for biological data representation, distribution, and discovery. MOBY-S is a web service based interoperability solution. S-MOBY is a Semantic Web-based interoperability solution. [ http://www.biomoby.org/]
  • 19. Simplicity and global standards Important factors behind the success of the web is simplicity and ubiquity. A service provider with a web site can reach the global community. 3 simple methods (GET, POST, and PUT) and a simple markup language. Web services is about expanding the Web as a platform not only to information but also to services.
  • 20. Web Service definition – W3C A Web service is a software system identified by a URI, whose public interfaces and bindings are defined and described using XML. Its definition can be discovered by other software systems. These systems may then interact with the Web service in a manner prescribed by its definition, using XML based messages conveyed by Internet protocols. W3C, Web Services Glossary [http://www.w3.org/TR/ws-gloss]
  • 21. Some web service keywords Application-to-application Platform independent Programming language independent Object model independent
  • 22. Some Web Service standards XML : All exchanged data is formatted with XML tags. The message is transmitted through a transport protocol such as SOAP or RPC. Data can be transported between applications using common protocols such as HTTP, FTP or SMTP. WSDL : The public interface to the web service is described by W eb S ervices D escription L anguage (WSDL). This is an XML-based service description on how to communicate with the web service. UDDI : The web service information is published using this protocol. It enables applications to look up web services information in order to determine whether to use them. [ http://en.wikipedia.org/wiki/Web_services ]
  • 23. Example of a service call All exchanged data is formatted with XML tags.
  • 24. Example of a service response
  • 25. Message transport protocols * The message (XML) is transmitted through a service transport protocol such as SOAP or RPC. * And wrapped in a common internet transport protocol like HTTP, FTP, SMTP ... for transport through the internet.
  • 26. Regular SOAP message Information intended for the recipient is written in the body . Such as Remote Procedure Call information, XML messages, or error messages. The header contains additional information on the SOAP message . Such as digital signature information, transaction information, and routing information. The SOAP envelope consists of a header and a body.
  • 27. Communication protocol Although SOAP does not depend on the underlying communication protocol, HTTP is usually used. Because of this, it is possible to communicate with Web services protected by firewalls.
  • 28. Data warehouse model (Slide by Samy Gaiji, IPGRI)
  • 29. Decentralized model (Slide by Samy Gaiji, IPGRI)
  • 30. Network data flow The Data Provider is the web service package (wrapper) installed at the data source. The Data Portal is a gateway to data published from the data provider nodes. Provider wrapper software Provider etc... DB User Working Database Online Database Portal Working Database Working Database
  • 31. Combination of services Web services can be combined to create new services. Seed bank Accession Inventory Weather Info Service GIS Species Occurrences Service New service to plan collecting missions for under-collected species to a period of good weather.
  • 33. Workbench Bioinformatics analyses often involve combining the use of databases and analysis programs which are linked in a specific order to form a workflow process. Flow of data from one analytical step to another can be captured in a formal workflow language.
  • 34. Taverna workflow The Taverna Workbench allows users to construct complex analysis workflows from components located on both remote and local machines, run these workflows on their own data and visualize the results. BioMOBY objects can be connected in a workflow. [http://taverna.sourceforge.net/]
  • 35. Science Environment for Ecological Knowledge The Science Environment for Ecological Knowledge (SEEK) is a system designed to facilitate not only data acquisition and archiving, but integrating, transforming, analyzing, and synthesizing ecological and biodiversity data. [http://seek.ecoinformatics.org/] [http://kepler-project.org/] SEEK
  • 37. Thank you for listening!

Editor's Notes

  1. * Spiders-web, photographer: Ian-Britton [http://www.freefoto.com/preview.jsp?id=01-17-1&k=Spiders+Web]
  2. Photo: IRRI genebank. Los Banos, Philippines [http://www.cgiar.org/images/irrigenebank1.jpg], VIR seed collection. St. Petersburg. Photographer Eva Thörn (NGB Picture Archive, image 001319).
  3. * IRRI genebank. Los Banos, Philippines [ http://www.cgiar.org/images/irrigenebank2.jpg] * Text formulation source [http://www.bgbm.org/TDWG/CODATA/ABCD-Evolution.htm] wording above is modified.
  4. * Multi-crop Passport Descriptors (MCPD) [http://www.ipgri.cgiar.org/publications/pdf/124.pdf] F AO (Food and Agricultural Organization of the United Nations) - IPGRI (International Plant Genetic Resources Institute). This is a revised version (December 2001) of the 1997 MCPD List. * FAO World Information and Early WarningSystem ( WIEWS) [http://apps3.fao.org/wiews/] * 19 Plant Uses Categories based on categories developed for the Working Group on Taxonomic Databases (TDWG) (Cook, Frances E.M., 1995. Economic Botany: Data Collection Standard. Royal Botanic Gardens Kew). [ http://www.ecpgr.cgiar.org/epgris/Training/MCPD-1998.doc] * The mapping of MCPD to ABCD was started in 2004 by Helmut Knüpffer and Walter Berendsohn, and finalized by Javier de la Torre and Dag Terje Filip Endresen in 2005. [ http://ww3.bgbm.org/MCDPH]
  5. * IPGRI Descriptors lists [http://www.ipgri.cgiar.org/system/page.asp?frame=programmes/inibap/home.htm] (119 descriptor lists, 2005) * MCPD [http://www.ipgri.cgiar.org/publications/pdf/333.pdf] * UPOV - International Union for the Protection of New Varieties of Plants (UPOV) [ http://www.upov.int/] * UPOV - The International Union for the Protection of New Varieties of Plants or UPOV (French: Union internationale pour la protection des obtentions végétales) is an intergovernmental organization with headquarters in Geneva, Switzerland. [http://en.wikipedia.org/wiki/UPOV] * COMECON - The Council for Mutual Economic Assistance (COMECON / Comecon / CMEA / CEMA), 1949 – 1991, was an economic organisation of communist states and a kind of Eastern European equivalent to the European Economic Community. The military counterpart to the Comecon was the Warsaw Pact. [http://en.wikipedia.org/wiki/Comecon]
  6. * Illustration: Corn earworm pupae that will be used to produce control parasites for release in the field. Photo by Scott Bauer. [http://www.ars.usda.gov/is/graphics/photos/k5554-2.htm] * UBIF is an attempt to define a common foundation for several TDWG/GBIF standards like SDD (see SDD WIKI), ABCD (see ABCD content schema homepage) or TaxonConceptNames (see Taxonomic Concept Transfer Schema WIKI). * Unified Biosciences Information Frameword (UBIF) XML schema for data exchange and integration across knowledge domains. The schema has been design for biological data, but is applicable to other knowledge areas as well. It is based on work of the TDWG SDD and ABCD subgroups and currently jointly authored by the SDD, ABCD, TaxonName subgroups and by GBIF (Global Biodiversity Information Facility). The framework may be used without changes for new schemata, no registration is necessary. * Complex Types are part of the UBIF infrastructure (TDWG common complex type for several schemas, ABCD, SDD, TCS, Lnnean Core, etc.)
  7. * The mapping of MCPD to ABCD was started in 2004 by Helmut Knüpffer and Walter Berendsohn, and finalized by Javier de la Torre and Dag Terje Filip Endresen in 2005. [ http://ww3.bgbm.org/MCDPH]
  8. * IPGRI Descriptors lists [http://www.ipgri.cgiar.org/system/page.asp?frame=programmes/inibap/home.htm] (119 descriptor lists, 2005) * MCPD [http://www.ipgri.cgiar.org/publications/pdf/333.pdf] * UPOV - International Union for the Protection of New Varieties of Plants (UPOV) [ http://www.upov.int/] * UPOV - The International Union for the Protection of New Varieties of Plants or UPOV (French: Union internationale pour la protection des obtentions végétales) is an intergovernmental organization with headquarters in Geneva, Switzerland. [http://en.wikipedia.org/wiki/UPOV] * COMECON - The Council for Mutual Economic Assistance (COMECON / Comecon / CMEA / CEMA), 1949 – 1991, was an economic organisation of communist states and a kind of Eastern European equivalent to the European Economic Community. The military counterpart to the Comecon was the Warsaw Pact. [http://en.wikipedia.org/wiki/Comecon]
  9. Quality counts: Chemist Gary List checks soybeans. Photo by Keith Weller. [ http://www.ars.usda.gov/is/graphics/photos/k5256-2.htm]
  10. Photo: PICT0173.jpg Sub-section from Whale Safari to Kaikoura New Zealand. Photo Dag Terje Filip Endresen [http://r142b.ngb.se/ngb/2004-10-New-Zealand-Australia/index.php?offset=79&size=medium&stp=1]
  11. The text formulation above is edited from various sources – on search hits with google.
  12. * W3C Web Services Glossary. W3C Working Group Note 11 February 2004 [http://www.w3.org/TR/ws-gloss/] * Spiders-web, photographer: Ian-Britton [http://www.freefoto.com/preview.jsp?id=01-17-1&k=Spiders+Web] * Copyright statement “feel free to use any of the images on the site if you are a private individual and your use is not commercial” [http://www.freefoto.com/browse.jsp?id=99-5-0]
  13. Perhaps this slide is too much ... ? * WS-Security : The Web Services Security protocol has been accepted as an OASIS standard. The standard allows authentication of actors and confidentiality of the messages sent. (taken out to simplify the slide...)
  14. Slide by Samy Gaiji, from presentation on: “ Information Networking - Challenges for the Plant Genetic Resources Communities, 2004.
  15. Slide by Samy Gaiji, from presentation on: “ Information Networking - Challenges for the Plant Genetic Resources Communities, 2004.
  16. Some examples to use...? * http://www.xmethods.com * http://www.xmethods.com/ve2/ViewListing.po;jsessionid=tYR5gLa3iESq9FYCW1m1IqHo(QHyMHiRM)?key=uuid:57D835E5-B4A5-4C4A-38E8-37E964100CF8 * http://services.bio.ifi.lmu.de:1046/prothesaurus/