Cover slide Documentation of Genetic Resources Global Information Systems CAC Training Course January 29, 2009 NordGen, Alnarp Dag Terje Filip Endresen Nordic Genetic Resource Center/  Bioversity International


TOPICS Documentation of genetic resources: Information Systems Data standards Data exchange Distributed data network


Global  PGR Information Systems


Germplasm catalogues The three large germplasm catalogues are indexed by the GBIF data portal EURISCO  is the data catalogue of the European genebanks ( more than 1 000 000   accessions ) SINGER  is the portal to the international CGIAR collections ( more than 650 000   accessions ) USDA-GRIN  is the portal to the USDA ARS National Germplasm Repositories of the USA ( more than 400 000   accessions )


ECPGR  (AEGIS, ECCDB, EURISCO) A European Genebank Integrated System (AEGIS)  Sharing of responsibilities ( Most Appropriate Accession ;  common agreed quality standards  for  e x situ  conservation ). Conservation of  the genetically unique and important accessions for Europe   and making them available for breeding and research . F our model crops :   Allium ,  Avena ,  Brassica  and  Prunus  species . Membership in AEGIS  is  open to all  European  countries  (ECPGR) .  EURISCO  and the  Central Crop Databases  play a key role in the information management .


ECPGR Central Crop Databases


ECPGR Central Crop Databases


EURISCO  [ http://eurisco.ecpgr.org/ ] EURISCO data catalogue of the European genebanks (more than  1   0 00 000   accessions from 35 European countries) EURISCO holds accession level data on 1 300 genera and 8 500 species. EURISCO was released in September 2003 as a result of the EU funded EPGRIS project. EURISCO is hosted by Bioversity International on behalf of the ECPGR.


EURISCO (new layout)


Data flow from genebanks to EURISCO and ECCDBs


EPGRIS3  [ http://www.epgris3.eu/ ] EPGRIS3 is a volunteer self-funded follow up on the EU funded EPGRIS project. EPGRIS3 is about improving the data exchange of European genebank datasets and to further develop the IT infrastructure on genetic resources in Europe.


EPGRIS3  Wiki Environment A EPGRIS3 Wiki environment is hosted by NordGen. Please register and contribute to the discussions.  [ http://wwwdev.ngb.se/epgris3/ ] Please make contact with one of the EPGRIS3 contact persons if you want to contribute to the EPGRIS3 project. [ http://www.epgris3.eu/ EPGRIS3contacts.htm ]


SINGER  [ http://singer.grinfo.net/ ] The System-wide Information Network for Genetic Resources (SINGER) . More than 650 000   accessions from the 12 international CGIAR organizations. SINGER is hosted by Bioversity International on behalf of the CGIAR.


CGIAR  [ http://www.cgiar.org/ ] AVRDC  -  The World Vegetable Center Bioversity  -  Bioversity International  CIAT  -  Centro Internacional de Agricultura Tropical CIMMYT  - Ce ntro Internacional de Mejoramiento de Maiz y Trigo CIP  - C entro Internacional de la Papa  ICARDA  -  International Center for Agricultural Research in the Dry Areas  ICRAF  -  The World Agroforestry Centre ICRISAT  -  International Crops Research Institute for the Semi-Arid Tropics IITA  -  International Institute of Tropical Agriculture ILRI  -  International Livestock Research Institute IRRI  -  International Rice Research Institute  WARDA  -  The Africa Rice Center


GCP  [ http://www.generationcp.org/ ] GCP G eneration  C hallenge  P rogramme. The GCP Mission:  To use advanced genomics science and plant genetic diversity to overcome complex agricultural bottlenecks that condemn millions of the world’s neediest people to a future of poverty and hunger . The GCP Vision:  A future where plant breeders have the tools to breed crops in marginal environments with greater efficiency and accuracy for the benefit of the resource-poor farmers and their families.


NordGen  [ http://www.nordgen.org/ ] The  Nordic Genetic Resource Center (NordGen) was established in January 2008. NordGen replaces the former institute Nordic Gene Bank (NGB) established in 1979. NordGen is the joint regional genetic resource center for all the 5 Nordic countries: Denmark, Finland, Iceland, Norway and Sweden. The NordGen reports to the Nordic Council of Ministers [ http://www.norden.org ]. The mandate of the NordGen is conservation and utilization of Nordic Genetic Resources.


Regional Programs on Genetic Resources SEEDNet,  South East European Development Network on Plant Genetic Resources  was established in 2004.  [ http://seednet.geminova.net/ ] SADC, Southern African Development Community program on genetic resources was started in 1989.  [ http://www.spgrc.org/ ] USDA GRIN, Germplasm Resources Information Network of the US.  [ http://www.ars-grin.gov/ ] …  and more


GBIF Global Biodiversity Information Facility


GBIF Data Portal GBIF  [ http:// data .gbif.org/ ]


GBIF PGR Network 2 [ http://data.gbif.org/datasets/network/2 ]


GBIF NordGen [ http:// data .gbif.org/ ]


GBIF SINGER [ http:// data .gbif.org/ ]






FAO WIEWS  [ http://apps3.fao.org/wiews / ]


FAO WIEWS, GPA  [ http://www.pgrfa.org/gpa/ ] Leipzig Declaration  1996, 150 countries [ http://www.globalplanofaction.org/ ]


Data Standards


Crop Descriptors The  crop descriptor lists  from Bioversity International provide global standards for characterization and evaluation data on crop genetic resources. The MCPD (Multi Crop Passport Descriptor List) provides a global standard for "passport  data" across the crops. The MCPD descriptor list is compatible with the TDWG standard: ABCD 2.06.


Accession level, Data Standards Multi Crop Passport  (MCPD) [http://www.bioversityinternational.org/publications/pubfile.asp?id_pub=124] Darwin Core  (DwC v2) [http://wiki.tdwg.org/twiki/bin/view/DarwinCore/] Access to Biological Collection Data  (ABCD 2.06) [http://wiki.tdwg.org/twiki/bin/view/ABCD] Generation Challenge Programme  (GCP Passport v1.05)   	 [http://gcpcr.grinfo.net/include/webservices/schema-documentation.php]


W3C :: RDF Resource Description Framework Scenario: You have a dataset of genebank accessions with pointers to the source datasets of the holding genebanks. You produce phenotypic evaluation data on accessions in this dataset. You find evaluation data from other sources on some of the accessions in your dataset. Some of the evaluation data are produced in areas of different day length, rainfall, soils… Some of the accessions in your dataset originate from areas of higher population densities other accessions originate from more natural habitats. Unfortunately most of the different sources of information is located on different web sites and it is difficult to bring the information together. You  would need to go through more or less the same process  as other researchers in many domains  of gathering heterogeneous data from multiple sources, combining and analysing it. This is the challenge that faces the web as a whole and is being addressed by the  Semantic Web  project.  RDFs can assist you to relate information from different sources. A RDF triplet looks like this:  subject-predicate-object  <rdf:Description rdf:about=&quot;http://www.example.org/index.html&quot;>   <dc:creator>John Smith</dc:creator> </rdf:Description> anytime   approximate   case study   diagnosis   inconsistent   kads banana  apples   stem color  knowledge based systems   knowledge level   knowledge management   knowledge representation   LSID   accession number   GUID  unitID  ontology   owl  parametric design  Full Scientific Name   peer to peer systems   problem solving  landrace   traditional cultivar   300  methods   rdf   rdf  WEB2  ABCD  SDD  semantic web   semantics   specification languages   web based   web ontology   INSTCODE  plant genetic resources  germplasm  agricultural traits  Aegilops


Life Science IDentifiers LSID is a digital name tag. LSIDs are GUIDs, Global Unique Identifiers. [http://lsid.sourceforge.net/] Structure  urn:lsid: authority : namespace : object : revision Example (fictive)  urn:lsid:eurisco.org:accession:H451269 The LSID concept introduces a straightforward approach to naming and identifying data resources stored in multiple, distributed data stores . LSID  define s  a simple, common way to identify and access biologically significant data LSID provides a naming standard  to support  interoperability. Developed by OMG-LSR and W3C, implemented by IBM. W3C :: LSID


Taxon References http://www.catalogueoflife.org http://www.itis.gov/


Biodiversity data exchange tools


Global Information Systems for Plant Genetic Resources (2009)


Global Information Systems for Plant Genetic Resources (2009)


Test resource with client form: http://localhost/tapirlink/tapir_client.php The XML Client form is very illustrative for understanding exactly how the wrapper software works!


Data Provider Software PyWrapper v3, based on the BioCASE Python software. [ http://www.pywrapper.org/ ] [ http://www.biocase.org/ ] DiGIR,  Di stributed  G eneric  I nformation  R etrieval.  [ http://digir.net ] TapirLink  [ http://wiki.tdwg.org/twiki/bin/view/TAPIR/TapirLink ] TapirDotNet  [ http://wiki.tdwg.org/twiki/bin/view/TAPIR/TapirDotNET ]


Distributed BioCASE/PyWrapper network


Global Information Systems for Plant Genetic Resources (2009)


Example of a service request All exchanged data is formatted with XML tags.


Example of a service response


Data portal and decentralized data networks  with web services


Data warehouse model


EURISCO (Europe) NordGen (Northern Europe) IPK Gatersleben (Germany) IHAR (Poland) (Other European  gene banks...) SINGER (CGIAR) (CGIAR International Future Harvest  gene banks...) USDA GRIN (USA) (USDA ARS National  Germplasm  Repositories...) WUR CGN (Netherlands) GBIF (Global Biodiversity Information Facility) USER ALIS (Accession Level  Information System) Web  Services MCPD Svalbard Global Seed Vault (Safe Backup)


Germplasm data indexing tools We have recently built data indexing tools for access to gene bank datasets provided with the BioCASE/PyWrapper.  This is planned to build a Global Accession Level Information System (ALIS). In cooperation with GBIF, which themselves index basic biodiversity data from a similar approach.  [ http://chm.grinfo.net/ ]


[ http://wwwdev.ngb.se/portal/ ]


Crop Wild Relatives ARM LKA BOL MDG UZB National Datasets are shared with  the central  CWR data index. The national  datasets as well as  access to other International  datasets are provided from  the CWR data portal. EURISCO SINGER [ http://www.cropwildrelatives.org ]


Taxonomy level metadata The Taxon and Country pages provides access to the relevant external datasets.


Country level metadata


Participation and the sharing of your institute datasets with global and national biodiversity projects  is important for your public and scientific visibility,  promoting the use (usefulness) of your data  and ultimately for the continued funding of your institutional activities.


Bioversity International  [http://www.bioversityinternational.org] GBIF,  Global Biodiversity Information Facility  [http://www.gbif.org] BioCASE , The Biological Collection Access Service for Europe.  [http://www.biocase.org] TDWG , Biodiversity Information Standards  [http://www.tdwg.org]


Thank you for listening!

