Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
The Germplasm Working Group

Dr. Vassilis Protonotarios
Agricultural Biotechnologist, PhD
Agro-Know Technologies, Greece
e-Conference on Germplasm Data Interoperability
Session 1: “The vision of Linked Germplasm Data”
Structure of the presentation
1. Background
– About the agINFRA project
– Issues related to data sharing

2. The Germplasm Working Group
– Objectives
– Wiki
– Link with RDA

3. The next steps
Background
The agINFRA project
• A project funded under the FP7 program of EC
• Consortium with expertise on
– Technology / infrastructures
– Data / data management

Combined to facilitate agricultural data sharing
More info at:

www.aginfra.eu
The agINFRA project
• Aims to enhance the interoperability between
the agricultural data sources
– Data sharing by
• Metadata aggregation & linking data
• Design and deploy the linked ag-data framework

– Methodology for linking data
– Provide the infrastructure needed
• Both cloud- and grid-based services
• Tools, APIs etc.
agINFRA major data types
Bibliographic
Agri Statistics
& Economics

Other?

Raw data

agINFRA

Profiles

Educational

Germplasm
Soil data
agINFRA major data sources
Data Type

Data provider(s)

Bibliographic

FAO AGRIS
CASDD (CAAS)

Educational

Organic.Edunet
Green Learning Network
LAFLOR

Germplasm

Chinese Crop Germplasm Information
System (CAAS)
Italian National Germplasm Database
(CRA)

Soil Data

Italian National Center for Soil Mapping

Statistical

FAOSTAT
CountrySTAT

Researchers’ profiles, organizations
& events

AGRIVIVO
Focusing on germplasm
Aggregators

National
Databases

Data flow

Italian
University

GENESYS
EURISCO

Local
Databases

Italian
Italian research
center

GBIF

Chinese

Chinese
research center
Focusing on germplasm
Aggregators

GENESYS
EURISCO

National
Databases

Local
Databases
Italian
University

Italian
Italian research
center
Chinese

Chinese
research center
The issue ?
• Heterogeneity!
– Data types
– Data formats
– Data management workflows
– Standards used
– Metadata exposure options
– ….

• Lack of connectivity with other data sources
The Germplasm Working Group
The Germplasm Working Group
• Created in the context of the agINFRA project
• Initially included agINFRA stakeholders
– now expanded to host all stakeholders

• The group is NOT a group of experts on
germplasm data!
The scope of the Germplasm WG
• Aims to enable/enhance interoperability between
germplasm databases
– By developing the services for
• exchanging their data and
• delivering their data to other partners

• Focusing on three actions:
1. IDENTIFY
2. ORGANIZE
3. PROPOSE
Germplasm WG objectives
• IDENTIFY: collect all information related to germplasm
data
•
•
•
•
•

People/groups
Namespaces (metadata, KOS)
Standards
Workflows
Events

• ORGANIZE: engage all stakeholders & available
resources, analyze existing standards , facilitate
collaboration
• PROPOSE: linked data framework to connect data
sources
• facilitate data sharing between germplasm data sources
Germplasm related information
metadata
schemas

Working
groups in
germplasm

Events
(for connecting
stakeholders)

KOS
(ontologies,
thesauri,
vocabularies
etc.)

data
management
workflows

Data exposure
capabilities
Germplasm related information
KOS
(ontologies,
thesauri,
vocabularies
etc.)

Working
groups in
germplasm

metadata
schemas
Events
(for connecting
stakeholders)

data
management
workflows
Data exposure
capabilities
Proposed methodology
1. Analyze metadata schemas & KOSs used to
describe germplasm resources
2. Define attributes & vocabularies that can be
used to expose germplasm resources in linked
data format.
3. Provide a set of recommendations for the
exposure of germplasm resources as linked data
4. Embed the recommendations in the data
infrastructure of agINFRA
– to allow the exposure of germplasm resources as
LOD.
The Germplasm WG wiki
• Central point of reference
http://wiki.aginfra.eu/index.php/Germplasm_Working_Group

• Freely accessible (no login required)
Information available so far
•
•
•
•
•
•

Vision
Activities
Outcomes
Participants
Next steps
Useful resources
–
–
–
–

Data sources
Standards
Services
Stakeholders

• Events
The agINFRA Germplasm Working Group
Key outcomes of the group
• Dossier on Germplasm Information:
– Major programs
– Major information systems and services
– agINFRA germplasm data sources (CGRIS & CRA)
– Core standards for germplasm information
– Plant nomenclature, taxonomies and ontologies
– Plant genomic resources
– Related references and links

• Freely available from the Germplasm Group wiki
The agINFRA Germplasm Working Group
Existing participants
Our wish list (tentative list)

Reusing experiences from

…and working closely
with
Connection with RDA
• RDA: Research Data Alliance (https://rd-alliance.org)
• Aims to “accelerate and facilitate research
data sharing and exchange”
• Structure:
– Interest Groups: Cover wider topics
– Working Groups: Working on focused topics
The agINFRA Germplasm Working Group
Connection with RDA
• Representation of agINFRA Germplasm WG in
– 1st RDA Plenary Meeting (March 2013,
Gothenburg, Sweden)
– 2nd RDA Plenary Meeting (September 2013,
Washington D.C., USA)

• Suggestion for a Germplasm WG in RDA
Link between WG and RDA Groups
Link between WG and RDA Groups
agINFRA WG

RDA IG/WG

•Interactions with data
providers

•Collection of large-scale data

•Collection of requirements
• Two (2) case studies
•Development of Best Practices
•Analysis of existing standards
•Collection of requirements
•Definition of data
management workflows

•Interaction with other
IGs/WGs (e.g. metadata, LD)
• Application in more cases
•Wider exposure of outcomes

•Development & adaptation of
tools and services
•Development of Best Practices

•Development of Best Practices
The next steps
Towards the linking of
germplasm data sources
1. Definition and application of the linked data
for the agINFRA germplasm data sources
2. Recording and documentation of the process
3. Identification of issues
4. Suggestion for solutions to these issues
5. Fine-tuning of workflow
6. Development of Best Practices
…and more next steps
• Update the existing analysis with new data
• Collect new user requirements
• (re)define the mappings between metadata
schemas and KOSs
• Fine-tune the linked data approach
Source: http://verastic.com/social/why-do-people-not-say-thank-you.html

Contact me: vprot@agroknow.gr

More Related Content

The agINFRA Germplasm Working Group

  • 1. The Germplasm Working Group Dr. Vassilis Protonotarios Agricultural Biotechnologist, PhD Agro-Know Technologies, Greece e-Conference on Germplasm Data Interoperability Session 1: “The vision of Linked Germplasm Data”
  • 2. Structure of the presentation 1. Background – About the agINFRA project – Issues related to data sharing 2. The Germplasm Working Group – Objectives – Wiki – Link with RDA 3. The next steps
  • 4. The agINFRA project • A project funded under the FP7 program of EC • Consortium with expertise on – Technology / infrastructures – Data / data management Combined to facilitate agricultural data sharing More info at: www.aginfra.eu
  • 5. The agINFRA project • Aims to enhance the interoperability between the agricultural data sources – Data sharing by • Metadata aggregation & linking data • Design and deploy the linked ag-data framework – Methodology for linking data – Provide the infrastructure needed • Both cloud- and grid-based services • Tools, APIs etc.
  • 6. agINFRA major data types Bibliographic Agri Statistics & Economics Other? Raw data agINFRA Profiles Educational Germplasm Soil data
  • 7. agINFRA major data sources Data Type Data provider(s) Bibliographic FAO AGRIS CASDD (CAAS) Educational Organic.Edunet Green Learning Network LAFLOR Germplasm Chinese Crop Germplasm Information System (CAAS) Italian National Germplasm Database (CRA) Soil Data Italian National Center for Soil Mapping Statistical FAOSTAT CountrySTAT Researchers’ profiles, organizations & events AGRIVIVO
  • 8. Focusing on germplasm Aggregators National Databases Data flow Italian University GENESYS EURISCO Local Databases Italian Italian research center GBIF Chinese Chinese research center
  • 10. The issue ? • Heterogeneity! – Data types – Data formats – Data management workflows – Standards used – Metadata exposure options – …. • Lack of connectivity with other data sources
  • 12. The Germplasm Working Group • Created in the context of the agINFRA project • Initially included agINFRA stakeholders – now expanded to host all stakeholders • The group is NOT a group of experts on germplasm data!
  • 13. The scope of the Germplasm WG • Aims to enable/enhance interoperability between germplasm databases – By developing the services for • exchanging their data and • delivering their data to other partners • Focusing on three actions: 1. IDENTIFY 2. ORGANIZE 3. PROPOSE
  • 14. Germplasm WG objectives • IDENTIFY: collect all information related to germplasm data • • • • • People/groups Namespaces (metadata, KOS) Standards Workflows Events • ORGANIZE: engage all stakeholders & available resources, analyze existing standards , facilitate collaboration • PROPOSE: linked data framework to connect data sources • facilitate data sharing between germplasm data sources
  • 15. Germplasm related information metadata schemas Working groups in germplasm Events (for connecting stakeholders) KOS (ontologies, thesauri, vocabularies etc.) data management workflows Data exposure capabilities
  • 16. Germplasm related information KOS (ontologies, thesauri, vocabularies etc.) Working groups in germplasm metadata schemas Events (for connecting stakeholders) data management workflows Data exposure capabilities
  • 17. Proposed methodology 1. Analyze metadata schemas & KOSs used to describe germplasm resources 2. Define attributes & vocabularies that can be used to expose germplasm resources in linked data format. 3. Provide a set of recommendations for the exposure of germplasm resources as linked data 4. Embed the recommendations in the data infrastructure of agINFRA – to allow the exposure of germplasm resources as LOD.
  • 18. The Germplasm WG wiki • Central point of reference http://wiki.aginfra.eu/index.php/Germplasm_Working_Group • Freely accessible (no login required)
  • 19. Information available so far • • • • • • Vision Activities Outcomes Participants Next steps Useful resources – – – – Data sources Standards Services Stakeholders • Events
  • 21. Key outcomes of the group • Dossier on Germplasm Information: – Major programs – Major information systems and services – agINFRA germplasm data sources (CGRIS & CRA) – Core standards for germplasm information – Plant nomenclature, taxonomies and ontologies – Plant genomic resources – Related references and links • Freely available from the Germplasm Group wiki
  • 24. Our wish list (tentative list) Reusing experiences from …and working closely with
  • 25. Connection with RDA • RDA: Research Data Alliance (https://rd-alliance.org) • Aims to “accelerate and facilitate research data sharing and exchange” • Structure: – Interest Groups: Cover wider topics – Working Groups: Working on focused topics
  • 27. Connection with RDA • Representation of agINFRA Germplasm WG in – 1st RDA Plenary Meeting (March 2013, Gothenburg, Sweden) – 2nd RDA Plenary Meeting (September 2013, Washington D.C., USA) • Suggestion for a Germplasm WG in RDA
  • 28. Link between WG and RDA Groups
  • 29. Link between WG and RDA Groups agINFRA WG RDA IG/WG •Interactions with data providers •Collection of large-scale data •Collection of requirements • Two (2) case studies •Development of Best Practices •Analysis of existing standards •Collection of requirements •Definition of data management workflows •Interaction with other IGs/WGs (e.g. metadata, LD) • Application in more cases •Wider exposure of outcomes •Development & adaptation of tools and services •Development of Best Practices •Development of Best Practices
  • 31. Towards the linking of germplasm data sources 1. Definition and application of the linked data for the agINFRA germplasm data sources 2. Recording and documentation of the process 3. Identification of issues 4. Suggestion for solutions to these issues 5. Fine-tuning of workflow 6. Development of Best Practices
  • 32. …and more next steps • Update the existing analysis with new data • Collect new user requirements • (re)define the mappings between metadata schemas and KOSs • Fine-tune the linked data approach

Editor's Notes

  1. Heterogeneous data types and formats,
  2. OAI-PMH harvesting is not an option in the case of germplasm data