Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Toward a Flexible Metadata Pipeline for Fish Specimen Images

  • Conference paper
  • First Online:
Metadata and Semantic Research (MTSR 2022)

Abstract

Flexible metadata pipelines are crucial for supporting the FAIR data principles. Despite this need, researchers seldom report their approaches for identifying metadata standards and protocols that support optimal flexibility. This paper reports on an initiative targeting the development of a flexible metadata pipeline for a collection containing over 300,000 digital fish specimen images, harvested from multiple data repositories and fish collections. The images and their associated metadata are being used for AI-related scientific research involving automated species identification, segmentation and trait extraction. The paper provides contextual background, followed by the presentation of a four-phased approach involving: 1. Assessment of the Problem, 2. Investigation of Solutions, 3. Implementation, and 4. Refinement. The work is part of the NSF Harnessing the Data Revolution, Biology Guided Neural Networks (NSF/HDR-BGNN) project and the HDR Imageomics Institute. An RDF graph prototype pipeline is presented, followed by a discussion of research implications and conclusion summarizing the results.

Supported by NSF-HDR-OAC: Biology-guided Neural Networks for Discovering Phenotypic Traits: 1940233 and 1940322m, NSF HDR-OAC: Imageomics: A New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning: 2118240, and the Institute of Museum and Library Services (IMLS) RE-246450-OLS-20.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. FAIR Sharing Standards Registry. https://fairsharing.org/search?fairsharingRegistry=Standard

  2. Introduction to BCO-DMO \(|\) BCO-DMO. https://www.bco-dmo.org/

  3. Marine Environmental Research Infrastructure for Data Integration and Application Network, https://meridian.cs.dal.ca/

  4. National Center for Biomedical Ontology BioPortal. https://bioportal.bioontology.org/

  5. Phenoscape. https://phenoscape.org

  6. Directive 2003/98/EC of the European Parliament and of the Council of 17 November 2003 on the re-use of public sector information (2003). http://data.europa.eu/eli/dir/2003/98/oj

  7. EU-funded projects go public www.openaire.eu. MRS Bull. 37(8), 714 (2012). https://doi.org/10.1557/mrs.2012.193

  8. Directive (EU) 2019/1024 of the European Parliament and of the Council of 20 June 2019 on open data and the re-use of public sector information (recast) (2019), http://data.europa.eu/eli/dir/2019/1024/oj/eng

  9. DCMI Metadata Terms (2020). https://www.dublincore.org/specifications/dublin-core/dcmi-terms/

  10. Imageomics Institute (2021). https://imageomics.osu.edu/

  11. Arencibia, E., Martinez, R., Marti-Lahera, Y., Goovaerts, M.: On metadata quality in Sceiba, a platform for quality control and monitoring of Cuban scientific publications. In: Garoufallou, E., Ovalle-Perandones, M.-A., Vlachidis, A. (eds.) MTSR 2021. CCIS, vol. 1537, pp. 106–113. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98876-0_9

    Chapter  Google Scholar 

  12. Atkins, D.E., et al.: Revolutionizing science and engineering through cyberinfrastructure: report of the national science foundation blue-ribbon advisory panel on cyberinfrastructure. Technical report, National Science Foundation (2003). https://www.nsf.gov/cise/sci/reports/atkins.pdf

  13. Bailey, C.B., Balakirev, F.F., Balakireva, L.L.: Closing the gap between FAIR data repositories and hierarchical data formats. Code4Lib J. (52) (2021). https://journal.code4lib.org/articles/16223

  14. Ball, A.: Metadata standards directory (2016). https://www.youtube.com/watch?v=Lh8w2_TpFP8

  15. Ball, A., Chen, S., Greenberg, J., Perez, C., Jeffery, K., Koskela, R.: Building a disciplinary metadata standards directory. Int. J. Digit. Curat. 9(1), 142–151 (2014). https://doi.org/10.2218/ijdc.v9i1.308

    Article  Google Scholar 

  16. Batista, D., Gonzalez-Beltran, A., Sansone, S.A., Rocca-Serra, P.: Machine actionable metadata models. Sci. Data 9(1) (2022). https://doi.org/10.1038/s41597-022-01707-6

  17. Brunet, M., Gilabert, A., Jones, P., Efthymiadis, D.: A historical surface climate dataset from station observations in Mediterranean North Africa and Middle East areas. Geosci. Data J. 1(2), 121–128 (2014). https://doi.org/10.1002/gdj3.12

  18. Child, A.W., Hinds, J., Sheneman, L., Buerki, S.: Centralized project-specific metadata platforms: toolkit provides new perspectives on open data management within multi-institution and multidisciplinary research projects. BMC. Res. Notes 15(1), 106 (2022). https://doi.org/10.1186/s13104-022-05996-3

    Article  Google Scholar 

  19. Chuttur, M.Y.: Perceived helpfulness of Dublin core semantics: an empirical study. In: Garoufallou, E., Greenberg, J. (eds.) MTSR 2013. CCIS, vol. 390, pp. 135–145. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03437-9_14

    Chapter  Google Scholar 

  20. Courtot, M., Gupta, D., Liyanage, I., Xu, F., Burdett, T.: BioSamples database: FAIRer samples metadata to accelerate research data management. Nucleic Acids Res. 50(D1), D1500–D1507 (2022). https://doi.org/10.1093/nar/gkab1046

    Article  Google Scholar 

  21. Dececchi, T.A., Balhoff, J.P., Lapp, H., Mabee, P.M.: Toward synthesizing our knowledge of morphology: using ontologies and machine reasoning to extract presence/absence evolutionary phenotypes across studies. Syst. Biol. 64(6), 936–952 (2015). https://doi.org/10.1093/sysbio/syv031

    Article  Google Scholar 

  22. Diamantopoulos, N., Sgouropoulou, C., Kastrantas, K., Manouselis, N.: Developing a metadata application profile for sharing agricultural scientific and scholarly research resources. In: García-Barriocanal, E., Cebeci, Z., Okur, M.C., Öztürk, A. (eds.) MTSR 2011. CCIS, vol. 240, pp. 453–466. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24731-6_45

    Chapter  Google Scholar 

  23. Edmunds, R.C., et al.: Phenoscape: identifying candidate genes for evolutionary phenotypes. Mol. Biol. Evol. 33(1), 13–24 (2016). https://doi.org/10.1093/molbev/msv223

    Article  Google Scholar 

  24. Elberskirch, L., et al.: Digital research data: from analysis of existing standards to a scientific foundation for a modular metadata schema in nanosafety. Part. Fibre Toxicol. 19(1) (2022). https://doi.org/10.1186/s12989-021-00442-x

  25. Elhamod, M., et al.: Hierarchy-guided neural networks for species classification. Preprint Evol. Biol. (2021). https://doi.org/10.1101/2021.01.17.427006

  26. Fordham, D.A., et al.: Using paleo-archives to safeguard biodiversity under climate change. Science 369(6507), eabc5654 (2020). https://doi.org/10.1126/science.abc5654

  27. Freire, N., Meijers, E., de Valk, S., Raemy, J.A., Isaac, A.: Metadata aggregation via linked data: results of the Europeana common culture project. In: Garoufallou, E., Ovalle-Perandones, M.-A. (eds.) MTSR 2020. CCIS, vol. 1355, pp. 383–394. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-71903-6_35

    Chapter  Google Scholar 

  28. Freire, N., Voorburg, R., Cornelissen, R., de Valk, S., Meijers, E., Isaac, A.: Aggregation of linked data in the cultural heritage domain: a case study in the Europeana network. Information 10(8), 252 (2019). https://doi.org/10.3390/info10080252

    Article  Google Scholar 

  29. Gallas, E.J., Malon, D., Hawkings, R.J., Albrand, S., Torrence, E.: An integrated overview of metadata in ATLAS. J. Phys: Conf. Ser. 219(4), 042009 (2010). https://doi.org/10.1088/1742-6596/219/4/042009

  30. tubri github: tubri-github/bgnn_api (2022). https://github.com/tubri-github/bgnn_API. Original-date: 2022-10-12T14:03:39Z

  31. Greenberg, J., White, H.C., Carrier, S., Scherle, R.: A metadata best practice for a scientific data repository. J. Libr. Metadata 9(3–4), 194–212 (2009). https://doi.org/10.1080/19386380903405090

    Article  Google Scholar 

  32. Houssos, N., Stamatis, K., Banos, V., Kapidakis, S., Garoufallou, E., Koulouris, A.: Implementing enhanced OAI-PMH requirements for Europeana. In: Gradmann, S., Borri, F., Meghini, C., Schuldt, H. (eds.) TPDL 2011. LNCS, vol. 6966, pp. 396–407. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24469-8_40

    Chapter  Google Scholar 

  33. Houssos, N., Stamatis, K., Koutsourakis, P., Kapidakis, S., Garoufallou, E., Koulouris, A.: Enhanced OAI-PMH services for metadata sharing in heterogeneous environments. Libr. Rev. 63(6/7), 465–489 (2014). https://doi.org/10.1108/LR-05-2014-0051

    Article  Google Scholar 

  34. Kalogeros, E., Gergatsoulis, M., Damigos, M.: Document-based RDF storage method for parallel evaluation of basic graph pattern queries. Int. J. Metadata Semant. Ontol. 14(1), 63 (2020). https://doi.org/10.1504/IJMSO.2020.107798

    Article  Google Scholar 

  35. Karnani, K., et al.: Computational metadata generation methods for biological specimen image collections (2022). https://doi.org/10.21203/rs.3.rs-1506561/v1

  36. Leipzig, J., et al.: Biodiversity image quality metadata augments convolutional neural network classification of fish species (2021). https://doi.org/10.1101/2021.01.28.428644

  37. Leipzig, J., Nüst, D., Hoyt, C.T., Ram, K., Greenberg, J.: The role of metadata in reproducible computational research. Patterns 2(9), 100322 (2021). https://doi.org/10.1016/j.patter.2021.100322

  38. Mabee, P.M., Balhoff, J.P., Dahdul, W.M., Lapp, H., Mungall, C.J.: Reasoning over anatomical homology in the Phenoscape KB. In: Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA, p. 2 (2018)

    Google Scholar 

  39. Manda, P., Balhoff, J.P., Lapp, H., Mabee, P., Vision, T.J.: Using the phenoscape knowledgebase to relate genetic perturbations to phenotypic evolution. Genesis 53(8), 561–571 (2015). https://doi.org/10.1002/dvg.22878

  40. Manghi, P., Houssos, N., Mikulicic, M., Jörg, B.: The data model of the OpenAIRE scientific communication e-infrastructure. In: Dodero, J.M., Palomo-Duarte, M., Karampiperis, P. (eds.) MTSR 2012. CCIS, vol. 343, pp. 168–180. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35233-1_18

    Chapter  Google Scholar 

  41. Margaritopoulos, M., Margaritopoulos, T., Mavridis, I., Manitsaris, A.: Quantifying and measuring metadata completeness. J. Am. Soc. Inform. Sci. Technol. 63(4), 724–737 (2012). https://doi.org/10.1002/asi.21706

    Article  Google Scholar 

  42. Michener, W.K.: Creating and managing metadata. In: Recknagel, F., Michener, W.K. (eds.) Ecological Informatics, pp. 71–88. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-59928-1_5

    Chapter  Google Scholar 

  43. Mons, B.: Data Stewardship for Open Science: Implementing FAIR Principles, 1 edn. Chapman and Hall/CRC, New York (2018). https://doi.org/10.1201/9781315380711

  44. Mons, B., Neylon, C., Velterop, J., Dumontier, M., da Silva Santos, L.O.B., Wilkinson, M.D.: Cloudy, increasingly FAIR; revisiting the FAIR data guiding principles for the European open science cloud. Inf. Serv. Use 37(1), 49–56 (2017). https://doi.org/10.3233/ISU-170824

  45. Nelson, A.: Desirable characteristics of data repositories for federally funded research. Technical report, Executive Office of the President of the United States (2022). https://doi.org/10.5479/10088/113528

  46. Nordling, L.: Scientists struggle to access Africa’s historical climate data. Nature 574(7780), 605–606 (2019). https://doi.org/10.1038/d41586-019-03202-2

    Article  Google Scholar 

  47. Park, J.R.: Metadata quality in digital repositories: a survey of the current state of the art. Catalog. Classif. Q. 47(3–4) (2009). https://doi.org/10.1080/01639370902737240

  48. Park, J.R., Tosaka, Y.: Metadata quality control in digital repositories and collections: criteria, semantics, and mechanisms. Catalog. Classif. Q. 48(8) (2010). https://doi.org/10.1080/01639374.2010.508711

  49. Pepper, J., Greenberg, J., Bakiş, Y., Wang, X., Bart, H., Breen, D.: Automatic metadata generation for fish specimen image collections (2021). https://doi.org/10.1101/2021.10.04.463070

  50. Perez, C.I.: The RDA’s metadata standards directory: information gathering. Master’s thesis, University of North Carolina at Chapel Hill (2013). https://www.rd-alliance.org/sites/default/files/CPerez-RDA-Metadata.pdf

  51. Rettberg, N., Schmidt, B.: OpenAIRE: supporting a European open access mandate. Coll. Res. Libr. News 76(6), 306–310 (2015). https://doi.org/10.5860/crln.76.6.9326

    Article  Google Scholar 

  52. Rockembach, M., Serrano, A.: Climate change and web archives: an Ibero-American study based on the Portuguese and Brazilian contexts. Rec. Manage. J. 31(3) (2021). https://doi.org/10.1108/RMJ-11-2020-0039

  53. Schöpfel, J.: Adding value to electronic theses and dissertations in institutional repositories. D-Lib Mag. 19(3/4) (2013). https://doi.org/10.1045/march2013-schopfe

  54. Soltis, P.S.: Digitization of herbaria enables novel research. Am. J. Bot. 104(9), 1281–1284 (2017). https://doi.org/10.3732/ajb.1700281

    Article  Google Scholar 

  55. Sterner, B., Elliott, S.: The FAIR and CARE data principles influence who counts as a participant in biodiversity science by governing the fitness-for-use of data (2022). http://philsci-archive.pitt.edu/21039/

  56. Tsiflidou, E., Manouselis, N.: Tools and techniques for assessing metadata quality. In: Garoufallou, E., Greenberg, J. (eds.) MTSR 2013. CCIS, vol. 390, pp. 99–110. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03437-9_11

    Chapter  Google Scholar 

  57. Virkus, S., Garoufallou, E.: Data science from a perspective of computer science. In: Garoufallou, E., Fallucchi, F., William De Luca, E. (eds.) MTSR 2019. CCIS, vol. 1057, pp. 209–219. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-36599-8_19

    Chapter  Google Scholar 

  58. Vlachidis, A., Antoniou, A., Bikakis, A., Terras, M.: Semantic metadata enrichment and data augmentation of small museum collections following the FAIR principles. In: Information and Knowledge Organisation in Digital Humanities, pp. 106–129. Routledge (2021). https://doi.org/10.4324/9781003131816-6

  59. Wieczorek, J., et al.: Darwin core: an evolving community-developed biodiversity data standard. PLoS ONE 7(1), e29715 (2012). https://doi.org/10.1371/journal.pone.0029715

  60. Wilkinson, M.D., et al: The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3(1), 160018 (2016). https://doi.org/10.1038/sdata.2016.18

  61. Wong, E.Y.: Data documentation initiative. Tech. Serv. Q. 33(1) (2016). https://doi.org/10.1080/07317131.2015.1093852

Download references

Acknowledgments

We thank the Integrated Digitized Biocollections (iDigBio), Global Biodiversity Information Facility (GBIF) and MorphBank data repositories, and the curators of the fish collections in the Great Lakes Invasives Network – Field Museum of Natural History, Illinois Natural History Survey, J. F. Bell Museum of Natural History, Ohio State University Museum of Biological Diversity, University of Michigan Museum of Zoology, and University of Wisconsin-Madison Zoological Museum - for sharing images of their fish specimens with us. We also thank Anuj Karpatne and team at Virginia Tech University who developed and trained the fish feature segmentation ANN component of the workflow, Joel Pepper for automated image quality feature extraction workflow and Bahadir Altintas for developing automated landmark extraction workflow.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dom Jebbia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jebbia, D., Wang, X., Bakis, Y., Bart Jr., H.L., Greenberg, J. (2023). Toward a Flexible Metadata Pipeline for Fish Specimen Images. In: Garoufallou, E., Vlachidis, A. (eds) Metadata and Semantic Research. MTSR 2022. Communications in Computer and Information Science, vol 1789. Springer, Cham. https://doi.org/10.1007/978-3-031-39141-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-39141-5_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-39140-8

  • Online ISBN: 978-3-031-39141-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics