Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3066911.3066919acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

Evolution of anatomical concept usage over time: mining 200 years of biodiversity literature

Published: 19 May 2017 Publication History

Abstract

The scientific literature contains an historic record of the changing ways in which we describe the world. Shifts in understanding of scientific concepts are reflected in the introduction of new terms and the changing usage and context of existing ones. We conducted an ontology-based temporal data mining analysis of biodiversity literature from the 1700s to 2000s to quantitatively measure how the context of usage for vertebrate anatomical concepts has changed over time. The corpus of literature was divided into nine non-overlapping time periods with comparable amounts of data and context vectors of anatomical concepts were compared to measure the magnitude of concept drift both between adjacent time periods and cumulatively relative to the initial state. Surprisingly, we found that while anatomical concept drift between adjacent time periods was substantial (55% to 68%), it was of the same magnitude as cumulative concept drift across multiple time periods. Such a process, bound by an overall mean drift, fits the expectations of a mean-reverting process.

References

[1]
Lakshmi M Akella, Catherine N Norton, and Holly Miller. 2012. NetiNeti: discovery of scientific names from text using machine learning methods. BMC Bioinformatics 13, 1 (2012), 211.
[2]
Jonathan BL Bard and Seung Y Rhee. 2004. Ontologies in biology: design, applications and future challenges. Nature Reviews Genetics 5, 3 (2004), 213--222.
[3]
Jason Best. 2013. Darwin Score. (2013). https://github.com/idigbio-citsci-hackathon/darwin-score [Accessed: Feb 2014].
[4]
Paul G Blackwell. 1998. Ornstein-Uhlenbeck process. Encyclopedia of Biostatistics (1998).
[5]
Judith A Blake and Carol J Bult. 2006. Beyond the data deluge: data integration and bio-ontologies. Journal of biomedical informatics 39, 3 (2006), 314--320.
[6]
Maya Carrillo, Esaú Villatoro-Tello, Aurelio López-López, Chris Eliasmith, Manuel Montes-y Gómez, and Luis Villasenor-Pineda. 2009. Representing context information for document retrieval. In International Conference on Flexible Query Answering Systems. Springer, 239--250.
[7]
Wasila M Dahdul, James P Balhoff, David C Blackburn, Alexander D Diehl, Melissa A Haendel, Brian K Hall, Hilmar Lapp, John G Lundberg, Christopher J Mungall, Martin Ringwald, and others. 2012. A unified anatomy ontology of the vertebrate skeletal system. PloS One 7, 12 (2012), e51070.
[8]
Wasila M Dahdul, John G Lundberg, Peter E Midford, James P Balhoff, Hilmar Lapp, Todd J Vision, Melissa A Haendel, Monte Westerfield, and Paula M Mabee. 2010. The teleost anatomy ontology: anatomical representation for the genomics age. Systematic Biology 59, 4 (2010), 369--383.
[9]
Stephen I. Gallant. 2000. Context Vectors: A Step Toward a "Grand Unified Representation". In Hybrid Neural Systems, Stefan Wermter and Ron Sun (Eds.). Lecture Notes in Computer Science, Vol. 1778. Springer Berlin Heidelberg, 204--210.
[10]
João Gama, Indrė Žliobaitė, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A survey on concept drift adaptation. ACM Computing Surveys (CSUR) 46, 4 (2014), 44.
[11]
Thomas R Gruber. 1993. A translation approach to portable ontology specifications. Knowledge Acquisition 5, 2 (1993), 199--220.
[12]
Nancy E Gwinn and Constance Rinaldo. 2009. The Biodiversity Heritage Library: sharing biodiversity literature with the world. IFLA journal 35, 1 (2009), 25--34.
[13]
Doug Huggins and Christian Schaller. 2013. Fixed Income Relative Value Analysis: A Practitioners Guide to the Theory, Tools, and Trades. John Wiley & Sons.
[14]
Myung Jig Kim, Charles R Nelson, and Richard Startz. 1991. Mean reversion in stock prices? A reappraisal of the empirical evidence. The Review of Economic Studies 58, 3 (1991), 515--528.
[15]
Cheng-Few Lee and Alice C Lee. 2006. Encyclopedia of Finance. Springer Science & Business Media.
[16]
Christopher J Mungall, Carlo Torniai, Georgios V Gkoutos, Suzanna E Lewis, and Melissa A Haendel. 2012. Uberon, an integrative multi-species anatomy ontology. Genome Biol 13, 1 (2012), R5.
[17]
Roderic DM Page. 2011. Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage Library. BMC Bioinformatics 12, 1 (2011), 187.
[18]
Roderic DM Page. 2013. BioNames: linking taxonomy, texts, and trees. PeerJ 1 (2013), e190.
[19]
James M Poterba and Lawrence H Summers. 1988. Mean reversion in stock prices: Evidence and implications. Journal of financial economics 22, 1 (1988), 27--59.
[20]
Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. 2006. Introduction to Data Mining. Addison Wesley.
[21]
Simon Tanner, Trevor Muñoz, and Pich Hemy Ros. 2009. Measuring mass text digitization quality and usefulness. D-Lib Magazine 15, 7/8 (2009), 1082--9873.
[22]
Anne E. Thessen, Hong Cui, and Dmitry Mozzherin. 2012. Applications of Natural Language Processing in Biodiversity Science. Adv. Bioinformatics 2012 (2012), 391574:1--391574:17.
[23]
Alexey Tsymbal. 2004. The problem of concept drift: definitions and related work. Technical Report TCD-CS-2004-15. The University of Dublin, Trinity College, Department of Computer Science, Dublin, Ireland.
[24]
Peter D Turney, Patrick Pantel, and others. 2010. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research 37, 1 (2010), 141--188.
[25]
Ted Underwood. 2013. OCR Normalizer. (2013). https://github.com/tedunderwood/DataMunging/tree/master/OCRnormalizer [Accessed Feb 2014].
[26]
Shenghui Wang, Stefan Schlobach, and Michel Klein. 2010. What is concept drift and how to measure it? In Knowledge Engineering and Management by the Masses. Springer, 241--256.
[27]
Qin Wei, P. Bryan Heidorn, and Chris Freeland. 2010. Name matters: taxonomic name recognition (TNR) in biodiversity heritage library (BHL). In iConference 2010 Proceedings. 284--288.

Index Terms

  1. Evolution of anatomical concept usage over time: mining 200 years of biodiversity literature

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SBD '17: Proceedings of The International Workshop on Semantic Big Data
      May 2017
      57 pages
      ISBN:9781450349871
      DOI:10.1145/3066911
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 19 May 2017

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. biodiversity literature
      2. bioinformatics
      3. concept drift
      4. data mining
      5. ontologies

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      SIGMOD/PODS'17
      Sponsor:

      Acceptance Rates

      SBD '17 Paper Acceptance Rate 8 of 15 submissions, 53%;
      Overall Acceptance Rate 30 of 54 submissions, 56%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 469
        Total Downloads
      • Downloads (Last 12 months)63
      • Downloads (Last 6 weeks)7
      Reflects downloads up to 19 Feb 2025

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media