Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2756406.2756923acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

Demystifying the Semantics of Relevant Objects in Scholarly Collections: A Probabilistic Approach

Published: 21 June 2015 Publication History

Abstract

Efforts to make highly specialized knowledge accessible through scientific digital libraries need to go beyond mere bibliographic metadata, since here information search is mostly entity-centric. Previous work has realized this trend and developed different methods to recognize and (to some degree even automatically) annotate several important types of entities: genes and proteins, chemical structures and molecules, or drug names to name but a few. Moreover, such entities are often crossreferenced with entries in curated databases. However, several questions still remain to be answered: Given a scientific discipline what are the important entities? How can they be automatically identified? Are really all of them relevant, i.e. do all of them carry deeper semantics for assessing a publication? How can they be represented, described, and subsequently annotated? How can they be used for search tasks? In this work we focus on answering some of these questions. We claim that to bring the use of scientific digital libraries to the next level we must find treat topic-specific entities as first class citizens and deeply integrate their semantics into the search process. To support this we propose a novel probabilistic approach that not only successfully provides a solution to the integration problem, but also demonstrates how to leverage the knowledge encoded in entities and provide insights to explore the use of our approach in different scenarios. Finally, we show how our results can benefit information providers.

References

[1]
D. M. Blei, A. Y. NG, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research. 2003
[2]
Blei, D. M., & Lafferty, J. D. (2009). Topic Models. In Text Mining: Classification, Clustering, and Applications (pp. 71--89). Chapman & Hall/CRC Data Mining and Knowledge Discovery Series.
[3]
Blei, D. M. (2012). Introduction to Probabilistic Topic Modeling. Communications of the ACM, 55, 77--84.
[4]
Goulart, R. R. V., Strube de Lima, V. L., & Xavier, C. C. (2011). A systematic review of named entity recognition in biomedical texts. Journal of the Brazilian Computer Society.
[5]
Settles, B. (2005). ABNER: An open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics, 21, 3191--3192.
[6]
Filippov, I. V., & Nicklaus, M. C. (2009). Optical structure recognition software to recover chemical information: OSRA, an open source solution. Journal of Chemical Information and Modeling, 49, 740--743.
[7]
Lowe, D. M., Corbett, P. T., Murray-Rust, P., & Glen, R. C. 2011. Journal of Chemical Information and Modeling, 51, 739--753.
[8]
Park, J., Rosania, G. R., Shedden, K. A., Nguyen, M., Lyu, N., & Saitou, K. (2009). Automated extraction of chemical structure information from digital raster images. Chemistry Central Journal, 3, 4.
[9]
P. Sojka and M. Lška. The Art of Mathematics Retrieval. Proceedings of the ACM Conference on Document Engineering. 2011
[10]
Michael Kohlhase, Bogdan A. Matican, and Corneliu C. Prodescu. MathWebSearch 0.5 -Scaling an open Formula Sarch Engine. Conferences on Intelligent Computer Mathematics (CICM). 2012
[11]
Kamali, S., & Tompa, F. W. (2013). Retrieving documents with mathematicalcontent. In Proceedings of the 36th international ACM SIGIRconference on Research and development in information retrieval -- SIGIR '13 (p. 353).
[12]
Sun, B., Mitra, P., & Giles, C. L. (2008). Mining, indexing, and searching for textual chemical molecule information on the web. In Proceeding of the international conference on World Wide Web (pp. 735--744).
[13]
Tönnies, S., Köhncke, B., Koepler, O., & Balke, W.-T. (2010). Exposing the Hidden Web for Chemical Digital Libraries. In Int.l Joint Conference on Digital Libraries (pp. 234--244).
[14]
Vickrey, D., Biewald, L., Teyssier, M., & Koller, D. (2005). Word-Sense Disambiguation for Machine Translation. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT '05) (pp. 771--778).
[15]
Carpuat, M., & Wu, D. (2007). Improving statistical machine translation using word sense disambiguation. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 61--72. Retrieved from papers2://publication/uuid/CA8E0BC3--96B6--4123--8674--4E4BD98AACA9
[16]
Brody, S., & Lapata, M. (2009). Bayesian Word Sense Induction. Computational Linguistics, 103--111.
[17]
Lau, J. H., Cook, P., McCarthy, D., Newman, D., Baldwin, T., & Computing, L. (2012). Word sense induction for novel sense detection. In Proceedings of the 13th Conference of the European Chapter of the Association for computational Linguistics (EACL 2012) (pp. 591--601).
[18]
Firth, J. R. (1957). A synopsis of linguistic theory 1930--55. Studies in Linguistic Analysis (special Volume of the Philological Society), 1952--59, 1--32.
[19]
Griffith TL, Steyvers M (2004). Finding Scientic Topics. Proceedings of the National Academy of Sciences of the United States of America, 101, 5228--5235
[20]
Burges, C. J. C. (1998). A Tutorial on Support Vector Machines for Pattern Recognition. Knowledge Discovery and Data Mining, 2, 121--167. Retrieved from /papers/Burges98.ps.gz
[21]
Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006). Hierarchical Dirichlet Processes. Journal of the American Statistical Association.

Cited By

View all
  • (2024)Automating Research Problem Framing and Exploration through Knowledge Extraction from Bibliometric DataBibliometrics - An Essential Methodological Tool for Research Projects10.5772/intechopen.1005575Online publication date: 10-Jun-2024
  • (2018)Metadata Enrichment of Multi-disciplinary Digital Library: A Semantic-Based ApproachDigital Libraries for Open Knowledge10.1007/978-3-030-00066-0_3(32-43)Online publication date: 5-Sep-2018
  • (2017)Utilizing dependency relationships between math expressions in math IRInformation Retrieval Journal10.1007/s10791-017-9296-820:2(132-167)Online publication date: 14-Mar-2017
  • Show More Cited By

Index Terms

  1. Demystifying the Semantics of Relevant Objects in Scholarly Collections: A Probabilistic Approach

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      JCDL '15: Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries
      June 2015
      324 pages
      ISBN:9781450335942
      DOI:10.1145/2756406
      • General Chairs:
      • Paul Logasa Bogen,
      • Suzie Allard,
      • Holly Mercer,
      • Micah Beck,
      • Program Chairs:
      • Sally Jo Cunningham,
      • Dion Goh,
      • Geneva Henry
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 June 2015

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. hidden knowledge
      2. probabilistic topic models.
      3. scientific digital libraries
      4. semantics entities

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      JCDL '15
      Sponsor:
      JCDL '15: 15th ACM/IEEE-CS Joint Conference on Digital Libraries
      June 21 - 25, 2015
      Tennessee, Knoxville, USA

      Acceptance Rates

      JCDL '15 Paper Acceptance Rate 18 of 60 submissions, 30%;
      Overall Acceptance Rate 415 of 1,482 submissions, 28%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)3
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 24 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Automating Research Problem Framing and Exploration through Knowledge Extraction from Bibliometric DataBibliometrics - An Essential Methodological Tool for Research Projects10.5772/intechopen.1005575Online publication date: 10-Jun-2024
      • (2018)Metadata Enrichment of Multi-disciplinary Digital Library: A Semantic-Based ApproachDigital Libraries for Open Knowledge10.1007/978-3-030-00066-0_3(32-43)Online publication date: 5-Sep-2018
      • (2017)Utilizing dependency relationships between math expressions in math IRInformation Retrieval Journal10.1007/s10791-017-9296-820:2(132-167)Online publication date: 14-Mar-2017
      • (2017)Semantic Facettation in Pharmaceutical Collections Using Deep Learning for Active Substance ContextualizationDigital Libraries: Data, Information, and Knowledge for Digital Lives10.1007/978-3-319-70232-2_4(41-53)Online publication date: 3-Nov-2017

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media