Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3626772.3661353acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

ScienceDirect Topic Pages: A Knowledge Base of Scientific Concepts Across Various Science Domains

Published: 11 July 2024 Publication History

Abstract

From undergraduate students to renowned scholars, everyone occasionally encounters unknown concepts within their field of interest, especially when reading scientific articles. ScienceDirectTopic Pages (TP) are intended to facilitate learning and to provide users with a structured overview of sources to deepen their knowledge about such unfamiliar topics. Our free service provides insight into a vast set of technical topics across 20 different scientific domains. Designed to emulate the natural flow of learning, TPs are embedded within millions of articles so that users can click on unfamiliar concepts they come across whilst reading an article. This redirects the user to a TP, consisting of a definition of the concept, which provides the user with a basic understanding of the concept. The TP further presents a collection of relevant snippets extracted from books and review articles published by ScienceDirect for users interested in references and more detailed explanations and applications of the concept. Finally, a set of related topics is provided to extend the user's knowledge even further. To build TPs, we utilize various information retrieval methods across our product. We retrieve the most relevant snippets for each topic/concept using a semantic search model fine-tuned on our scientific database. We further leverage the power of Retrieval Augmented Generation to generate reliable definitions on the topics sourced from ScienceDirect's content. To retrieve a list of relevant concepts for each topic, we use the co-occurrence statistics of concepts within books and articles.

References

[1]
Beltagy, I., Lo, K., Cohan, A.: SciBERT: A pretrained language model for scientific text. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3615--3620, Association for Computational Linguistics, Hong Kong, China (Nov 2019). URL https://aclanthology.org/D19--1371
[2]
Hofstätter, S., Lin, S.C., Yang, J.H., Lin, J., Hanbury, A.: Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling. In: Proc. of SIGIR (2021)
[3]
Jin, Y., Kan, M.Y., Ng, J.P., He, X.: Mining scientific terms and their definitions: A study of the acl anthology. In: EMNLP, pp. 780--790 (2013)
[4]
Kang, D., Head, A., Sidhu, R., Lo, K., Weld, D.S., Hearst, M.A.: Document-level definition detection in scholarly documents: Existing models, error analyses, and future directions. In: Proceedings of the FirstWorkshop on Scholarly Document Processing, pp. 196--206 (2020)
[5]
Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Text summarization branches out, pp. 74--81 (2004)
[6]
Malaisé, V., Otten, A., Coupet, P.: Omniscience and extensions--lessons learned from designing a multi-domain, multi-use case knowledge representation system. In: European Knowledge Acquisition Workshop, pp. 228--242 (2018)
[7]
Manakul, P., Liusie, A., Gales, M.J.: Selfcheckgpt: Zero-resource blackbox hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896 (2023)
[8]
Ni, J., Ábrego, G.H., Constant, N., Ma, J., Hall, K.B., Cer, D., Yang, Y.: Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models. arXiv preprint arXiv:2108.08877 (2021)
[9]
Ni, J., Qu, C., Lu, J., Dai, Z., Ábrego, G.H., Ma, J., Zhao, V.Y., Luan, Y., Hall, K.B., Chang, M.W., et al.: Large dual encoders are generalizable retrievers. arXiv preprint arXiv:2112.07899 (2021)
[10]
Plavén-Sigray, P., Matheson, G.J., Schiffler, B.C., Thompson, W.H.: The readability of scientific texts is decreasing over time. Elife 6, e27725 (2017)
[11]
Rafailov, R., Sharma, A., Mitchell, E., Manning, C.D., Ermon, S., Finn, C.: Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems 36 (2024)
[12]
Reimers, N., Gurevych, I.: Sentence-bert: Sentence embeddings using siamese bert-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics (11 2019), URL http://arxiv.org/abs/1908.10084
[13]
Wang, K., Thakur, N., Reimers, N., Gurevych, I.: Gpl: Generative pseudo labeling for unsupervised domain adaptation of dense retrieval. arXiv preprint arXiv:2112.07577 (2021)
[14]
Wang, X., Macdonald, C., Ounis, I.: Improving zero-shot retrieval using dense external expansion. Information Processing & Management 59(5), 103026 (2022), ISSN 0306--4573. 2022.103026, URL https://www.sciencedirect.com/science/article/pii/S0306457322001364

Index Terms

  1. ScienceDirect Topic Pages: A Knowledge Base of Scientific Concepts Across Various Science Domains

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
      July 2024
      3164 pages
      ISBN:9798400704314
      DOI:10.1145/3626772
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 11 July 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. knowledge acquisition information retrieval
      2. passage retrieval
      3. scientific document processing

      Qualifiers

      • Short-paper

      Conference

      SIGIR 2024
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 142
        Total Downloads
      • Downloads (Last 12 months)142
      • Downloads (Last 6 weeks)11
      Reflects downloads up to 16 Jan 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media