Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3360901.3364438acmconferencesArticle/Chapter ViewAbstractPublication Pagesk-capConference Proceedingsconference-collections
short-paper
Public Access

Searching for Evidence of Scientific News in Scholarly Big Data

Published: 23 September 2019 Publication History

Abstract

Public digital media can often mix factual information with fake scientific news, which is typically difficult to pinpoint, especially for non-professionals. These scientific news articles create illusions and misconceptions, thus ultimately influence the public opinion, with serious consequences at a broader social scale. Yet, existing solutions aiming at automatically verifying the credibility of news articles are still unsatisfactory. We propose to verify scientific news by retrieving and analyzing its most relevant source papers from an academic digital library (DL), e.g., arXiv. Instead of querying keywords or regular named entities extracted from news articles, we query domain knowledge entities (DKEs) extracted from the text. By querying each DKE, we retrieve a list of candidate scholarly papers. We then design a function to rank them and select the most relevant scholarly paper. After exploring various representations, experiments indicate that the term frequency-inverse document frequency (TF-IDF) representation with cosine similarity outperforms baseline models based on word embedding. This result demonstrates the efficacy of using DKEs to retrieve scientific papers which are relevant to a specific news article. It also indicates that word embedding may not be the best document representation for domain specific document retrieval tasks. Our method is fully automated and can be effectively applied to facilitating fake and misinformed news detection across many scientific domains.

References

[1]
Taruna Agrawal, Rahul Gupta, and Shrikanth Narayanan. 2017. Multimodal detection of fake social media use through a fusion of classification and pairwise ranking systems. In EUSIPCO, 2017. IEEE, 1045--1049.
[2]
Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, and Andrew McCallum. 2017. SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications. In Proceedings of the 11th International Workshop on Semantic Evaluation. 546--555.
[3]
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, and Ray Kurzweil. 2018. Universal Sentence Encoder for English. In Proceedings of EMNLP 2018: System Demonstrations. 169--174.
[4]
Danqi Chen. 2018. Neural Reading Comprehension and Beyond. Ph.D. Dissertation. Stanford University.
[5]
Niall J Conroy, Victoria L Rubin, and Yimin Chen. 2015. Automatic deception detection: Methods for finding fake news. In Proceedings of the 78th ASIS&T Annual Meeting: Information Science with Impact: Research in and for the Community. American Society for Information Science, 82.
[6]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT 2019. 4171--4186.
[7]
Eliza Harrison, Paige Martin, Didi Surian, and Adam G Dunn. 2019. Recommending research articles to consumers of online vaccination information. arXiv preprint arXiv:1904.11886 (2019).
[8]
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Association for Computational Linguistics (ACL) System Demonstrations. 55--60. http://www.aclweb.org/anthology/P/P14/P14--5010
[9]
Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing Order into Text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, EMNLP 2004. 404--411.
[10]
Tomas Mikolov,Wen-tau Yih, and Geoffrey Zweig. 2013. Linguistic Regularities in Continuous SpaceWord Representations. In Processings of 2013 Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics. 746--751.
[11]
Bhaskar Mitra, Nick Craswell, et al. 2018. An introduction to neural information retrieval. Foundations and Trends® in Information Retrieval 13, 1 (2018), 1--126.
[12]
Rehana Moin, Khalid Mahmood Zahoor-ur Rehman, Mohammad Eid Alzahrani, and Muhammad Qaiser Saleem. 2018. Framework for Rumors Detection in Social Media. Framework 9, 5 (2018).
[13]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of EMNLP 2014, Alessandro Moschitti, Bo Pang, and Walter Daelemans (Eds.). 1532--1543.
[14]
Dwaipayan Roy, Debasis Ganguly, Sumit Bhatia, Srikanta Bedathur, and Mandar Mitra. 2018. Using Word Embeddings for Information Retrieval: How Collection and Term Normalization Choices Affect Performance. In Proceedings of the 27th CIKM. 1835--1838.
[15]
Dietram A Scheufele and Nicole M Krause. 2019. Science audiences, misinformation, and fake news. Proceedings of the National Academy of Sciences (2019).
[16]
Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter 19, 1 (2017), 22--36.
[17]
Weiming Wen, Songwen Su, and Zhou Yu. 2018. Cross-Lingual Cross-Platform Rumor Verification Pivoting on Multimedia Content. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3487--3496.
[18]
Jian Wu, Sagnik Ray Choudhury, Agnese Chiatti, Chen Liang, and C Lee Giles. 2017. HESDK: A hybrid approach to extracting scientific domain knowledge entities. In Proceedings of the 17th JCDL. 241--244.

Cited By

View all
  • (2022)Visual descriptor extraction from patent figure captionsProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3533299(1-5)Online publication date: 20-Jun-2022
  • (2020)A Comparative Study of Sequence Tagging Methods for Domain Knowledge Entity Recognition in Biomedical PapersProceedings of the ACM/IEEE Joint Conference on Digital Libraries in 202010.1145/3383583.3398602(397-400)Online publication date: 1-Aug-2020

Index Terms

  1. Searching for Evidence of Scientific News in Scholarly Big Data

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        K-CAP '19: Proceedings of the 10th International Conference on Knowledge Capture
        September 2019
        281 pages
        ISBN:9781450370080
        DOI:10.1145/3360901
        • General Chairs:
        • Mayank Kejriwal,
        • Pedro Szekely,
        • Program Chair:
        • Raphaël Troncy
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 23 September 2019

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. domain knowledge entity
        2. embedding
        3. fake news
        4. web api

        Qualifiers

        • Short-paper

        Funding Sources

        • OLD DOMINION UNIVERSITY
        • DARPA

        Conference

        K-CAP '19
        Sponsor:
        K-CAP '19: Knowledge Capture Conference
        November 19 - 21, 2019
        CA, Marina Del Rey, USA

        Acceptance Rates

        Overall Acceptance Rate 55 of 198 submissions, 28%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)157
        • Downloads (Last 6 weeks)13
        Reflects downloads up to 25 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2022)Visual descriptor extraction from patent figure captionsProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3533299(1-5)Online publication date: 20-Jun-2022
        • (2020)A Comparative Study of Sequence Tagging Methods for Domain Knowledge Entity Recognition in Biomedical PapersProceedings of the ACM/IEEE Joint Conference on Digital Libraries in 202010.1145/3383583.3398602(397-400)Online publication date: 1-Aug-2020

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Login options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media