short-paper

Public Access

Searching for Evidence of Scientific News in Scholarly Big Data

Authors:

Md Reshad Ul Hoque,

Agnese Chiatti,

Jian WuAuthors Info & Claims

K-CAP '19: Proceedings of the 10th International Conference on Knowledge Capture

Pages 251 - 254

https://doi.org/10.1145/3360901.3364438

Published: 23 September 2019 Publication History

Abstract

Public digital media can often mix factual information with fake scientific news, which is typically difficult to pinpoint, especially for non-professionals. These scientific news articles create illusions and misconceptions, thus ultimately influence the public opinion, with serious consequences at a broader social scale. Yet, existing solutions aiming at automatically verifying the credibility of news articles are still unsatisfactory. We propose to verify scientific news by retrieving and analyzing its most relevant source papers from an academic digital library (DL), e.g., arXiv. Instead of querying keywords or regular named entities extracted from news articles, we query domain knowledge entities (DKEs) extracted from the text. By querying each DKE, we retrieve a list of candidate scholarly papers. We then design a function to rank them and select the most relevant scholarly paper. After exploring various representations, experiments indicate that the term frequency-inverse document frequency (TF-IDF) representation with cosine similarity outperforms baseline models based on word embedding. This result demonstrates the efficacy of using DKEs to retrieve scientific papers which are relevant to a specific news article. It also indicates that word embedding may not be the best document representation for domain specific document retrieval tasks. Our method is fully automated and can be effectively applied to facilitating fake and misinformed news detection across many scientific domains.

References

[1]

Taruna Agrawal, Rahul Gupta, and Shrikanth Narayanan. 2017. Multimodal detection of fake social media use through a fusion of classification and pairwise ranking systems. In EUSIPCO, 2017. IEEE, 1045--1049.

[2]

Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, and Andrew McCallum. 2017. SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications. In Proceedings of the 11th International Workshop on Semantic Evaluation. 546--555.

[3]

Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, and Ray Kurzweil. 2018. Universal Sentence Encoder for English. In Proceedings of EMNLP 2018: System Demonstrations. 169--174.

[4]

Danqi Chen. 2018. Neural Reading Comprehension and Beyond. Ph.D. Dissertation. Stanford University.

[5]

Niall J Conroy, Victoria L Rubin, and Yimin Chen. 2015. Automatic deception detection: Methods for finding fake news. In Proceedings of the 78th ASIS&T Annual Meeting: Information Science with Impact: Research in and for the Community. American Society for Information Science, 82.

[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT 2019. 4171--4186.

[7]

Eliza Harrison, Paige Martin, Didi Surian, and Adam G Dunn. 2019. Recommending research articles to consumers of online vaccination information. arXiv preprint arXiv:1904.11886 (2019).

[8]

Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Association for Computational Linguistics (ACL) System Demonstrations. 55--60. http://www.aclweb.org/anthology/P/P14/P14--5010

[9]

Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing Order into Text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, EMNLP 2004. 404--411.

[10]

Tomas Mikolov,Wen-tau Yih, and Geoffrey Zweig. 2013. Linguistic Regularities in Continuous SpaceWord Representations. In Processings of 2013 Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics. 746--751.

[11]

Bhaskar Mitra, Nick Craswell, et al. 2018. An introduction to neural information retrieval. Foundations and Trends® in Information Retrieval 13, 1 (2018), 1--126.

[12]

Rehana Moin, Khalid Mahmood Zahoor-ur Rehman, Mohammad Eid Alzahrani, and Muhammad Qaiser Saleem. 2018. Framework for Rumors Detection in Social Media. Framework 9, 5 (2018).

[13]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of EMNLP 2014, Alessandro Moschitti, Bo Pang, and Walter Daelemans (Eds.). 1532--1543.

[14]

Dwaipayan Roy, Debasis Ganguly, Sumit Bhatia, Srikanta Bedathur, and Mandar Mitra. 2018. Using Word Embeddings for Information Retrieval: How Collection and Term Normalization Choices Affect Performance. In Proceedings of the 27th CIKM. 1835--1838.

Digital Library

[15]

Dietram A Scheufele and Nicole M Krause. 2019. Science audiences, misinformation, and fake news. Proceedings of the National Academy of Sciences (2019).

[16]

Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter 19, 1 (2017), 22--36.

Digital Library

[17]

Weiming Wen, Songwen Su, and Zhou Yu. 2018. Cross-Lingual Cross-Platform Rumor Verification Pivoting on Multimedia Content. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3487--3496.

[18]

Jian Wu, Sagnik Ray Choudhury, Agnese Chiatti, Chen Liang, and C Lee Giles. 2017. HESDK: A hybrid approach to extracting scientific domain knowledge entities. In Proceedings of the 17th JCDL. 241--244.

Cited By

Wei XWu JAjayi KOyen DAizawa AMandl TCarevic ZHinze AMayr PSchaer P(2022)Visual descriptor extraction from patent figure captionsProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3533299(1-5)Online publication date: 20-Jun-2022
https://dl.acm.org/doi/10.1145/3529372.3533299
Wu JUl Hoque MReiske GWeigle MBradshaw BGaff HLi JKwan CHuang RWu DMarchionini GHe DCunningham SHansen P(2020)A Comparative Study of Sequence Tagging Methods for Domain Knowledge Entity Recognition in Biomedical PapersProceedings of the ACM/IEEE Joint Conference on Digital Libraries in 202010.1145/3383583.3398602(397-400)Online publication date: 1-Aug-2020
https://dl.acm.org/doi/10.1145/3383583.3398602

Index Terms

Searching for Evidence of Scientific News in Scholarly Big Data
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information retrieval
  2. World Wide Web

Recommendations

Russian Scholarly Papers in Open-Access Megajournals
Abstract
The quantity, research topics, and growth rates are assessed for Russian scholarly papers published in open-access megajournals. Russian papers published in PLoS ONE in 2006–2019 are analyzed on the basis of international scientometric indicators. ...
Scholarly publications beyond pay-walls: increased citation advantage for open publishing

First, we aim to determine the total amount of scholarly articles freely available on the internet. Second, we aim to prove whether there exists a citation advantage for open publishing. The total scholarly publication output of Norway is indexed in ...
Citation-Based Retrieval for Scholarly Publications

Many scholarly publications are currently available on the Internet and in digital libraries. However, existing search engines have proved mostly ineffective in searching these publications. Using scholarly literature published on the Internet as a case ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

K-CAP '19: Proceedings of the 10th International Conference on Knowledge Capture

September 2019

281 pages

ISBN:9781450370080

DOI:10.1145/3360901

General Chairs:
Mayank Kejriwal
University of Southern California Information Sciences Institute, USA
,
Pedro Szekely
University of Southern California Information Sciences Institute, USA
,
Program Chair:
Raphaël Troncy
EURECOM, France

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 September 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

OLD DOMINION UNIVERSITY
DARPA

Conference

K-CAP '19

Sponsor:

SIGAI

K-CAP '19: Knowledge Capture Conference

November 19 - 21, 2019

CA, Marina Del Rey, USA

Acceptance Rates

Overall Acceptance Rate 55 of 198 submissions, 28%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
445
Total Downloads

Downloads (Last 12 months)157
Downloads (Last 6 weeks)13

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wei XWu JAjayi KOyen DAizawa AMandl TCarevic ZHinze AMayr PSchaer P(2022)Visual descriptor extraction from patent figure captionsProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3533299(1-5)Online publication date: 20-Jun-2022
https://dl.acm.org/doi/10.1145/3529372.3533299
Wu JUl Hoque MReiske GWeigle MBradshaw BGaff HLi JKwan CHuang RWu DMarchionini GHe DCunningham SHansen P(2020)A Comparative Study of Sequence Tagging Methods for Domain Knowledge Entity Recognition in Biomedical PapersProceedings of the ACM/IEEE Joint Conference on Digital Libraries in 202010.1145/3383583.3398602(397-400)Online publication date: 1-Aug-2020
https://dl.acm.org/doi/10.1145/3383583.3398602

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten