Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3529372.3533285acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
short-paper

TinyGenius: intertwining natural language processing with microtask crowdsourcing for scholarly knowledge graph creation

Published: 20 June 2022 Publication History

Abstract

As the number of published scholarly articles grows steadily each year, new methods are needed to organize scholarly knowledge so that it can be more efficiently discovered and used. Natural Language Processing (NLP) techniques are able to autonomously process scholarly articles at scale and to create machine readable representations of the article content. However, autonomous NLP methods are by far not sufficiently accurate to create a high-quality knowledge graph. Yet quality is crucial for the graph to be useful in practice. We present TinyGenius, a methodology to validate NLP-extracted scholarly knowledge statements using microtasks performed with crowdsourcing. The scholarly context in which the crowd workers operate has multiple challenges. The explainability of the employed NLP methods is crucial to provide context in order to support the decision process of crowd workers. We employed TinyGenius to populate a paper-centric knowledge graph, using five distinct NLP methods. In the end, the resulting knowledge graph serves as a digital library for scholarly articles.

References

[1]
Rubayyi Alghamdi and Khalid Alfalqi. 2015. A Survey of Topic Modeling in Text Mining. International Journal of Advanced Computer Science and Applications 6 (01 2015).
[2]
Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017. Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI '17). Association for Computing Machinery, New York, NY, USA, 2334--2346.
[3]
Justin Cheng, Jaime Teevan, Shamsi T. Iqbal, and Michael S. Bernstein. 2015. Break It Down: A Comparison of Macro- and Microtasks. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI '15). Association for Computing Machinery, New York, NY, USA, 4061--4064.
[4]
Gobinda G. Chowdhury. 2003. Natural language processing. Annual Review of Information Science and Technology 37, 1 (2003), 51--89.
[5]
Benjamin M. Good and Andrew I. Su. 2013. Crowdsourcing for bioinformatics. Bioinformatics 29, 16 (06 2013), 1925--1933.
[6]
Olaf Hartig. 2017. Foundations of RDF* and SPARQL* : (An Alternative Approach to Statement-Level Metadata in RDF). In Proceedings of the 11th Alberto Mendelzon International Workshop on Foundations of Data Management and the Web 2017 : (CEUR Workshop Proceedings, Vol. 1912). Article 12. http://ceur-ws.org/Vol-1912/paper12.pdf
[7]
Mohamad Yaser Jaradeh, Allard Oelen, Kheir Eddine Farfar, Manuel Prinz, Jennifer D'Souza, Gábor Kismihók, Markus Stocker, and Sören Auer. 2019. Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge. K-CAP 2019 - Proceedings of the 10th International Conference on Knowledge Capture (2019), 243--246.
[8]
Arif Jinha. 2010. Article 50 million: An estimate of the number of scholarly articles in existence. Learned Publishing 23, 3 (2010), 258--263.
[9]
Steven Komarov, Katharina Reinecke, and Krzysztof Z. Gajos. 2013. Crowd-sourcing Performance Evaluations of User Interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Paris, France) (CHI '13). Association for Computing Machinery, New York, NY, USA, 207--216.
[10]
Ora Lassila, Ralph R Swick, et al. 1998. Resource description framework (RDF) model and syntax specification. (1998).
[11]
Thomas D. LaToza, W. Ben Towne, Christian M. Adriano, and André van der Hoek. 2014. Microtask Programming: Building Software with a Crowd. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (Honolulu, Hawaii, USA) (UIST '14). Association for Computing Machinery, New York, NY, USA, 43--54.
[12]
Jing Li, Aixin Sun, Jianglei Han, and Chenliang Li. 2022. A Survey on Deep Learning for Named Entity Recognition. IEEE Transactions on Knowledge and Data Engineering 34, 1 (2022), 50--70.
[13]
Barend Mons and Jan Velterop. 2009. Nano-publication in the e-science era. CEUR Workshop Proceedings 523 (2009).
[14]
Allard Oelen, Markus Stocker, and Sören Auer. 2021. Crowdsourcing Scholarly Discourse Annotations. In 26th International Conference on Intelligent User Interfaces (College Station, TX, USA) (IUI '21). 464--474.
[15]
Eric Prudhommeaux and Andy Seaborne. 2008. SPARQL query language for RDF. (2008). http://www.w3.org/TR/rdf-sparql-query/
[16]
Cristina Sarasua, Elena Simperl, and Natalya F. Noy. 2012. CrowdMap: Crowdsourcing Ontology Alignment with Microtasks. In The Semantic Web - ISWC 2012, Philippe Cudré-Mauroux, Jeff Heflin, Evren Sirin, Tania Tudorache, Jérôme Euzenat, Manfred Hauswirth, Josiane Xavier Parreira, Jim Hendler, Guus Schreiber, Abraham Bernstein, and Eva Blomqvist (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 525--541.
[17]
Wei Shen, Jianyong Wang, and Jiawei Han. 2015. Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions. IEEE Transactions on Knowledge and Data Engineering 27, 2 (2015), 443--460.
[18]
Markus Stocker, Pauli Paasonen, Markus Fiebig, Martha A Zaidan, and Alex Hardisty. 2018. Curating Scientific Information in Knowledge Infrastructures. Data Science Journal 17 (2018).
[19]
Oguzhan Tas and Farzad Kiyani. 2007. A survey automatic text summarization. PressAcademia Procedia 5, 1 (2007), 205--213.
[20]
Jaime Teevan, Daniel J. Liebling, and Walter S. Lasecki. 2014. Selfsourcing Personal Tasks. In CHI '14 Extended Abstracts on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI EA '14). 2527--2532.

Cited By

View all
  • (2024)Mining Semantic Relations in Data References to Understand the Roles of Research Data in Academic LiteratureProceedings of the 2023 ACM/IEEE Joint Conference on Digital Libraries10.1109/JCDL57899.2023.00039(215-227)Online publication date: 26-Jun-2024
  • (2024)Creating and validating a scholarly knowledge graph using natural language processing and microtask crowdsourcingInternational Journal on Digital Libraries10.1007/s00799-023-00360-725:2(273-285)Online publication date: 1-Jun-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
JCDL '22: Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries
June 2022
392 pages
ISBN:9781450393454
DOI:10.1145/3529372
  • General Chairs:
  • Akiko Aizawa,
  • Thomas Mandl,
  • Zeljko Carevic,
  • Program Chairs:
  • Annika Hinze,
  • Philipp Mayr,
  • Philipp Schaer
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • IEEE Technical Committee on Digital Libraries (TC DL)

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. crowdsourcing microtasks
  2. intelligent user interfaces
  3. knowledge graph validation
  4. scholarly knowledge graphs

Qualifiers

  • Short-paper

Funding Sources

Conference

JCDL '22
Sponsor:

Acceptance Rates

JCDL '22 Paper Acceptance Rate 35 of 132 submissions, 27%;
Overall Acceptance Rate 415 of 1,482 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)3
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Mining Semantic Relations in Data References to Understand the Roles of Research Data in Academic LiteratureProceedings of the 2023 ACM/IEEE Joint Conference on Digital Libraries10.1109/JCDL57899.2023.00039(215-227)Online publication date: 26-Jun-2024
  • (2024)Creating and validating a scholarly knowledge graph using natural language processing and microtask crowdsourcingInternational Journal on Digital Libraries10.1007/s00799-023-00360-725:2(273-285)Online publication date: 1-Jun-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media