short-paper

TinyGenius: intertwining natural language processing with microtask crowdsourcing for scholarly knowledge graph creation

Authors:

Markus Stocker,

Sören AuerAuthors Info & Claims

JCDL '22: Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries

Article No.: 5, Pages 1 - 5

https://doi.org/10.1145/3529372.3533285

Published: 20 June 2022 Publication History

Abstract

As the number of published scholarly articles grows steadily each year, new methods are needed to organize scholarly knowledge so that it can be more efficiently discovered and used. Natural Language Processing (NLP) techniques are able to autonomously process scholarly articles at scale and to create machine readable representations of the article content. However, autonomous NLP methods are by far not sufficiently accurate to create a high-quality knowledge graph. Yet quality is crucial for the graph to be useful in practice. We present TinyGenius, a methodology to validate NLP-extracted scholarly knowledge statements using microtasks performed with crowdsourcing. The scholarly context in which the crowd workers operate has multiple challenges. The explainability of the employed NLP methods is crucial to provide context in order to support the decision process of crowd workers. We employed TinyGenius to populate a paper-centric knowledge graph, using five distinct NLP methods. In the end, the resulting knowledge graph serves as a digital library for scholarly articles.

References

[1]

Rubayyi Alghamdi and Khalid Alfalqi. 2015. A Survey of Topic Modeling in Text Mining. International Journal of Advanced Computer Science and Applications 6 (01 2015).

[2]

Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017. Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI '17). Association for Computing Machinery, New York, NY, USA, 2334--2346.

Digital Library

[3]

Justin Cheng, Jaime Teevan, Shamsi T. Iqbal, and Michael S. Bernstein. 2015. Break It Down: A Comparison of Macro- and Microtasks. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI '15). Association for Computing Machinery, New York, NY, USA, 4061--4064.

Digital Library

[4]

Gobinda G. Chowdhury. 2003. Natural language processing. Annual Review of Information Science and Technology 37, 1 (2003), 51--89.

[5]

Benjamin M. Good and Andrew I. Su. 2013. Crowdsourcing for bioinformatics. Bioinformatics 29, 16 (06 2013), 1925--1933.

[6]

Olaf Hartig. 2017. Foundations of RDF^* and SPARQL^* : (An Alternative Approach to Statement-Level Metadata in RDF). In Proceedings of the 11th Alberto Mendelzon International Workshop on Foundations of Data Management and the Web 2017 : (CEUR Workshop Proceedings, Vol. 1912). Article 12. http://ceur-ws.org/Vol-1912/paper12.pdf

[7]

Mohamad Yaser Jaradeh, Allard Oelen, Kheir Eddine Farfar, Manuel Prinz, Jennifer D'Souza, Gábor Kismihók, Markus Stocker, and Sören Auer. 2019. Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge. K-CAP 2019 - Proceedings of the 10th International Conference on Knowledge Capture (2019), 243--246.

Digital Library

[8]

Arif Jinha. 2010. Article 50 million: An estimate of the number of scholarly articles in existence. Learned Publishing 23, 3 (2010), 258--263.

[9]

Steven Komarov, Katharina Reinecke, and Krzysztof Z. Gajos. 2013. Crowd-sourcing Performance Evaluations of User Interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Paris, France) (CHI '13). Association for Computing Machinery, New York, NY, USA, 207--216.

Digital Library

[10]

Ora Lassila, Ralph R Swick, et al. 1998. Resource description framework (RDF) model and syntax specification. (1998).

[11]

Thomas D. LaToza, W. Ben Towne, Christian M. Adriano, and André van der Hoek. 2014. Microtask Programming: Building Software with a Crowd. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (Honolulu, Hawaii, USA) (UIST '14). Association for Computing Machinery, New York, NY, USA, 43--54.

Digital Library

[12]

Jing Li, Aixin Sun, Jianglei Han, and Chenliang Li. 2022. A Survey on Deep Learning for Named Entity Recognition. IEEE Transactions on Knowledge and Data Engineering 34, 1 (2022), 50--70.

Digital Library

[13]

Barend Mons and Jan Velterop. 2009. Nano-publication in the e-science era. CEUR Workshop Proceedings 523 (2009).

[14]

Allard Oelen, Markus Stocker, and Sören Auer. 2021. Crowdsourcing Scholarly Discourse Annotations. In 26th International Conference on Intelligent User Interfaces (College Station, TX, USA) (IUI '21). 464--474.

Digital Library

[15]

Eric Prudhommeaux and Andy Seaborne. 2008. SPARQL query language for RDF. (2008). http://www.w3.org/TR/rdf-sparql-query/

[16]

Cristina Sarasua, Elena Simperl, and Natalya F. Noy. 2012. CrowdMap: Crowdsourcing Ontology Alignment with Microtasks. In The Semantic Web - ISWC 2012, Philippe Cudré-Mauroux, Jeff Heflin, Evren Sirin, Tania Tudorache, Jérôme Euzenat, Manfred Hauswirth, Josiane Xavier Parreira, Jim Hendler, Guus Schreiber, Abraham Bernstein, and Eva Blomqvist (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 525--541.

Digital Library

[17]

Wei Shen, Jianyong Wang, and Jiawei Han. 2015. Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions. IEEE Transactions on Knowledge and Data Engineering 27, 2 (2015), 443--460.

[18]

Markus Stocker, Pauli Paasonen, Markus Fiebig, Martha A Zaidan, and Alex Hardisty. 2018. Curating Scientific Information in Knowledge Infrastructures. Data Science Journal 17 (2018).

[19]

Oguzhan Tas and Farzad Kiyani. 2007. A survey automatic text summarization. PressAcademia Procedia 5, 1 (2007), 205--213.

[20]

Jaime Teevan, Daniel J. Liebling, and Walter S. Lasecki. 2014. Selfsourcing Personal Tasks. In CHI '14 Extended Abstracts on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI EA '14). 2527--2532.

Digital Library

Cited By

Fan LLafia SWofford MThomer AYakel EHemphill LKlein MBen-David AJäschke RKelly M(2024)Mining Semantic Relations in Data References to Understand the Roles of Research Data in Academic LiteratureProceedings of the 2023 ACM/IEEE Joint Conference on Digital Libraries10.1109/JCDL57899.2023.00039(215-227)Online publication date: 26-Jun-2024
https://dl.acm.org/doi/10.1109/JCDL57899.2023.00039
Oelen AStocker MAuer S(2024)Creating and validating a scholarly knowledge graph using natural language processing and microtask crowdsourcingInternational Journal on Digital Libraries10.1007/s00799-023-00360-725:2(273-285)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s00799-023-00360-7

Index Terms

TinyGenius: intertwining natural language processing with microtask crowdsourcing for scholarly knowledge graph creation

Recommendations

Creating and validating a scholarly knowledge graph using natural language processing and microtask crowdsourcing
Abstract
Due to the growing number of scholarly publications, finding relevant articles becomes increasingly difficult. Scholarly knowledge graphs can be used to organize the scholarly knowledge presented within those publications and represent them in ...
Creating a Scholarly Knowledge Graph from Survey Article Tables
Digital Libraries at Times of Massive Societal Transition
Abstract
Due to the lack of structure, scholarly knowledge remains hardly accessible for machines. Scholarly knowledge graphs have been proposed as a solution. Creating such a knowledge graph requires manual effort and domain experts, and is therefore time-...
A Novel Curated Scholarly Graph Connecting Textual and Data Publications
In the last decade, scholarly graphs became fundamental to storing and managing scholarly knowledge in a structured and machine-readable way. Methods and tools for discovery and impact assessment of science rely on such graphs and their quality to serve ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

JCDL '22: Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries

June 2022

392 pages

ISBN:9781450393454

DOI:10.1145/3529372

General Chairs:
Akiko Aizawa
National Institute of Informatics, Japan
,
Thomas Mandl
University of Hildesheim, Germany
,
Zeljko Carevic
GESIS - Leibniz Institute for the Social Sciences, Germany
,
Program Chairs:
Annika Hinze
University of Waikato, New Zealand
,
Philipp Mayr
GESIS - Leibniz Institute for the Social Sciences, Germany
,
Philipp Schaer
TH Köln (University of Applied Sciences), Germany

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

IEEE Technical Committee on Digital Libraries (TC DL)

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

European Research Council

Conference

JCDL '22

Sponsor:

JCDL '22: The ACM/IEEE Joint Conference on Digital Libraries in 2022

June 20 - 24, 2022

Cologne, Germany

Acceptance Rates

JCDL '22 Paper Acceptance Rate 35 of 132 submissions, 27%;

Overall Acceptance Rate 415 of 1,482 submissions, 28%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
117
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)3

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fan LLafia SWofford MThomer AYakel EHemphill LKlein MBen-David AJäschke RKelly M(2024)Mining Semantic Relations in Data References to Understand the Roles of Research Data in Academic LiteratureProceedings of the 2023 ACM/IEEE Joint Conference on Digital Libraries10.1109/JCDL57899.2023.00039(215-227)Online publication date: 26-Jun-2024
https://dl.acm.org/doi/10.1109/JCDL57899.2023.00039
Oelen AStocker MAuer S(2024)Creating and validating a scholarly knowledge graph using natural language processing and microtask crowdsourcingInternational Journal on Digital Libraries10.1007/s00799-023-00360-725:2(273-285)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s00799-023-00360-7

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten