research-article

Open access

Crowdsourcing Scholarly Discourse Annotations

Authors:

Markus Stocker,

Sören AuerAuthors Info & Claims

IUI '21: Proceedings of the 26th International Conference on Intelligent User Interfaces

Pages 464 - 474

https://doi.org/10.1145/3397481.3450685

Published: 14 April 2021 Publication History

All formats PDF

Abstract

The number of scholarly publications grows steadily every year and it becomes harder to find, assess and compare scholarly knowledge effectively. Scholarly knowledge graphs have the potential to address these challenges. However, creating such graphs remains a complex task. We propose a method to crowdsource structured scholarly knowledge from paper authors with a web-based user interface supported by artificial intelligence. The interface enables authors to select key sentences for annotation. It integrates multiple machine learning algorithms to assist authors during the annotation, including class recommendation and key sentence highlighting. We envision that the interface is integrated in paper submission processes for which we define three main task requirements: The task has to be . We evaluated the interface with a user study in which participants were assigned the task to annotate one of their own articles. With the resulting data, we determined whether the participants were successfully able to perform the task. Furthermore, we evaluated the interface’s usability and the participant’s attitude towards the interface with a survey. The results suggest that sentence annotation is a feasible task for researchers and that they do not object to annotate their articles during the submission process.

References

[1]

Karl Aberer and Alexey Boyarsky. 2011. ScienceWISE: a Web-based Interactive Semantic Platform for scientific collaboration. 10th International Semantic Web Conference (ISWC 2011-Demo) (2011). https://doi.org/10.1007/978-3-662-46641-4_33

[2]

Mohammad Allahbakhsh, Boualem Benatallah, Aleksandar Ignjatovic, Hamid Reza Motahari-Nezhad, Elisa Bertino, and Schahram Dustdar. 2013. Quality control in crowdsourcing systems: Issues and directions. IEEE Internet Computing 17, 2 (2013), 76–81. https://doi.org/10.1109/MIC.2013.20

Digital Library

[3]

Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Crawford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu Ha, Rodney Kinney, Sebastian Kohlmeier, Kyle Lo, Tyler Murray, Hsu Han Ooi, Matthew Peters, Joanna Power, Sam Skjonsberg, Lucy Lu Wang, Chris Wilhelm, Zheng Yuan, Madeleine Van Zuylen, and Oren Etzioni. 2018. Construction of the literature graph in semantic scholar. NAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference 3(2018), 84–91. https://doi.org/10.18653/v1/n18-3011

[4]

Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehman, Richard Cyganiak, and Zachary Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. The semantic web (2007), 722–735. https://doi.org/10.1007/978-3-540-76298-0_52

Digital Library

[5]

Antonin Bergeaud, Yoann Potiron, and Juste Raimbault. 2017. Classifying patents based on their semantic content. PLoS ONE 12, 4 (2017), 1–22. https://doi.org/10.1371/journal.pone.0176310

[6]

Kalina Bontcheva, Hamish Cunningham, Ian Roberts, Angus Roberts, Valentin Tablan, Niraj Aswani, and Genevieve Gorrell. 2013. GATE Teamware: a web-based, collaborative text annotation framework. Language Resources and Evaluation 47, 4 (2013), 1007–1029. https://doi.org/10.1007/s10579-013-9215-6

Digital Library

[7]

Kalina Bontcheva, Hamish Cunningham, Ian Roberts, and Valentin Tablan. 2010. Web-based collaborative corpus annotation: Requirements and a framework implementation. New Challenges for NLP Frameworks(2010), 20–27.

[8]

John Brooke. 1996. SUS: a “quick and dirty’usability. Usability evaluation in industry(1996), 189.

[9]

Cristina-Iulia Bucur, Tobias Kuhn, and Davide Ceolin. 2020. A Unified Nanopublication Model for Effective and User-Friendly Access to the Elements of Scientific Publishing. In International Conference on Knowledge Engineering and Knowledge Management. Springer, 104–119. https://doi.org/10.1007/978-3-030-61244-3_7

Digital Library

[10]

Sarven Capadisli, Amy Guy, Ruben Verborgh, Christoph Lange, Sören Auer, and Tim Berners-Lee. 2017. Decentralised authoring, annotations and notifications for a read-write web with dokieli. In International Conference on Web Engineering. Springer, 469–481. https://doi.org/10.1007/978-3-319-60131-1_33

[11]

Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017. Revolt: Collaborative crowdsourcing for labeling machine learning datasets. Conference on Human Factors in Computing Systems - Proceedings 2017-May (2017), 2334–2346. https://doi.org/10.1145/3025453.3026044

Digital Library

[12]

Alexandru Constantin, Silvio Peroni, Steve Pettifer, David Shotton, and Fabio Vitali. 2016. The Document Components Ontology (DoCO). Semantic Web 7, 2 (2016), 167–181. https://doi.org/10.3233/SW-150177

Digital Library

[13]

Andreiwid Sheffer Corrêa and Pär-Ola Zander. 2017. Unleashing Tabular Content to Open Data. Proceedings of the 18th Annual International Conference on Digital Government Research (2017), 54–63. https://doi.org/10.1145/3085228.3085278

Digital Library

[14]

Bart Custers and Daniel Bachlechner. 2018. Advancing the EU Data Economy: Conditions for Realizing the Full of Potential of Data Reuse. SSRN Electronic Journal(2018), 1–19. https://doi.org/10.2139/ssrn.3091038

[15]

Joe Davison. 2020. Zero-Shot Learning in Modern NLP. (accessed on 2020-09-30)(2020). https://joeddav.github.io/blog/2020/05/29/ZSL.html

[16]

Hélène de Ribaupierre and Gilles Falquet. 2018. Extracting discourse elements and annotating scientific documents using the SciAnnotDoc model: a use case in gender documents. International Journal on Digital Libraries 19, 2-3 (2018), 271–286. https://doi.org/10.1007/s00799-017-0227-5

[17]

Jacob Devlin, Ming Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference 1, Mlm (2019), 4171–4186.

[18]

Henrik Eriksson. 2007. An Annotation Tool for Semantic Documents (System Description). 4th European Semantic Web Conference (ESWC)(2007), 759–768. https://doi.org/10.1007/978-3-540-72667-8_54

Digital Library

[19]

Ad Hoc Working Group for Critical Appraisal of the Medical Literature. 1987. A proposal for more informative abstracts of clinical articles. Annals of Internal Medicine 106, 4 (1987), 598–604.

[20]

Robert L Fowler and Anne S Barker. 1974. Effectiveness of highlighting for retention of text material.Journal of Applied Psychology 59, 3 (1974), 358.

[21]

Paul Ginsparg. 2011. ArXiv at 20. Nature 476, 7359 (2011), 145–147. https://doi.org/10.1038/476145a

[22]

Glenn T Gobbel, Jennifer Garvin, Ruth Reeves, Robert M Cronin, Julia Heavirland, Jenifer Williams, Allison Weaver, Shrimalini Jayaramaraja, Dario Giuse, Theodore Speroff, 2014. Assisted annotation of medical free text using RapTAT. Journal of the American Medical Informatics Association 21, 5(2014), 833–841. https://doi.org/10.1136/amiajnl-2013-002255

[23]

Ben Green and Yiling Chen. 2019. The principles and limits of algorithm-in-the-loop decision making. Proceedings of the ACM on Human-Computer Interaction 3, CSCW(2019). https://doi.org/10.1145/3359152

Digital Library

[24]

Rebecca Grier. 2015. How high is high? A metanalysis of NASA TLX global workload scores. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 59. https://doi.org/10.1177/1541931215591373

[25]

Sandra G. Hart. 2006. NASA-task load index (NASA-TLX); 20 years later. Proceedings of the Human Factors and Ergonomics Society (2006), 904–908. https://doi.org/10.1177/154193120605000909

[26]

Mohamad Yaser Jaradeh, Allard Oelen, Kheir Eddine Farfar, Manuel Prinz, Jennifer D’Souza, Gábor Kismihók, Markus Stocker, and Sören Auer. 2019. Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge. K-CAP 2019 - Proceedings of the 10th International Conference on Knowledge Capture (2019), 243–246. https://doi.org/10.1145/3360901.3364435

Digital Library

[27]

Arif Jinha. 2010. Article 50 million: An estimate of the number of scholarly articles in existence. Learned Publishing 23, 3 (2010), 258–263. https://doi.org/10.1087/20100308

[28]

Aniket Kittur, Boris Smus, Susheel Khamkar, and Robert E Kraut. 2011. Crowdforge: Crowdsourcing complex work. In Proceedings of the 24th annual ACM symposium on User interface software and technology. 43–52. https://doi.org/10.1145/2047196.2047202

Digital Library

[29]

Maxim Kolchin, Eugene Cherny, Fedor Kozlov, Alexander Shipilo, and Liubov Kovriguina. 2015. CEUR-WS-LOD: Conversion of CEUR-WS Workshops to Linked Data. Semantic Web Evaluation Challenges 1, September 2016 (2015), 51–62. https://doi.org/10.1007/978-3-319-25518-7

[30]

Marios Koniaris, George Papastefanatos, and Ioannis Anagnostopoulos. 2018. Solon: A holistic approach for modelling, managing and mining legal sources. Algorithms 11, 12 (2018), 1–22. https://doi.org/10.3390/a11120196

[31]

Rachael Lammey. 2014. CrossRef developments and initiatives: An update on services for the scholarly publishing community from CrossRef. Science Editing 1, 1 (2014), 13–18. https://doi.org/10.6087/kcse.2014.1.13

[32]

Christoph Lange and Angelo Di Iorio. 2014. Semantic Publishing Challenge – Assessing the Quality of Scientific Output. Semantic Web Evaluation Challenge 1 (2014), 61–76. https://doi.org/10.1007/978-3-319-12024-9

[33]

Patrice Lopez. 2009. GROBID: Combining automatic bibliographic data recognition and term extraction for scholarship publications. International conference on theory and practice of digital libraries (2009), 473–474. https://doi.org/10.1007/978-3-642-04346-8_62

[34]

Robert F. Lorch. 1989. Text-signaling devices and their effects on reading and memory processes. Educational Psychology Review 1, 3 (1989), 209–234. https://doi.org/10.1007/BF01320135

[35]

Domhnall MacAuley. 1995. Critical appraisal of medical literature: An aid to rational decision making. Family Practice 12, 1 (1995), 98–103. https://doi.org/10.1093/fampra/12.1.98

[36]

Inderjeet Mani. 2001. Automatic summarization. Vol. 3. John Benjamins Publishing.

[37]

Derek Miller. 2019. Leveraging BERT for Extractive Text Summarization on Lectures. (2019).

[38]

Barend Mons and Jan Velterop. 2009. Nano-publication in the e-science era. CEUR Workshop Proceedings 523 (2009).

[39]

David Nadeau and Satoshi Sekine. 2007. A Survey on Named Entity Recognition. Lingvisticae Investigationes 30, 1 (2007), 3–26. https://doi.org/10.1007/978-981-13-9409-6_218

[40]

Takeo Nakayama, Nobuko Hirai, Shigeaki Yamazaki, and Mariko Naito. 2005. Adoption of structured abstracts by general medical journals and format for a structured abstract. Journal of the Medical Library Association 93, 2 (2005), 237–242.

[41]

Allard Oelen, Mohamad Yaser Jaradeh, Markus Stocker, and Sören Auer. 2020. Generate FAIR Literature Surveys with Scholarly Knowledge Graphs. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (2020), 97–106. https://doi.org/10.1145/3383583.3398520

Digital Library

[42]

Heiko Paulheim. 2017. Knowledge Graph Refinement: A Survey of Approaches and Evaluation Methods. Semantic Web 8, 3 (2017), 489–508. https://doi.org/10.3233/SW-160218

Digital Library

[43]

Silvio Peroni and David Shotton. 2018. The SPAR Ontologies. International Semantic Web Conference(2018), 119–136. https://doi.org/10.1007/978-3-030-00668-6_8

Digital Library

[44]

Francesco Ronzano, Gerard Casamayor del Bosque, and Horacio Saggion. 2014. Semantify CEUR-WS Proceedings: towards the automatic generation of highly descriptive scholarly publishing Linked Datasets. Communications in Computer and Information Science 475, June(2014), V–VI. https://doi.org/10.1007/978-3-319-12024-9

[45]

Wei Shen, Jianyong Wang, and Jiawei Han. 2015. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Transactions on Knowledge and Data Engineering 27, 2(2015), 443–460. https://doi.org/10.1109/TKDE.2014.2327028

[46]

Hiroyuki Shindo, Yohei Munesada, and Yuji Matsumoto. 2019. PDFanno: A web-based linguistic annotation tool for PDF documents. LREC 2018 - 11th International Conference on Language Resources and Evaluation (2019), 1082–1086.

[47]

Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo June Hsu, and Kuansan Wang. 2015. An overview of microsoft academic service (MAS) and applications. WWW 2015 Companion - Proceedings of the 24th International Conference on World Wide Web (2015), 243–246. https://doi.org/10.1145/2740908.2742839

Digital Library

[48]

Rion Snow, Brendan O’connor, Dan Jurafsky, and Andrew Y Ng. 2008. Cheap and fast–but is it good? evaluating non-expert annotations for natural language tasks. In Proceedings of the 2008 conference on empirical methods in natural language processing. 254–263.

[49]

Sasha Spala, Franck Dernoncourt, Walter Chang, and Carl Dockhorn. 2018. A Web-based Framework for Collecting and Assessing Highlighted Sentences in a Document. Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations(2018), 78–81.

[50]

Pontus Stenetorp, Sampo Pyysalo, and Goran Topi. 2012. BRAT : a Web-based Tool for NLP-Assisted Text Annotation. Figure 1 (2012), 102–107.

[51]

Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: A core of semantic knowledge. 16th International World Wide Web Conference, WWW2007 (2007), 697–706. https://doi.org/10.1145/1242572.1242667

Digital Library

[52]

Jaana Takis, A. Q.M.Saiful Islam, Christoph Lange, and Sören Auer. 2015. Crowdsourced semantic annotation of scientific publications and tabular data in PDF. ACM International Conference Proceeding Series 16-17-Sept (2015), 1–8. https://doi.org/10.1145/2814864.2814887

Digital Library

[53]

Ann Taylor, Mitchell Marcus, and Beatrice Santorini. 2003. The Penn Treebank: An Overview. (2003), 5–22. https://doi.org/10.1007/978-94-010-0201-1_1

[54]

Mark Traquair, Ertugrul Kara, Burak Kantarci, and Shahzad Khan. 2019. Deep Learning for the Detection of Tabular Information from Electronic Component Datasheets. Proceedings - International Symposium on Computers and Communications 2019-June(2019), 0–5. https://doi.org/10.1109/ISCC47284.2019.8969682

[55]

Thomas S Tullis and Jacqueline N Stetson. 2004. A Comparison of Questionnaires for Assessing Website Usability. Usability Professional Association Conference (2004), 1–12.

[56]

Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: A free collaborative knowledgebase. Commun. ACM 57, 10 (2014), 78–85. https://doi.org/10.1145/2629489

Digital Library

[57]

Thomas Weber, Heinrich Hußmann, Zhiwei Han, Stefan Matthes, Yuanting Liu, and Yuant-Ing Liu. 2020. Draw with Me: Human-in-the-Loop for Image Restoration. 20 (2020), 243–253. https://doi.org/10.1145/3377325.3377509

Digital Library

[58]

Wenpeng Yin, Jamaal Hay, and Dan Roth. 2020. Benchmarking zero-shot text classification: Datasets, evaluation and entailment approach. EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference(2020), 3914–3923. https://doi.org/10.18653/v1/d19-1404

Cited By

Oelen AAuer S(2024)Leveraging Large Language Models for Realizing Truly Intelligent User InterfacesExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650949(1-8)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613905.3650949
Wang HYu ZZhang YWang YYang FWang LLiu JGuo B(2024)hmOS: An Extensible Platform for Task-Oriented Human–Machine ComputingIEEE Transactions on Human-Machine Systems10.1109/THMS.2024.341443254:5(536-545)Online publication date: Oct-2024
https://doi.org/10.1109/THMS.2024.3414432
Bless CBaimuratov IKarras OKlein MBen-David AJäschke RKelly M(2024)SciKGTeX-A LATEX Package to Semantically Annotate Contributions in Scientific PublicationsProceedings of the 2023 ACM/IEEE Joint Conference on Digital Libraries10.1109/JCDL57899.2023.00030(155-164)Online publication date: 26-Jun-2024
https://dl.acm.org/doi/10.1109/JCDL57899.2023.00030
Show More Cited By

Index Terms

Crowdsourcing Scholarly Discourse Annotations

Index terms have been assigned to the content through auto-classification.

Recommendations

On the characteristics of scholarly annotations
HYPERTEXT '02: Proceedings of the thirteenth ACM conference on Hypertext and hypermedia

We report on our observations of annotations for use in scholarly communication, rather than for use as personal artifact. Scholarly annotations reflect uses that predate digital representations and benefit from formalized structure. Scholarly ...
Russian Scholarly Papers in Open-Access Megajournals
Abstract
The quantity, research topics, and growth rates are assessed for Russian scholarly papers published in open-access megajournals. Russian papers published in PLoS ONE in 2006–2019 are analyzed on the basis of international scientometric indicators. ...
Predicting user intentions in graphical user interfaces using implicit disambiguation
CHI EA '01: CHI '01 Extended Abstracts on Human Factors in Computing Systems

We address the problem of predicting user intentions in cases of pointing ambiguities in graphical user interfaces.We argue that it is possible to heuristically resolve pointing ambiguities using implicit information that resides in natural pointing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

IUI '21: Proceedings of the 26th International Conference on Intelligent User Interfaces

April 2021

618 pages

ISBN:9781450380171

DOI:10.1145/3397481

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 April 2021

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

European Research Council

Conference

IUI '21

Sponsor:

IUI '21: 26th International Conference on Intelligent User Interfaces

April 14 - 17, 2021

TX, College Station, USA

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Sponsor:
sigai
sigai

30th International Conference on Intelligent User Interfaces

March 24 - 27, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
1,010
Total Downloads

Downloads (Last 12 months)228
Downloads (Last 6 weeks)46

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Oelen AAuer S(2024)Leveraging Large Language Models for Realizing Truly Intelligent User InterfacesExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650949(1-8)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613905.3650949
Wang HYu ZZhang YWang YYang FWang LLiu JGuo B(2024)hmOS: An Extensible Platform for Task-Oriented Human–Machine ComputingIEEE Transactions on Human-Machine Systems10.1109/THMS.2024.341443254:5(536-545)Online publication date: Oct-2024
https://doi.org/10.1109/THMS.2024.3414432
Bless CBaimuratov IKarras OKlein MBen-David AJäschke RKelly M(2024)SciKGTeX-A LATEX Package to Semantically Annotate Contributions in Scientific PublicationsProceedings of the 2023 ACM/IEEE Joint Conference on Digital Libraries10.1109/JCDL57899.2023.00030(155-164)Online publication date: 26-Jun-2024
https://dl.acm.org/doi/10.1109/JCDL57899.2023.00030
Jiomekong ATiwari S(2024)An approach based on open research knowledge graph for knowledge acquisition from scientific papersThe Electronic Library10.1108/EL-06-2023-015442:3(413-442)Online publication date: 5-Jun-2024
https://doi.org/10.1108/EL-06-2023-0154
Zhang YZhang C(2024)Extracting problem and method sentence from scientific papers: a context-enhanced transformer using formulaic expression desensitizationScientometrics10.1007/s11192-024-05048-6129:6(3433-3468)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s11192-024-05048-6
Brack AEntrup EStamatakis MBuschermöhle PHoppe AEwerth R(2024)Sequential sentence classification in research papers using cross-domain multi-task learningInternational Journal on Digital Libraries10.1007/s00799-023-00392-z25:2(377-400)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s00799-023-00392-z
Oelen AStocker MAuer SAizawa AMandl TCarevic ZHinze AMayr PSchaer P(2022)TinyGeniusProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3533285(1-5)Online publication date: 20-Jun-2022
https://dl.acm.org/doi/10.1145/3529372.3533285
Brack AHoppe ABuschermöhle PEwerth RAizawa AMandl TCarevic ZHinze AMayr PSchaer P(2022)Cross-domain multi-task learning for sequential sentence classification in research papersProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3530922(1-13)Online publication date: 20-Jun-2022
https://dl.acm.org/doi/10.1145/3529372.3530922
Zhang YWang YZhang HZhu BChen SZhang D(2022)OneLabeler: A Flexible System for Building Data Labeling ToolsProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3517612(1-22)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.1145/3491102.3517612

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents