Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3397481.3450685acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article
Open access

Crowdsourcing Scholarly Discourse Annotations

Published: 14 April 2021 Publication History

Abstract

The number of scholarly publications grows steadily every year and it becomes harder to find, assess and compare scholarly knowledge effectively. Scholarly knowledge graphs have the potential to address these challenges. However, creating such graphs remains a complex task. We propose a method to crowdsource structured scholarly knowledge from paper authors with a web-based user interface supported by artificial intelligence. The interface enables authors to select key sentences for annotation. It integrates multiple machine learning algorithms to assist authors during the annotation, including class recommendation and key sentence highlighting. We envision that the interface is integrated in paper submission processes for which we define three main task requirements: The task has to be . We evaluated the interface with a user study in which participants were assigned the task to annotate one of their own articles. With the resulting data, we determined whether the participants were successfully able to perform the task. Furthermore, we evaluated the interface’s usability and the participant’s attitude towards the interface with a survey. The results suggest that sentence annotation is a feasible task for researchers and that they do not object to annotate their articles during the submission process.

References

[1]
Karl Aberer and Alexey Boyarsky. 2011. ScienceWISE: a Web-based Interactive Semantic Platform for scientific collaboration. 10th International Semantic Web Conference (ISWC 2011-Demo) (2011). https://doi.org/10.1007/978-3-662-46641-4_33
[2]
Mohammad Allahbakhsh, Boualem Benatallah, Aleksandar Ignjatovic, Hamid Reza Motahari-Nezhad, Elisa Bertino, and Schahram Dustdar. 2013. Quality control in crowdsourcing systems: Issues and directions. IEEE Internet Computing 17, 2 (2013), 76–81. https://doi.org/10.1109/MIC.2013.20
[3]
Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Crawford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu Ha, Rodney Kinney, Sebastian Kohlmeier, Kyle Lo, Tyler Murray, Hsu Han Ooi, Matthew Peters, Joanna Power, Sam Skjonsberg, Lucy Lu Wang, Chris Wilhelm, Zheng Yuan, Madeleine Van Zuylen, and Oren Etzioni. 2018. Construction of the literature graph in semantic scholar. NAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference 3(2018), 84–91. https://doi.org/10.18653/v1/n18-3011
[4]
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehman, Richard Cyganiak, and Zachary Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. The semantic web (2007), 722–735. https://doi.org/10.1007/978-3-540-76298-0_52
[5]
Antonin Bergeaud, Yoann Potiron, and Juste Raimbault. 2017. Classifying patents based on their semantic content. PLoS ONE 12, 4 (2017), 1–22. https://doi.org/10.1371/journal.pone.0176310
[6]
Kalina Bontcheva, Hamish Cunningham, Ian Roberts, Angus Roberts, Valentin Tablan, Niraj Aswani, and Genevieve Gorrell. 2013. GATE Teamware: a web-based, collaborative text annotation framework. Language Resources and Evaluation 47, 4 (2013), 1007–1029. https://doi.org/10.1007/s10579-013-9215-6
[7]
Kalina Bontcheva, Hamish Cunningham, Ian Roberts, and Valentin Tablan. 2010. Web-based collaborative corpus annotation: Requirements and a framework implementation. New Challenges for NLP Frameworks(2010), 20–27.
[8]
John Brooke. 1996. SUS: a “quick and dirty’usability. Usability evaluation in industry(1996), 189.
[9]
Cristina-Iulia Bucur, Tobias Kuhn, and Davide Ceolin. 2020. A Unified Nanopublication Model for Effective and User-Friendly Access to the Elements of Scientific Publishing. In International Conference on Knowledge Engineering and Knowledge Management. Springer, 104–119. https://doi.org/10.1007/978-3-030-61244-3_7
[10]
Sarven Capadisli, Amy Guy, Ruben Verborgh, Christoph Lange, Sören Auer, and Tim Berners-Lee. 2017. Decentralised authoring, annotations and notifications for a read-write web with dokieli. In International Conference on Web Engineering. Springer, 469–481. https://doi.org/10.1007/978-3-319-60131-1_33
[11]
Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017. Revolt: Collaborative crowdsourcing for labeling machine learning datasets. Conference on Human Factors in Computing Systems - Proceedings 2017-May (2017), 2334–2346. https://doi.org/10.1145/3025453.3026044
[12]
Alexandru Constantin, Silvio Peroni, Steve Pettifer, David Shotton, and Fabio Vitali. 2016. The Document Components Ontology (DoCO). Semantic Web 7, 2 (2016), 167–181. https://doi.org/10.3233/SW-150177
[13]
Andreiwid Sheffer Corrêa and Pär-Ola Zander. 2017. Unleashing Tabular Content to Open Data. Proceedings of the 18th Annual International Conference on Digital Government Research (2017), 54–63. https://doi.org/10.1145/3085228.3085278
[14]
Bart Custers and Daniel Bachlechner. 2018. Advancing the EU Data Economy: Conditions for Realizing the Full of Potential of Data Reuse. SSRN Electronic Journal(2018), 1–19. https://doi.org/10.2139/ssrn.3091038
[15]
Joe Davison. 2020. Zero-Shot Learning in Modern NLP. (accessed on 2020-09-30)(2020). https://joeddav.github.io/blog/2020/05/29/ZSL.html
[16]
Hélène de Ribaupierre and Gilles Falquet. 2018. Extracting discourse elements and annotating scientific documents using the SciAnnotDoc model: a use case in gender documents. International Journal on Digital Libraries 19, 2-3 (2018), 271–286. https://doi.org/10.1007/s00799-017-0227-5
[17]
Jacob Devlin, Ming Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference 1, Mlm (2019), 4171–4186.
[18]
Henrik Eriksson. 2007. An Annotation Tool for Semantic Documents (System Description). 4th European Semantic Web Conference (ESWC)(2007), 759–768. https://doi.org/10.1007/978-3-540-72667-8_54
[19]
Ad Hoc Working Group for Critical Appraisal of the Medical Literature. 1987. A proposal for more informative abstracts of clinical articles. Annals of Internal Medicine 106, 4 (1987), 598–604.
[20]
Robert L Fowler and Anne S Barker. 1974. Effectiveness of highlighting for retention of text material.Journal of Applied Psychology 59, 3 (1974), 358.
[21]
Paul Ginsparg. 2011. ArXiv at 20. Nature 476, 7359 (2011), 145–147. https://doi.org/10.1038/476145a
[22]
Glenn T Gobbel, Jennifer Garvin, Ruth Reeves, Robert M Cronin, Julia Heavirland, Jenifer Williams, Allison Weaver, Shrimalini Jayaramaraja, Dario Giuse, Theodore Speroff, 2014. Assisted annotation of medical free text using RapTAT. Journal of the American Medical Informatics Association 21, 5(2014), 833–841. https://doi.org/10.1136/amiajnl-2013-002255
[23]
Ben Green and Yiling Chen. 2019. The principles and limits of algorithm-in-the-loop decision making. Proceedings of the ACM on Human-Computer Interaction 3, CSCW(2019). https://doi.org/10.1145/3359152
[24]
Rebecca Grier. 2015. How high is high? A metanalysis of NASA TLX global workload scores. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 59. https://doi.org/10.1177/1541931215591373
[25]
Sandra G. Hart. 2006. NASA-task load index (NASA-TLX); 20 years later. Proceedings of the Human Factors and Ergonomics Society (2006), 904–908. https://doi.org/10.1177/154193120605000909
[26]
Mohamad Yaser Jaradeh, Allard Oelen, Kheir Eddine Farfar, Manuel Prinz, Jennifer D’Souza, Gábor Kismihók, Markus Stocker, and Sören Auer. 2019. Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge. K-CAP 2019 - Proceedings of the 10th International Conference on Knowledge Capture (2019), 243–246. https://doi.org/10.1145/3360901.3364435
[27]
Arif Jinha. 2010. Article 50 million: An estimate of the number of scholarly articles in existence. Learned Publishing 23, 3 (2010), 258–263. https://doi.org/10.1087/20100308
[28]
Aniket Kittur, Boris Smus, Susheel Khamkar, and Robert E Kraut. 2011. Crowdforge: Crowdsourcing complex work. In Proceedings of the 24th annual ACM symposium on User interface software and technology. 43–52. https://doi.org/10.1145/2047196.2047202
[29]
Maxim Kolchin, Eugene Cherny, Fedor Kozlov, Alexander Shipilo, and Liubov Kovriguina. 2015. CEUR-WS-LOD: Conversion of CEUR-WS Workshops to Linked Data. Semantic Web Evaluation Challenges 1, September 2016 (2015), 51–62. https://doi.org/10.1007/978-3-319-25518-7
[30]
Marios Koniaris, George Papastefanatos, and Ioannis Anagnostopoulos. 2018. Solon: A holistic approach for modelling, managing and mining legal sources. Algorithms 11, 12 (2018), 1–22. https://doi.org/10.3390/a11120196
[31]
Rachael Lammey. 2014. CrossRef developments and initiatives: An update on services for the scholarly publishing community from CrossRef. Science Editing 1, 1 (2014), 13–18. https://doi.org/10.6087/kcse.2014.1.13
[32]
Christoph Lange and Angelo Di Iorio. 2014. Semantic Publishing Challenge – Assessing the Quality of Scientific Output. Semantic Web Evaluation Challenge 1 (2014), 61–76. https://doi.org/10.1007/978-3-319-12024-9
[33]
Patrice Lopez. 2009. GROBID: Combining automatic bibliographic data recognition and term extraction for scholarship publications. International conference on theory and practice of digital libraries (2009), 473–474. https://doi.org/10.1007/978-3-642-04346-8_62
[34]
Robert F. Lorch. 1989. Text-signaling devices and their effects on reading and memory processes. Educational Psychology Review 1, 3 (1989), 209–234. https://doi.org/10.1007/BF01320135
[35]
Domhnall MacAuley. 1995. Critical appraisal of medical literature: An aid to rational decision making. Family Practice 12, 1 (1995), 98–103. https://doi.org/10.1093/fampra/12.1.98
[36]
Inderjeet Mani. 2001. Automatic summarization. Vol. 3. John Benjamins Publishing.
[37]
Derek Miller. 2019. Leveraging BERT for Extractive Text Summarization on Lectures. (2019).
[38]
Barend Mons and Jan Velterop. 2009. Nano-publication in the e-science era. CEUR Workshop Proceedings 523 (2009).
[39]
David Nadeau and Satoshi Sekine. 2007. A Survey on Named Entity Recognition. Lingvisticae Investigationes 30, 1 (2007), 3–26. https://doi.org/10.1007/978-981-13-9409-6_218
[40]
Takeo Nakayama, Nobuko Hirai, Shigeaki Yamazaki, and Mariko Naito. 2005. Adoption of structured abstracts by general medical journals and format for a structured abstract. Journal of the Medical Library Association 93, 2 (2005), 237–242.
[41]
Allard Oelen, Mohamad Yaser Jaradeh, Markus Stocker, and Sören Auer. 2020. Generate FAIR Literature Surveys with Scholarly Knowledge Graphs. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (2020), 97–106. https://doi.org/10.1145/3383583.3398520
[42]
Heiko Paulheim. 2017. Knowledge Graph Refinement: A Survey of Approaches and Evaluation Methods. Semantic Web 8, 3 (2017), 489–508. https://doi.org/10.3233/SW-160218
[43]
Silvio Peroni and David Shotton. 2018. The SPAR Ontologies. International Semantic Web Conference(2018), 119–136. https://doi.org/10.1007/978-3-030-00668-6_8
[44]
Francesco Ronzano, Gerard Casamayor del Bosque, and Horacio Saggion. 2014. Semantify CEUR-WS Proceedings: towards the automatic generation of highly descriptive scholarly publishing Linked Datasets. Communications in Computer and Information Science 475, June(2014), V–VI. https://doi.org/10.1007/978-3-319-12024-9
[45]
Wei Shen, Jianyong Wang, and Jiawei Han. 2015. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Transactions on Knowledge and Data Engineering 27, 2(2015), 443–460. https://doi.org/10.1109/TKDE.2014.2327028
[46]
Hiroyuki Shindo, Yohei Munesada, and Yuji Matsumoto. 2019. PDFanno: A web-based linguistic annotation tool for PDF documents. LREC 2018 - 11th International Conference on Language Resources and Evaluation (2019), 1082–1086.
[47]
Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo June Hsu, and Kuansan Wang. 2015. An overview of microsoft academic service (MAS) and applications. WWW 2015 Companion - Proceedings of the 24th International Conference on World Wide Web (2015), 243–246. https://doi.org/10.1145/2740908.2742839
[48]
Rion Snow, Brendan O’connor, Dan Jurafsky, and Andrew Y Ng. 2008. Cheap and fast–but is it good? evaluating non-expert annotations for natural language tasks. In Proceedings of the 2008 conference on empirical methods in natural language processing. 254–263.
[49]
Sasha Spala, Franck Dernoncourt, Walter Chang, and Carl Dockhorn. 2018. A Web-based Framework for Collecting and Assessing Highlighted Sentences in a Document. Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations(2018), 78–81.
[50]
Pontus Stenetorp, Sampo Pyysalo, and Goran Topi. 2012. BRAT : a Web-based Tool for NLP-Assisted Text Annotation. Figure 1 (2012), 102–107.
[51]
Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: A core of semantic knowledge. 16th International World Wide Web Conference, WWW2007 (2007), 697–706. https://doi.org/10.1145/1242572.1242667
[52]
Jaana Takis, A. Q.M.Saiful Islam, Christoph Lange, and Sören Auer. 2015. Crowdsourced semantic annotation of scientific publications and tabular data in PDF. ACM International Conference Proceeding Series 16-17-Sept (2015), 1–8. https://doi.org/10.1145/2814864.2814887
[53]
Ann Taylor, Mitchell Marcus, and Beatrice Santorini. 2003. The Penn Treebank: An Overview. (2003), 5–22. https://doi.org/10.1007/978-94-010-0201-1_1
[54]
Mark Traquair, Ertugrul Kara, Burak Kantarci, and Shahzad Khan. 2019. Deep Learning for the Detection of Tabular Information from Electronic Component Datasheets. Proceedings - International Symposium on Computers and Communications 2019-June(2019), 0–5. https://doi.org/10.1109/ISCC47284.2019.8969682
[55]
Thomas S Tullis and Jacqueline N Stetson. 2004. A Comparison of Questionnaires for Assessing Website Usability. Usability Professional Association Conference (2004), 1–12.
[56]
Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: A free collaborative knowledgebase. Commun. ACM 57, 10 (2014), 78–85. https://doi.org/10.1145/2629489
[57]
Thomas Weber, Heinrich Hußmann, Zhiwei Han, Stefan Matthes, Yuanting Liu, and Yuant-Ing Liu. 2020. Draw with Me: Human-in-the-Loop for Image Restoration. 20 (2020), 243–253. https://doi.org/10.1145/3377325.3377509
[58]
Wenpeng Yin, Jamaal Hay, and Dan Roth. 2020. Benchmarking zero-shot text classification: Datasets, evaluation and entailment approach. EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference(2020), 3914–3923. https://doi.org/10.18653/v1/d19-1404

Cited By

View all
  • (2024)Leveraging Large Language Models for Realizing Truly Intelligent User InterfacesExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650949(1-8)Online publication date: 11-May-2024
  • (2024)hmOS: An Extensible Platform for Task-Oriented Human–Machine ComputingIEEE Transactions on Human-Machine Systems10.1109/THMS.2024.341443254:5(536-545)Online publication date: Oct-2024
  • (2024)SciKGTeX-A LATEX Package to Semantically Annotate Contributions in Scientific PublicationsProceedings of the 2023 ACM/IEEE Joint Conference on Digital Libraries10.1109/JCDL57899.2023.00030(155-164)Online publication date: 26-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IUI '21: Proceedings of the 26th International Conference on Intelligent User Interfaces
April 2021
618 pages
ISBN:9781450380171
DOI:10.1145/3397481
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 April 2021

Check for updates

Author Tags

  1. Crowdsourcing Text Annotations
  2. Intelligent User Interface
  3. Knowledge Graph Construction
  4. Structured Scholarly Knowledge
  5. Web-based Annotation Interface

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

IUI '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)228
  • Downloads (Last 6 weeks)46
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Leveraging Large Language Models for Realizing Truly Intelligent User InterfacesExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650949(1-8)Online publication date: 11-May-2024
  • (2024)hmOS: An Extensible Platform for Task-Oriented Human–Machine ComputingIEEE Transactions on Human-Machine Systems10.1109/THMS.2024.341443254:5(536-545)Online publication date: Oct-2024
  • (2024)SciKGTeX-A LATEX Package to Semantically Annotate Contributions in Scientific PublicationsProceedings of the 2023 ACM/IEEE Joint Conference on Digital Libraries10.1109/JCDL57899.2023.00030(155-164)Online publication date: 26-Jun-2024
  • (2024)An approach based on open research knowledge graph for knowledge acquisition from scientific papersThe Electronic Library10.1108/EL-06-2023-015442:3(413-442)Online publication date: 5-Jun-2024
  • (2024)Extracting problem and method sentence from scientific papers: a context-enhanced transformer using formulaic expression desensitizationScientometrics10.1007/s11192-024-05048-6129:6(3433-3468)Online publication date: 1-Jun-2024
  • (2024)Sequential sentence classification in research papers using cross-domain multi-task learningInternational Journal on Digital Libraries10.1007/s00799-023-00392-z25:2(377-400)Online publication date: 1-Jun-2024
  • (2022)TinyGeniusProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3533285(1-5)Online publication date: 20-Jun-2022
  • (2022)Cross-domain multi-task learning for sequential sentence classification in research papersProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3530922(1-13)Online publication date: 20-Jun-2022
  • (2022)OneLabeler: A Flexible System for Building Data Labeling ToolsProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3517612(1-22)Online publication date: 29-Apr-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media