research-article

Analysing the requirements for an Open Research Knowledge Graph: use cases, quality requirements, and construction strategies

Authors:

Markus Stocker,

Ralph EwerthAuthors Info & Claims

International Journal on Digital Libraries, Volume 23, Issue 1

Pages 33 - 55

https://doi.org/10.1007/s00799-021-00306-x

Published: 01 March 2022 Publication History

Abstract

Current science communication has a number of drawbacks and bottlenecks which have been subject of discussion lately: Among others, the rising number of published articles makes it nearly impossible to get a full overview of the state of the art in a certain field, or reproducibility is hampered by fixed-length, document-based publications which normally cannot cover all details of a research work. Recently, several initiatives have proposed knowledge graphs (KG) for organising scientific information as a solution to many of the current issues. The focus of these proposals is, however, usually restricted to very specific use cases. In this paper, we aim to transcend this limited perspective and present a comprehensive analysis of requirements for an Open Research Knowledge Graph (ORKG) by (a) collecting and reviewing daily core tasks of a scientist, (b) establishing their consequential requirements for a KG-based system, (c) identifying overlaps and specificities, and their coverage in current solutions. As a result, we map necessary and desirable requirements for successful KG-based science communication, derive implications, and outline possible solutions.

References

[1]

Ammar, W., Groeneveld, D., Bhagavatula, C., Beltagy, I., Crawford, M., Downey, D., Dunkelberger, J., Elgohary, A., Feldman, S., Ha, V., Kinney, R., Kohlmeier, S., Lo, K., Murray, T., Ooi, H., Peters, M.E., Power, J., Skjonsberg, S., Wang, L.L., Wilhelm, C., Yuan, Z., van Zuylen, M., Etzioni, O.: Construction of the literature graph in semantic scholar. In: Bangalore, S., Chu-Carroll, J., Li, Y. (eds.) Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1–6, 2018, vol. 3 (Industry Papers), pp. 84–91. Association for Computational Linguistics (2018).

[2]

Aryani A and Wang J Research graph: Building a distributed graph of scholarly works using research data switchboard Open Repos. Conf. 2017

[3]

Auer S and Mann S Towards an open research knowledge graph Ser. Libr. 2019 76 1–4 35-41

[4]

Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: Semeval 2017 task 10: Scienceie—xtracting keyphrases and relations from scientific publications. In: Bethard, S., Carpuat, M., Apidianaki, M., Mohammad, S.M., Cer, D.M., Jurgens, D. (eds.) Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval@ACL 2017, Vancouver, Canada, 2017, pp. 546–555. Association for Computational Linguistics (2017).

[5]

Badie K, Asadi N, and Mahmoudi MT Zone identification based on features with high semantic richness and combining results of separate classifiers J. Inf. Telecommun. 2018 2 4 411-427

[6]

Balog K Entity-Oriented Search 2018 Berlin Springer

[7]

Bechhofer S, Buchan IE, Roure DD, Missier P, Ainsworth JD, Bhagat J, Couch PA, Cruickshank D, Delderfield M, Dunlop I, Gamble M, Michaelides DT, Owen S, Newman DR, Sufi S, and Goble CA Why linked data is not enough for scientists Future Gener. Comput. Syst. 2013 29 2 599-611

[8]

Beel J, Gipp B, Langer S, and Breitinger C Research-paper recommender systems: a literature survey Int. J. Digit. Libr. 2016 17 4 305-338

[9]

Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, 2019, pp. 3613–3618. Association for Computational Linguistics (2019).

[10]

Bizer C Quality-Driven Information Filtering—In the Context of Web-Based Information Systems 2007 Saarbrücken VDM Verlag

[11]

Bodenreider O The unified medical language system (UMLS): integrating biomedical terminology Nucl. Acids Res. 2004 32 267-270

[12]

Bollacker, K.D., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Wang, J.T. (ed.) Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, 2008, pp. 1247–1250. ACM (2008).

[13]

Booch G, Rumbaugh J, and Jacobson I Unified Modeling Language User Guide, The (2nd Edition) (Addison-Wesley Object Technology Series) 2005 Boston Addison-Wesley Professional

[14]

Bornmann L and Mutz R Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references J. Assoc. Inf. Sci. Technol. 2015 66 11 2215-2222

[15]

Brack, A., D’Souza, J., Hoppe, A., Auer, S., Ewerth, R.: Domain-independent extraction of scientific concepts from research articles. In: Jose, J.M., Yilmaz, E., Magalhães, J., Castells, P., Ferro, N., Silva, M.J., Martins, F. (eds.) Advances in Information Retrieval—42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, 2020, Proceedings, Part I, Lecture Notes in Computer Science, vol. 12035, pp. 251–266. Springer (2020).

[16]

Brack, A., Hoppe, A., Stocker, M., Auer, S., Ewerth, R.: Requirements analysis for an open research knowledge graph. In: Hall, M.M., Mercun, T., Risse, T., Duchateau, F. (eds.) Digital Libraries for Open Knowledge—24th International Conference on Theory and Practice of Digital Libraries, TPDL 2020, Lyon, France, 2020, Proceedings, Lecture Notes in Computer Science, vol. 12246, pp. 3–18. Springer (2020).

[17]

Brack, A., Müller, D.U., Hoppe, A., Ewerth, R.: Coreference resolution in research papers from multiple domains. In: Hiemstra, D., Moens, M., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) Advances in Information Retrieval—43rd European Conference on IR Research, ECIR 2021, Virtual Event, 2021, Proceedings, Part I, Lecture Notes in Computer Science, vol. 12656, pp. 79–97. Springer (2021).

[18]

Braun, R., Benedict, M., Wendler, H., Esswein, W.: Proposal for requirements driven design science research. In: Donnellan, B., Helfert, M., Kenneally, J., VanderMeer, D.E., Rothenberger, M.A., Winter, R. (eds.) New Horizons in Design Science: Broadening the Research Agenda—10th International Conference, DESRIST 2015, Dublin, Ireland, 2015, Proceedings, Lecture Notes in Computer Science, vol. 9073, pp. 135–151. Springer (2015).

[19]

Brodaric, B., Reitsma, F., Qiang, Y.: Skiing with DOLCE: toward an e-science knowledge infrastructure. In: Eschenbach, C., Grüninger, M. (eds.) Formal Ontology in Information Systems, Proceedings of the Fifth International Conference, FOIS 2008, Saarbrücken, Germany, 2008, Frontiers in Artificial Intelligence and Applications, vol. 183, pp. 208–219. IOS Press (2008).

[20]

Burton A, Aryani A, Koers H, Manghi P, Bruzzo SL, Stocker M, Diepenbroek M, Schindler U, and Fenner M The scholix framework for interoperability in data-literature information exchange D-Lib Mag. 2017 23 1/2 1-20

[21]

Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Jr., E.R.H., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: Fox, M., Poole, D. (eds.) Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, Atlanta, Georgia, USA, 2010. AAAI Press (2010). http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/1879

[22]

CB Insights: The data flywheel: how enlightened self-interest drives data network effects. https://www.cbinsights.com/research/team-blog/data-network-effects/ (2020)

[23]

Cohan, A., Ammar, W., van Zuylen, M., Cady, F.: Structural scaffolds for citation intent classification in scientific publications. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2019, vol. 1 (Long and Short Papers), pp. 3586–3596. Association for Computational Linguistics (2019).

[24]

Cohan, A., Beltagy, I., King, D., Dalvi, B., Weld, D.S.: Pretrained language models for sequential sentence classification. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, 2019, pp. 3691–3697. Association for Computational Linguistics (2019).

[25]

Cohen KB, Lanfranchi A, Choi MJ, Baumgartner WA, Panteleyeva N, Verspoor K, Palmer M, and Hunter LE Coreference annotation and resolution in the Colorado richly annotated full text (CRAFT) corpus of biomedical journal articles BMC Bioinform. 2017 18 1 1-14

[26]

Consortium TGO and Consortium The gene ontology resource: 20 years and still going strong Nucl. Acids Res. 2019 47 D330-D338

[27]

Constantin A, Peroni S, Pettifer S, Shotton DM, and Vitali F The document components ontology (DoCo) Semant. Web 2016 7 2 167-181

[28]

Dayrell, C., Jr., A.C., Lima, G., Jr., D.M., Copestake, A.A., Feltrim, V.D., Tagnin, S.E.O., Aluísio, S.M.: Rhetorical move detection in english abstracts: multi-label sentence classifiers and their annotated corpora. In: Calzolari, N., Choukri, K., Declerck, T., Dogan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, Istanbul, Turkey, 2012, pp. 1604–1609. European Language Resources Association (ELRA) (2012). http://www.lrec-conf.org/proceedings/lrec2012/summaries/734.html

[29]

Degbelo, A.: A snapshot of ontology evaluation criteria and strategies. In: Hoekstra, R., Faron-Zucker, C., Pellegrini, T., de Boer, V. (eds.) Proceedings of the 13th International Conference on Semantic Systems, SEMANTICS 2017, Amsterdam, The Netherlands, 2017, pp. 1–8. ACM (2017).

[30]

Degtyarenko K, de Matos P, Ennis M, Hastings J, Zbinden M, McNaught A, Alcántara R, Darsow M, Guedj M, and Ashburner M Chebi: a database and ontology for chemical entities of biological interest Nucl. Acids Res. 2008 36 344-350

[31]

Dernoncourt, F., Lee, J.Y.: 200k RCT: a dataset for sequential sentence classification in medical abstracts. In: Kondrak, G., Watanabe, T. (eds.) Proceedings of the Eighth International Joint Conference on Natural Language Processing, IJCNLP 2017, Taipei, Taiwan, 2017, Volume 2: Short Papers, pp. 308–313. Asian Federation of Natural Language Processing (2017). https://www.aclweb.org/anthology/I17-2052/

[32]

Dessì, D., Osborne, F., Recupero, D.R., Buscaldi, D., Motta, E., Sack, H.: AI-KG: an automatically generated knowledge graph of artificial intelligence. In: Pan, J.Z., Tamma, V.A.M., d’Amato, C., Janowicz, K., Fu, B., Polleres, A., Seneviratne, O., Kagal, L. (eds.) The Semantic Web—ISWC 2020—19th International Semantic Web Conference, Athens, Greece, 2020, Proceedings, Part II, Lecture Notes in Computer Science, vol. 12507, pp. 127–143. Springer (2020).

[33]

Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2019, vol. 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019).

[34]

Doerr, M., Kritsotaki, A., Rousakis, Y., Hiebel, G., Theodoridou, M.: Definition of the CRMsci: an extension of CIDOC-CRM to support scientific observation. Tech. rep., FORTH, Version 1.2.8. http://www.cidoc-crm.org/crmsci/ModelVersion/version-1.2.8 (2020)

[35]

Dogan RI, Leaman R, and Lu Z NCBI disease corpus: a resource for disease name recognition and concept normalization J. Biomed. Inform. 2014 47 1-10

[36]

Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: Macskassy, S.A., Perlich, C., Leskovec, J., Wang, W., Ghani, R. (eds.) The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA-2014, pp. 601–610. ACM (2014).

[37]

D’Souza, J., Hoppe, A., Brack, A., Jaradeh, M.Y., Auer, S., Ewerth, R.: The STEM-ECR dataset: grounding scientific entity references in STEM scholarly content to authoritative encyclopedic and lexicographic sources. In: Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of The 12th Language Resources and Evaluation Conference, LREC 2020, Marseille, France, 2020, pp. 2192–2203. European Language Resources Association (2020). https://www.aclweb.org/anthology/2020.lrec-1.268/

[38]

Färber, M.: The microsoft academic knowledge graph: A linked data source with 8 billion triples of scholarly data. In: Ghidini, C., Hartig, O., Maleshkova, M., Svátek, V., Cruz, I.F., Hogan, A., Song, J., Lefrançois, M., Gandon, F. (eds.) The Semantic Web—ISWC 2019—18th International Semantic Web Conference, Auckland, New Zealand, 2019, Proceedings, Part II, Lecture Notes in Computer Science, vol. 11779, pp. 113–129. Springer (2019).

[39]

Färber M, Bartscherer F, Menne C, and Rettinger A Linked data quality of DBpedia, Freebase, Opencyc, Wikidata, and YAGO Semant. Web 2018 9 1 77-129

[40]

Fathalla, S., Vahdati, S., Auer, S., Lange, C.: Towards a knowledge graph representing research findings by semantifying survey articles. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L.S., Karydis, I. (eds.) Research and Advanced Technology for Digital Libraries—21st International Conference on Theory and Practice of Digital Libraries, TPDL 2017, Thessaloniki, Greece, 2017, Proceedings, Lecture Notes in Computer Science, vol. 10450, pp. 315–327. Springer (2017).

[41]

Fellbaum C WordNet: An Electronic Lexical Database. Language, Speech, and Communication 1998 Cambridge MIT Press

[42]

Fink A Conducting Research Literature Reviews: From the Internet to Paper 2014 Thousand Oaks SAGE Publications

[43]

Fisas, B., Saggion, H., Ronzano, F.: On the discoursive structure of computer graphics research papers. In: Meyers, A., Rehbein, I., Zinsmeister, H. (eds.) Proceedings of The 9th Linguistic Annotation Workshop, LAW@NAACL-HLT 2015, 2015, Denver, Colorado, USA, pp. 42–51. The Association for Computer Linguistics (2015).

[44]

Friedrich, A., Adel, H., Tomazic, F., Hingerl, J., Benteau, R., Marusczyk, A., Lange, L.: The sofc-exp corpus and neural approaches to information extraction in the materials science domain. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, 2020, pp. 1255–1268. Association for Computational Linguistics (2020).

[45]

Gábor, K., Buscaldi, D., Schumann, A., Qasemi Zadeh, B., Zargayouna, H., Charnois, T.: Semeval-2018 task 7: Semantic relation extraction and classification in scientific papers. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, New Orleans, Louisiana, USA, 2018, pp. 679–688. Association for Computational Linguistics (2018).

[46]

Galárraga, L., Razniewski, S., Amarilli, A., Suchanek, F.M.: Predicting completeness in knowledge bases. In: de Rijke, M., Shokouhi, M., Tomkins, A., Zhang, M. (eds.) Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM 2017, Cambridge, United Kingdom, 2017, pp. 375–383. ACM (2017).

[47]

Galárraga, L.A., Teflioudi, C., Hose, K., Suchanek, F.M.: AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In: Schwabe, D., Almeida, V.A.F., Glaser, H., Baeza-Yates, R., Moon, S.B. (eds.) 22nd International World Wide Web Conference, WWW ’13, Rio de Janeiro, Brazil, 2013, pp. 413–422. International World Wide Web Conferences Steering Committee. ACM (2013).

[48]

Gonçalves S, Cortez P, and Moro S A deep learning classifier for sentence classification in biomedical and computer science abstracts Neural Comput. Appl. 2020 32 11 6793-6807

[49]

Groza, T., Handschuh, S., Möller, K., Decker, S.: SALT—semantically annotated latex for scientific publications. In: Franconi, E., Kifer, M., May, W. (eds.) The Semantic Web: Research and Applications, 4th European Semantic Web Conference, ESWC 2007, Innsbruck, Austria, 2007, Proceedings, Lecture Notes in Computer Science, vol. 4519, pp. 518–532. Springer (2007).

[50]

Hars A Structure of Scientific Knowledge 2003 Berlin Springer 83-185

[51]

Hevner AR, March ST, Park J, and Ram S Design science in information systems research MIS Q. 2004 28 1 75-105

[52]

Hoppe, A., Hagen, J., Holzmann, H., Kniesel, G., Ewerth, R.: An analytics tool for exploring scientific software and related publications. In: Méndez, E., Crestani, F., Ribeiro, C., David, G., Lopes, J.C. (eds.) Digital Libraries for Open Knowledge, 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018, Porto, Portugal, 2018, Proceedings, Lecture Notes in Computer Science, vol. 11057, pp. 299–303. Springer (2018).

[53]

Horvath, I.: Comparison of three methodological approaches of design research. In: S.N. (ed.) Proceedings of the 16th International Conference on Engineering Design, ICED’07, pp. 1–11. Ecole Central Paris (2007). Null; Conference date: 28-08-2007 through 30-08-2007

[54]

Hou, Y., Jochim, C., Gleize, M., Bonin, F., Ganguly, D.: Identification of tasks, datasets, evaluation metrics, and numeric scores for scientific leaderboards construction. In: Korhonen, A., Traum, D.R., Màrquez, L. (eds.) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, 2019, vol. 1: Long Papers, pp. 5203–5213. Association for Computational Linguistics (2019).

[55]

Jain, S., van Zuylen, M., Hajishirzi, H., Beltagy, I.: Scirex: A challenge dataset for document-level information extraction. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, 2020, pp. 7506–7516. Association for Computational Linguistics (2020).

[56]

Jaradeh, M.Y., Oelen, A., Prinz, M., Stocker, M., Auer, S.: Open research knowledge graph: a system walkthrough. In: Doucet, A., Isaac, A., Golub, K., Aalberg, T., Jatowt, A. (eds.) Digital Libraries for Open Knowledge—23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019, Oslo, Norway, 2019, Proceedings, Lecture Notes in Computer Science, vol. 11799, pp. 348–351. Springer (2019).

[57]

Jia, R., Wong, C., Poon, H.: Document-level n-ary relation extraction with multiscale representation learning. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2019, vol. 1 (Long and Short Papers), pp. 3693–3704. Association for Computational Linguistics (2019).

[58]

Kannan, A.V., Fradkin, D., Akrotirianakis, I., Kulahcioglu, T., Canedo, A., Roy, A., Yu, S., Malawade, A.V., Faruque, M.A.A.: Multimodal knowledge graph for deep learning papers and code. In: d’Aquin, M., Dietze, S., Hauff, C., Curry, E., Cudré-Mauroux, P. (eds.) CIKM ’20: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, 2020, pp. 3417–3420. ACM (2020).

[59]

Kardas, M., Czapla, P., Stenetorp, P., Ruder, S., Riedel, S., Taylor, R., Stojnic, R.: Axcell: Automatic extraction of results from machine learning papers. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, 2020, pp. 8580–8594. Association for Computational Linguistics (2020).

[60]

Kim S, Martínez D, Cavedon L, and Yencken L Automatic classification of sentences to support evidence based medicine BMC Bioinform. 2011 12 2 S5

[61]

Kitchenham, B.A., Charters, S.: Guidelines for performing systematic literature reviews in software engineering. Tech. Rep. EBSE 2007-001, Keele University and Durham University Joint Report. https://www.elsevier.com/__data/promis_misc/525444systematicreviewsguide.pdf (2007)

[62]

Klampanos IA, Davvetas A, Koukourikos A, and Karkaletsis V ANNETT-O: an ontology for describing artificial neural network evaluation, topology and training Int. J. Metadata Semant. Ontol. 2019 13 3 179-190

[63]

Kolitsas, N., Ganea, O., Hofmann, T.: End-to-end neural entity linking. In: Korhonen, A., Titov, I. (eds.) Proceedings of the 22nd Conference on Computational Natural Language Learning, CoNLL 2018, Brussels, Belgium, 2018, pp. 519–529. Association for Computational Linguistics (2018).

[64]

Kringelum J, Kjærulff SK, Brunak S, Lund O, Oprea TI, and Taboureau O Chemprot-3.0: a global chemical biology diseases mapping Database J. Biol. Databases Curation 2016

[65]

Lange C Ontologies and languages for representing mathematical knowledge on the semantic web Semant. Web 2013 4 2 119-158

[66]

Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, van Kleef P, Auer S, and Bizer C Dbpedia—a large-scale, multilingual knowledge base extracted from Wikipedia Semant. Web 2015 6 2 167-195

[67]

Li, J., Sun, Y., Johnson, R.J., Sciaky, D., Wei, C., Leaman, R., Davis, A.P., Mattingly, C.J., Wiegers, T.C., Lu, Z.: Biocreative V CDR task corpus: a resource for chemical disease relation extraction. Database J. Biol. Databases Curation 2016, (2016).

[68]

Liakata M, Saha S, Dobnik S, Batchelor CR, and Rebholz-Schuhmann D Automatic recognition of conceptualization zones in scientific articles and two life science applications Bioinformatics 2012 28 7 991-1000

[69]

Liakata, M., Teufel, S., Siddharthan, A., Batchelor, C.R.: Corpora for the conceptualisation and zoning of scientific papers. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, 2010, Valletta, Malta. European Language Resources Association (2010). http://www.lrec-conf.org/proceedings/lrec2010/summaries/644.html

[70]

Lo, K., Wang, L.L., Neumann, M., Kinney, R., Weld, D.S.: S2ORC: the semantic scholar open research corpus. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, 2020, pp. 4969–4983. Association for Computational Linguistics (2020).

[71]

Luan, Y., He, L., Ostendorf, M., Hajishirzi, H.: Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018, pp. 3219–3232. Association for Computational Linguistics (2018).

[72]

Lubani M, Noah SAM, and Mahmud R Ontology population: approaches and design aspects J. Inf. Sci. 2019

[73]

Manghi P, Bardi A, Atzori C, Baglioni M, Manola N, Schirrwagen J, and Principe P The OpenAIRE research graph data model Zenodo 2019

[74]

Mesbah, S., Fragkeskos, K., Lofi, C., Bozzon, A., Houben, G.: Semantic annotation of data processing pipelines in scientific publications. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) The Semantic Web—14th International Conference, ESWC 2017, Portorož, Slovenia, 2017, Proceedings, Part I, Lecture Notes in Computer Science, vol. 10249, pp. 321–336 (2017).

[75]

Nasar Z, Jaffry SW, and Malik MK Information extraction from scientific articles: a survey Scientometrics 2018 117 3 1931-1990

[76]

Nguyen, V.B., Svátek, V., Rabby, G., Corcho, Ó.: Ontologies supporting research-related information foraging using knowledge graphs: literature survey and holistic model mapping. In: Keet, C.M., Dumontier, M. (eds.) Knowledge Engineering and Knowledge Management—22nd International Conference, EKAW 2020, Bolzano, Italy, 2020, Proceedings, Lecture Notes in Computer Science, vol. 12387, pp. 88–103. Springer (2020).

[77]

Nickel M, Murphy K, Tresp V, and Gabrilovich E A review of relational machine learning for knowledge graphs Proc. IEEE 2016 104 1 11-33

[78]

Oelen, A., Jaradeh, M.Y., Stocker, M., Auer, S.: Generate FAIR literature surveys with scholarly knowledge graphs. In: Huang, R., Wu, D., Marchionini, G., He, D., Cunningham, S.J., Hansen, P. (eds.) JCDL ’20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, Virtual Event, China, 2020, pp. 97–106. ACM (2020).

[79]

Okoli C A guide to conducting a standalone systematic literature review Commun. Assoc. Inf. Syst. 2015 37 43

[80]

Papers with code. https://paperswithcode.com/. Accessed 04 Oct 2021

[81]

Park, S., Caragea, C.: Scientific keyphrase identification and classification by pre-trained language models intermediate task transfer learning. In: Scott, D., Bel, N., Zong, C. (eds.) Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), 2020, pp. 5409–5419. International Committee on Computational Linguistics (2020).

[82]

Peng, Y., Yan, S., Lu, Z.: Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. In: Demner-Fushman, D., Cohen, K.B., Ananiadou, S., Tsujii, J. (eds.) Proceedings of the 18th BioNLP Workshop and Shared Task, BioNLP@ACL 2019, Florence, Italy, 2019, pp. 58–65. Association for Computational Linguistics (2019).

[83]

Peroni S and Shotton DM Fabio and cito: ontologies for describing bibliographic resources and citations J. Web Semant. 2012 17 33-43

[84]

Pertsas V and Constantopoulos P Scholarly ontology: modelling scholarly practices Int. J. Digit. Libr. 2017 18 3 173-190

[85]

Petasis, G., Karkaletsis, V., Paliouras, G., Krithara, A., Zavitsanos, E.: Ontology population and enrichment: state of the art. In: Paliouras, G., Spyropoulos, C.D., Tsatsaronis, G. (eds.) Knowledge-Driven Multimedia Information Extraction and Ontology Evolution—Bridging the Semantic Gap, Lecture Notes in Computer Science, vol. 6050, pp. 134–166. Springer (2011).

[86]

Pineau, J., Vincent-Lamarre, P., Sinha, K., Larivière, V., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Larochelle, H.: Improving reproducibility in machine learning research (a report from the neurips 2019 reproducibility program). CoRR abs/2003.12206 (2020). arXiv:2003.12206

[87]

Pipino LL, Lee YW, and Wang RY Data quality assessment Commun. ACM 2002 45 4 211-218

[88]

Pujara, J., Singh, S.: Mining knowledge graphs from text. In: Chang, Y., Zhai, C., Liu, Y., Maarek, Y. (eds.) Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM 2018, Marina Del Rey, CA, USA, 2018, pp. 789–790. ACM (2018).

[89]

Qasemi Zadeh, B., Handschuh, B.S.: The ACL RD-TEC: a dataset for benchmarking terminology extraction and classification in computational linguistics. In: Proceedings of the 4th International Workshop on Computational Terminology (Computerm), pp. 52–63. Association for Computational Linguistics and Dublin City University, Dublin, Ireland (2014). 10.3115/v1/W14-4807. https://www.aclweb.org/anthology/W14-4807

[90]

Qasemi Zadeh, B., Schumann, A.: The ACL RD-TEC 2.0: a language resource for evaluating term extraction and entity recognition methods. In: Calzolari, N., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, Portorož, Slovenia, 2016. European Language Resources Association (ELRA) (2016). http://www.lrec-conf.org/proceedings/lrec2016/summaries/681.html

[91]

Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100, 000+ questions for machine comprehension of text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, 2016, pp. 2383–2392. The Association for Computational Linguistics (2016).

[92]

Richardson S, Wilson M, Nishikawa J, and Hayward R The well-built clinical question: a key to evidence-based decisions ACP J. Club 1995 123 3 A12-13

[93]

Ruiz-Iniesta, A., Corcho, Ó.: A review of ontologies for describing scholarly and scientific documents. In: Castro, A.G., Lange, C., Lord, P.W., Stevens, R. (eds.) Proceedings of the 4th Workshop on Semantic Publishing Co-located with the 11th Extended Semantic Web Conference (ESWC 2014), Anissaras, Greece, 2014, CEUR Workshop Proceedings, vol. 1155. CEUR-WS.org (2014). http://ceur-ws.org/Vol-1155/paper-07.pdf

[94]

Safder I, Hassan S, Visvizi A, Noraset T, Nawaz R, and Tuarob S Deep learning-based extraction of algorithmic metadata in full-text scholarly documents Inf. Process. Manag. 2020 57 6 102269

[95]

Salatino AA, Thanapalasingam T, Mannocci A, Birukou A, Osborne F, and Motta E The computer science ontology: a comprehensive automatically-generated taxonomy of research areas Data Intell. 2020 2 3 379-416

[96]

Say, A., Fathalla, S., Vahdati, S., Lehmann, J., Auer, S.: Semantic representation of physics research data. In: Aveiro, D., Dietz, J.L.G., Filipe, J. (eds.) Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2020, vol. 2: KEOD, Budapest, Hungary, 2020, pp. 64–75. SCITEPRESS (2020).

[97]

Singh, M., Barua, B., Palod, P., Garg, M., Satapathy, S., Bushi, S., Ayush, K., Rohith, K.S., Gamidi, T., Goyal, P., Mukherjee, A.: OCR++: a robust framework for information extraction from scholarly articles. In: Calzolari, N., Matsumoto, Y., Prasad, R. (eds.) COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, 2016, Osaka, Japan, pp. 3390–3400. ACL (2016). https://www.aclweb.org/anthology/C16-1320/

[98]

Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S, and Consortium TO The obo foundry: coordinated evolution of ontologies to support biomedical data integration Nat. Biotechnol. 2007 25 11 1251-1255

[99]

Soldatova LN and King RD An ontology of scientific experiments J. R. Soc. Interface 2006 3 11 795-803

[100]

Stead, C., Smith, S., Busch, P.A., Vatanasakdakul, S.: Emerald 110k: a multidisciplinary dataset for abstract sentence classification. In: Mistica, M., Piccardi, M., MacKinlay, A. (eds.) Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association, ALTA 2019, Sydney, Australia, 2019, pp. 120–125. Australasian Language Technology Association (2019). https://aclweb.org/anthology/papers/U/U19/U19-1016/

[101]

Stocker, M., Prinz, M., Rostami, F., Kempf, T.: Towards research infrastructures that curate scientific information: a use case in life sciences. In: Auer, S., Vidal, M. (eds.) Data Integration in the Life Sciences—13th International Conference, DILS 2018, Hannover, Germany, 2018, Proceedings, Lecture Notes in Computer Science, vol. 11371, pp. 61–74. Springer (2018).

[102]

Suchanek, F.M., Gross-Amblard, D., Abiteboul, S.: Watermarking for ontologies. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N.F., Blomqvist, E. (eds.) The Semantic Web—ISWC 2011—10th International Semantic Web Conference, Bonn, Germany, 2011, Proceedings, Part I, Lecture Notes in Computer Science, vol. 7031, pp. 697–713. Springer (2011).

[103]

Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Williamson, C.L., Zurko, M.E., Patel-Schneider, P.F., Shenoy, P.J. (eds.) Proceedings of the 16th International Conference on World Wide Web, WWW 2007, Banff, Alberta, Canada, 2007, pp. 697–706. ACM (2007).

[104]

Talburt, J.R.: 2—principles of information quality. In: Talburt, J.R. (ed.) Entity Resolution and Information Quality, pp. 39–62. Morgan Kaufmann, Boston (2011). http://www.sciencedirect.com/science/article/pii/B9780123819727000026

[105]

Teufel, S., Siddharthan, A., Batchelor, C.R.: Towards domain-independent argumentative zoning: Evidence from chemistry and computational linguistics. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, Singapore, A Meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1493–1502. ACL (2009). https://www.aclweb.org/anthology/D09-1155/

[106]

Vahdati, S., Fathalla, S., Auer, S., Lange, C., Vidal, M.: Semantic representation of scientific publications. In: Doucet, A., Isaac, A., Golub, K., Aalberg, T., Jatowt, A. (eds.) Digital Libraries for Open Knowledge—23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019, Oslo, Norway, 2019, Proceedings, Lecture Notes in Computer Science, vol. 11799, pp. 375–379. Springer (2019).

[107]

Vandenbussche P, Atemezing G, Poveda-Villalón M, and Vatant B Linked open vocabularies (LOV): a gateway to reusable semantic vocabularies on the web Semant. Web 2017 8 3 437-452

[108]

Vrandecic D and Krötzsch M Wikidata: a free collaborative knowledgebase Commun. ACM 2014 57 10 78-85

[109]

Waard, A., Tel, G.: The ABCDE format enabling semantic conference proceedings. In: Völkel, M., Schaffert, S. (eds.) SemWiki2006, First Workshop on Semantic Wikis—From Wiki to Semantics, Proceedings, Co-located with the ESWC2006, Budva, Montenegro, 2006, CEUR Workshop Proceedings, vol. 206. CEUR-WS.org (2006). http://ceur-ws.org/Vol-206/paper8.pdf

[110]

Wang RY and Strong DM Beyond accuracy: what data quality means to data consumers J. Manag. Inf. Syst. 1996 12 4 5-33

[111]

Weikum, G., Dong, L., Razniewski, S., Suchanek, F.M.: Machine knowledge: creation and curation of comprehensive knowledge bases. CoRR abs/2009.11564 (2020). arXiv:2009.11564

[112]

Xiong, C., Power, R., Callan, J.: Explicit semantic ranking for academic search via knowledge graph embedding. In: Barrett, R., Cummings, R., Agichtein, E., Gabrilovich, E. (eds.) Proceedings of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, 2017, pp. 1271–1279. ACM (2017).

[113]

Yaman, B., Pasin, M., Freudenberg, M.: Interlinking scigraph and dbpedia datasets using link discovery and named entity recognition techniques. In: Eskevich, M., de Melo, G., Fäth, C., McCrae, J.P., Buitelaar, P., Chiarcos, C., Klimek, B., Dojchinovski, M. (eds.) 2nd Conference on Language, Data and Knowledge, LDK 2019, Leipzig, Germany, OASICS, vol. 70, pp. 15:1–15:8. Schloss Dagstuhl–Leibniz–Zentrum für Informatik (2019).

[114]

Zaveri A, Rula A, Maurino A, Pietrobon R, Lehmann J, and Auer S Quality assessment for linked data: a survey Semant. Web 2016 7 1 63-93

[115]

Zhang Y, Wang M, Saberi M, and Chang E From big scholarly data to solution-oriented knowledge repository Front. Big Data 2019 2 38

Cited By

Brack AEntrup EStamatakis MBuschermöhle PHoppe AEwerth R(2024)Sequential sentence classification in research papers using cross-domain multi-task learningInternational Journal on Digital Libraries10.1007/s00799-023-00392-z25:2(377-400)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s00799-023-00392-z
Brack AHoppe ABuschermöhle PEwerth RAizawa AMandl TCarevic ZHinze AMayr PSchaer P(2022)Cross-domain multi-task learning for sequential sentence classification in research papersProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3530922(1-13)Online publication date: 20-Jun-2022
https://dl.acm.org/doi/10.1145/3529372.3530922
Dessí DOsborne FReforgiato Recupero DBuscaldi DMotta E(2022)CS-KG: A Large-Scale Knowledge Graph of Research Entities and Claims in Computer ScienceThe Semantic Web – ISWC 202210.1007/978-3-031-19433-7_39(678-696)Online publication date: 23-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-19433-7_39

Index Terms

Analysing the requirements for an Open Research Knowledge Graph: use cases, quality requirements, and construction strategies
1. Information systems
2. Software and its engineering
  1. Software creation and management
    1. Designing software
      1. Requirements analysis

Index terms have been assigned to the content through auto-classification.

Recommendations

Open Research Knowledge Graph: A System Walkthrough
Digital Libraries for Open Knowledge
Abstract
Despite improved digital access to scholarly literature in the last decades, the fundamental principles of scholarly communication remain unchanged and continue to be largely document-based. Scholarly knowledge remains locked in representations ...
Requirements Analysis for an Open Research Knowledge Graph
Digital Libraries for Open Knowledge
Abstract
Current science communication has a number of drawbacks and bottlenecks which have been subject of discussion lately: Among others, the rising number of published articles makes it nearly impossible to get a full overview of the state of the art ...
Quality Requirements Analysis Using Requirements Frames
QSIC '11: Proceedings of the 2011 11th International Conference on Quality Software

Defining quality requirements completely and correctly is more difficult than defining functional requirements because stakeholders do not state most of quality requirements explicitly. We thus propose a method to measure a requirements specification ...

Comments

Information & Contributors

Information

Published In

cover image International Journal on Digital Libraries

International Journal on Digital Libraries Volume 23, Issue 1

Mar 2022

105 pages

ISSN:1432-5012

EISSN:1432-1300

Issue’s Table of Contents

© The Author(s) 2021.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 March 2022

Accepted: 12 July 2021

Revision received: 08 July 2021

Received: 04 February 2021

Author Tags

Qualifiers

Research-article

Funding Sources

Technische Informationsbibliothek (TIB) – Leibniz-Informationszentrum Technik und Naturwissenschaften (1051)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Brack AEntrup EStamatakis MBuschermöhle PHoppe AEwerth R(2024)Sequential sentence classification in research papers using cross-domain multi-task learningInternational Journal on Digital Libraries10.1007/s00799-023-00392-z25:2(377-400)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s00799-023-00392-z
Brack AHoppe ABuschermöhle PEwerth RAizawa AMandl TCarevic ZHinze AMayr PSchaer P(2022)Cross-domain multi-task learning for sequential sentence classification in research papersProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3530922(1-13)Online publication date: 20-Jun-2022
https://dl.acm.org/doi/10.1145/3529372.3530922
Dessí DOsborne FReforgiato Recupero DBuscaldi DMotta E(2022)CS-KG: A Large-Scale Knowledge Graph of Research Entities and Claims in Computer ScienceThe Semantic Web – ISWC 202210.1007/978-3-031-19433-7_39(678-696)Online publication date: 23-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-19433-7_39

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents