Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Analysing the requirements for an Open Research Knowledge Graph: use cases, quality requirements, and construction strategies

Published: 01 March 2022 Publication History

Abstract

Current science communication has a number of drawbacks and bottlenecks which have been subject of discussion lately: Among others, the rising number of published articles makes it nearly impossible to get a full overview of the state of the art in a certain field, or reproducibility is hampered by fixed-length, document-based publications which normally cannot cover all details of a research work. Recently, several initiatives have proposed knowledge graphs (KG) for organising scientific information as a solution to many of the current issues. The focus of these proposals is, however, usually restricted to very specific use cases. In this paper, we aim to transcend this limited perspective and present a comprehensive analysis of requirements for an Open Research Knowledge Graph (ORKG) by (a) collecting and reviewing daily core tasks of a scientist, (b) establishing their consequential requirements for a KG-based system, (c) identifying overlaps and specificities, and their coverage in current solutions. As a result, we map necessary and desirable requirements for successful KG-based science communication, derive implications, and outline possible solutions.

References

[1]
Ammar, W., Groeneveld, D., Bhagavatula, C., Beltagy, I., Crawford, M., Downey, D., Dunkelberger, J., Elgohary, A., Feldman, S., Ha, V., Kinney, R., Kohlmeier, S., Lo, K., Murray, T., Ooi, H., Peters, M.E., Power, J., Skjonsberg, S., Wang, L.L., Wilhelm, C., Yuan, Z., van Zuylen, M., Etzioni, O.: Construction of the literature graph in semantic scholar. In: Bangalore, S., Chu-Carroll, J., Li, Y. (eds.) Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1–6, 2018, vol. 3 (Industry Papers), pp. 84–91. Association for Computational Linguistics (2018).
[2]
Aryani A and Wang J Research graph: Building a distributed graph of scholarly works using research data switchboard Open Repos. Conf. 2017
[3]
Auer S and Mann S Towards an open research knowledge graph Ser. Libr. 2019 76 1–4 35-41
[4]
Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: Semeval 2017 task 10: Scienceie—xtracting keyphrases and relations from scientific publications. In: Bethard, S., Carpuat, M., Apidianaki, M., Mohammad, S.M., Cer, D.M., Jurgens, D. (eds.) Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval@ACL 2017, Vancouver, Canada, 2017, pp. 546–555. Association for Computational Linguistics (2017).
[5]
Badie K, Asadi N, and Mahmoudi MT Zone identification based on features with high semantic richness and combining results of separate classifiers J. Inf. Telecommun. 2018 2 4 411-427
[6]
Balog K Entity-Oriented Search 2018 Berlin Springer
[7]
Bechhofer S, Buchan IE, Roure DD, Missier P, Ainsworth JD, Bhagat J, Couch PA, Cruickshank D, Delderfield M, Dunlop I, Gamble M, Michaelides DT, Owen S, Newman DR, Sufi S, and Goble CA Why linked data is not enough for scientists Future Gener. Comput. Syst. 2013 29 2 599-611
[8]
Beel J, Gipp B, Langer S, and Breitinger C Research-paper recommender systems: a literature survey Int. J. Digit. Libr. 2016 17 4 305-338
[9]
Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, 2019, pp. 3613–3618. Association for Computational Linguistics (2019).
[10]
Bizer C Quality-Driven Information Filtering—In the Context of Web-Based Information Systems 2007 Saarbrücken VDM Verlag
[11]
Bodenreider O The unified medical language system (UMLS): integrating biomedical terminology Nucl. Acids Res. 2004 32 267-270
[12]
Bollacker, K.D., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Wang, J.T. (ed.) Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, 2008, pp. 1247–1250. ACM (2008).
[13]
Booch G, Rumbaugh J, and Jacobson I Unified Modeling Language User Guide, The (2nd Edition) (Addison-Wesley Object Technology Series) 2005 Boston Addison-Wesley Professional
[14]
Bornmann L and Mutz R Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references J. Assoc. Inf. Sci. Technol. 2015 66 11 2215-2222
[15]
Brack, A., D’Souza, J., Hoppe, A., Auer, S., Ewerth, R.: Domain-independent extraction of scientific concepts from research articles. In: Jose, J.M., Yilmaz, E., Magalhães, J., Castells, P., Ferro, N., Silva, M.J., Martins, F. (eds.) Advances in Information Retrieval—42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, 2020, Proceedings, Part I, Lecture Notes in Computer Science, vol. 12035, pp. 251–266. Springer (2020).
[16]
Brack, A., Hoppe, A., Stocker, M., Auer, S., Ewerth, R.: Requirements analysis for an open research knowledge graph. In: Hall, M.M., Mercun, T., Risse, T., Duchateau, F. (eds.) Digital Libraries for Open Knowledge—24th International Conference on Theory and Practice of Digital Libraries, TPDL 2020, Lyon, France, 2020, Proceedings, Lecture Notes in Computer Science, vol. 12246, pp. 3–18. Springer (2020).
[17]
Brack, A., Müller, D.U., Hoppe, A., Ewerth, R.: Coreference resolution in research papers from multiple domains. In: Hiemstra, D., Moens, M., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) Advances in Information Retrieval—43rd European Conference on IR Research, ECIR 2021, Virtual Event, 2021, Proceedings, Part I, Lecture Notes in Computer Science, vol. 12656, pp. 79–97. Springer (2021).
[18]
Braun, R., Benedict, M., Wendler, H., Esswein, W.: Proposal for requirements driven design science research. In: Donnellan, B., Helfert, M., Kenneally, J., VanderMeer, D.E., Rothenberger, M.A., Winter, R. (eds.) New Horizons in Design Science: Broadening the Research Agenda—10th International Conference, DESRIST 2015, Dublin, Ireland, 2015, Proceedings, Lecture Notes in Computer Science, vol. 9073, pp. 135–151. Springer (2015).
[19]
Brodaric, B., Reitsma, F., Qiang, Y.: Skiing with DOLCE: toward an e-science knowledge infrastructure. In: Eschenbach, C., Grüninger, M. (eds.) Formal Ontology in Information Systems, Proceedings of the Fifth International Conference, FOIS 2008, Saarbrücken, Germany, 2008, Frontiers in Artificial Intelligence and Applications, vol. 183, pp. 208–219. IOS Press (2008).
[20]
Burton A, Aryani A, Koers H, Manghi P, Bruzzo SL, Stocker M, Diepenbroek M, Schindler U, and Fenner M The scholix framework for interoperability in data-literature information exchange D-Lib Mag. 2017 23 1/2 1-20
[21]
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Jr., E.R.H., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: Fox, M., Poole, D. (eds.) Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, Atlanta, Georgia, USA, 2010. AAAI Press (2010). http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/1879
[22]
CB Insights: The data flywheel: how enlightened self-interest drives data network effects. https://www.cbinsights.com/research/team-blog/data-network-effects/ (2020)
[23]
Cohan, A., Ammar, W., van Zuylen, M., Cady, F.: Structural scaffolds for citation intent classification in scientific publications. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2019, vol. 1 (Long and Short Papers), pp. 3586–3596. Association for Computational Linguistics (2019).
[24]
Cohan, A., Beltagy, I., King, D., Dalvi, B., Weld, D.S.: Pretrained language models for sequential sentence classification. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, 2019, pp. 3691–3697. Association for Computational Linguistics (2019).
[25]
Cohen KB, Lanfranchi A, Choi MJ, Baumgartner WA, Panteleyeva N, Verspoor K, Palmer M, and Hunter LE Coreference annotation and resolution in the Colorado richly annotated full text (CRAFT) corpus of biomedical journal articles BMC Bioinform. 2017 18 1 1-14
[26]
Consortium TGO and Consortium The gene ontology resource: 20 years and still going strong Nucl. Acids Res. 2019 47 D330-D338
[27]
Constantin A, Peroni S, Pettifer S, Shotton DM, and Vitali F The document components ontology (DoCo) Semant. Web 2016 7 2 167-181
[28]
Dayrell, C., Jr., A.C., Lima, G., Jr., D.M., Copestake, A.A., Feltrim, V.D., Tagnin, S.E.O., Aluísio, S.M.: Rhetorical move detection in english abstracts: multi-label sentence classifiers and their annotated corpora. In: Calzolari, N., Choukri, K., Declerck, T., Dogan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, Istanbul, Turkey, 2012, pp. 1604–1609. European Language Resources Association (ELRA) (2012). http://www.lrec-conf.org/proceedings/lrec2012/summaries/734.html
[29]
Degbelo, A.: A snapshot of ontology evaluation criteria and strategies. In: Hoekstra, R., Faron-Zucker, C., Pellegrini, T., de Boer, V. (eds.) Proceedings of the 13th International Conference on Semantic Systems, SEMANTICS 2017, Amsterdam, The Netherlands, 2017, pp. 1–8. ACM (2017).
[30]
Degtyarenko K, de Matos P, Ennis M, Hastings J, Zbinden M, McNaught A, Alcántara R, Darsow M, Guedj M, and Ashburner M Chebi: a database and ontology for chemical entities of biological interest Nucl. Acids Res. 2008 36 344-350
[31]
Dernoncourt, F., Lee, J.Y.: 200k RCT: a dataset for sequential sentence classification in medical abstracts. In: Kondrak, G., Watanabe, T. (eds.) Proceedings of the Eighth International Joint Conference on Natural Language Processing, IJCNLP 2017, Taipei, Taiwan, 2017, Volume 2: Short Papers, pp. 308–313. Asian Federation of Natural Language Processing (2017). https://www.aclweb.org/anthology/I17-2052/
[32]
Dessì, D., Osborne, F., Recupero, D.R., Buscaldi, D., Motta, E., Sack, H.: AI-KG: an automatically generated knowledge graph of artificial intelligence. In: Pan, J.Z., Tamma, V.A.M., d’Amato, C., Janowicz, K., Fu, B., Polleres, A., Seneviratne, O., Kagal, L. (eds.) The Semantic Web—ISWC 2020—19th International Semantic Web Conference, Athens, Greece, 2020, Proceedings, Part II, Lecture Notes in Computer Science, vol. 12507, pp. 127–143. Springer (2020).
[33]
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2019, vol. 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019).
[34]
Doerr, M., Kritsotaki, A., Rousakis, Y., Hiebel, G., Theodoridou, M.: Definition of the CRMsci: an extension of CIDOC-CRM to support scientific observation. Tech. rep., FORTH, Version 1.2.8. http://www.cidoc-crm.org/crmsci/ModelVersion/version-1.2.8 (2020)
[35]
Dogan RI, Leaman R, and Lu Z NCBI disease corpus: a resource for disease name recognition and concept normalization J. Biomed. Inform. 2014 47 1-10
[36]
Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: Macskassy, S.A., Perlich, C., Leskovec, J., Wang, W., Ghani, R. (eds.) The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA-2014, pp. 601–610. ACM (2014).
[37]
D’Souza, J., Hoppe, A., Brack, A., Jaradeh, M.Y., Auer, S., Ewerth, R.: The STEM-ECR dataset: grounding scientific entity references in STEM scholarly content to authoritative encyclopedic and lexicographic sources. In: Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of The 12th Language Resources and Evaluation Conference, LREC 2020, Marseille, France, 2020, pp. 2192–2203. European Language Resources Association (2020). https://www.aclweb.org/anthology/2020.lrec-1.268/
[38]
Färber, M.: The microsoft academic knowledge graph: A linked data source with 8 billion triples of scholarly data. In: Ghidini, C., Hartig, O., Maleshkova, M., Svátek, V., Cruz, I.F., Hogan, A., Song, J., Lefrançois, M., Gandon, F. (eds.) The Semantic Web—ISWC 2019—18th International Semantic Web Conference, Auckland, New Zealand, 2019, Proceedings, Part II, Lecture Notes in Computer Science, vol. 11779, pp. 113–129. Springer (2019).
[39]
Färber M, Bartscherer F, Menne C, and Rettinger A Linked data quality of DBpedia, Freebase, Opencyc, Wikidata, and YAGO Semant. Web 2018 9 1 77-129
[40]
Fathalla, S., Vahdati, S., Auer, S., Lange, C.: Towards a knowledge graph representing research findings by semantifying survey articles. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L.S., Karydis, I. (eds.) Research and Advanced Technology for Digital Libraries—21st International Conference on Theory and Practice of Digital Libraries, TPDL 2017, Thessaloniki, Greece, 2017, Proceedings, Lecture Notes in Computer Science, vol. 10450, pp. 315–327. Springer (2017).
[41]
Fellbaum C WordNet: An Electronic Lexical Database. Language, Speech, and Communication 1998 Cambridge MIT Press
[42]
Fink A Conducting Research Literature Reviews: From the Internet to Paper 2014 Thousand Oaks SAGE Publications
[43]
Fisas, B., Saggion, H., Ronzano, F.: On the discoursive structure of computer graphics research papers. In: Meyers, A., Rehbein, I., Zinsmeister, H. (eds.) Proceedings of The 9th Linguistic Annotation Workshop, LAW@NAACL-HLT 2015, 2015, Denver, Colorado, USA, pp. 42–51. The Association for Computer Linguistics (2015).
[44]
Friedrich, A., Adel, H., Tomazic, F., Hingerl, J., Benteau, R., Marusczyk, A., Lange, L.: The sofc-exp corpus and neural approaches to information extraction in the materials science domain. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, 2020, pp. 1255–1268. Association for Computational Linguistics (2020).
[45]
Gábor, K., Buscaldi, D., Schumann, A., Qasemi Zadeh, B., Zargayouna, H., Charnois, T.: Semeval-2018 task 7: Semantic relation extraction and classification in scientific papers. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, New Orleans, Louisiana, USA, 2018, pp. 679–688. Association for Computational Linguistics (2018).
[46]
Galárraga, L., Razniewski, S., Amarilli, A., Suchanek, F.M.: Predicting completeness in knowledge bases. In: de Rijke, M., Shokouhi, M., Tomkins, A., Zhang, M. (eds.) Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM 2017, Cambridge, United Kingdom, 2017, pp. 375–383. ACM (2017).
[47]
Galárraga, L.A., Teflioudi, C., Hose, K., Suchanek, F.M.: AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In: Schwabe, D., Almeida, V.A.F., Glaser, H., Baeza-Yates, R., Moon, S.B. (eds.) 22nd International World Wide Web Conference, WWW ’13, Rio de Janeiro, Brazil, 2013, pp. 413–422. International World Wide Web Conferences Steering Committee. ACM (2013).
[48]
Gonçalves S, Cortez P, and Moro S A deep learning classifier for sentence classification in biomedical and computer science abstracts Neural Comput. Appl. 2020 32 11 6793-6807
[49]
Groza, T., Handschuh, S., Möller, K., Decker, S.: SALT—semantically annotated latex for scientific publications. In: Franconi, E., Kifer, M., May, W. (eds.) The Semantic Web: Research and Applications, 4th European Semantic Web Conference, ESWC 2007, Innsbruck, Austria, 2007, Proceedings, Lecture Notes in Computer Science, vol. 4519, pp. 518–532. Springer (2007).
[50]
Hars A Structure of Scientific Knowledge 2003 Berlin Springer 83-185
[51]
Hevner AR, March ST, Park J, and Ram S Design science in information systems research MIS Q. 2004 28 1 75-105
[52]
Hoppe, A., Hagen, J., Holzmann, H., Kniesel, G., Ewerth, R.: An analytics tool for exploring scientific software and related publications. In: Méndez, E., Crestani, F., Ribeiro, C., David, G., Lopes, J.C. (eds.) Digital Libraries for Open Knowledge, 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018, Porto, Portugal, 2018, Proceedings, Lecture Notes in Computer Science, vol. 11057, pp. 299–303. Springer (2018).
[53]
Horvath, I.: Comparison of three methodological approaches of design research. In: S.N. (ed.) Proceedings of the 16th International Conference on Engineering Design, ICED’07, pp. 1–11. Ecole Central Paris (2007). Null; Conference date: 28-08-2007 through 30-08-2007
[54]
Hou, Y., Jochim, C., Gleize, M., Bonin, F., Ganguly, D.: Identification of tasks, datasets, evaluation metrics, and numeric scores for scientific leaderboards construction. In: Korhonen, A., Traum, D.R., Màrquez, L. (eds.) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, 2019, vol. 1: Long Papers, pp. 5203–5213. Association for Computational Linguistics (2019).
[55]
Jain, S., van Zuylen, M., Hajishirzi, H., Beltagy, I.: Scirex: A challenge dataset for document-level information extraction. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, 2020, pp. 7506–7516. Association for Computational Linguistics (2020).
[56]
Jaradeh, M.Y., Oelen, A., Prinz, M., Stocker, M., Auer, S.: Open research knowledge graph: a system walkthrough. In: Doucet, A., Isaac, A., Golub, K., Aalberg, T., Jatowt, A. (eds.) Digital Libraries for Open Knowledge—23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019, Oslo, Norway, 2019, Proceedings, Lecture Notes in Computer Science, vol. 11799, pp. 348–351. Springer (2019).
[57]
Jia, R., Wong, C., Poon, H.: Document-level n-ary relation extraction with multiscale representation learning. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2019, vol. 1 (Long and Short Papers), pp. 3693–3704. Association for Computational Linguistics (2019).
[58]
Kannan, A.V., Fradkin, D., Akrotirianakis, I., Kulahcioglu, T., Canedo, A., Roy, A., Yu, S., Malawade, A.V., Faruque, M.A.A.: Multimodal knowledge graph for deep learning papers and code. In: d’Aquin, M., Dietze, S., Hauff, C., Curry, E., Cudré-Mauroux, P. (eds.) CIKM ’20: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, 2020, pp. 3417–3420. ACM (2020).
[59]
Kardas, M., Czapla, P., Stenetorp, P., Ruder, S., Riedel, S., Taylor, R., Stojnic, R.: Axcell: Automatic extraction of results from machine learning papers. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, 2020, pp. 8580–8594. Association for Computational Linguistics (2020).
[60]
Kim S, Martínez D, Cavedon L, and Yencken L Automatic classification of sentences to support evidence based medicine BMC Bioinform. 2011 12 2 S5
[61]
Kitchenham, B.A., Charters, S.: Guidelines for performing systematic literature reviews in software engineering. Tech. Rep. EBSE 2007-001, Keele University and Durham University Joint Report. https://www.elsevier.com/__data/promis_misc/525444systematicreviewsguide.pdf (2007)
[62]
Klampanos IA, Davvetas A, Koukourikos A, and Karkaletsis V ANNETT-O: an ontology for describing artificial neural network evaluation, topology and training Int. J. Metadata Semant. Ontol. 2019 13 3 179-190
[63]
Kolitsas, N., Ganea, O., Hofmann, T.: End-to-end neural entity linking. In: Korhonen, A., Titov, I. (eds.) Proceedings of the 22nd Conference on Computational Natural Language Learning, CoNLL 2018, Brussels, Belgium, 2018, pp. 519–529. Association for Computational Linguistics (2018).
[64]
Kringelum J, Kjærulff SK, Brunak S, Lund O, Oprea TI, and Taboureau O Chemprot-3.0: a global chemical biology diseases mapping Database J. Biol. Databases Curation 2016
[65]
Lange C Ontologies and languages for representing mathematical knowledge on the semantic web Semant. Web 2013 4 2 119-158
[66]
Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, van Kleef P, Auer S, and Bizer C Dbpedia—a large-scale, multilingual knowledge base extracted from Wikipedia Semant. Web 2015 6 2 167-195
[67]
Li, J., Sun, Y., Johnson, R.J., Sciaky, D., Wei, C., Leaman, R., Davis, A.P., Mattingly, C.J., Wiegers, T.C., Lu, Z.: Biocreative V CDR task corpus: a resource for chemical disease relation extraction. Database J. Biol. Databases Curation 2016, (2016).
[68]
Liakata M, Saha S, Dobnik S, Batchelor CR, and Rebholz-Schuhmann D Automatic recognition of conceptualization zones in scientific articles and two life science applications Bioinformatics 2012 28 7 991-1000
[69]
Liakata, M., Teufel, S., Siddharthan, A., Batchelor, C.R.: Corpora for the conceptualisation and zoning of scientific papers. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, 2010, Valletta, Malta. European Language Resources Association (2010). http://www.lrec-conf.org/proceedings/lrec2010/summaries/644.html
[70]
Lo, K., Wang, L.L., Neumann, M., Kinney, R., Weld, D.S.: S2ORC: the semantic scholar open research corpus. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, 2020, pp. 4969–4983. Association for Computational Linguistics (2020).
[71]
Luan, Y., He, L., Ostendorf, M., Hajishirzi, H.: Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018, pp. 3219–3232. Association for Computational Linguistics (2018).
[72]
Lubani M, Noah SAM, and Mahmud R Ontology population: approaches and design aspects J. Inf. Sci. 2019
[73]
Manghi P, Bardi A, Atzori C, Baglioni M, Manola N, Schirrwagen J, and Principe P The OpenAIRE research graph data model Zenodo 2019
[74]
Mesbah, S., Fragkeskos, K., Lofi, C., Bozzon, A., Houben, G.: Semantic annotation of data processing pipelines in scientific publications. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) The Semantic Web—14th International Conference, ESWC 2017, Portorož, Slovenia, 2017, Proceedings, Part I, Lecture Notes in Computer Science, vol. 10249, pp. 321–336 (2017).
[75]
Nasar Z, Jaffry SW, and Malik MK Information extraction from scientific articles: a survey Scientometrics 2018 117 3 1931-1990
[76]
Nguyen, V.B., Svátek, V., Rabby, G., Corcho, Ó.: Ontologies supporting research-related information foraging using knowledge graphs: literature survey and holistic model mapping. In: Keet, C.M., Dumontier, M. (eds.) Knowledge Engineering and Knowledge Management—22nd International Conference, EKAW 2020, Bolzano, Italy, 2020, Proceedings, Lecture Notes in Computer Science, vol. 12387, pp. 88–103. Springer (2020).
[77]
Nickel M, Murphy K, Tresp V, and Gabrilovich E A review of relational machine learning for knowledge graphs Proc. IEEE 2016 104 1 11-33
[78]
Oelen, A., Jaradeh, M.Y., Stocker, M., Auer, S.: Generate FAIR literature surveys with scholarly knowledge graphs. In: Huang, R., Wu, D., Marchionini, G., He, D., Cunningham, S.J., Hansen, P. (eds.) JCDL ’20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, Virtual Event, China, 2020, pp. 97–106. ACM (2020).
[79]
Okoli C A guide to conducting a standalone systematic literature review Commun. Assoc. Inf. Syst. 2015 37 43
[80]
Papers with code. https://paperswithcode.com/. Accessed 04 Oct 2021
[81]
Park, S., Caragea, C.: Scientific keyphrase identification and classification by pre-trained language models intermediate task transfer learning. In: Scott, D., Bel, N., Zong, C. (eds.) Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), 2020, pp. 5409–5419. International Committee on Computational Linguistics (2020).
[82]
Peng, Y., Yan, S., Lu, Z.: Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. In: Demner-Fushman, D., Cohen, K.B., Ananiadou, S., Tsujii, J. (eds.) Proceedings of the 18th BioNLP Workshop and Shared Task, BioNLP@ACL 2019, Florence, Italy, 2019, pp. 58–65. Association for Computational Linguistics (2019).
[83]
Peroni S and Shotton DM Fabio and cito: ontologies for describing bibliographic resources and citations J. Web Semant. 2012 17 33-43
[84]
Pertsas V and Constantopoulos P Scholarly ontology: modelling scholarly practices Int. J. Digit. Libr. 2017 18 3 173-190
[85]
Petasis, G., Karkaletsis, V., Paliouras, G., Krithara, A., Zavitsanos, E.: Ontology population and enrichment: state of the art. In: Paliouras, G., Spyropoulos, C.D., Tsatsaronis, G. (eds.) Knowledge-Driven Multimedia Information Extraction and Ontology Evolution—Bridging the Semantic Gap, Lecture Notes in Computer Science, vol. 6050, pp. 134–166. Springer (2011).
[86]
Pineau, J., Vincent-Lamarre, P., Sinha, K., Larivière, V., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Larochelle, H.: Improving reproducibility in machine learning research (a report from the neurips 2019 reproducibility program). CoRR abs/2003.12206 (2020). arXiv:2003.12206
[87]
Pipino LL, Lee YW, and Wang RY Data quality assessment Commun. ACM 2002 45 4 211-218
[88]
Pujara, J., Singh, S.: Mining knowledge graphs from text. In: Chang, Y., Zhai, C., Liu, Y., Maarek, Y. (eds.) Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM 2018, Marina Del Rey, CA, USA, 2018, pp. 789–790. ACM (2018).
[89]
Qasemi Zadeh, B., Handschuh, B.S.: The ACL RD-TEC: a dataset for benchmarking terminology extraction and classification in computational linguistics. In: Proceedings of the 4th International Workshop on Computational Terminology (Computerm), pp. 52–63. Association for Computational Linguistics and Dublin City University, Dublin, Ireland (2014). 10.3115/v1/W14-4807. https://www.aclweb.org/anthology/W14-4807
[90]
Qasemi Zadeh, B., Schumann, A.: The ACL RD-TEC 2.0: a language resource for evaluating term extraction and entity recognition methods. In: Calzolari, N., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, Portorož, Slovenia, 2016. European Language Resources Association (ELRA) (2016). http://www.lrec-conf.org/proceedings/lrec2016/summaries/681.html
[91]
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100, 000+ questions for machine comprehension of text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, 2016, pp. 2383–2392. The Association for Computational Linguistics (2016).
[92]
Richardson S, Wilson M, Nishikawa J, and Hayward R The well-built clinical question: a key to evidence-based decisions ACP J. Club 1995 123 3 A12-13
[93]
Ruiz-Iniesta, A., Corcho, Ó.: A review of ontologies for describing scholarly and scientific documents. In: Castro, A.G., Lange, C., Lord, P.W., Stevens, R. (eds.) Proceedings of the 4th Workshop on Semantic Publishing Co-located with the 11th Extended Semantic Web Conference (ESWC 2014), Anissaras, Greece, 2014, CEUR Workshop Proceedings, vol. 1155. CEUR-WS.org (2014). http://ceur-ws.org/Vol-1155/paper-07.pdf
[94]
Safder I, Hassan S, Visvizi A, Noraset T, Nawaz R, and Tuarob S Deep learning-based extraction of algorithmic metadata in full-text scholarly documents Inf. Process. Manag. 2020 57 6 102269
[95]
Salatino AA, Thanapalasingam T, Mannocci A, Birukou A, Osborne F, and Motta E The computer science ontology: a comprehensive automatically-generated taxonomy of research areas Data Intell. 2020 2 3 379-416
[96]
Say, A., Fathalla, S., Vahdati, S., Lehmann, J., Auer, S.: Semantic representation of physics research data. In: Aveiro, D., Dietz, J.L.G., Filipe, J. (eds.) Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2020, vol. 2: KEOD, Budapest, Hungary, 2020, pp. 64–75. SCITEPRESS (2020).
[97]
Singh, M., Barua, B., Palod, P., Garg, M., Satapathy, S., Bushi, S., Ayush, K., Rohith, K.S., Gamidi, T., Goyal, P., Mukherjee, A.: OCR++: a robust framework for information extraction from scholarly articles. In: Calzolari, N., Matsumoto, Y., Prasad, R. (eds.) COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, 2016, Osaka, Japan, pp. 3390–3400. ACL (2016). https://www.aclweb.org/anthology/C16-1320/
[98]
Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S, and Consortium TO The obo foundry: coordinated evolution of ontologies to support biomedical data integration Nat. Biotechnol. 2007 25 11 1251-1255
[99]
Soldatova LN and King RD An ontology of scientific experiments J. R. Soc. Interface 2006 3 11 795-803
[100]
Stead, C., Smith, S., Busch, P.A., Vatanasakdakul, S.: Emerald 110k: a multidisciplinary dataset for abstract sentence classification. In: Mistica, M., Piccardi, M., MacKinlay, A. (eds.) Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association, ALTA 2019, Sydney, Australia, 2019, pp. 120–125. Australasian Language Technology Association (2019). https://aclweb.org/anthology/papers/U/U19/U19-1016/
[101]
Stocker, M., Prinz, M., Rostami, F., Kempf, T.: Towards research infrastructures that curate scientific information: a use case in life sciences. In: Auer, S., Vidal, M. (eds.) Data Integration in the Life Sciences—13th International Conference, DILS 2018, Hannover, Germany, 2018, Proceedings, Lecture Notes in Computer Science, vol. 11371, pp. 61–74. Springer (2018).
[102]
Suchanek, F.M., Gross-Amblard, D., Abiteboul, S.: Watermarking for ontologies. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N.F., Blomqvist, E. (eds.) The Semantic Web—ISWC 2011—10th International Semantic Web Conference, Bonn, Germany, 2011, Proceedings, Part I, Lecture Notes in Computer Science, vol. 7031, pp. 697–713. Springer (2011).
[103]
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Williamson, C.L., Zurko, M.E., Patel-Schneider, P.F., Shenoy, P.J. (eds.) Proceedings of the 16th International Conference on World Wide Web, WWW 2007, Banff, Alberta, Canada, 2007, pp. 697–706. ACM (2007).
[104]
Talburt, J.R.: 2—principles of information quality. In: Talburt, J.R. (ed.) Entity Resolution and Information Quality, pp. 39–62. Morgan Kaufmann, Boston (2011). http://www.sciencedirect.com/science/article/pii/B9780123819727000026
[105]
Teufel, S., Siddharthan, A., Batchelor, C.R.: Towards domain-independent argumentative zoning: Evidence from chemistry and computational linguistics. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, Singapore, A Meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1493–1502. ACL (2009). https://www.aclweb.org/anthology/D09-1155/
[106]
Vahdati, S., Fathalla, S., Auer, S., Lange, C., Vidal, M.: Semantic representation of scientific publications. In: Doucet, A., Isaac, A., Golub, K., Aalberg, T., Jatowt, A. (eds.) Digital Libraries for Open Knowledge—23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019, Oslo, Norway, 2019, Proceedings, Lecture Notes in Computer Science, vol. 11799, pp. 375–379. Springer (2019).
[107]
Vandenbussche P, Atemezing G, Poveda-Villalón M, and Vatant B Linked open vocabularies (LOV): a gateway to reusable semantic vocabularies on the web Semant. Web 2017 8 3 437-452
[108]
Vrandecic D and Krötzsch M Wikidata: a free collaborative knowledgebase Commun. ACM 2014 57 10 78-85
[109]
Waard, A., Tel, G.: The ABCDE format enabling semantic conference proceedings. In: Völkel, M., Schaffert, S. (eds.) SemWiki2006, First Workshop on Semantic Wikis—From Wiki to Semantics, Proceedings, Co-located with the ESWC2006, Budva, Montenegro, 2006, CEUR Workshop Proceedings, vol. 206. CEUR-WS.org (2006). http://ceur-ws.org/Vol-206/paper8.pdf
[110]
Wang RY and Strong DM Beyond accuracy: what data quality means to data consumers J. Manag. Inf. Syst. 1996 12 4 5-33
[111]
Weikum, G., Dong, L., Razniewski, S., Suchanek, F.M.: Machine knowledge: creation and curation of comprehensive knowledge bases. CoRR abs/2009.11564 (2020). arXiv:2009.11564
[112]
Xiong, C., Power, R., Callan, J.: Explicit semantic ranking for academic search via knowledge graph embedding. In: Barrett, R., Cummings, R., Agichtein, E., Gabrilovich, E. (eds.) Proceedings of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, 2017, pp. 1271–1279. ACM (2017).
[113]
Yaman, B., Pasin, M., Freudenberg, M.: Interlinking scigraph and dbpedia datasets using link discovery and named entity recognition techniques. In: Eskevich, M., de Melo, G., Fäth, C., McCrae, J.P., Buitelaar, P., Chiarcos, C., Klimek, B., Dojchinovski, M. (eds.) 2nd Conference on Language, Data and Knowledge, LDK 2019, Leipzig, Germany, OASICS, vol. 70, pp. 15:1–15:8. Schloss Dagstuhl–Leibniz–Zentrum für Informatik (2019).
[114]
Zaveri A, Rula A, Maurino A, Pietrobon R, Lehmann J, and Auer S Quality assessment for linked data: a survey Semant. Web 2016 7 1 63-93
[115]
Zhang Y, Wang M, Saberi M, and Chang E From big scholarly data to solution-oriented knowledge repository Front. Big Data 2019 2 38

Cited By

View all
  • (2024)Sequential sentence classification in research papers using cross-domain multi-task learningInternational Journal on Digital Libraries10.1007/s00799-023-00392-z25:2(377-400)Online publication date: 1-Jun-2024
  • (2022)Cross-domain multi-task learning for sequential sentence classification in research papersProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3530922(1-13)Online publication date: 20-Jun-2022
  • (2022)CS-KG: A Large-Scale Knowledge Graph of Research Entities and Claims in Computer ScienceThe Semantic Web – ISWC 202210.1007/978-3-031-19433-7_39(678-696)Online publication date: 23-Oct-2022

Index Terms

  1. Analysing the requirements for an Open Research Knowledge Graph: use cases, quality requirements, and construction strategies
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image International Journal on Digital Libraries
      International Journal on Digital Libraries  Volume 23, Issue 1
      Mar 2022
      105 pages
      ISSN:1432-5012
      EISSN:1432-1300
      Issue’s Table of Contents

      Publisher

      Springer-Verlag

      Berlin, Heidelberg

      Publication History

      Published: 01 March 2022
      Accepted: 12 July 2021
      Revision received: 08 July 2021
      Received: 04 February 2021

      Author Tags

      1. Scholarly communication
      2. Research knowledge graph
      3. Design science research
      4. Requirements analysis

      Qualifiers

      • Research-article

      Funding Sources

      • Technische Informationsbibliothek (TIB) – Leibniz-Informationszentrum Technik und Naturwissenschaften (1051)

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 20 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Sequential sentence classification in research papers using cross-domain multi-task learningInternational Journal on Digital Libraries10.1007/s00799-023-00392-z25:2(377-400)Online publication date: 1-Jun-2024
      • (2022)Cross-domain multi-task learning for sequential sentence classification in research papersProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3530922(1-13)Online publication date: 20-Jun-2022
      • (2022)CS-KG: A Large-Scale Knowledge Graph of Research Entities and Claims in Computer ScienceThe Semantic Web – ISWC 202210.1007/978-3-031-19433-7_39(678-696)Online publication date: 23-Oct-2022

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media