Requirements Analysis for an Open Research Knowledge Graph
Pages 3 - 18
Abstract
Current science communication has a number of drawbacks and bottlenecks which have been subject of discussion lately: Among others, the rising number of published articles makes it nearly impossible to get a full overview of the state of the art in a certain field, or reproducibility is hampered by fixed-length, document-based publications which normally cannot cover all details of a research work. Recently, several initiatives have proposed knowledge graphs (KGs) for organising scientific information as a solution to many of the current issues. The focus of these proposals is, however, usually restricted to very specific use cases. In this paper, we aim to transcend this limited perspective by presenting a comprehensive analysis of requirements for an Open Research Knowledge Graph (ORKG) by (a) collecting daily core tasks of a scientist, (b) establishing their consequential requirements for a KG-based system, (c) identifying overlaps and specificities, and their coverage in current solutions. As a result, we map necessary and desirable requirements for successful KG-based science communication, derive implications and outline possible solutions.
References
[1]
Harris MAMA et al. Gene ontology consortium: The gene ontology (GO) database and informatics resource Nucleic Acids Res. 2004 32 D258-D261
[2]
Amir, A., Jing-bo, W.: Research graph: building a distributed graph of scholarly works using research data switchboard. In: Open Repositories CONFERENCE (2017)
[3]
Ammar, W., et al.: Construction of the literature graph in semantic scholar. In: NAACL-HLT (2018)
[4]
Auer, S.: Towards an open research knowledge graph (2018).
[5]
Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: Semeval 2017 task 10: scienceie - extracting keyphrases and relations from scientific publications. In: SemEval@ACL (2017)
[6]
Balog K Entity-Oriented Search 2018 Heidelberg Springer
[7]
Bechhofer, S., et al.: Why linked data is not enough for scientists. In: 2010 IEEE 6th International Conference on e-Science (2010)
[8]
Beel J, Gipp B, Langer S, and Breitinger C Research-paper recommender systems: a literature survey Int. J. Digit. Libr. 2015 17 4 305-338
[9]
Beltagy, I., Lo, K., Cohan, A.: Scibert: pretrained language model for scientific text. In: EMNLP (2019)
[10]
Bodenreider O The unified medical language system (UMLS): integrating biomedical terminology Nucleic Acids Res. 2004 32 D267-D270
[11]
Vrandečić D and Krötzsch M Wikidata: a free collaborative knowledgebase Commun. ACM 2014 57 10 78-85
[12]
Bornmann L and Mutz R Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references J. Assoc. Inf. Sci. Technol. 2015 66 11 2215-2222
[13]
Brack A, D’Souza J, Hoppe A, Auer S, Ewerth R, et al. Jose JM et al. Domain-Independent extraction of scientific concepts from research articles Advances in Information Retrieval 2020 Cham Springer 251-266
[14]
Braun R, Benedict M, Wendler H, and Esswein W Donnellan B, Helfert M, Kenneally J, VanderMeer D, Rothenberger M, and Winter R Proposal for requirements driven design science research New Horizons in Design Science: Broadening the Research Agenda 2015 Cham Springer 135-151
[15]
Brodaric, B., Reitsma, F., Qiang, Y.: Skiing with DOLCE: toward an e-science knowledge infrastructure. In: FOIS (2008)
[16]
Burton, A., et al.: The scholix framework for interoperability in data-literature information exchange. D-Lib Mag. 23(1/2) (2017)
[17]
Cohan, A., Ammar, W., van Zuylen, M., Cady, F.: Structural scaffolds for citation intent classification in scientific publications. In: NAACL-HLT (2019)
[18]
Cohan, A., Beltagy, I., King, D., Dalvi, B., Weld, D.S.: Pretrained language models for sequential sentence classification. In: EMNLP (2019)
[19]
Constantin A, Peroni S, Pettifer S, Shotton DM, and Vitali F The document components ontology (DoCO) Seman. Web 2016 7 2 167-181
[20]
Degbelo, A.: A snapshot of ontology evaluation criteria and strategies. In: SEMANTICS, pp. 1–8. ACM (2017)
[21]
Degtyarenko K et al. Chebi: a database and ontology for chemical entities of biological interest Nucleic Acids Res. 2008 36 344-350
[22]
Dernoncourt, F., Lee, J.Y.: 200k RCT: a dataset for sequential sentence classification in medical abstracts. In: IJCNLP (2017)
[23]
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
[24]
Färber M Ghidini C The microsoft academic knowledge graph: a linked data source with 8 billion triples of scholarly data The Semantic Web – ISWC 2019 2019 Cham Springer 113-129
[25]
Fathalla S, Vahdati S, Auer S, and Lange C Kamps J, Tsakonas G, Manolopoulos Y, Iliadis L, and Karydis I Towards a knowledge graph representing research findings by semantifying survey articles Research and Advanced Technology for Digital Libraries 2017 Cham Springer 315-327
[26]
Fellbaum C WordNet: An Electronic Lexical Database. Language, Speech, and Communication 1998 Cambridge MIT Press
[27]
Fink A Conducting Research Literature Reviews 2014 Thousand Oaks SAGE Publications
[28]
Fisas, B., Saggion, H., Ronzano, F.: On the discoursive structure of computer graphics research papers. In: LAW@NAACL-HLT (2015)
[29]
Gábor, K., Buscaldi, D., Schumann, A.K., QasemiZadeh, B., Zargayouna, H., Charnois, T.: Semeval-2018 task 7: semantic relation extraction and classification in scientific papers. In: Proceedings of The 12th International Workshop on Semantic Evaluation (2018)
[30]
Groza, T., Kim, H., Handschuh, S.: Salt: semantically annotated latex. In: SAAW@ISWC (2006)
[31]
Handschuh, S., QasemiZadeh, B.: The ACL RD-TEC: a dataset for benchmarking terminology extraction and classification in computational linguistics. In: COLING 2014: 4th international workshop on computational terminology (2014)
[32]
Hars A Structure of Scientific Knowledge 2003 Heidelberg Springer
[33]
Hevner AR, March ST, Park J, and Ram S Design science in information systems research MIS Q. 2004 28 1 75-105
[34]
Hoppe A, Hagen J, Holzmann H, Kniesel G, and Ewerth R Méndez E, Crestani F, Ribeiro C, David G, and Lopes JC An analytics tool for exploring scientific software and related publications Digital Libraries for Open Knowledge 2018 Cham Springer 299-303
[35]
Horváth, I.: Comparison of three methodological approaches of design research. In: ICED (2007)
[36]
Hou, Y., Jochim, C., Gleize, M., Bonin, F., Ganguly, D.: Identification of tasks, datasets, evaluation metrics, and numeric scores for scientific leaderboards construction. In: ACL (2019)
[37]
Jaradeh MY, Oelen A, Prinz M, Stocker M, and Auer S Doucet A, Isaac A, Golub K, Aalberg T, and Jatowt A Open research knowledge graph: a system walkthrough Digital Libraries for Open Knowledge 2019 Cham Springer 348-351
[38]
Kim, S., Martínez, D., Cavedon, L., Yencken, L.: Automatic classification of sentences to support evidence based medicine. In: BMC Bioinformatics (2011)
[39]
Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviews in software engineering. Keele University and Durham University Joint Report, Technical report (2007)
[40]
Klampanos IA, Davvetas A, Koukourikos A, and Karkaletsis V Annett-o: an ontology for describing artificial neural network evaluation, topology and training IJMSO 2018 13 24-49
[41]
Kolitsas, N., Ganea, O.E., Hofmann, T.: End-to-end neural entity linking. In: CoNLL (2018)
[42]
Lange C Ontologies and languages for representing mathematical knowledge on the semantic web Semant. Web 2013 4 119-158
[43]
Lehmann J et al. Dbpedia - a large-scale, multilingual knowledge base extracted from wikipedia Semant. Web 2015 6 167-195
[44]
Liakata M, Saha S, Dobnik S, Batchelor C, and Rebholz-Schuhmann D Automatic recognition of conceptualization zones in scientific articles and two life science applications Bioinformatics 2012 28 7 991-1000
[45]
Liakata, M., Teufel, S., Siddharthan, A., Batchelor, C.R.: Corpora for the conceptualisation and zoning of scientific papers. In: LREC (2010)
[46]
Luan, Y., He, L., Ostendorf, M., Hajishirzi, H.: Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In: EMNLP (2018)
[47]
Lubani M, Noah SAM, and Mahmud R Ontology population: Approaches and design aspects J. Inf. Sci. 2019 45 4 502-515
[48]
Manghi, P., et al.: The OpenAIRE research graph data model (2019).
[49]
Mesbah S, Fragkeskos K, Lofi C, Bozzon A, and Houben GJ Blomqvist E, Maynard D, Gangemi A, Hoekstra R, Hitzler P, and Hartig O Semantic annotation of data processing pipelines in scientific publications The Semantic Web 2017 Cham Springer 321-336
[50]
Nasar Z, Jaffry SW, and Malik MK Information extraction from scientific articles: a survey Scientometrics 2018 117 3 1931-1990
[51]
Oelen, A., Jaradeh, M.Y., Farfar, K.E., Stocker, M., Auer, S.: Comparing research contributions in a scholarly knowledge graph. In: SciKnow@K-CAP (2019)
[52]
Okoli C A guide to conducting a standalone systematic literature review CAIS 2015 37 43
[53]
Peroni S and Shotton DM Fabio and cito: ontologies for describing bibliographic resources and citations J. Web Semant. 2012 17 33-43
[54]
Pertsas V and Constantopoulos P Scholarly ontology: modelling scholarly practices Int. J. Digit. Libr. 2016 18 3 173-190
[55]
Petasis G, Karkaletsis V, Paliouras G, Krithara A, and Zavitsanos E Paliouras G, Spyropoulos CD, and Tsatsaronis G Ontology population and enrichment: state of the art Knowledge-Driven Multimedia Information Extraction and Ontology Evolution 2011 Heidelberg Springer 134-166
[56]
Pujara, J., Singh, S.: Mining knowledge graphs from text. In: WSDM 2018 (2018)
[57]
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100, 000+ questions for machine comprehension of text. In: EMNLP (2016)
[58]
Ruiz Iniesta, A., Corcho, O.: A review of ontologies for describing scholarly and scientific documents. In: 4th Workshop on Semantic Publishing (SePublica) (2014)
[59]
Salatino, A.A., Thanapalasingam, T., Mannocci, A., Birukou, A., Osborne, F., Motta, E.: The computer science ontology: a comprehensive automatically-generated taxonomy of research areas. In: Data Intelligent (2019)
[60]
Singh, M., et al.: Ocr++: a robust framework for information extraction from scholarly articles. In: COLING (2016)
[61]
Soldatova LN and King RD An ontology of scientific experiments J. R. Soc. Interface 2006 3 795-803
[62]
Stocker M, Prinz M, Rostami F, and Kempf T Auer S and Vidal ME Towards research infrastructures that curate scientific information: a use case in life sciences Data Integration in the Life Sciences 2019 Cham Springer 61-74
[63]
Teufel, S., Siddharthan, A., Batchelor, C.: Towards discipline-independent argumentative zoning: evidence from chemistry and computational linguistics. In: EMNLP (2009)
[64]
Vahdati S, Fathalla S, Auer S, Lange C, and Vidal ME Doucet A, Isaac A, Golub K, Aalberg T, and Jatowt A Semantic representation of scientific publications Digital Libraries for Open Knowledge 2019 Cham Springer 375-379
[65]
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10) (2014)
[66]
de Waard, A., Tel, G.: The ABCDE format enabling semantic conference proceedings. In: SemWiki (2006)
[67]
Xiong, C., Power, R., Callan, J.P.: Explicit semantic ranking for academic search via knowledge graph embedding. In: WWW (2017)
[68]
Yaman, B., Pasin, M., Freudenberg, M.: Interlinking scigraph and dbpedia datasets using link discovery and named entity recognition techniques. In: LDK (2019)
Index Terms
- Requirements Analysis for an Open Research Knowledge Graph
Index terms have been assigned to the content through auto-classification.
Recommendations
Analysing the requirements for an Open Research Knowledge Graph: use cases, quality requirements, and construction strategies
AbstractCurrent science communication has a number of drawbacks and bottlenecks which have been subject of discussion lately: Among others, the rising number of published articles makes it nearly impossible to get a full overview of the state of the art ...
Open Research Knowledge Graph: Next Generation Infrastructure for Semantic Scholarly Knowledge
K-CAP '19: Proceedings of the 10th International Conference on Knowledge CaptureDespite improved digital access to scholarly knowledge in recent decades, scholarly communication remains exclusively document-based. In this form, scholarly knowledge is hard to process automatically. We present the first steps towards a knowledge ...
Comments
Information & Contributors
Information
Published In
Aug 2020
234 pages
ISBN:978-3-030-54955-8
DOI:10.1007/978-3-030-54956-5
© Springer Nature Switzerland AG 2020.
Publisher
Springer-Verlag
Berlin, Heidelberg
Publication History
Published: 25 August 2020
Author Tags
Qualifiers
- Article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 30 Jan 2025