Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Towards a Knowledge Graph for Science

2018, Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics

Towards a Knowledge Graph for Science Invited Article∗ Sören Auer Viktor Kovtun Manuel Prinz TIB Leibniz Information Centre for Science and Technology and L3S Research Centre at University of Hannover Hannover, Germany auer@tib.eu L3S Research Centre, Leibniz University of Hannover Hannover, Germany kovtun@l3s.de TIB Leibniz Information Centre for Science and Technology Hannover, Germany manuel.prinz@tib.eu Anna Kasprzik Markus Stocker TIB Leibniz Information Centre for Science and Technology Hannover, Germany anna.kasprzik@tib.eu TIB Leibniz Information Centre for Science and Technology Hannover, Germany markus.stocker@tib.eu ABSTRACT KEYWORDS The document-centric workflows in science have reached (or already exceeded) the limits of adequacy. This is emphasized by recent discussions on the increasing proliferation of scientific literature and the reproducibility crisis. This presents an opportunity to rethink the dominant paradigm of document-centric scholarly information communication and transform it into knowledgebased information flows by representing and expressing information through semantically rich, interlinked knowledge graphs. At the core of knowledge-based information flows is the creation and evolution of information models that establish a common understanding of information communicated between stakeholders as well as the integration of these technologies into the infrastructure and processes of search and information exchange in the research library of the future. By integrating these models into existing and new research infrastructure services, the information structures that are currently still implicit and deeply hidden in documents can be made explicit and directly usable. This has the potential to revolutionize scientific work as information and research results can be seamlessly interlinked with each other and better matched to complex information needs. Furthermore, research results become directly comparable and easier to reuse. As our main contribution, we propose the vision of a knowledge graph for science, present a possible infrastructure for such a knowledge graph as well as our early attempts towards an implementation of the infrastructure. Knowledge Graph, Science and Technology, Research Infrastructure, Libraries, Information Science ∗ This invited article accompanies Sören Auer’s WIMS2018 keynote. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. WIMS ’18, June 25ś27, 2018, Novi Sad, Serbia © 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-5489-9/18/06. . . $15.00 https://doi.org/10.1145/3227609.3227689 ACM Reference Format: Sören Auer, Viktor Kovtun, Manuel Prinz, Anna Kasprzik, and Markus Stocker. 2018. Towards a Knowledge Graph for Science: Invited Article. In WIMS ’18: 8th International Conference on Web Intelligence, Mining and Semantics, June 25ś27, 2018, Novi Sad, Serbia. ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3227609.3227689 1 INTRODUCTION The communication of scholarly information is document-centric. Researchers produce essays and articles that are made available in online and offline publication media as text documents. The entire library, technology, service and research landscape is geared towards this fundamental approach. This approach may indeed be justified if questions can be answered by individual articles. However, increasingly often answers do not just span multiple articles but also multiple scientific domains. In these cases researchers are not adequately supported by the existing infrastructure. Currently, in a best case scenario researchers obtain large, disordered amounts of more or less relevant documents, or more generally, digital objects. With the current developments in areas such as knowledge representation, semantic search, human-machine interaction, natural language processing, and artificial intelligence it has become possible to completely rethink this dominant paradigm of documentcentric information flows and transform scholarly communication into knowledge-based information flows by expressing and representing information as structured, interlinked and semantically rich knowledge graphs. Ehrlinger and Wöß [10] have discussed the term łknowledge graphž and proposed a definition based on an analysis of current related research work. According to the authors a łknowledge graph acquires and integrates information in an ontology and applies a reasoner to derive new knowledgež. The authors underline that łan ontology does not differ from a knowledge basež, meaning that following their definition a knowledge graph acquires and integrates information in a kind of knowledge base. Since it is not within WIMS ’18, June 25–27, 2018, Novi Sad, Serbia the scope of this article, we make no further attempt to refine this definition or provide an alternative. However, we suggest a specialization to science by proposing that a knowledge graph for science acquires and integrates scientific information in a knowledge base, and may apply a reasoner or other computational methods to derive new information. The variegated problems of document-centric information flows are fairly obvious. Among others, the expansion of scientific literature1 makes it increasingly difficult to keep an overview of the current state of research. Furthermore, the creation, reading, and processing of scientific literature is tying up cognitive capacity. The ambiguity, intransparency and redundancy of publications also contributed to a lack of reproducability of research culminating in the reproducibility crisis [18]. A further problem rests in the fact that although the characters, words, and sentences can be indexed and searched, the structure and semantics of text, illustrations, references, symbols, etc. are currently hardly accessible to computers. As our main contribution, we propose the vision of a knowledge graph for science, present a possible infrastructure for such a knowledge graph (Section 2) as well as our early attempts towards an implementation of this infrastructure (Section 3). We highlight some avenues for future work (Section 4) and provide a brief review of related work (Section 5). 2 SCIENCE GRAPH INFRASTRUCTURE The science graph is a knowledge graph for scholarly communication. It is the core of a socio-technical infrastructure that develops and maintains the graph and operates services. The science graph represents scientific information. It does not merely link (metadata about) people, documents, datasets, institutions, grants, etc. but rather represents research contributions semantically, i.e., explicitly and formally. While an ontology that conceptualizes research contributions is yet to be developed, a semantic description for a research contribution should, at the very least, relate the problems) addressed by the contribution with the methods used and results obtained. Problems, methods, and results are semantic resources themselves. For instance, a result such as the statement łThere is a significant difference in the mean duration of a phenomenon X (e.g., particle formation in the atmosphere) between winter and summer seasonsž is not merely a natural language sentence but an identified semantic resource in the knowledge graph. This resource is furthermore linked to the research contribution (and thus the authors and affiliation) and the methods used to obtain the result. The science graph is populated and curated by the infrastructure via four complementary sources. First, the infrastructure leverages existing metadata, data, taxonomies, ontologies, and information models. Second, it provides services that enable direct contributions from scientists who describe their research, supported by intelligent interfaces and automatically generated suggestions. Third, it implements automated methods for information extraction and linking. Fourth, it supports curation and quality assurance by domain experts, librarians and information scientists. We argue that in order to succeed it is necessary to combine these different sources and curatorial methods. Automated procedures 1 National Science Foundation: Science and Engineering Publication Output Trends: https://www.nsf.gov/statistics/2018/nsf18300/nsf18300.pdf S. Auer et al. do not achieve the necessary coverage and accuracy while fully manual curation is too time-consuming. Moreover, librarians and information scientists lack the necessary domain expertise while scientists lack the necessary expertise in knowledge representation. By combining the four strategies we can bring their respective strengths to bear and compensate for relative weaknesses. The science graph infrastructure provides services for interlinking, integration, visualization, exploration, and search. It enables scientists to gain a much faster overview of new developments in a specific field and identify relevant research problems. It represents the evolution of the scientific discourse in the individual disciplines and enables scientists to make their work more accessible to colleagues as well as partners in industry, policy, and society at large. We suggest that a minimally viable infrastructure must comprise the following technical components. First, a data model for representing scholarly communication semantically. The data model can adopt RDF and Linked Data as a scaffold, but must add comprehensive provenance, evolution, and discourse information. Second, the infrastructure must include a scalable graph-storage backend to store information and expose a comprehensive API for interacting with the knowledge graph. Third, we require user interface widgets and components for collaborative authoring and curation of the graph and integration of these widgets into third-party services. Finally, the infrastructure must support semi-automated semantic integration, search, extraction, and recommendation services to support the curation of the knowledge graph. 3 INFRASTRUCTURE IMPLEMENTATION At its core, the infrastructure consists of a scalable data management system with a flexible graph-based data model that can be accessed via lightweight APIs. To ensure maximum interoperability, it implements the long-established open standards RDF, RDF Schema, OWL, and Linked Data as well as W3C Data on the Web and the FAIR Data Principles. A central aspect is the preservation of provenance and evolution, so that changes can be tracked transparently at any time. The user interface supports flexible elements, which can be contributed by advanced users themselves to enable customized domain-specific interactions. 3.1 Ontology As already stated by Ehrlinger and Wöß [10], ontologies are core elements of a knowledge graph insofar as that all information that is acquired as an input for the graph is integrated into a network of ontologies underlying the graph. In information science, the term łontologyž has many definitions ś a majority of them build on the formulation originally proposed by Gruber [14] who defined an ontology as łan explicit specification of a conceptualizationž. Thus, since ontologies are the base of the conceptualization of scholarly communication they are a core element of the science graph infrastructure. Attempts to łconceptualize sciencež using methods from knowledge engineering are, predictably, not entirely novel, and there are already several existing suggestions for ontologies trying to cover the scientific research process.2 However, fundamental questions such as: łWhat is research?ž, łWhat are 2 https://derivadow.com/2011/04/19/science-ontology-take-three/ Towards a Knowledge Graph for Science WIMS ’18, June 25–27, 2018, Novi Sad, Serbia GraphQL Application Services Blazegraph ... Amazon Neptune Persistence Neo4j Authentication & Authorization Domain Domain Model Application REST API SPARQL endpoint Figure 1: Architecture Diagram. The figure shows all layers and adapters. Possible adapters or storage solutions that are not currently available are displayed with dashed lines. the contents of scholarly communication?ž, łWhat are the relevant components of a research contribution?ž are notoriously difficult to answer and answers are continuously hard to formalize. We have decided to set out focusing on łresearch contributionž as an abstract central concept of a possible top-level ontology, postponing considerations of feasibility and usefulness until we have gathered more experience in order to tackle those questions. We suggest that a research contribution communicates one or more results in an attempt to address one or more problems using a set of methods. This nucleus is certainly subject to review and extension. The next necessary step will be to define specific and adequate knowledge engineering workflows for the development of a core ontology (or, more extensively, a network of top-level and domain ontologies) that can be used as the base of the science graph infrastructures in order to support the storing of information. It is fairly obvious that any abstract concept and top-level ontology will need to be specialized and branched out for different fields of science. What we call łproblemž may be more commonly known as łhypothesisž in the natural sciences and łresearch questionž in engineering. Furthermore, these specialized concepts may be conceptualized differently, i.e., feature different attributes and accordingly entail different conclusions concerning their subconcepts. It is even less obvious how concepts are established and determined. A top-down approach whereby a small group of experts designs top-level and domain-specific ontologies as well as their alignment with existing ontologies seems to be a daunting task with uncertain outcome. A bottom-up approach whereby concepts, relations, and conceptualizations are crowd-sourced, thus emerging from the submission of semi-structured data by researchers, may be an interesting way to derive an ontology but comes with its own challenges, for instance the acquisition of the resources that are needed for a continuous curation and formalization of the submitted data. The design of a practical ontology engineering workflow that finds the right balance between those options and incorporates as many of their positive aspects as possible will need careful attention and should occupy a large portion of the next phase in the construction of the science knowledge graph. 3.2 Backend The backend features a layered architecture consisting of three layers: application layer, domain layer, and persistence layer. Figure 1 provides an overview of the layers and components. Inspired by the Hexagonal Architecture [8], the application layer contains ports and adapters. These implement the interface to the outside world and contain the application logic needed so that clients can access the information contained in the knowledge graph. The domain layer contains the domain model from which the knowledge graph is built. It also contains the authentication and authorization component that allows us to store provenance information as part of the domain model. As the lowest layer in the architecture, the persistence layer is responsible for storing all data. Since we currently evaluate different storage options, the persistence layer includes abstractions that can be implemented differently for the respective storage solutions. The backend is implemented around a data model that builds on RDF. Hence, triples consisting of a subject, a predicate and an object are the elementary entities. They also contain provenance information, such as the time of creation and the author. Resources are entities identified by an ID and carry a label for display purposes. Subjects and predicates are resources and are always referenced by their ID. Objects are either resources or (typed) literal values. All data inserted into the knowledge graph will be made persistent via a layer that is agnostic of any specific storage technology. The compatibility of the data model with RDF means that data can be translated from and to RDF so that as storage technology one could use an off-the-shelf triple store. However, since we also want to make statements about triples which are not well supported in RDF we decided to use a linked property graph (LPG) instead, and accordingly our current implementation uses Neo4j. Data modifications are preserved and can be queried. Currently, we only allow additions and deletions. Data can be modified and queried via a REST API implemented as an adapter of the application layer. We adopt JSON as the serialization format. Other possible adapters include a SPARQL endpoint or a GraphQL interface. The knowledge graph can be queried openly and without registration. However, users are required to register in order to contribute data to the knowledge graph. Possible queries include the search for resources by ID or label, or the retrieval of lists of statements filtered by resource identifiers. The REST API is currently used by the frontend to power a user interface to the knowledge graph, as well as visualizations. 3.3 Frontend The user interface provides access to the knowledge graph, specifically research contribution descriptions and resources they related to, currently in two primary forms: hierarchical and graphical. The main page of the user interface includes a search form to allow a search for resources by their labels. The resulting resources are then displayed in both graphical and hierarchical form. At the top, the user interface visualizes the graphical form (Figure 2). It displays the found or selected resources as well as their relations to other resources. The graph visualized in this form can be navigated: Nodes can be selected to display related information. The interface supports navigating from node to node while WIMS ’18, June 25–27, 2018, Novi Sad, Serbia S. Auer et al. Knowledge Engineering Design of ontologies problems problems Gruber’s design of ontologies methods Case studies results Design criteria for ontologies Figure 2: Graphical representation of a research contribution as a knowledge graph information about the currently selected node is automatically updated. While navigating, the displayed part of the graph is updated as well so that the user is presented with the nodes that have direct connections to the currently selected one, while other nodes disappear. In the hierarchical view, the information is displayed in the form of nested lists (Figure 3). Here, the found or selected resources are elements of the outer list while the inner list contains the related resources and literals. Hence, the links in the hierarchical view present an additional way to navigate information. Each link triggers the selection of the corresponding resource. The information about the selected resource is reflected both in the hierarchical and graph views. The hierarchical view enables users to add new information to the knowledge graph. Submitting information should be straightforward in order to reduce the burden on researchers. Currently, and subject to improvements, each top level resource in the hierarchy displays a button that provides the functionality to add related resources (e.g., a relationship to another resource or literal). When adding a new resource, the user is asked to provide the literal or the title for the resource as well as the relation between them. 4 FUTURE WORK The work presented here delineates our initial steps towards a knowledge graph for science. By testing existing and developing new components, we have so far focused on some core technical aspects of the infrastructure. Naturally, there are a number of research problems and implementation issues as well as a range of socio-technical aspects that need to be addressed in order to realize the vision. Dimensions of open challenges are, among others: • the low-threshold integration of researchers through methods of crowd-sourcing, human-machine interaction, and social networks; • automated analysis, quality assessment, and completion of the knowledge graph as well as interlinking with external sources; • support for representing fuzzy information, scientific discourse and the evolution of knowledge; Figure 3: Hierarchical representation of two research contributions as a knowledge graph • development of new methods of exploration, retrieval, and visualization of knowledge graph information. Several projects have demonstrated how to represent general encyclopedic and factual information in knowledge graphs (see Section 5). An open challenge is how to represent scholarly communication in specialized fields of science. Since precise conceptual structures emerge and evolve over time, the representation of discourse, opinion-forming, and evolution is of particular interest. A knowledge graph for science needs to accommodate fuzzy definitions, diverging opinions, and competing conceptualizations. The integration of information from documents is critical and relies on natural language processing and information mining methods from text, image, and other media. The maturity of current methods is arguably insufficient to construct a rich knowledge graph from legacy documents in an automated manner. The science graph infrastructure thus relies on numerous complementary approaches to acquire information. Researchers are an important source but must be supported with automated suggestions and recommendations for populating the graph in order to reduce the manual effort. Another open challenge is the question of how to organize collaboration and interaction among researchers, librarians, information scientists, and knowledge engineers. An integration of particular interest is the one between the science graph infrastructure and publishers. Addressing the issue of when to best capture contributions to the knowledge graph by Towards a Knowledge Graph for Science researchers, a collaboration with publishers could enable the acquisition of contributions at the time of article submission through the respective submission systems. Existing scholarly communication incentive measures (e.g., citations, h/i-10 index, impact factor) are document-centric and thus rather coarse-granular. Clearly, we need incentive models for contributions to the knowledge graph for science. We argue that graphcentric measures are an opportunity for a more accurate assessment of scholarly contributions. 5 RELATED WORK Knowledge graphs such as DBpedia [2], Yago [16] and WikiData [25] as well as similar industrial initiatives by Google, Bing, IBM, BBC, or Thomson Reuters have demonstrated that representing encyclopedic and factual knowledge using RDF and Linked Data is feasible. However, while there has been a vast amount of work related to representing and managing bibliographic metadata, relatively little work focuses on representing the information contained inside scientific publications semantically. The Semantic Publishing and Referencing (SPAR) Ontologies [21] focus primarily on metadata but also on document structure to some extent. There has been some work on enriching various document formats with semantic annotations. Examples include Dokie.li [6], RASH [22] or MicroPublications [7] for HTML and SALT [13] for LaTeX. We started representing key findings of survey articles focusing on semantically describing research problems, approaches, implementations and evaluations in [11] and integrating bibliographic information in a knowledge graph [23]. Other work focused on developing ontologies for representing scholarly knowledge in specific domains, for example mathematics [19], the RXNO ontology in chemistry or the OBO Foundry ontologies [24] in the life sciences. A knowledge graph for science must go beyond such efforts, by enabling the parallel and synchronized creation, curation and augmentation of both terminological/ontological as well as assertional and discourse knowledge. For representing provenance and discourse we can build on the PROV ontology [20] and Document Components Ontology [9]. While there has been work on argumentation and reasoning in AI (e.g. [3, 12]) and philosophy (often using specialized formalisms), more work needs to be done to represent argumentation, concept drift and scholarly knowledge evolution in knowledge graphs. The RDF data model and respective ontologies arguably appear adequate as a scaffold for representing scholarly knowledge. However, aspects such as provenance, evolution and discourse are more difficult to represent in pure RDF (see the ongoing discussion about reification). While there are meanwhile relatively elegant solutions such as RDF singleton properties [26], which can be used for representing and exchanging semantic data, we need to investigate how graph data management techniques (e.g. using the Gremlin graph query algebra [17]) can be employed for storing and managing the extremely large amounts of interconnected scholarly communication data and metadata. Hence, we argue that a knowledge graph for science can build but must extend the triple (or quad) data model in RDF. The scholarly communication community has initiated numerous related projects. The Research Graph [1] is a prominent example WIMS ’18, June 25–27, 2018, Novi Sad, Serbia for an effort that aims to link research objects, in particular publications, dataset, researcher profiles. The Scholix project [5], driven by a corresponding Research Data Alliance working group and associated organizations, aims at standardizing the information about the links between scholarly literature and data exchanged among publishers, data repositories, and infrastructures such as DataCite, Crossref, and OpenAIRE. Other related projects include Research Objects [4], which proposes a machine readable abstract structure that relates the products of a research investigation, including articles but also data and other research artefacts, as well as the RMap Project [15], which aims at preserving łthe many-to-many complex relationships among scholarly publications and their underlying data.ž 6 CONCLUSIONS The transition from purely document-centric to a more knowledgebased view on scholarly communication is in line with the current digital transformation of information flows in general and is thus inevitable. However, this also creates a need for the implementation of corresponding tools and workflows supporting the switch. As of now, there are still very few of those tools, and their design and concrete features remain a challenge that is yet to be tackled ś collaboratively and in a coordinated manner. ACKNOWLEDGMENTS The authors would like to thank the participants of a related workshop held at TIB on March 20, 2018 for their contributions to current developments on the Open Research Knowledge Graph, a project recently initiated and coordinated by TIB. REFERENCES [1] Amir Aryani and Jingbo Wang. 2017. Research Graph: Building a Distributed Graph of Scholarly Works using Research Data Switchboard. In Open Repositories CONFERENCE (2017-06-01). https://doi.org/10.4225/03/58c696655af8a [2] Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. In The Semantic Web. 722ś735. https://doi.org/10.1007/978-3-540-76298-0_52 [3] Pietro Baroni, Marco Romano, Francesca Toni, Marco Aurisicchio, and Giorgio Bertanza. 2015. Automatic evaluation of design alternatives with quantitative argumentation. Argument & Computation 6, 1 (2015), 24ś49. https://doi.org/10. 1080/19462166.2014.1001791 [4] Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, and Carole Goble. 2013. Why linked data is not enough for scientists. Future Generation Computer Systems 29, 2 (2013), 599ś611. https://doi.org/10.1016/j. future.2011.08.004 Special section: Recent advances in e-Science. [5] Adrian Burton, Hylke Koers, Paolo Manghi, Markus Stocker, Martin Fenner, Amir Aryani, Sandro La Bruzzo, Michael Diepenbroek, and Uwe Schindler. 2017. The Scholix Framework for Interoperability in Data-Literature Information Exchange. D-Lib Magazine Volume 23, 1/2 (2017). https://doi.org/10.1045/ january2017-burton [6] Sarven Capadisli, Amy Guy, Ruben Verborgh, Christoph Lange, Sören Auer, and Tim Berners-Lee. 2017. Decentralised Authoring, Annotations and Notifications for a Read-Write Web with dokieli. In International Conference on Web Engineering. 469ś481. https://doi.org/10.1007/978-3-319-60131-1_33 [7] Tim Clark, Paolo N Ciccarese, and Carole A Goble. 2014. Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications. Journal of Biomedical Semantics 5, 1 (2014). https://doi.org/10. 1186/2041-1480-5-28 [8] Alistair Cockburn. 2018. Hexagonal architecture. http://alistair.cockburn.us/ Hexagonal+architecture [9] Alexandru Constantin, Silvio Peroni, Steve Pettifer, David Shotton, and Fabio Vitali. 2016. The document components ontology (DoCO). Semantic Web 7, 2 (2016), 167ś181. https://doi.org/10.3233/sw-150177 WIMS ’18, June 25–27, 2018, Novi Sad, Serbia [10] Lisa Ehrlinger and Wolfram Wöß. 2016. Towards a Definition of Knowledge Graphs. In Joint Proceedings of the Posters and Demos Track of the 12th International Conference on Semantic Systems - SEMANTiCS2016 and the 1st International Workshop on Semantic Change & Evolving Semantics (SuCCESS’16), Michael Martin, Martí Cuquet, and Erwin Folmer (Eds.), Vol. 1695. CEUR-WS, Leipzig, Germany. http://ceur-ws.org/Vol-1695/paper4.pdf [11] Said Fathalla, Sahar Vahdati, Sören Auer, and Christoph Lange. 2017. Towards a Knowledge Graph Representing Research Findings by Semantifying Survey Articles. In Research and Advanced Technology for Digital Libraries. 315ś327. https://doi.org/10.1007/978-3-319-67008-9_25 [12] Thomas F. Gordon and Nikos Karacapilidis. 1997. The Zeno argumentation framework. In Proceedings of the sixth international conference on Artificial intelligence and law - ICAIL ’97. ACM, 10ś18. https://doi.org/10.1145/261618.261622 [13] Tudor Groza, Siegfried Handschuh, Knud Möller, and Stefan Decker. 2007. SALT - Semantically Annotated LaTeX for Scientific Publications. In Extended Semantic Web Conference. 518ś32. https://doi.org/10.1007/978-3-540-72667-8_37 [14] Thomas R. Gruber. 1993. A translation approach to portable ontology specifications. Knowledge Acquisition 5, 2 (June 1993), 199ś220. https://doi.org/10.1006/ knac.1993.1008 [15] Karen L. Hanson, Tim DiLauro, and Mark Donoghue. 2015. The RMap Project: Capturing and Preserving Associations Amongst Multi-Part Distributed Publications. In Proceedings of the 15th ACM/IEEE-CE on Joint Conference on Digital Libraries - JCDL ’15. ACM, 281ś282. https://doi.org/10.1145/2756406.2756952 [16] Johannes Hoffart, Fabian M. Suchanek, Klaus Berberich, and Gerhard Weikum. 2013. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence 194 (2013), 28ś61. https://doi.org/10.1016/ j.artint.2012.06.001 [17] Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino, Hagen Fürstenau, Manfred Pinkal, Marc Spaniol, Bilyana Taneva, Stefan Thater, and Gerhard Weikum. 2011. Robust Disambiguation of Named Entities in Text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP ’11). Association for Computational Linguistics, Stroudsburg, PA, USA, 782ś792. http://dl.acm.org/citation.cfm?id=2145432.2145521 [18] John P. A. Ioannidis. 2005. Why Most Published Research Findings Are False. PLOS Medicine 2, 8 (08 2005). https://doi.org/10.1371/journal.pmed.0020124 [19] Christoph Lange. 2013. Ontologies and languages for representing mathematical knowledge on the Semantic Web. Semantic Web 4, 2 (2013), 119ś158. https: //doi.org/10.3233/SW-2012-0059 [20] Timothy Lebo, Satya Sahoo, Deborah McGuinness, Khalid Belhajjame, James Cheney, David Corsar, Daniel Garijo, Stian Soiland-Reyes, Stephan Zednik, and Jun Zhao. 2013. PROV-O: The PROV Ontology. Recommendation. W3C. [21] Silvio Peroni. 2014. The Semantic Publishing and Referencing Ontologies. In Semantic Web Technologies and Legal Scholarly Publishing. Law, Governance and Technology, Vol. 15. Springer, Cham, 121ś193. https://doi.org/10.1007/ 978-3-319-04777-5_5 [22] Silvio Peroni, Francesco Osborne, Angelo Di Iorio, Andrea Giovanni Nuzzolese, Francesco Poggi, Fabio Vitali, and Enrico Motta. 2017. Research Articles in Simplified HTML: a Web-first format for HTML-based scholarly articles. PeerJ Computer Science 3 (2017), e132. https://doi.org/10.7717/peerj-cs.132 [23] Afshin Sadeghi, Christoph Lange, Maria-Esther Vidal, and SÃűren Auer. 2017. Integration of Scholarly Communication Metadata Using Knowledge Graphs. In Research and Advanced Technology for Digital Libraries. 328ś341. https: //doi.org/10.1007/978-3-319-67008-9_26 [24] Barry Smith, Michael Ashburner, Cornelius Rosse, Jonathan Bard, William Bug, Werner Ceusters, Louis J Goldberg, Karen Eilbeck, Amelia Ireland, Christopher J Mungall, Neocles Leontis, Philippe Rocca-Serra, Alan Ruttenberg, SusannaAssunta Sansone, Richard H Scheuermann, Nigam Shah, Patricia L Whetzel, and Suzanna Lewis. 2007. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nature Biotechnology 25, 11 (2007), 1251ś1255. https://doi.org/10.1038/nbt1346 [25] Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: A Free Collaborative Knowledgebase. Commun. ACM 57, 10 (2014), 78ś85. https://doi.org/10.1145/ 2629489 [26] Daya C. Wimalasuriya and Dejing Dou. 2010. Ontology-based information extraction: An introduction and a survey of current approaches. Journal of Information Science 36, 3 (2010), 306ś323. https://doi.org/10.1177/0165551509360123 S. Auer et al.