Abstract
The OWL Reasoner Evaluation (ORE) Competition is an annual competition (with an associated workshop) which pits OWL 2 compliant reasoners against each other on various standard reasoning tasks over naturally occurring problems. The 2015 competition was the third of its sort and had 14 reasoners competing in six tracks comprising three tasks (consistency, classification, and realisation) over two profiles (OWL 2 DL and EL). In this paper, we outline the design of the competition and present the infrastructure used for its execution: the corpora of ontologies, the competition framework, and the submitted systems. All resources are publicly available on the Web, allowing users to easily re-run the 2015 competition, or reuse any of the ORE infrastructure for reasoner experiments or ontology analysis.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
The Web Ontology Language (OWL) is in its second iteration (OWL 2) and has seen significant adoption especially in Health Care and Life Sciences. OWL 2 DL can be seen as a variant of the description logic \(\mathcal {SROIQ}\), with the various other profiles being either subsets (e.g., OWL 2 EL) or extensions (e.g., OWL 2 Full). Description logics generally are designed to be computationally practical so that, even if they do not have tractable worst-case complexity for key services, they nevertheless admit implementations which seem to work well in practice [2]. Unlike in the early days of description logics or even of the direct precursors of OWL (DAML+OIL), the reasoner landscape for OWL is rich, diverse, and highly compliant with a common, detailed specification. Thus, we have a large number of high performance, production-quality reasoners with similar core capacities (with respect to language features and standard inference tasks).
Research on optimising OWL reasoning continues apace, though empirical work still lags both theoretical and engineering work in breadth, depth, and sophistication. There is, in general, a lack of shared understanding of test cases, test scenarios, infrastructure, and experiment design. A common strategy in research communities to help address these issues is to hold competitions, that is, experiments designed and hosted by third parties on an independent (often constrained, but sometimes expanded) infrastructure. Such competitions (in contrast to published benchmarks) typically do not directly provide strong empirical evidence about the competing tools. Instead, they serve two key functions: (1) they provide a clear, motivating event that helps drive tool development (e.g., for correctness or performance) and (2) components of the competition are useful for subsequent research. Finally, competitions can be great fun and help foster a strong community. They can be especially useful for newcomers by providing a simple way to gain some prima facie validation of their tools without the burden of designing and executing complex experiments themselves.
Toward these ends, we have been running a competition for OWL reasoners (with an associated workshop): The OWL Reasoner Evaluation (ORE) competition. ORE has been running, in substantively its current form, for three years, and this year it was held in conjunction with the 28th International Description Logic Workshop (DL 2015)Footnote 1 in June 2015. A report on the ORE 2015 competition results and analysis is under submsission [15]. In this paper we focus on the elements of the ORE 2015 that are reusable by the general public. To that end, we describe the design of the 2015 competition, which provides a reasonable default structure for reasoner comparison. We also describe the competition infrastructure: the corpora of ontologies, the competition framework, and the submitted systems. All resources are publicly available on the Web, allowing users to easily re-run the 2015 competition, or reuse any of the ORE infrastructure for reasoner experiments, benchmarks, debugging, or improvement, or for ontology analysis.
2 Competition Design
The ORE competition is inspired by and modeled on the CADE ATP System Competition (CASC) [16, 23] which has been running for 25 years and has been heavily influential in the automated theorem proving communityFootnote 2 (especially for first order logic). The key common elements between ORE and CASC are:
-
1.
A number of distinct tracks/divisions/disciplines characterised by problem type (e.g., “effectively propositional" or “OWL 2 EL ontology”).
-
2.
The test problems are derived from a large, neutral, updated yearly set of problems (e.g., for CASC, the TPTP library [22]).
-
3.
Reasoners compete (primarily) on number of problems solved with a tight per problem timeout.
The last point is worth some comment. Most evaluations of reasoner performance in the literature use some form of time in their evaluations (e.g., CPU or wall clock time). This has several advantages, including capturing the primary quantity of interest for most users in most situations. However, it is vulnerable to a number of problems esp. as one starts testing on large numbers of diverse ontologies. For example, one reasoner might perform very well on a large number of small ontologies and comparatively poorly on a few larger ones, whereas another might lose on all the small ones (due to start up overhead) and do much better on the larger ones. Yet, their averages and even medians might be similar.
More critically, timeouts severely distort aggregate statistics about time. If we include timeouts in the statistics, they crop times. That is, given a timeout of two hours, we cannot distinguish between a reasoner that would take two hours and one minute from one that would take days. Reasoner errors can cause similar issues. If we drop those times, buggy reasoners seem to do better. Even if we include them, less buggy reasoners that time-out, or just take longer to correctly finish than the buggy reasoners take to hit an error, will be penalised. Measuring problems solved does not fully ameliorate these problems and introduces some new ones, but it seems more robust for simple comparisons. So it serves as a better default experiment design.
As description logics have a varied set of core inference services supported by essentially all reasoners, ORE also has track distinctions based on task (e.g., classification or realisation). ORE 2015 had both a live as well as an offline competition. The offline competition is executed with more relaxed time constraints against user-submitted ontologies, while the live competition is executed with a tight timeout against a corpus of ontologies we constructed.
3 Ontology Corpora
In the following sections we present the publicly available corpora of ontologies constructed for the live competition and the user-submitted ontologies. Ontology pre-processing was done using the OWL API (v3.5.1) [4].
3.1 Live Competition Corpus
The full live competition corpus contains 1,920 ontologies, sampled from three source corpora: A January 2015 snapshot of Bioportal [12] containing 330 biomedical ontologies, the Oxford Ontology LibraryFootnote 3 with 793 ontologies that were collected for the purpose of ontology-related tool evaluation, and MOWLCorp [6], a corpus based on a 2014 snapshot of a Web crawl containing around 21,000 unique ontologies. As a first step, the ontologies of all three source corpora were collected and serialised into OWL/XML with their imports closure merged into a single ontology. The merging is, from a competition perspective, necessary to mitigate the bottleneck of loading potentially large imports repeatedly over the network, and because the hosts of frequently imported ontologies sometimes impose restrictions on the number of simultaneous accesses.Footnote 4 After the collection, the entire pool of ontologies is divided into three groups: (1) Ontologies with less than 50 axioms, (2) OWL 2 DL ontologies, and (3) OWL 2 Full ontologies. The first group was removed from the pool.
As reasoner developers could tune their reasoners towards the ontologies in the three publicly available source corpora, we included a number of approximations into our pool. The entire set of OWL 2 Full ontologies was approximated into OWL 2 DL, i.e., we used a (slightly modified) version of the OWL API profile checker to drop DL profile-violating axioms so that the remainder is in OWL 2 DL [8]. Because of some imperfections in the “DLification” process, this process had to be performed twice. For example, in the first round, the DL expressivity checker may have noted a missing declaration and an illegal punning. Fixing this would result in dropping the axiom(s) causing the illegal punning as well as injecting the declaration—which could result again in an illegal punning.
The OWL 2 DL group was then approximated into OWL 2 EL using the approximation method employed by TrOWL [17]. As some ontologies are included in more than one of the source corpora, we excluded at this point (as a last pre-processing step) all duplicatesFootnote 5 from the entire pool of ontologies, and removed ontologies with TBoxes containing less than 50 axioms. This left us with the full competition dataset of 1,920 unique OWL 2 DL ontologies. The full competition corpus can be obtained from Zenodo [9].
3.2 User Submitted Ontologies
We had four user submissions to ORE 2015, consisting of a total of 7 ontologies. The user submissions underwent the same pre-processing procedures as the corpus (Sect. 3.1). This had occasionally large consequences on the ontologies, most importantly with respect to rules (they were stripped out) and any axiom beyond OWL 2 DL (for example, axioms redefining built-in vocabulary or violating the global constraints on role hierarchies, see [8]).
We make all user submitted ontologies for which we have permission to redistribute, redistributable. Occasionally, we have some user submitted ontologies which are proprietary and so cannot be redistributed. On the one hand, we prefer all ontologies be fully shareable. On the other, we want the widest reach possible. Currently, the number of “restricted” ontologies that have been submitted are very few, so it seems worth the outreach. We do work with submitters to make those ontologies as accessible as possible. Some basic metrics for the ontologies can be found in Table 1. The ontology archive is published on Zenodo, and linked to from http://owl.cs.manchester.ac.uk/publications/supporting-material/ore-2015-report.
4 Competing Systems
There were 14 reasoners submitted with 11 purporting to cover OWL 2 DL, and 3 being OWL 2 EL specific. The set of competing systems (as submitted) is available on the Web.Footnote 11 In Table 2, we briefly summarize the participating reasoning systems. More detailed information about each reasoner can be found onlineFootnote 12 as well as in our recently conducted OWL reasoner survey [7]. The version information reflect the state of the system as it was submitted to ORE.
5 Test Framework
The test framework used in ORE 2015 is a slightly modified version of the one used for ORE 2014, which is implemented in Java, open sourced under the LGPL license, and versioned and distributed on Github.Footnote 13
The framework takes a “script wrapper” approach to execute reasoners, instead of, for example, requiring all reasoners to use (a specific version of) the OWL API. While this puts some extra burden on established reasoners with good OWL API bindings, this, combined with the requirement only to handle some OWL 2 standard syntax (with the very easy to parse and serialise functional syntax [11] as a common choice), makes it very easy for new reasoners to participate even if they are written in hard-to-integrate with the JVM languages. There is a standard script for OWL API based reasoners so it is fairly trivial to prepare an OWL API wrapped reasoner for competition. On the other hand, this is not necessarily a desirable outcome as encouraging reasoners to provide good OWL API support (thus supporting access to those reasoners by the plethora of tools which use the OWL API) is an outcome we want to encourage.
Reasoners report times, results, and any errors through the invocation script. Times are in wall clock time (CPU time is inappropriate because it will penalise parallel reasoners) and exclude “standard” parsing and loading of problems (i.e., without significant processing of the ontology). The framework enforces (configurable) timeouts for each reasoning problem. Results are validated by comparison between competitors with a majority vote/random tie-breaking fallback strategy.
The framework supports both serial and parallel execution of a competition. Parallel distributed mode is used for the live competition, but serial mode is sufficient for testing or offline experiments. The framework also logs sufficient information to allow “replaying" the competition, and includes scripts for a complete replay as well as jumping to the final results.
5.1 Usage
The framework project can be cloned with git,Footnote 14 after which users can use the build-evaluator script to build the project with Maven.Footnote 15 Subsequently, the configuration work will primarily focus on the data folder (the paragraphs below discuss folders within data), and minor changes to the scripts in the scripts folder may be necessary to modify the amount of memory allocated to the JVM.
Global settings. The global settings of the framework are defined in a file in the configs folder. Possible settings include the timeout (in milliseconds) for the execution each reasoner, the processing timeout (used to cut the reported processing time of the reasoners for the evaluation), the memory limit, output options, and execution options such as forcing the competition to only take place if there exists one client-machine available for each reasoner.
Competition settings. The competition settings are defined in files in the competitions folder. The key competition settings are its name, output folder, list of participating reasoners, query folder, and execution and processing timeouts.
Reasoner settings. Each reasoner under test needs to be accompanied with two elements: a starter script that the framework executes to start the reasoner, and a configuration file that defines the reasoner name, response output folder, starter script location, accepted ontology format, supported OWL 2 profiles, and whether the reasoner supports datatypes and rules. Multiple versions of the same reasoner can be benchmarked, so long as their respective configuration files differ in name and input-output information.
Inputs. The inputs for evaluating a competition are: a corpus of ontologies (which should be placed in the ontologies folder), and a collection of reasoners configured as above (each of which should be a folder within the reasoners folder). Next, the framework needs the queries that are meant to be evaluated in the benchmark; these files specify the reasoning task, and the ontology location, its profile(s), and whether it contains rules or datatypes. Query files are generated by the framework using the appropriate create-queries scripts for the task.
There is further documentation on the framework’s GitHub repository.
6 Conclusion
The ORE 2015 Reasoner Competition continues the success of its predecessors. And with it, the general public can benefit from the resources we presented here for their own experimentation. The ORE 2015 corpus, whether used with the ORE framework or in a custom test harness, is a significant and distinct corpus for reasoner experimentation. Developers can easily rerun this year’s competition with new or updated reasoners to get a sense of their relative progress and we believe that solving all the problems in that corpus in similar or somewhat relaxed time constraints is a reliable indicator of a very high quality implementation.
Ideally, the ORE toolkit and corpora will serve as a nucleus for an infrastructure for common experimentation. To that end, every relevant resource (from corpora to test framework) has been appropriately published, and where appropriate versioned, on the Web. The test harness seems perfectly well suited for black box head-to-head comparisons, and we recommend experimenters consider it before writing a home grown one. This will improve the reliability of the test harness as well as reproducibility of experiments. Even for cases where more elaborate internal measurements are required, the ORE harness can serve as the command and control mechanism. For example, separating actual calculus activity from other behavior (e.g., parsing) requires a deep delve into the reasoner internals. However, given a set of reasoners that could separate out those timings, it would be a simple extension to the harness to accommodate them.
Notes
- 1.
The websites for DL2015 and ORE2015 are archived at http://dl.kr.org/dl2015/ and https://www.w3.org/community/owled/ore-2015-workshop/ respectively.
- 2.
See the CASC website for details on past competitions: http://www.cs.miami.edu/~tptp/CASC/. Also of interest, though not directly inspirational for ORE, is the SAT competition http://www.satcompetition.org/.
- 3.
- 4.
Which may be exceeded considering that all reasoners in the competition run in parallel.
- 5.
Duplicates are those that are byte identical after being “DLified” and serialised into Functional Syntax.
- 6.
https://code.google.com/archive/p/dinto/. Submitted by María Herrero, Computer Science Department, Univesidad Carlos III de Madrid. Leganés, Spain.
- 7.
https://github.com/obophenotype/cell-ontology. Submitted by Dr. David Osumi-Sutherland, GO Editorial Office, European Bioinformatics Institute, European Molecular Biology Laboratory, Cambridge, UK.
- 8.
- 9.
- 10.
- 11.
See links in supplementary materials website: http://owl.cs.manchester.ac.uk/publications/supporting-material/ore-2015-report.
- 12.
- 13.
- 14.
- 15.
References
Glimm, B., Horrocks, I., Motik, B., Stoilos, G., Wang, Z.: HermiT: an OWL 2 reasoner. J. Autom. Reasoning 53(3), 245–269 (2014)
Gonçalves, R.S., Matentzoglu, N., Parsia, B., Sattler, U.: The empirical robustness of description logic classification. In: Proceedings of ISWC (2013)
Haarslev, V., Hidde, K., Möller, R., Wessel, M.: The RacerPro knowledge representation and reasoning system. Semant. Web J. 3(3), 267–277 (2012)
Horridge, M., Bechhofer, S.: The OWL API: a java API for OWL ontologies. Semant. Web J. 2(1), 11–21 (2011)
Kazakov, Y., Krötzsch, M., Simancik, F.: The incredible ELK - From polynomial procedures to efficient reasoning with EL ontologies. J. Autom. Reasoning 53(1), 1–61 (2014)
Matentzoglu, N., Bail, S., Parsia, B.: A snapshot of the OWL web. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 331–346. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41335-3_21
Matentzoglu, N., Leo, J., Hudhra, V., Sattler, U., Parsia, B.: A survey of current, stand-alone OWL reasoners. In: Proceedings of ORE (2015)
Matentzoglu, N., Parsia, B.: The OWL Full/DL gap in the field. In: Proceedings of OWLED (2014)
Matentzoglu, N., Parsia, B.: ORE 2015 reasoner competition dataset (2015). http://dx.doi.org/10.5281/zenodo.18578
Mendez, J.: jcel: A modular rule-based reasoner. In: Proceedings of ORE (2012)
Motik, B., Patel-Schneider, P.F., Parsia, B.: OWL 2 Web Ontology Language: Structural specification and functional-style syntax. In: W3C Recommendation (2009)
Noy, N.F., Shah, N.H., Whetzel, P.L., Dai, B., Dorf, M., Griffith, N., Jonquet, C., Rubin, D.L., Storey, M.A., Chute, C.G., Musen, M.A.: BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Res. 37, 170–173 (2009)
Osumi-Sutherland, D., Marygold, S.J., Millburn, G.H., McQuilton, P., Ponting, L., Stefancsik, R., Falls, K., Brown, N.H., Gkoutos, G.V.: The drosophila phenotype ontology. J. Biomed. Semant. 4(1), 1–10 (2013)
Palmisano, I.: JFact repository (2015). https://github.com/owlcs/jfact
Parsia, B., Matentzoglu, N., Gonçalves, R.S., Glimm, B., Steigmiller, A.: The OWL reasoner evaluation (ORE) 2015 competition report. J. Autom. Reasoning 53(3), 245–269 (2016). in submission
Pelletier, F., Sutcliffe, G., Suttner, C.: The development of CASC. AIC 15(2–3), 79–90 (2002)
Ren, Y., Pan, J.Z., Zhao, Y.: Soundness preserving approximation for TBox reasoning. In: Proceedings of AAAI (2010)
Armas Romero, A., Cuenca Grau, B., Horrocks, I.: MORe: modular combination of OWL reasoners for ontology classification. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012. LNCS, vol. 7649, pp. 1–16. Springer, Heidelberg (2012). doi:10.1007/978-3-642-35176-1_1
Sertkaya, B.: The ELepHant reasoner system description. In: Proceedings of ORE (2013)
Sirin, E., Parsia, B., Cuenca Grau, B., Kalyanpur, A., Katz, Y.: Pellet: a practical OWL-DL reasoner. J. Web Semant. 5(2), 51–53 (2007)
Steigmiller, A., Liebig, T., Glimm, B.: Konclude: system description. J. Web Semant. 27, 78–85 (2014)
Sutcliffe, G.: The TPTP problem library and associated infrastructure: the FOF and CNF parts, v3.5.0. J. Autom. Reasoning 43(4), 337–362 (2009)
Sutcliffe, G., Suttner, C.: The state of CASC. AIC 19(1), 35–48 (2006)
Thomas, E., Pan, J.Z., Ren, Y.: TrOWL: tractable OWL 2 reasoning infrastructure. In: Aroyo, L., Antoniou, G., Hyvönen, E., Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) ESWC 2010. LNCS, vol. 6089, pp. 431–435. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13489-0_38
Tsarkov, D., Horrocks, I.: FaCT++ description logic reasoner: system description. In: Furbach, U., Shankar, N. (eds.) IJCAR 2006. LNCS (LNAI), vol. 4130, pp. 292–297. Springer, Heidelberg (2006). doi:10.1007/11814771_26
Tsarkov, D., Palmisano, I.: Chainsaw: a metareasoner for large ontologies. In: Proceedings of ORE (2012)
Zhou, Y., Nenov, Y., Grau, B.C., Horrocks, I.: Pay-as-you-go OWL query answering using a triple store. In: Proceedings of AAAI (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Parsia, B., Matentzoglu, N., Gonçalves, R.S., Glimm, B., Steigmiller, A. (2016). The OWL Reasoner Evaluation (ORE) 2015 Resources. In: Groth, P., et al. The Semantic Web – ISWC 2016. ISWC 2016. Lecture Notes in Computer Science(), vol 9982. Springer, Cham. https://doi.org/10.1007/978-3-319-46547-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-46547-0_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46546-3
Online ISBN: 978-3-319-46547-0
eBook Packages: Computer ScienceComputer Science (R0)