Keywords

1 Introduction

Declarative business process specification approaches rely on a normative description of processes’ behaviour by means of inviolable rules. Those rules, named constraints, exert restrictions on the process behaviour. Declare [1], Dynamic Condition Response Graphs (DCR Graphs) [24], and the Case Management Model and Notation (CMMN)Footnote 1 are examples of such declarative process specification languages. Declarative process discovery is the branch of the process mining discipline [2] aimed at extracting the constraints that specify a process by their verification over execution data, namely event logs. To date, research in the area has mainly focused on devising of efficient algorithms for mining constraints verified in the event log [9, 11, 14, 32]. All of these approaches return constraints that are never (or rarely) violated in the event log, thus implicitly assuming that rarity corresponds to noise. Similarly, approaches have been proposed to verify the compliance and conformance of event logs with declarative process specifications [28,29,30, 39], indicating whether and to what extent constraint sets are satisfied by process executions.

However, it is often not sufficient to solely assess whether constraints are satisfied or not. Especially in cases with rare exceptions, knowing when and why certain constraints are violated is key. Those circumstances apply in many scenarios: To mention but a few, inspecting the root cause of enactment errors [15], deriving digital evidence of malicious behaviour [25], identifying the origin of drifts in the process [31], or identifying illicit inflows into digital currency services [26].

Against this background, we leverage semantic technologies for the verification and querying of event logs, in an approach that (i) checks whether they comply with provided declarative process specifications, and (ii) if not, reports on the cause of violations at a fine-grained level. We evaluate our technique by applying our prototype to real-world event logs. We empirically show the insights into detected violations provided by detailed reports. The opportunities brought about by the use of semantic technologies go well beyond the promising results presented in this vision paper. We thus endow the discussion on the evaluation of our approach with considerations on potential improvements over the current achievements, especially in regard to further semantic analyses of the current results.

The remainder of the paper is structured as follows. Section 2 provides an overview of the literature on declarative process mining and semantic technologies. Section 3 describes our approach, i.e., its workflow and components, from the bare event log to its semantic representation and reasoning. Section 4 evaluates and discusses the feasibility of the approach against real-life logs. Section 5 concludes the paper and highlights future research directions.

2 Background and State of the Art

Our approach draws on two main strands of research: on the one hand, it is grounded in state-of-the-art process mining techniques for declarative specifications; on the other hand, it adopts validation concepts developed in the semantic web research community and applies them to fine-grained compliance checking.

Declarative Process Mining. A declarative process specification defines a process behaviour through a set of temporal rules (constraints) that collectively determine the allowed and forbidden traces. Specifically, constraints exert conditions that allow or forbid target tasks to be executed, depending on the occurrence of so-called activation tasks. A declarative specification allows for any process executions that do not violate those constraints. Each constraint is defined using a template that captures the semantics of the constraint, parametric to the constrained tasks. The number of parameters denote its cardinality. Declare [1] is a well-established declarative process specification language. Its semantics are expressed in Linear Temporal Logic on Finite Traces (LTL\(_{f}\)) [10]. It offers a repertoire of templates extending that of the seminal paper of Dwyer et al. [16]. Table 1 shows some of those. \(\textsc {Response}\), for example, is a binary template stating that the occurrence of the first task imposes the second one to occur eventually afterwards. applies the \(\textsc {Response}\) template to task and , meaning that each time that occurs in a trace, is expected to occur afterwards. is subsumed by as it restricts the statement by adding that cannot recur before . switches activation and target as well as the temporal dependency: can occur only if occurred before. establishes that after , cannot occur any more. Finally, is a constraint stating that in every process execution, must occur. It applies the \(\textsc {Participation}\) unary template.

The fundamental input for process mining is the event log, i.e., a collection of recorded process executions (traces) expressed as sequences of events. For the sake of simplicity, we assume events to relate to single tasks of a process. The eXtensible Event Stream (XES) format is an IEEE standardFootnote 2 for the storage and exchange of event logs and event streams, based on XML. In declarative process mining, constraints are verified over event logs to measure their support (i.e., how often they are satisfied in the traces), and optionally their confidence (i.e., how often they are satisfied in traces in which their activation occurs), and interest factor (i.e., how often they are satisfied in traces in which both activation and target occur) [14, 32]. Table 1 reports traces that satisfy or violate the aforementioned constraints. For instance, a trace like satisfies (increasing its mining measures) but violates (decreasing it). Although a trace like satisfies , it does not contribute to its confidence or interest factor, since the activation ( ) does not occur. Support is thus an aggregate measure for compliance as those traces that do not comply with a constraint, decrease its value.

Table 1. A collection of Declare constraints with their respective natural language explanation, and examples of accepted ( ) or violating (\(\times \)) traces.

Semantic Web Technologies provide a comprehensive set of standards for the representation, decentralised linking, querying, reasoning about, and processing of semantically explicit information. We mainly base our approach on two elements of the semantic web technology stack: (i) the Resource Description Framework (RDF), which we use as a uniform model to express process information, declarative constraints, and validation reports, and (ii) the SHACL Shapes Constraint Language (SHACL), which provides a constraint specification and validation framework on top of RDF. In the following, we provide a brief overview of both these standards.

RDF is a data representation model published by the World Wide Web Consortium (W3C) as a set of recommendations and working group notes.Footnote 3 It provides a standard model for expressing information about resources, which in the context of the present paper represent traces, tasks, events, actors, etc. An RDF dataset consists of a set of statements about these resources, expressed in the form of triples \(\langle s,p,o \rangle \) where s is a subject, p is a predicate, and o is an object; s and o represent the two resources being related whereas p represents the nature of their relationship. Formally, we define an RDF graph G as a set of RDF triples \((v_1, v_2, v_3) \in (U \cup B \cup L) \times U \times (U \cup B \cup L)\) where: U is an infinite set of constant values (called RDF Uniform Resource Identifier (URI) references); B is an infinite set of local identifiers without defined semantics (called blank nodes); and L is a set of values (called literals). For instance, a trace expressed in RDF may contain a statement such as to describe the temporal relationship between two particular events. A key advantage of RDF as a data model is its extensible nature, i.e., additional statements about and can be added at any time, adding concepts and predicates from various additional, potentially domain-specific vocabularies.

Furthermore, ontology specification languages such as the Web Ontology Language (OWL)Footnote 4 can be used to more closely describe the semantic characteristics of terms. Although beyond the scope of the present paper, this extensible semantic web technology stack opens up opportunities for the use of reasoners in compliance checking and in the interpretation of violations (e.g., generalisations, root cause analyses, etc.) [4, 17]. Other benefits of RDF include interoperability, i.e., information expressed in RDF using shared vocabularies can be exchanged between applications without loss of meaning. Furthermore, it makes it possible to apply a wide range of general purpose RDF parsing, mapping, transformation, and query processing tools.

Finally, once transformed into RDF, information (such as process information and compliance checking rules) can be easily published online, interlinked, and shared between applications and organisations, which is particularly interesting in the context of collaborative processes.

SHACL is a W3C Recommendation and language for validating RDF graphs against a set of conditions.Footnote 5 These conditions are specified in the form of RDF shape graphs which can be applied to validate data graphs. SHACL thereby provides a flexible declarative mechanism to define arbitrary validity constraints on RDF graphs. Specifically, these constraints can be formulated using a range of constraint components defined in the standard, or as arbitrary constraints implemented in SPARQL Protocol and RDF Query Language (SPARQL).Footnote 6 Validation of a data graph against a shapes graph produces a validation report expressed in RDF as the union of results against all shapes in the shapes graph. Importantly, compliance checking in SHACL is performed gradually by validating each individual so-called focus node against the set of constraints that apply to it. Compliance checks therefore produce detailed validation reports that not only conclude whether a data graph conforms globally, but also reports on the specific violations encountered. In the context of process compliance checking, this provides a foundation to not only determine whether a process trace conforms, but to also report on the specific declarative rules that are violated.

Semantic Approaches to Compliance Checking. Several approaches have been developed to support organisations in their compliance efforts, each offering their own functional and operational capabilities [5, 22]. The underlying frameworks, such as Process Compliance Language (PCL) [20], Declare [1], and BPMN-Q [3], offer different levels of expressiveness, and consequently, varying reasoning and modelling support for normative requirements [23]. Three main strategies for business process compliance checking have been proposed in the literature to date [22]: (i) runtime monitoring of process instances to detect or predict compliance violations (runtime compliance checking, online monitoring, compliance monitoring); (ii) design-time checks for compliant or non-compliant behaviour during the design of process models; (iii) auditing of log files in an offline manner, i.e., after the process instance execution has finished. Based on Declare, techniques such as the one described in [34] can discover declarative models, use them to check process compliance, and apply the model constraints in online monitoring. Presently, however, all reported result are at the granularity of a trace, without pointing at the cause. Remarkably, Maggi et al. [33] apply the outcome of that technique to repair BPMN [15] process models according to the constraints to which an event log is not compliant.

Fig. 1.
figure 1

SHACLare components architecture

In our approach, we implement Declare constraints with semantic technologies. Related work on semantic approaches in process mining include Thao Ly et al. [38], which presents a vision towards life-time compliance. The authors motivate the creation of global repositories to store reusable semantic constraints (e.g., for medical treatment). and advocate the importance of intelligible feedback on constraint violations. This includes the reasons for the violations, to help the user to devise strategies for conflict avoidance and compensation. Furthermore, the results of semantic process checks must be documented. In our paper, we address these requirements. To reason about the data surrounding a task, semantic annotations of tasks are introduced in [19, 21] as logical statements of the PCL language over process variables, to support business process design. Governatori et al. [18] show how semantic web technologies and languages like LegalRuleML (based on RuleML)Footnote 7 can be applied to model and link legal rules from industry to support humans in analysing process tasks over legal texts. They apply semantic annotations in the context of a regularity compliance system. To that end, they extend their compliance checker Regorous [21] with semantics of LegalRuleML. An approach to include additional domain knowledge to the process perspective is introduced in [39]. The authors extract additional information from domain models and integrate this knowledge in the process compliance check. Pham et al. [36] introduce an ontology-based approach for business process compliance checking. They use a Business Process Ontology (BPO) and a Business Rules Ontology (BRO) including OWL axioms and SWRLFootnote 8 rules to model compliance constraints. They do not, however, systematically translate compliance rules as in our approach, and mention that performance is a key problem with standard reasoners. Our focus is different from the work presented in that our main aim is not to validate process models against rules, but to verify declarative specifications on existing process executions.

A Folder-Path enabled extension of SPARQL called FPSPARQL for process discovery is proposed in [6]. Similarly, Leida et al. [27] focus on the advantages of process representation in RDF and show the potential of SPARQL queries for process discovery. Our approach also builds on semantic representations of process data but explores how compliance checking can be conducted with semantic technologies.

Table 2. Some Declare constraints discovered by minerful [14] from the Sepsis log [35]

3 Approach

Our approach applies a set of semantic technologies, including SHACL on event logs to check their compliance with Declare constraints.Footnote 9 As depicted in Fig. 1, the SHACLare compliance checking workflow is composed as follows. ➀ We generate SHACL compliance constraints for a target process from a set of Declare constraints. To this end, we translate the Declare constraints through respective SHACL templates. ➁ Because SHACL constraints can only be validated against RDF graphs, we next transform the event log into RDF. This step takes an event log in XES format as input and produces a traces file in RDF. ➂ At this point, we have all necessary inputs to perform the SHACL compliance check, which results in a detailed report on each constraint violation.

➀  to SHACL. In this first step, we generate SHACL compliance constraints for a target process. As input, we take a set of Declare constraints, which can be automatically discovered with tools such as minerful [12]. Table 2 shows an excerpt of the declarative constraints retrieved by minerful from the real-life Sepsis event log [35]. We then use the RDF Mapping language (RML)Footnote 10 to map each constraint (e.g., ) to the respective Declare RDF data model elements (available at http://semantics.id/ns/declare).

figure cd

To enable SHACL validation on Declare constraints, we developed a complete set of SHACL templates to represent the Declare ones. Specifically, we analysed the Declare constraint semantics and derived a set of equivalent SHACL constraint templates, similarly to what Schönig et al. [37] did with SQL. These SHACL templates are publicly available at http://semantics.id/resource/shaclare as documents written using the RDF Turtle notation.Footnote 11 Listing 1 shows an excerpt of SHACL template that represents \(\textsc {AlternateResponse}\) in RDF Turtle notation. Lines 1–4 list the namespaces used in the templates. In addition to the standard , , and vocabularies defined by the W3C, we introduce the following vocabularies: (i) provides a set of properties and parameters used in the template, which are specific to the SHACLare approach, (ii) defines the RDF classes and properties to describe the Declare language, and (iii) provides the terms used to represents XES data in RDF. Line 5 contains a placeholder that will be replaced by the name of the particular Declare constraint instance during the compliance checking runtime. Lines 6–7 signify that the constraint will be validated against instances of class . Lines 8–9 specify that the SHACL constraint will evaluate \(\textsc {AlternateResponse}\), while line 12 provides a template for the violation constraint message. Lines 14–34 are the main part, representing the \(\textsc {AlternateResponse}\) Declare constraint template in its SHACL-SPARQL representation.

Subsequently, we translate the Declare constraints into SHACL constraints by injecting the constraint parameters and in the SHACL template. All generated SHACL constraints are stored in memory for later execution, but also get persisted as a SHACL shape graph in the RDF Turtle format as a reference.

➁ From XES to RDF. The transformation takes as input an event log in XES format and translates it in RDF. To that end, we developed an XES RDF vocabulary (available at http://semantics.id/ns/xes), which provides the terms and concepts to represent XES data in RDF. We transform into main classes the basic XES objects: (i) for the complete event log, (ii) to refer to individual traces, and (iii) for single events. In addition to those classes, we define a set of data properties for the optional event attributes, such as and . Object properties (relation) bind together the elements, e.g., to relate and , and to relate an and an that contains event sequences.

➂ SHACL Compliance Checking. Given the SHACL Declare constraints (➀) and the RDF event log (➁), we have all the required input to run the SHACL Compliance Checking component. All generated SHACL constraints are grouped according to their template (e.g., \(\textsc {AlternateResponse}\)) and subsequently checked one by one by a SHACL validation engine. The result of a SHACL validation is an RDF graph with one instance. In case the attribute of a is false, a instance provides further details for every violation, including: (i) , which points to the trace in which the violation occurred; (ii) , linking the SHACL constraint instance which triggered the result; (iii) , explaining the reason for the constraint violation with a natural-language sentence.

4 Evaluation

To demonstrate the efficiency and effectiveness of our compliance checking approach, we developed a proof-of-concept software tool, available for download at gitlab.isis.tuwien.ac.at/shaclare/shaclare. In this section, we report on its application on publicly available real-world event logs. In particular, we focus on the report produced on a specific event log to demonstrate the capability of SHACLare to provide detailed insights into compliance violations. The experiment environment, as well as the full set of input and output files are available on the aforementioned SHACLare GitLab project page.

The Prototype. The prototype is implemented in Java, using a number of open source libraries,Footnote 12 including (i) Apache Jena, for RDF Graph manipulation and processing, (ii) OpenXES, for accessing and acquiring XES data into RDF Graph, (iii) caRML, for acquiring RDF representation of Declare constraints, and (iv) the TopBraid SHACL engine, for compliance checking. Based on the SHACLare approach description in Sect. 3, the following functions are available: ➀ translation of Declare to SHACL constraints, ➁ translation of XES event logs to its RDF Graph representations, and ➂ compliance checking of event logs against defined constraints.

Table 3. An excerpt of the SHACLare validation report

Insights into Violations. The steps are illustrated on the example of the Sepsis treatment process event log [35]. That event log reports the trajectories of patients showing symptoms of sepsis in a Dutch hospital, from their registration in the emergency room to their discharge. Because of the reported flexibility of the healthcare process [35] and its knowledge-intensive nature, it is a suitable case for declarative process specification [13].

After running our prototype with the mined Declare constraints and the Sepsis event log as inputs, we receive as output the compliance report, the traces in RDF, and the set of checked constraints in SHACL (cf. Fig. 1). Those files form the basis of our analysis. We thus import the report and trace triples in GraphDBFootnote 13 to query and visualise the result data structures. An excerpt of non-compliant validation results is shown in Table 3.

Next, we explore why particular constraints are violated. For instance, the constraint dictates that after every admission at the Intensive Care unit ( ), a C-Reactive Protein test ( ) must be performed later on, and the patient should not be admitted again at the IC before the CRP. As it can be noticed in Table 2, the support of the constraint is high (0.974), therefore we are interested in understanding what event(s) determined its violation. We thus extract the event signalled by the report and visualise it from the XES RDF representation. Figure 2(a) depicts a graph with the violating event and the connected events from its trace. We observe that the violation is due to the fact that the two events are swapped in the trace ( occurs before ). The right-hand side in the screenshot features a sample of metadata on the selected event (the violating activation, namely ). The tool allows us to scroll through all available properties and navigate through the trace, thus providing the user a powerful tool to inspect the violations in detail. Similarly, Fig. 2(b) shows that despite the high support of in the event log (0.996 as per Table 2), in one trace it was recorded that the patient was discharged and sent to the emergency room ( ), although no took place before.

Fig. 2.
figure 2

Graphs illustrating the violations of constraints in the traces.

The analysis can be further extended through the integration of additional context information on the traces, which may be specified in domain-specific vocabularies. Furthermore, the by-products and results of our approach can be utilised for further analyses via semantic technologies. This can provide a rich interpretation context for the qualitative analysis of constraint violations. An important aspect is that it is possible to use all the metadata information directly in the SHACL constraints or later on in a SPARQL query. Therefore, the analysis is readily extensible beyond the control flow perspective. The investigation of those extensions draws future plans for our research, outlined in the next section.

5 Conclusion and Future Work

In this vision paper, we proposed a novel approach for compliance checking leveraging semantic technologies, named SHACLare. In particular, we provide a set of templates to translate Declare constraints into SHACL constraints, and subsequently validate them against a graph representation of XES process execution data. Thereby, we overcome a typical limitation of existing compliance checking approaches, which lack granular information on violations and their context. The compliance reports SHACLareprovides contain links to the respective traces, as well as to the events triggering constraint violations. Due to the semantically explicit representation, we can easily query and visualise connected events and inspect all metadata provided in the original log files. The preliminary results attained illustrate how our approach could be used and extended further for auditability of event logs.

This new approach opens opportunities for future work in various directions. We aim at leveraging Linked Data frameworks so as to conduct compliance checking over multiple process data sources beyond single event logs, inspired by the seminal work of Calvanese et al. [8]. In that regard, the checking of richer constraints encompassing perspectives other than the usual control flow (i.e., data, time, resources, etc.) [7], draws our future research endeavours, driven by the capabilities readily made available to that extent by SHACL and SPARQL. This paper analysed the compliance checking setting, thus the analysis of constraints violations. If the goal is also to analyse the relevance of satisfaction, verification of a formula is not sufficient and the vacuity problem must be taken into account [12]. We will investigate how our approach can be extended to tackle this problem. Online compliance checking is another topic of interest. It would be worth exploring how SHACL constraints can be checked in a streaming fashion, potentially based on stream reasoning engines. In addition, we aim to further (automatically) enrich the metadata extracted from the log data, using Natural Language Processing (NLP) and ontologies. From a formal analysis viewpoint, we will investigate the semantic equivalence of the SHACL constraints with respect to Declare \(\text {LTL}_{f}\) formulas, and more generally the expressive power of those languages. Finally, additional compliance checking tools could be developed to provide users with easy-to-use and feature-rich options to adopt this approach in their process management strategies.