Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Semantically-Governed Data-Aware Processes

2012

Abstract. In this paper we consider processes that run over data stored in a relational database. Our setting is that of ontology-based data access (OBDA), where the information in the database is conceptually represented as an ontology and is declaratively mapped to it through queries.

Arthur H.M. ter Hofstede Massimo Mecella Sebastian Sardina Andrea Marrella (Eds.) Knowledge-intensive Business Processes 1st International Workshop, KiBP 2012 Proceedings June 15, 2012 Rome, Italy Preface Nowadays, Workflow Management Systems (WfMSs) and, more generally, Process Management Systems (PMSs) and Process-aware Information Systems (PAISs), are widely used to support many human organizational activities, ranging from well-understood, relatively stable and structured processes (supply chain management, postal delivery tracking, etc.) to processes that are more complicated, less structured and may exhibit a high degree of variation (healthcare, emergency management, etc.). Every aspect of a business process involves a certain amount of knowledge which may be complex depending on the domain of interest. The adequate representation of this knowledge is determined by the modeling language used. Some processes behave in a way that is well understood, predictable and repeatable: the tasks are clearly delineated and the control flow is straightforward. Recent discussions, however, illustrate the increasing demand for solutions for knowledge-intensive processes, where these characteristics are less applicable. The actors involved in the conduct of a knowledge-intensive process have to deal with a high degree of uncertainty. Tasks may be hard to perform and the order in which they need to be performed may be highly variable. Modeling knowledge-intensive processes can be complex as it may be hard to capture at design-time what knowledge is available at run-time. In realistic environments, for example, actors lack important knowledge at execution time or this knowledge can become obsolete as the process progresses. Even if each actor (at some point) has perfect knowledge of the world, it may not be certain of its beliefs at later points in time, since tasks by other actors may change the world without those changes being perceived. Typically, a knowledge-intensive process cannot be adequately modeled by classical, state of the art process/workflow modeling approaches. In some respect there is a lack of maturity when it comes to capturing the semantic aspects involved, both in terms of representing them and in terms of reasoning about them. The main focus of the 1st International Workshop on Knowledge-intensive Business Processes (KiBP 2012) was investigating how techniques from different fields, such as Artificial Intelligence (AI), Knowledge Representation (KR), Business Process Management (BPM), Service Oriented Computing (SOC), etc., can be combined with the aim of improving the modeling and the enactment phases of a knowledge-intensive process. The 1st International Workshop on Knowledge-intensive Business Processes (KiBP 2012) was held as part of the program of the 2012 Knowledge Representation & Reasoning International Conference (KR 2012) in Rome, Italy, in June 2012. The workshop was hosted by the Dipartimento di Ingegneria Informatica, Automatica e Gestionale Antonio Ruberti of Sapienza Universitá di Roma, with financial support of the University, through grant 2010-C26A107CN9 TESTMED, and the EU Commission through the projects FP7-258888 Greener Buildings and FP7-257899 Smart Vortex. This volume contains the 5 papers accepted and presented at the workshop. Each paper was reviewed by three members of the internationally renowned Program Committee. In addition, a further paper was invited for inclusion in the iii workshop proceedings and for presentation at the workshop. There were two keynote talks, one by Marlon Dumas (Institute of Computer Science, University of Tartu, Estonia) on “Integrated Data and Process Management: Finally?” and the other by Yves Lespérance (Department of Computer Science and Engineering, York University, Canada) on “A Logic-Based Approach to Business Processes Customization” completed the scientific program. We would like to thank all the Program Committee members for their valuable work in selecting the papers, Andrea Marrella for his valuable work as publication and publicity chair of the workshop, and Carola Aiello and the consulting agency Consulta Umbria for the organization of this successful event. June 15, 2012 Rome, Italy Arthur H.M. ter Hofstede Massimo Mecella Sebastian Sardina iv Organizing Committee Program Chairs Arthur H.M. ter Hofstede Massimo Mecella Sebastian Sardina Queensland University of Technology Sapienza - University of Rome RMIT University Proceedings Chair Andrea Marrella Sapienza - University of Rome Program Committee Marco Aiello Diego Calvanese Fabio Casati Florian Daniel Massimiliano De Leoni Riccardo De Masellis Claudio Di Ciccio Christoph Dorn Marlon Dumas Marie-Christine Fauvet Paolo Felli Hector Geffner Marcello La Rosa Yves Lespérance Niels Lohmann Marco Montali Selmin Nurcan Manfred Reichert António Rito Silva Alessandro Russo Rainer Schmidt Pnina Soffer Roman Vaculı́n Barbara Weber Mathias Weske Petia Wohed University of Groningen Free University of Bozen-Bolzano University of Trento University of Trento Eindhoven University of Technology Sapienza - University of Rome Sapienza - University of Rome University of California University of Tartu Joseph Fourier University of Grenoble Sapienza - University of Rome Pompeu Fabra University of Barcelona Queensland University of Technology York University University of Rostock Free University of Bozen-Bolzano Panthéon - Sorbonne University University of Ulm Technical University of Lisbon Sapienza - University of Rome University of Aalen University of Haifa IBM Research University of Innsbruck University of Potsdam Stockholm University Additional Reviewers Sergey Smirnov University of Potsdam v Table of Contents Keynote Talks Integrated Data and Process Management: Finally? . . . . . . . . . . . . . . . . . . . Marlon Dumas 1 A Logic-Based Approach to Business Process Customization . . . . . . . . . . . . Yves Lespérance 5 Invited Paper Automatic Detection of Business Process Interference . . . . . . . . . . . . . . . . . . Nick van Beest, Eirini Kaldeli, Pavel Bulanov, Hans Wortmann and Alexander Lazovik 6 Full Research Papers Semantically-Governed Data-Aware Processes . . . . . . . . . . . . . . . . . . . . . . . . Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Marco Montali and Ario Santoso 21 Knowledge-intensive Processes: An Overview of Contemporary Approaches Claudio Di Ciccio, Andrea Marrella and Alessandro Russo 33 Business Processes Verification with Temporal Answer Set Programming . Laura Giordano, Alberto Martelli, Matteo Spiotta and Daniele Theseider Dupré 48 A knowledge-based approach to the configuration of business process model abstractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Shamila Mafazi, Wolfgang Mayer, Georg Grossmann and Markus Stumptner Modular Representation of a Business Process Planner . . . . . . . . . . . . . . . . . Shahab Tasharrofi and Eugenia Ternovska vi 75 Integrated Data and Process Management: Finally? Marlon Dumas University of Tartu, Estonia marlon.dumas@ut.ee Abstract. Contemporary information systems are generally built on the principle of segregation of data and processes. Data are modeled in terms of entities and relationships while processes are modeled as chains of events and activities. This situation engenders an impedance mismatch between the process layer, the business logic layer and the data layer. We discuss some of the issues that this impedance mismatch raises and analyze how and to what extent these issues are addressed by emerging artifact-centric process management paradigms. 1 The Data Versus Process Divide Data management and process management are both well-trodden fields – but each in its own way. Well-established data analysis and design methods allow data analysts to identify and to capture domain entities and to refine these domain entities down to the level of database schemas in a seamless and largely standardized manner. Concomitantly, database systems and associated middleware enable the development of robust and scalable data-driven applications, while contemporary packaged enterprise systems support hundreds of business activities on top of shared databases. In a similar vein, well-documented and proven process analysis and design methods allow process analysts to identify and to capture process models at different levels of abstraction, ranging from high-level process models suitable for qualitative analysis and organizational redesign down to the level of executable processes that can be deployed in Business Process Management Systems (BPMS). But while data management and process management are each well supported by their own body of mature methods and tools, these methods and tools are at best loosely integrated. For example, when it comes to accessing data, BPMS typically rely on request-response interactions with database applications or packaged enterprise systems. Typically, data fetched from these systems are copied into the “working memory” of the BPMS. The data in this working memory are then used to evaluate business rules relevant to the execution of the process, and to orchestrate both manual and automated work. But the burden of synchronizing the working data maintained by the BPMS with the data maintained by the underlying systems is generally left with the developers. More generally, the “data vs. process” divide leads to an impedance mismatch between the data layer, the business logic layers and the process layer, which in the long run, hinders on the coherence and maintainability of information systems. In particular, the data vs. process divide has the following effects: – Process-related and function-related data redundancy. The BPMS maintains data about the state of the process, since these data are needed in order to enable the 2 M. Dumas system to schedule tasks, react to events and to evaluate predicates attached to decision points in the process. On the other hand, data entities manipulated by the process are stored in the database(s) underpinning the applications with which the BPMS interacts. Hence, the state of the entities is stored both by the BPMS and by the underlying applications. In other words, data are managed redundantly at the database layer and at the process layer, thereby adding development and maintenance complexity. – Business rules fragmentation and redundancy. Some business rules are encoded at the level of the business process, others in the business logic layer (e.g. using a business rules engine) and others in the database (in the form of triggers or integrity constraints). Worst, some rules are encoded at different levels depending on the type of rule and the data involved. This fragmentation and redundancy hampers maintainability and potentially leads to inconsistencies. The effects of this mismatch are perhaps less apparent when a one-to-one mapping exists between the instances of a given process and the entities of a given entity type. This is the case for example of a typical invoice handling process where one process instance (also called a “case”) corresponds exactly to one invoice. In this context, the state of a process instance maps neatly to the state of an entity. Ergo, the data required by the process, for example when evaluating branching conditions, is restricted to the data contained in the associated entity (i.e. the invoice in this example) and possibly to the state of other entities within the logical horizon [5] of the said entity – e.g. the Purchase Order (PO) associated to the invoice. Accordingly, collecting the data required for evaluating business rules required by this process is relatively simple, while synchronizing the state of the process instance with the state of its associated entity (at the business logic and data layers) does not pose a major burden. The impedance mismatch however becomes much more evident when this one-toone correspondence between processes and entities does not hold. Consider for example a shipment process where a single shipment may contain products for multiple customers, ordered by means of multiple purchase orders (POs) and invoiced by means of multiple invoices – perhaps even multiple POs and multiple invoices per customer involved. Furthermore, consider the case where the products requested in a given PO are not necessarily sent all in a single shipment, but instead may be spread across multiple shipments. In this setting, the effects of a customer canceling a PO are not circumscribed to one single instance of the shipment process. Similarly, the effects of a delayed shipment are not restricted to single PO. Consequently, business rules related for example to cancellation penalties, compensation for delayed deliveries or prioritization of shipments become considerably more difficult to capture, to maintain and to reason about, as exemplified in numerous case studies [1, 9, 8, 3]. Traditional process management approaches quickly hit their limit when dealing with such processes. The outcome of this limitation is that a significant chunk of the “process logic” has to be pushed down to the business logic layer (e.g. in the form of business rules) – which essentially voids the benefits of adopting a structured process management approach supported by a BPMS. Service-oriented architectures (SOAs) facilitate the inter-connection of applications and application components. Their emergence has greatly facilitated the integration of data-driven and process-driven applications. SOAs have also enabled packaged enter- Integrated Data and Process Management: Finally? 3 prise software vendors to “open the box” by providing standardized programmatic access to the vast functionality of their systems. But per se, SOAs do not address the problem of data and process integration, since data-centric services and process-centric services are still developed separately using different methods. A case in point is Thomas Erl’s service-oriented design method [4], which advocates that process-centric services should be strictly layered on top of data-centric (a.k.a. entity-centric) services. Erl’s approach consists of two distinct methods for designing process-centric services and entity-centric services. This same principle permeates in many other service-oriented design methods [7]. Such approaches do not address the issues listed above. Instead, they merely reproduce the data versus process divide by segregating data-centric services and process-centric services. 2 The Artifact-Centric Process Management Paradigm This talk discusses emerging approaches that aim at addressing the shortcomings of the traditional data versus processes divide. In particular, the keynote discusses the emerging artifact-centric process management paradigm [1, 2] and how this paradigm, in conjunction with service-oriented architectures and associated platforms, enable higher levels of integration and higher responsiveness to process change. Mainstream process modeling notations such as BPMN can be thought as being activity-centric in the sense that process models are structured in terms of flows of events and activities. Modularity is achieved by decomposing activities into subprocesses. Data manipulation is captured either by means of global variables defined within the scope of a process or subprocess, or by means of conceptually passive data objects that are created, read and/or updated by the events and activities in the process. In contrast, the database applications and/or enterprise systems on top of which these processes execute are usually structured in terms of objects that encapsulate data and/or behavior. This duality engenders the above-mentioned impedance mismatch between the process layer and the business logic and data layers. In contrast, artifact-centric process modeling paradigms aim at conceptually integrating the process layer, the business logic and the data layer. Their key tenet is that business processes should be conceived in terms of collections of artifacts that encapsulate data and have an associated lifecycle. Transitions between these states in this lifecycle are triggered by events coming from human actors, modules of an enterprise system (possibly exposed as services) and possibly other artifacts, thus implying that artifacts are inter-linked. In this way, the state of the process and the state of the entities are naturally maintained “in sync” and business processes are conceived as network of inter-connected artifacts that may be connected according to N-to-M relations, thus allowing one to seamlessly capture rules spanning across what would traditionally be perceived to be multiple process instances. The talk also discusses ongoing efforts within the Artifact-Centric Service Interoperation (ACSI) project-2 . This project aims at combining the artifact-centric process management paradigm with SOAs in order to achieve higher levels of abstraction during business process integration across organizational boundaries. The key principle of -2 http://www.acsi-project.eu/ 4 M. Dumas the ACSI project is that processes should be conceived as systems of artifacts that are bound to services. The binding between artifacts and services specifies where should the data of the artifact be pushed to, or where it should be pulled from, and when. In the ACSI approach, process developers do not reason in terms of tasks that are mapped to request-response interactions between a process and the underlying systems. Instead, they reason in terms of artifacts, their lifecycles, operations and associated data. Artifact lifecycles are captured based on a meta-model – namely Guard-Stage-Milestone (GSM) – that allows one to capture behavior, data querying and manipulation in a unified framework [6]. Upon this foundation, the ACSI project is building a proof-of-concept platform that supports the definition and execution of artifact-centric business processes. Challenges addressed by ACSI include the problem of reverse-engineering artifact systems from enterprise system logs – for the purpose of legacy systems migration – and the verification of artifact-centric processes, which by nature are infinite-state systems due to the tight integration of processes and data. Acknowledgments. This paper is the result of collective discussions within the ACSI project team. Thanks especially to Rick Hull for numerous discussions on this topic. The ACSI project is funded by the European Commission’s FP7 ICT Program. References 1. Kamal Bhattacharya, Nathan S. Caswell, Santhosh Kumaran, Anil Nigam, and Frederick Y. Wu. Artifact-centered operational modeling: Lessons from customer engagements. IBM Systems Journal, 46(4):703–721, 2007. 2. David Cohn and Richard Hull. Business artifacts: A data-centric approach to modeling business operations and processes. IEEE Data Eng. Bull., 32(3):3–9, 2009. 3. Marlon Dumas. On the convergence of data and process engineering. In Proc. of the 15th International Conference on Advances in Databases and Information Systems (ADBIS), Vienna, Austria, pages 19–26. Springer, September 2011. 4. Thomas Erl. Service-Oriented Architecture (SOA): Concepts, Technology, and Design. Prentice Hall, 2005. 5. P. Feldman and D. Miller. Entity model clustering: Structuring a data model by abstraction. The Computer Journal, 29(4):348360, 1986. 6. Richard Hull, Elio Damaggio, Riccardo De Masellis, Fabiana Fournier, Manmohan Gupta, Fenno Terry Heath, Stacy Hobson, Mark H. Linehan, Sridhar Maradugu, Anil Nigam, Piyawadee Noi Sukaviriya, and Roman Vaculı́n. Business artifacts with guard-stage-milestone lifecycles: managing artifact interactions with conditions and events. In Proceedings of the Fifth ACM International Conference on Distributed Event-Based Systems (DEBS), New York, NY, USA, pages 51–62. ACM, July 2011. 7. Thomas Kohlborn, Axel Korthaus, Taizan Chan, and Michael Rosemann. Identification and analysis of business and software services - a consolidated approach. IEEE Transactions on Services Computing, 2(1):50–64, 2009. 8. Vera Künzle and Manfred Reichert. Philharmonicflows: towards a framework for object-aware process management. Journal of Software Maintenance, 23(4):205–244, 2011. 9. Guy Redding, Marlon Dumas, Arthur H. M. ter Hofstede, and Adrian Iordachescu. A flexible, object-centric approach for business process modelling. Service Oriented Computing and Applications, 4(3):191–201, 2010. A Logic-Based Approach to Business Process Customization Yves Lespérance Department of Computer Science and Engineering, York University, Toronto, Canada lesperan@cse.yorku.ca Abstract. In this invited lecture, I will present a logic-based approach to modeling and engineering processes that arose from work in AI. The approach is based on a logical framework for modeling dynamic domains called the Situation Calculus. It also uses a language called ConGolog for specifying complex processes on top of the Situation Calculus. By using such a logical framework we can provide clear formal characterizations of problems that arise in the area of business process design and management. Available automated reasoning techniques can also be used to analyze and synthesize processes. After introducing the framework, I will discuss how one can use it to model process customization, where one customizes a generic process to satisfy certain constraints required by a client. I will show how we can allow for uncontrollable actions by the process, and then define a notion of maximally permissive supervisor for such a process, i.e., a supervisor that constrains the process as little as possible, while ensuring that the desired constraints are satisfied. We have shown that such a maximally permissive supervisor always exist and is unique. Finally, I will briefly discuss how one can use the framework to model the problem of process orchestration, where one wants to orchestrate a set of available services to produce a desired process. Automatic Detection of Business Process Interference N.R.T.P. van Beest1 , E. Kaldeli2 , P. Bulanov2 , J.C. Wortmann1 , and A. Lazovik2 1 2 Department of Business & ICT, Faculty of Economics and Business, University of Groningen Nettelbosje 2, 9747 AE Groningen, The Netherlands Distributed Systems Group, Johann Bernoulli Institute, University of Groningen, Nijenborgh 9, 9747 AG, The Netherlands Abstract. Today’s organizations are characterized by long-running distributed business processes, which involve different stakeholders and share common resources. One of the main challenges posed in such a highly distributed setting comes from the interference between different processes that are running in parallel. During execution of a business process, a data modification caused by some external process may lead to erroneous and undesirable business outcomes. In order to address this problem, we propose to annotate business processes with dependency scopes, which cover critical sections of the process. Erroneous execution can be prevented by executing intervention processes, which are triggered at runtime. However, for complex processes with a large number of activities and many interactions with the environment, the manual specification of the appropriate critical sections can be particularly time-consuming and error-prone. To overcome this limitation, we present an algorithm for automating the discovery of critical sections. The proposed approach is applied on a real case-study of a BP from the Dutch e-Government. 1 Introduction Modern private and public organizations are moving from traditional, proprietary and locally managed Business Process Management Systems (BPMS) to BPMS where more and more tasks are outsourced to third party providers and resources are shared among different stakeholders. Often, this is realized by the emergent paradigms such as Service Oriented Computing (SOC) and cloud computing. As a result, business processes (BPs) can no longer be considered in isolation, since data can be simultaneously accessed and modified by different external processes. Disregarding the interdependencies with external actors and other processes may lead to inconsistent situations, potentially resulting in undesirable business outcomes. The situation where undesirable business outcomes are caused by data modifications of some other concurrently executing process is known as process interference [1, 2]. The problem of process interference is particularly relevant for knowledge-intensive BPs, where shared data are accessed and modified by many processes, involving a large number of stakeholders. Automatic Detection of Business Process Interference 7 E-Government is a typical area characterized by multiple concurrently executing knowledge-intensive processes. These processes access and modify commonly shared resources such as citizen data, information reported by external contracted parties, etc. In such a context, a “think globally, act locally” approach has to be adopted: each BP instance has to take its own action, independently of other processes, based on how its knowledge about the world evolves during runtime, and how this knowledge affects the next tasks in its workflow. For example, important data used by subsequent tasks may become obsolete, and conditions on which the process relies may not hold anymore. Therefore, a BP has to be continuously informed about changes concerning that data, reason about them, and react accordingly in order to be able to ensure its consistency with the new state of the world. In the Netherlands, a first attempt has been made to provide a Software as a Service (SaaS) solution for the local e-Government (www.govunited.nl). One of the processes that is proposed as a candidate for this initiative concerns the process of the Dutch Law for Societal Support, known as the WMO law. This law is intended to offer support for people with a chronic disease or a disability, by providing facilities (usually by external parties) such as domestic care, transportation, a wheelchair or a home modification. Naturally, several different instances of the WMO process can be executed concurrently, together with other governmental processes, which may access and modify the same data. For example, during the execution of the WMO process, the citizen may move to a different address, the medical status of the citizen may alter, the eligibility criteria may change because of some new directive etc. These changes may pass unnoticed by BPs which rely upon them, and consequently result in unexpected behavior and undesirable business outcomes. The consequences are often noticed only by end customers [3], by erroneous orders or invoices, customer requests that are never handled, etc. Traditional verification techniques for workflow and data-flow (e.g. [4]) are not sufficient for ensuring the correctness of such BPs, as they assume a closed environment where no other process can use a service that affects the data used by that organization. In addition, most work about resolving process interference refers to failing processes or concerns design-time solutions [5, 6]. Consequently, neither of these solutions is suitable for a highly dynamic SaaS environment. In [2], a run-time mechanism is proposed, where vulnerable parts of the process are monitored in order to manage interferences by employing intervention processes. Dependency scopes (DS) are used to specify a critical section of the BP, whose correct execution relies on the accuracy of a volatile process variable, i.e. a variable that can be changed externally during the execution of the process. If a volatile variable is modified by some exogenous factor during execution of the activities in the respective DS, an intervention process (IP) is triggered, with the purpose of resolving the potential execution problems stemming from this change event. However, for complex processes with a large number of activities and many interactions with the environment, the task of manually annotating a BP with DSs becomes difficult, time-consuming, and prone to errors. Thus, 8 N.R.T.P. van Beest et al. critical parts of the BP whose correct execution is dependent on the validity of some volatile variable may be neglected. In this paper, we extend the initial idea presented in [2], by systematizing the main methodology, and providing an algorithm which automates the task of identifying the critical parts of a BP. To this end, we concretize the proposed approach by describing the semantic extensions to the BP modelling that allow the specification of DSs for resolving runtime process errors. Given a block-style BP specification and some basic information about the services it uses (i.e. the input-output parameters and internal state variables), we show how the parts of the process that are covered by DSs can be automatically inferred. This way, the task of the BP designer can be highly facilitated. The remainder of this paper is organized as follows. Section 2 describes a possible interference scenario on a real case-study taken from Dutch e-Government, which plays the role of our running example. In Section 3 the basic definitions required for the proposed approach are presented. The algorithm for the automatic identification of critical sections is described in Section 4. Section 5 provides an overview of related work, and the overall conclusions are drawn in Section 6. 2 A Process Interference Case-study In order to illustrate the effects of process interference and the potential ways to overcome them, let us consider a real case-study from the Dutch e-Government regarding the WMO law, as described in [2]. The BP under investigation (referred to as WMO process) concerns the handling of the requests from citizens at one of the 430 municipalities in the Netherlands. In this section, the WMO process is described as used by one of the municipalities. Furthermore, an example is provided, showing the required DSs along with the required IPs. 2.1 WMO Process Description The WMO process (shown in Figure 1) starts with the submission of an application for a provision by a citizen. After receiving the application at the municipality office, a home visit is executed by an officer, in order to gather a detailed understanding of the situation. After the home visit, additional information on the citizen’s health may still be required, which can be obtained via a medical advice provided by e.g. a general practitioner. Based on this information, a decision is made by the municipality to determine whether the citizen is eligible to receive the requested provision or not. In case of a negative decision, the citizen has the possibility for appeal. In case of a positive decision, the process continues and the requested provision will be provided. For domestic help, the citizen has the choice between “Personal Budget” and “Care in Kind”. In case of a “Personal Budget”, the citizen periodically receives a certain amount of money for the granted provision, and in case of “Care In Kind” suppliers who can take care of the provision are contacted. For obtaining a wheelchair, first the detailed requirements are acquired before sending the order to the supplier. The home Automatic Detection of Business Process Interference 9 Handle invoice Intake and application [Revise decision] Home visit [Affirm decision] Terminate [Medical advice] Receive invoice [No medical advice] Return invoice to the supplier Medical advice Check invoice with decision [Invoice not correct] [Invoice correct] [No appeal] [Appeal] Decision Terminate [Rejected] [Approved] [Else] [Home Modification] [Domestic help] [Wheelchair] [Care in kind] Tender procedure Acquire requirements Send request to supplier Check tender with decision Send order to supplier Receive delivery confirmation [Personal budget] [Tender not ok] Handle invoice + [Tender ok] Send order confirmation to selected supplier Receive delivery confirmation Handle invoice + Payment Fig. 1: The WMO process modification involves a tender procedure to select a supplier that provides the best offer. If the selected tender is approved by the municipality, the order is sent to the selected supplier. After delivery of the provision, an invoice is sent by the supplier to the municipality. Finally, the invoice is checked and paid. 2.2 Interference Examples The request for a wheelchair or a home modification may take up to 6 weeks until the delivery of the provision. These processes depend on the correctness of a number of process variables, like the address of the citizen and the content of the decision. However, these process variables may be changed by another process running in parallel, independently from the WMO process, and are, therefore, volatile. A change in either of these process variables (e.g. address) may have potentially negative consequences for the WMO process, due to its dependencies 10 N.R.T.P. van Beest et al. DS1: {Address, Medical Condition} provision == ‘home modification’ DS2: {WMO Eligibility Criteria} provision == ‘wheelchair’ provision == ‘care in kind’ provision == ‘personal budget’ DS3: {Address, Medical Condition} Tender Procedure Acquire Requirements Delivery Confirmation Check Tender tenderOK == FALSE Send Request to Supplier Send Order to Supplier tenderOK == TRUE Handle Invoice Send Order to Supplier Delivery Confirmation Fig. 2: WMO dependency scopes on those variables, and lead to erronous outcomes. Such situations are typical examples of process interference. For example, the requirements of a wheelchair may depend on certain characteristics of the citizen’s home. Consequently, an address change after “Acquire requirements” might result in a wheelchair that does not fit the actual requirements. Similarly, if the citizen moves to a nursing home after “Check tender with decision”, the home modification is not necessary anymore. However, the supplier is not notified of this address change and the municipality is notified through a different process, which is external to the WMO process. As a result, unless some action is taken to cancel or update the order, the WMO process will proceed with the home modification. In order to guard for changes to the volatile process variables, DSs can be defined, covering those activities for which such a change poses a potential risk of interference. In Figure 2, a part of the process is annotated with DSs using a Process Modeller tool developed for the graphical modeling of BPs. The tool provides a selection of standard control blocks like flow, switch etc., with the extra support of design tools for modeling DSs. For the implementation details see [7]. The activities in DS1 rely on the accuracy of the address. If the address changes, the DS should be triggered, and potentially some recovery activities need to be executed, depending on the state of the BP at that point. For example, if the address change is detected before the order for a wheelchair is sent to the supplier, it is sufficient to execute the IP as shown in Figure 3a. However, if the order is already sent to the supplier, some additional activities are required (Figure 3b). First of all, the current order should be put on hold. After acquiring Automatic Detection of Business Process Interference a) Home visit Acquire requirements Send order to supplier Receive delivery confirmation [Requirements Unchanged] b) Pause order Home visit 11 Resume order Acquire requirements Receive delivery confirmation Cancel order [New Requirements] Send order to supplier Fig. 3: WMO intervention examples the requirements again, it is evaluated whether there is a change. If not, the order can be resumed, otherwise the old order should be cancelled and a new order should be sent. The specification of IPs is outside the scope of this paper (for a detailed discussion about the specification of IPs see [2] and [7]). 3 Basic Definitions In this section, we provide the basic definitions regarding the BP representation extended with the support of DSs. First, we define the Service Repository (SR), which is a registry that keeps semantic information about a set of services that are accessible to the client who is executing a specific BP. The SR plays the role of a pool of service descriptions and instances, which are used as the building elements of different process specifications. Service descriptions specify the basic functionalities provided by a service. Service instances refer to specific providers, which offer a service whose functionality conforms to some service description. The service descriptions specify the operations offered by the respective service type and are represented in terms of simple semantics. Service instances refer to specific providers of a certain service description. The service descriptions can be extracted from standard semantic languages for representing Web Services, such as WSDL-S (www.w3.org/Submission/WSDL-S) and OWL-S (www. w3.org/Submission/OWL-S). The service descriptions capture the Input-Output behavior of the operations, i.e. the type of the input parameters inputs and of the expected outputs, as well as some information about its internal variables (similar to Locals in OWL-S). No extra semantic information is required to automatically identify the critical sections of a BP. Definition 1 (Service Repository (SR)). A Service Repository SR=(SD, SI ) is a registry, which keeps a set of Service Descriptions SD, and a set of Service Instances SI . A Service Description sd ∈ SD is a tuple sd = (sdid , O, SV ), where sdid is a unique identifier, O is a set of service operations, and SV is a list of variables, each ranging over a finite domain. These variables correspond to state variables internal to the service, whose value can be changed by the service operations. Each service operation o ∈ O is a tuple o = (id (o), in(o), out(o)) where: – id (o) is the identifier of the operation – in(o) is a list of variables that play the role of input parameters to o, ranging over finite domains 12 N.R.T.P. van Beest et al. – out(o) is a list of variables that play the role of output parameters to o, ranging over finite domains A Service Instance si ∈ SI is a tuple si = (iid (si ), st(si )): – st(si ) is the unique identifier (service type) of the service description sd ∈ SD this instance complies with – iid (si ) is an instance identifier. For each pair of service instances si1 , si2 ∈ SI that have the same service type st(si1 ) = st(si2 ), iid (si1 ) 6= iid (si2 ). The set of state variables involved in the SR may be used by different running process instances, and their value may be changed by any process that has access to the respective setting service operation. In the followings, the working definition of a Business Process (BP) is provided. Although the WMO process (Figure 1) is represented in BPMN-notation for readability, the core BP representation used in this paper is block-structured [8], and uses the basic BPEL constructs of BPEL, enriched with DSs. As such, the syntax of the BP is block-structured and unambiguously defined, so that the BP can be directly executed by an orchestrator [9], and automatically parsed to identify the parts of the BP that should be covered by a DS. The representation is ultimately a tree structure where a block can have other blocks as children, and for each block its parent can be obtained. All activities included in the BP are references to service instances that exist in the Service Repository. Definition 2 (Business Process (BP)). Given a Service Repository SR= (SD, SI), a Business Process is a tuple BP = (PV , E ), with E being a process element E = (ACT | SEQUENCE | FLOW | SWITCH | REPEAT | WHILE | DS ), where: – PV = PVi ∪ PVe is a set of variables ranging over finite domains. - PVi is a set of internal variables, which are declared at the BP level (BPspecific). A subset of PVi are passed as input parameters to the entire BP, in which case we write BP (pv1 , . . . , pvn ), where pvi ∈ PVi and pvi can be initialized with specific values at execution time. - PVe is a set of external variables, which refer to state variables declared in the SR. An external variable v ∈ PVe is a reference sdid .iid .vid , where sdid is the identifier of a service description sd = (sdid , O, SV ) ∈ SD, iid is the identifier of a service instance si = (iid , sdid ) ∈ SI , and vid is the identifier of some state variable v ∈ SV . – ACT is a process activity, which represents the invocation of a service operation. For instance, in BPEL it may correspond to an invoke, receive, reply, etc. Every ACT refers to an operation that exists in SI . It is a tuple act = (id (act), in(act), out(act)), where id (act) is a reference sdid .iid .oid , with sdid being an identifier of a service description sd = (sdid , O, SV ) ∈ SD, iid the identifier of a service instance si = (iid , sdid ) ∈ SI , and oid is the identifier of some operation o ∈ O. The input and output parameters of act refer to the inputs and outputs of the respective oid , i.e. in(act) = in(oid ) and out(act) = out(oid ). The input (output) parameters of all activities in the BP Automatic Detection of Business Process Interference 13 form the sets IP (OP ). Input variables can be assigned with constant values or other process variables: id (act)(ip1 := v1 , . . . , ipn := vn ), where ipi ∈ in(act), vi ∈ (PV ∪ OP ), or vi is a value compliant with ipi ’s domain. There are also two special types of activities: no-op, which represents an idle activity, and exit, whose execution causes the entire BP to halt. – SEQUENCE refers to a totally ordered set of process elements, which are executed in sequence: SEQUENCE {e1 . . . en }, where ei is a process element. – FLOW represents a set of process elements, which are executed in parallel: FLOW {e1 . . . en }, where ei is a process element. – SWITCH is a set of tuples {(c1 , e1 ), . . . , (cn , en )}, where ei is a process element and ci is a logical condition C ::= var ◦ v , where var ∈ (PV ∪ OP ), v is some constant belonging to var ’s domain, and ◦ is a relational operator (◦ ∈ {=, <, >, 6=, ≤, ≥}). All ci participating in a SWITCH refer to the same variable var and are mutually exclusive. – REPEAT represents a loop structure, and is defined as a tuple (pe, c{pei }), where c is a logical condition as already defined, and pe, pei are process elements. c is evaluated just after the end of pe, and if it holds then pe is repeated, after the execution of the optional pei . – DS is a dependency scope as defined in Definition 3. 3.1 Dependency scopes The DS is based on a guard-verify structure to deal with modification events due to factors exogenous to the BP, e.g. due to some other process execution which affects some data on which the BP relies. The critical part of the BP is included in the guard block, while the verify block specifies the types of events that require intervention. The mechanism of event recording and handling are out of scope of this paper (for a system dealing with process-generated events see e.g. [10]). Whenever such an event occurs, the control flow is transferred to the verify block, and the respective goal is activated. Once the resulting IP finishes execution in the updated environment, the control flow of the BP continues from the point following the guard-verify structure, unless it is explicitly forced to terminate. Definition 3 (Dependency Scope (DS)). Given a SR = (SD, SI ) and a BP = (PVi ∪ PVe , E ), a dependency scope is a tuple DS = hguard (VV ){CS }, verify({(ci , IPi | terminate(IPi ))})i, where: – guard (VV ) indicates the set of volatile variables VV ⊂ PVe whose modification triggers the verification of the DS, and CS a process element of BP which is called the Critical Section. Whenever during the execution of CS a modification event regarding the value of a vv ∈ VV is received, the verify part of the DS is triggered, and BP ’s execution is interrupted. – verify({(ci , IPi )}) comprises a set of tuples consisting of a logical condition ci and an intervention process IPi in compliance with Definition 2 to be pursued if ci holds. Providing a case condition is optional, with the default interpretation 14 N.R.T.P. van Beest et al. being ci = TRUE . IPi specifies a BP which ensures the satisfaction of the properties that reflect the state right after the final activity of CS . After the interruption of the BP , some IPi is executed, and then BP is resumed just after CS (and from any other parallel branches that were interrupted). – terminate(IP ) forces the rest of BP ’s execution to be aborted after completing IP ’s execution. Following Definition 3, the DS specification representing DS1 of Figure 2 is as follows, where IPa, IPb and IPc refer to the respective intervention processes, which take care of repairing the erroneous execution in each of the cases. < ds > < guard > < variables > < variable name = " address " dataType = " dt:address " / > < variable name = " medCond " dataType = " dt:medInfo " / > </ variables > < c ri ti c al Se ct i on > <! -- Subprocess covered by DS1 as in Figure 2 -- > </ c ri t ic al Se c ti on > </ guard > < verify > < case condition = " address . county != ‘ Groningen ’ " > < terminate > < invoke name = " IPa " / > </ terminate > </ case > < case condition = " address . county = ‘ Groningen ’& AND ; medCond != ‘ deceased ’ " > < invoke name = " IPb " / > </ case > < case condition = " medCond = ‘ deceased ’ " > < terminate > < invoke name = " IPc " / > </ terminate > </ case > </ verify > </ ds > According to DS1 , if a modification event regarding the address or the medical condition is received within the scope of the guarded subprocess, different IPs are executed, depending on the state of execution and the kind of modification that has occurred. For example, if the address change indicates that the citizen has moved to another municipality, then IPa includes canceling the order (either for a wheelchair or home modification) if one has already been issued, and sending a notification to the city hall. Similarly, IPb takes care of the situation where the customer has moved within the range of the municipality, and IPc in case his medical condition has changed to ‘deceased’. In the following section we describe how the guard (VV ){CS } part of a DS description can be derived automatically, by parsing the BP specification. 4 Automatic Identification of Critical Sections The algorithm of automated generation of the parts of a BP covered by a DS is presented in Algorithm 1 below. The algorithm guarantees that the computed Automatic Detection of Business Process Interference 15 CSs are elements of the BP in compliance with Definition 2. CSs cover all activities that are directly or indirectly dependent on the same set of volatile variables VV . That is, they either use a vv ∈ VV as input or use the output of another activity, which is dependent on vv . These activities are referred to as Dependent Activities (DA). In order to ensure that important change events will not pass untreated, any part of the process in a potential execution path between two activities dependent on the same VV should also be covered by the respective CS. This is necessary to take care of any modification of vv that occurs during the execution of this intermediate part, since the modification may require the cancelation or repetition of some preceding part of the BP which relied on some vv ∈ VV (e.g. performing a new visit to the new house if the address has changed), and which is used by a succeeding element (e.g. to calculate the characteristics of the requested wheelchair). However, branches in switch or flow constructs that are not on a potential path between two activities dependent on some vv , should not be unnecessarily included in the respective CS, in order to avoid unnecessary invocation of intervention processes. a) b) c) Fig. 4: CS creation examples In Figure 4, some examples of CSs are provided to illustrate the properties described above. The shaded activities are dependent on VV and should be covered by a CS. The CSs are indicated by a dashed line. In case (a), only the specific branches of the switch-constructs that comprise dependent activities are included in the CS. In situation (b), however, the second switch has to be covered entirely by a CS, because the last activity is dependent on VV as well. Any modification event regarding a vv ∈ VV that occurs during the upper branch (which is not dependent on VV ) has still to be dealt with, since the last activity may use a a variable that is a result of some dependent activities before the switch, which produced this result based on the obsolete vv . In situation (c), both branches of the first switch contain activities that are not dependent on VV . However, as they both are on a path between activities that are dependent on VV , the entire switch is covered by a CS. The main function of Algorithm 1 is extractScopes, which takes as an input a BP specification in accordance with Definition 2 and the list of volatile variables VV . extractScopes returns a list of tuples hVVi , CSi i, which correspond to the guard parts of all DSs in the BP. Given a BP = (PVi ∪ PVe , E ), VV = PVe . That is, all state variables that are declared in the SR and used in the BP should be guarded, since their modification may be a source of erroneous results. The 16 N.R.T.P. van Beest et al. BP is treated as a tree (represented in XML), where the root is the outermost element in the specification, and the leaves are the activities. The outermost loop in the function extractScopes iterates over the list of volatile variables VV . For each vv ∈ VV , critical sections are extracted separately. Identical CSs for different variables are merged into a united CS at the end by mergeScopes. The first step (line 4) is to find all activities and switch– blocks that depend directly or indirectly on the volatile variable vv , by calling the function getDependentElems. First (line 18), all activities for which vv is assigned to some of their input parameters directly or by transitivity are added to the dependent elements DE . Then (line 24), DE is augmented by adding all switch–blocks whose condition is either on vv , or some variable produced by the already considered activities. All elements in DE are arranged in a breadth-first order as they appear in the BP. The next step in extractScopes is to iterate through the list DE . In the inner loop, for each pair of elements ei , ej , it is checked whether their minimal common ancestor is of type sequence. If so, then the function getTempCS is called, which returns a set of elements that are candidates for being CSs with respect to the variable vv , and lie between ei and ej . Then, ej can be removed from DE , since subsequent inspections on it are redundant, as the appropriate CSs covering it have already been computed. Function getTempCS(ei , ej , BP ) first calls getPathBtw to compute the path between ei and ej (line 31), which comprises all elements that are part of the sequence between ei and ej , including the special markers StartBranchEl and EndBranchEl . These markers indicate the start (splits) and end points (joins) of branching elements. Consequently, a path is a list with members of type Item (line 44), where an item is either a process element or a BranchElMarker . Markers are added in the path only if they concern joins (splits) for which the respective split (join) is not encountered during the traversal of the BP from ei to ej . This way, the markers divide the path into the appropriate sequences of elements (lines 33 to 39), each of which is a candidate for being a CS. Function getPathBtw uses the auxiliary function nextItems (not explained in the algorithm for space reasons), which returns a list consisting of the next element in the sequence path, and some possible EndBranchEl , if any are encountered before the next element is fetched. These are added to the path, and the process proceeds by fetching the next items (line 45), until the element in the sequence that contains ej is reached. In the latter case, pathInElem is called, which traverses the path within this last element until ej is reached. If the element containing ej is an activity or sequence, this activity (ej ) or the subsequence till ej (line 52) are returned respectively. If the element is a switch or flow, then a StartBranchEl marker is added in the list of results, and the branch containing ej is inspected. pathInElem is called recursively on this branch, and all items in the path leading to ej are collected in pathj . Consequently, the computation of the entire path is completed, and returned to getTempCS . The path is traversed (line 33), and divided into the appropriate CSs: currCS is constructed as a sequence of the elements in path, until a marker is met, at which point currCS is added to the list of candidate CSs. Automatic Detection of Business Process Interference 17 Algorithm 1 Automatic computation of the set of the pairs Guarded={hVVi , CSi i}, consisting of volatile variables and respective elements that constitute the Critical Sections 1: function extractScopes(BP , VV ): List[(List[V], E)] 2: for each vv ∈ VV do 3: guardList = ∅ 4: DE = getDependentElems(vv , BP ) 5: for each ei ∈ DE do 6: tmpCS = ∅ 7: DE = DE .remove(ei ) 8: for each ej ∈ DE do 9: if type(minCommonAncestor(ei , ej ))=sequence then 10: tmpCS = tmpCS ∪ getTempCS(ei , ej , BP ) 11: DE = DE .remove(ej ) 12: for tmpCSi ∈ tmpCS do 13: guardList.add(h{vv }, tmpCSi i) 14: mergeScopes (guardList) 15: function getDependentElems(vv , BP ): List[Element] 16: varList = {vv } 17: DE = ∅ 18: for each ai ∈ BP .getActivities do 19: for each ipi := v ∈ ai .parseInputAssignments do 20: if v ∈ varList then 21: for each opi ∈ out(ai ) do 22: varList.add (opi ) 23: DE .add(ai ); break; 24: for each SW IT CHi ∈ BP .getSWITCHelements do 25: ci = SWITCHi .getFirstCondition 26: if ci .getLeftVariable ∈ varList then 27: miDE.add(SW IT CHi ); 28: return DE 29: function getTempCS(ei , ej , BP ): List[Elem] 30: tmpCSList = ∅ 31: path = getPathBtw(ei , ej , BP ) 32: currCS = ∅ 33: for each item ∈ path do 34: match type(item) 35: case Element: 36: currCS .attachInSeq(item) 37: case BranchElMarker: 38: tmpCSList.add(currCS ) 39: currCS = ∅ 40: return tmpCSList 18 N.R.T.P. van Beest et al. 41: function getPathBtw(ei , ej , BP ): List[Item] 42: currElem = ei 43: while ¬ currElem.contains(ej ) do 44: path.append(currItems) 45: currItems = nextItem(currElem, ei , BP ) 46: currElem = currItems.getElement 47: if currItems = ∅ then return ∅ 48: path.append(pathInElem(currElem, ej , BP )) 49: return path 50: function pathInElem(el , endEl , BP ): List[Item] 51: match type(el ) 52: case activity: 53: return {el } 54: case sequence: 55: return el .subsequenceTill (endEl ) 56: case SWITCH ∨ flow: 57: pathj = {StartBrEl } 58: branchj = el .getBranchWith(endEl ) 59: return pathj .append(pathInElem(branchj , endEl , BP ) 60: return ∅ Once the list of temporary CSs tmpCS regarding a volatile variable vv is computed as described above, extractScopes proceeds with constructing the respective guardList consisting of tuples h{vv }, tmpCSi i (line 12). After repeating the process described above for each vv ∈ VV , mergeScopes is called, in order to clean up the candidate CSs. The following steps are performed in that order: – If there are two tuples h{v1 }, CS1 i and h{v2 }, CS2 i, where CS1 and CS2 are identical, then they are replaced by a single tuple h{v1 , v2 }, CS1 i. – If there are two tuples h{v1 }, CS1 i and h{v2 }, CS2 i, where v1 =v2 and CS1 .descendantOf (CS2 ), then the former tuple is removed as redundant. – If a list of tuples on the same volatile variable set hVV , CS1 i, . . . , hVV , CSn i correspond to the branches of a switch, i.e. there is an eswitch = switch{ (CS1 , e1 ), . . . , (CSn , en )}, then these are replaced with a single CS, which covers the entire switch–element. A similar process is performed for flow branches. – If a list of tuples on the same volatile variable set hVV , CS1 i, . . . , hVV , CSn i are interrelated through a sequence relation, i.e. there is a seq{CS1 , . . . , CSn }, then these are replaced with a single CS, which covers the entire sequence. Algorithm 1 has been applied to the BP specification of the WMO process represented in Figure 1. The algorithm identified three volatile variables, and all five critical sections related to them. The total time for parsing the WMO process specification and computing all CSs is below 100 msec. The discovered CSs can then be projected on the Process Modeller, as presented in Figure 2. Automatic Detection of Business Process Interference 5 19 Related work Process interference between concurrent BPs occurs frequently in organizations, and some solutions have been provided in literature, e.g. [2, 5, 6]. Although the use of temporal logic for data-flow analysis in business processes can ensure soundness of both the control-flow and the data-flow [4], runtime disruptions due to external data changes are not accounted for. As a result, process interference can not be prevented or resolved by such methods. However, most existing mechanisms to resolve process interference are either providing a design-time solution, thus requiring that the designer anticipates all potential problems and ways to overcome them in advance, or are based on failing processes [5]. A more elaborate solution for process interference in Service-Oriented Computing is provided by [6], where in addition to failing processes, events like exceptional conditions or unavailable activities are covered. More specifically to cloud computing, an approach for handling faults due to failing processes or services is presented by [11]. In practice, however, process interference does not necessarily cause processes to fail. Often, processes may end up with providing erroneous outcomes as a result of wrong data values, a problem that is acknowledged in [2]. Interference causes processes to provide erroneous outcomes as a result of wrong data values. In most cases, however, wrong data values are interpreted a data integrity problem. Much work has been done with respect to ensuring data integrity in distributed and concurrent systems. Some techniques for checking the integrity of distributed and dynamic data stored on the cloud are discussed in [12, 13], while [14] focus on run-time failures that affect cloud short-lived data. Although the interference problem is related to concurrent data usage, the cause of the problem is beyond data integrity issues. Therefore, we focus on problems that arise at the level of process execution due to the use of outdated data. 6 Concluding Remarks One of the main challenges posed by the emergent distributed setting of modern BP Management Systems comes from the interference between different processes that access common resources. During execution of a business process, a data modification caused by some external factor may lead to erronous results, and should, therefore, be guarded and dealt with. To address this issue, the correct identification of the sections of a business process, whose correct execution depends on some volatile variable, is very important. These sections shoul be guarded upon, so that whenever a modification event is received during their execution, an appropriate intervention process is executed, in order to restore the process to a consistent state. However, the task of manual specification of these critical sections can become cumbersome and prone to errors, especially for processes with a complex structure, using many shared resources. To facilitate this task, we have developed an algorithm, which automatically computes the appropriate critical sections, given a BP specification and some semantics 20 N.R.T.P. van Beest et al. regarding the input-output and the internal state variables of the service operations used by the process. We have shown how this can be applied in a real case-study taken from the Dutch e-government. The results can be presented on a process modelling tool in a graphical way, so as to assist the process designer in the specification of the necessary dependency scopes in order to ensure the delivery of correct results by the process. References 1. Xiao, Y., Urban, S.: Process dependencies and process interference rules for analyzing the impact of failure in a service composition environment. In: Business Inf. Systems. Volume 4439 of LNCS. (2007) 67–81 2. van Beest, N.R.T.P., Bulanov, P., Wortmann, J., Lazovik, A.: Resolving business process interference via dynamic reconfiguration. In: Proc. of 8th Int. Conf. on Service Oriented Computing (ICSOC). (2010) 47–60 3. van Beest, N.R.T.P., Szirbik, N.B., Wortmann, J.C.: Assessing the interference in concurrent business processes. In: Proc. of 12th Int. Conf. on Enterprise Information Systems (ICEIS). (2010) 261–270 4. Trčka, N., van der Aalst, W., Sidorova, N.: Data-flow anti-patterns: Discovering data-flow errors in workflows. In: Adv. Inf. Systems Eng. Volume 5565 of LNCS. (2009) 425–439 5. Xiao, Y., Urban, S.: Using data dependencies to support the recovery of concurrent processes in a service composition environment. In: Proc. of the 16th Int. Conf. on Cooperative Inf. Systems. (2008) 139–156 6. Urban, S., Gao, L., Shrestha, R., Courter, A.: The dynamics of process modeling: New directions for the use of events and rules in service-oriented computing. In: The Evolution of Conceptual Modeling. Volume 6520 of LNCS. (2011) 205–224 7. van Beest, N.R.T.P., Kaldeli, E., Bulanov, P., Wortmann, J., Lazovik, A.: Automated runtime repair of business processes. Technical Report 2012-12-2, University of Groningen (2012) www.cs.rug.nl/∼eirini/papers/tech 2012-12-2.pdf. 8. Ouvans, C., Dumas, M., ter Hofstede, A., van der Aalst, W.: From BPMN process models to BPEL web services. In: Int. Conf. on Web Services. (2006) 285–292 9. Kopp, O., Martin, D., Wutke, D., Leymann, F.: On the choice between graphbased and block-structured business process modeling languages. In: Modellierung betrieblicher Informationssysteme (MobIS 2008). Volume 141 of Lecture Notes in Informatics (LNI)., Gesellschaft für Informatik e.V. (GI) (2008) 59–72 10. Rozsnyai, S., Vecera, R., Schiefer, J., Schatten, A.: Event cloud - searching for correlated business events. In: 9th IEEE Int. Conf. on E-Commerce Technology / 4th IEEE Int. Conf. on Enterprise Computing, E-Commerce and E-Services. (2007) 11. Juhnke, E., Dornemann, T., Freisleben, B.: Fault-tolerant BPEL workflow execution via cloud-aware recovery policies. In: 35th EUROMICRO Conference on Softw. Eng. and Adv. Applications (SEAA). (2009) 31 – 38 12. Sravan Kumar, R., Saxena, A.: Data integrity proofs in cloud storage. In: 3rd Int. Conf. on Communication Systems and Networks (COMSNETS). (2011) 1 – 4 13. Hao, Z., Zhong, S., Yu, N.: A privacy-preserving remote data integrity checking protocol with data dynamics and public verifiability. IEEE Trans. on Knowledge and Data Engineering 23(9) (2011) 1432–1437 14. Ko, S.Y., Hoque, I., Cho, B., Gupta, I.: Making cloud intermediate data faulttolerant. In: 1st ACM Symposium on Cloud computing. (2010) 181–192 Semantically-Governed Data-Aware Processes Diego Calvanese1 , Giuseppe De Giacomo2 , Domenico Lembo2 , Marco Montali1 , and Ario Santoso1 1 2 Free University of Bozen-Bolzano, lastname@inf.unibz.it Sapienza Università di Roma, lastname@dis.uniroma1.it Abstract. In this paper we consider processes that run over data stored in a relational database. Our setting is that of ontology-based data access (OBDA), where the information in the database is conceptually represented as an ontology and is declaratively mapped to it through queries. We are interested in verifying temporal logic formulas on the evolution of the information at the conceptual level, taking into account the knowledge present in the ontology, which allows for deducing information that is only implicitly available. Specifically, we show how, building on first-order rewritability of queries over the system state that is typical of ontology languages for OBDA, we are able to reformulate the temporal properties into temporal properties expressed over the underlying database. This allows us adopt notable decidability results on verification of evolving databases that have been established recently. 1 Introduction Recent work in business processes, services and databases brought the necessity of considering both data and processes simultaneously while designing the system. This holistic view of considering data and processes together has given rise to a line of research under the name of artifact-centric business processes [16, 14, 19, 1] that aims at avoiding the notorious discrepancy of traditional approaches where these aspects are considered separately [7]. Recently, interesting decidability results for verification of temporal properties over such systems have been obtained in the context of so-called Data-centric Dynamic Systems (DCDSs) based on relational technology [12, 6, 4, 5]. In a DCDS, processes operate over the data of the system and evolve it by executing actions that may issue calls to external services. The data returned by such external services is injected into the system, effectively making it infinite state. There has been also some work on a form of DCDS based on ontologies, where the data layer is represented in a rich ontology formalism, and actions perform a form of instance level update of the ontology [3]. The use of an ontology allows for a high-level conceptual view of the data layer that is better suited for a business level treatment of the manipulated information. Here we introduce Semantically-Governed Data-Aware Processes (SGDAP), in which we merge these two approaches by enhancing a relational layer constituted by a DCDS based system, with an ontology, constituting a semantic layer. The ontology captures the domain in which the SGDAP is executed, and allows for seeing the data and their manipulation at a conceptual level through an ontology-based data access (OBDA) system [8, 18]. Hence it provides us with a way of semantically governing 22 D. Calvanese et al. the underlying DCDS. Specifically, an SGDAP is constituted by two main components: (i) an OBDA system [8] which includes (the intensional level of) an ontology, a relational database schema, and a mapping between the ontology and the database; (ii) a process component, which characterizes the evolution of the system in terms of a process specifying preconditions and effects of action execution over the relational layer. The ontology is represented through a Description Logic (DL) TBox [2], expressed in a lightweight ontology language of the DL-Lite family [10], a family of DLs specifically designed for efficiently accessing to large amounts of data. The mapping is defined in terms of a set of assertions, each relating an arbitrary (SQL) query over the relational layer to a set of atoms whose predicates are the concepts and roles of the ontology, and whose arguments are terms built using specific function symbols applied to the answer variables of the SQL query. Such mappings specify how to populate the elements of the ontology from the data in the database, and function symbols are used to construct (abstract) objects (object terms) from the concrete values retrieved from the database. When an SGDAP evolves, each snapshot of the system is characterized by a database instance at the relational layer, and by a corresponding virtual ABox, which together with the TBox provides a conceptual view of the relational instance at the semantic layer. When the system is progressed by the process component, we assume that at every time the current instance can be arbitrarily queried, and can be updated through action executions, possibly involving external service calls to get new values from the environment. Hence the process component relies on three main notions: actions, which are the atomic progression steps for the data layer; external services, which can be called during the execution of actions; and a process, which is essentially a non-deterministic program that uses actions as atomic instructions. During the execution, the snapshots of the relational layer can be virtually mapped as ABoxes in the semantic layer. This enables to: (i) understand the evolution of the system at the conceptual level, and (ii) govern it at the semantic level, rejecting those actions that, executed at the relational layer, would lead to a new semantic snapshot that is inconsistent with the semantic layer’s TBox. In this work, we are interested in verifying dynamic properties specified in a variant of µ-calculus [15], one of the most powerful temporal logics, expressed over the semantic layer of an SGDAP. We consider properties expressed as µ-calculus formulae whose atoms are queries built over the semantic layer. By relying on techniques for query answering in DL-Lite OBDA systems, which exploit FOL rewritability of query answering and of ontology satisfiability, we reformulate the temporal properties expressed over the semantic layer into analogous properties over the relational layer. Given that our systems are in general infinite-state, verification of temporal properties is undecidable. However, we show how we can adapt to our setting recent results on the decidability of verification of DCDSs based on suitable finite-state abstractions [5]. 2 Preliminaries In this section we introduce the description logic (DL) DL-LiteA,id and describe the ontology-based data access (OBDA) framework. DL-LiteA,id [11, 8] allows for specifying concepts, representing sets of objects, roles, representing binary relations between objects, and attributes, representing binary relations between objects and values. The syntax of concept, role and attribute expressions Semantically-Governed Data-Aware Processes 23 in DL-LiteA,id is as follows: B −→ N | ∃R | δ(U ) R −→ P | P − Here, N , P , and U respectively denote a concept name, a role name, and an attribute name, P − denotes the inverse of a role, and B and R respectively denote basic concepts and basic roles. The concept ∃R, also called unqualified existential restriction, denotes the domain of a role R, i.e., the set of objects that R relates to some object. Similarly, the concept δ(U ) denotes the domain of an attribute U , i.e., the set of objects that U relates to some value. Note that we consider here a simplified version of DL-LiteA,id where we distinguish between between objects and values, but do not further deal with different datatypes; similarly, we consider only a simplified version of identification assertions. A DL-LiteA,id ontology is a pair (T , A), where T is a TBox, i.e., a finite set of TBox assertions, and A is an Abox, i.e., a finite set of ABox assertions. DL-LiteA,id TBox assertions have the following form: B1 ⊑ B2 B1 ⊑ ¬B2 (id B Z1 , . . . , Zn ) R1 ⊑ R2 R1 ⊑ ¬R2 (funct R) U1 ⊑ U2 U1 ⊑ ¬U2 (funct U ) From left to right, assertions of the first row denote inclusions between basic concepts, basic roles, and attributes; assertions of the second row denote disjointness between basic concepts, basic roles, and attributes; assertions of the last row denote identification (assertions) (IdA), and global functionality on roles and attributes. In the IdA, each Zi denotes either an attribute or a basic role. Intuitively, an IdA of the above form asserts that for any two different instances o, o′ of B, there is at least one Zi such that o and o′ differ in the set of their Zi -fillers, that is the set of objects (if Zi is a role) or values (if Zi is an attribute) that are related to o by Zi . As usual, in DL-LiteA,id TBoxes we impose that roles and attributes occurring in functionality assertions or IdAs cannot be specialized (i.e., they cannot occur in the right-hand side of inclusions). DL-LiteA,id ABox assertions have the form N (t1 ), P (t1 , t2 ), or U (t1 , v1 ), where t1 and t2 denote individual objects and v1 denotes a value. The semantics of DL-LiteA,id is given in [11]. We only recall here that we interpret objects and values over distinct domains, and that for both we adopt the Unique Name Assumption, i.e., different constants denote different objects (or values). The notions of entailment, satisfaction, and model are as usual [11]. We also say that A is consistent wrt T if (T , A) is satisfiable, i.e., admits at least one model. Next we introduce queries. As usual (cf. OWL 2), answers to queries are formed by terms denoting individuals appearing in the ABox. The domain of an ABox A, denoted by ADOM(A), is the (finite) set of terms appearing in A. A union of conjunctive queries (UCQ) q over a TBox T is a FOL formula of the form ∃~y1 .conj 1 (~x, ~y1 ) ∨ · · · ∨ ∃~yn .conj n (~x, ~yn ), with free variables ~x and existentially quantified variables ~y1 , . . . , ~yn . Each conj i (~x, y~i ) in q is a conjunction of atoms of the form N (z), P (z, z ′ ), U (z, z ′ ) where N , P and U respectively denote a concept, role and attribute name of T , and z, z ′ are constants in a set C or variables in ~x or y~i , for some i ∈ {1, . . . , n}. The (certain) answers to q over an ontology (T , A) is the set ans (q, T , A) of substitutions3 3 As customary, we can view each substitution simply as a tuple of constants, assuming some ordering of the free variables of q. 24 D. Calvanese et al. σ of the free variables of q with constants in ADOM(A) such that qσ evaluates to true in every model of (T , A). If q has no free variables, then it is called boolean, and its certain answers are true or false. Computing ans (q, T , A) of a UCQ q over a DL-LiteA,id ontology (T , A) is in AC 0 in the size of A [11]. This is actually a consequence of the fact that DL-LiteA,id enjoys the FOL rewritability property, which in our setting says that for every UCQ q, ans (q, T , A) can be computed by evaluating the UCQ REW(q, T ) over A considered as a database. REW(q, T ) is the so-called perfect reformulation of q w.r.t. T [11]. We also recall that, in DL-LiteA,id , ontology satisfiability is FOL rewritable. In other words, we can construct a boolean FOL query qunsat (T ) that evaluates to true over an ABox A iff the ontology (T , A) is unsatisfiable. In our framework, we consider an extension of UCQs, called ECQs, which are queries of the query language EQL-Lite(UCQ) [9]. Formally, an ECQ over a TBox T is a possibly open domain independent formula of the form: Q −→ [q] | ¬Q | Q1 ∧ Q2 | ∃x.Q | x = y where q is a UCQ over T and [q] denotes that q is evaluated under the (minimal) knowledge operator (cf. [9]). To compute the certain answers ANS (Q, T , A) to an ECQ Q over an ontology (T , A), we can compute the certain answers over (T , A) of each UCQ embedded in Q, and evaluate the first-order part of Q over the relations obtained as the certain answers of the embedded UCQs. Hence, also computing ANS (Q, T , A) of an ECQ Q over a DL-LiteA,id ontology (T , A) is in AC 0 in the size of A [9]. Ontology-Based Data Access (OBDA). In an OBDA system, a relational database is connected to an ontology that represents the domain of interest by a mapping, which relates database values with values and (abstract) objects in the ontology (c.f. [8]). In particular, we make use of a countably infinite set V of values and a set Λ of function symbols, each with an associated arity. We also define the set C of constants as the union of V and the set {f (d1 , . . . , dn ) | f ∈ Λ and d1 , . . . , dn ∈ V} of object terms. Formally, an OBDA system is a structure O = hR, T , Mi, where: (i) R = {R1 , . . . , Rn } is a database schema, constituted by a finite set of relation schemas; (ii) T is a DL-LiteA,id TBox; (iii) M is a set of mapping assertions, each of the form: Φ(~x) ❀ Ψ (~y , ~t), where: (a) ~x is a non-empty set of variables, (b) ~y ⊆ ~x, (c) ~t is a set of object terms of the form f (~z), with f ∈ Λ and ~z ⊆ ~x, (d) Φ(~x) is an arbitrary SQL query over D, with ~x as output variables, and (e) Ψ (~y , ~t) is a conjunctive query over T of arity n > 0 without non-distinguished variables, whose atoms are over the variables ~y and the object terms ~t. Example 1. As a running example, we consider a simple university information system that stores and manipulates data concerning students and their degree. In particular, we define an OBDA system O = hR, T , Mi to capture the conceptual schema of such a domain, how data are concretely maintained in a relational database, and how the two information levels are linked through mappings. The conceptual schema is depicted in Figure 1, and formalized as the following DL-LiteA,id TBox T : Bachelor ⊑ Student Master ⊑ Student Graduated ⊑ Student δ(MNum) ⊑ Student Student ⊑ δ(MNum) (funct MNum) (id Student MNum) The conceptual schema states that Bachelor and Master are subclasses of Student, that some Students could be already Graduated, and that MNum (representing the matriculation number) is Semantically-Governed Data-Aware Processes 25 Bachelor Student Master Graduated mNum: String Fig. 1. UML conceptual schema for our running example. an attribute relating individuals of type Student (domain of the attribute) to corresponding Codes (range of the attribute). The conceptual schema also expresses that each Student has exactly one matriculation number, and we assume that matriculation numbers can be used to identify Students (i.e., each MNum is associated to at most one Student). Data related to students are maintained in a concrete underlying data source that obeys the database schema R, constituted by the following relation schemas: (i) ENROLLED(id, name, surname, type, endDate) stores information about students that are currently (endDate=NULL) or were enrolled in a bachelor (type="Bachelor") or master (type="Master") course. (ii) GRAD(id, mark, type) stores data of former students who have been graduated. (iii) TRANSF M(name, surname) is a temporary relation used to maintain information about master students that have been recently transferred from another university, and must still complete the enrollment process. The interconnection between the database schema R and the conceptual schema T is specified through the following set M of mappings: m1 : SELECT name, surname, type FROM ENROLLED WHERE type ="Bachelor" ❀ Bachelor(stu1 (name, surname, type)) m2 : SELECT name, surname, type FROM ENROLLED WHERE type ="Master" ❀ Master(stu1 (name, surname, type)) m3 : SELECT name, surname, type, id FROM ENROLLED ❀ MNum(stu1 (name, surname, type), val(id)) m4 : SELECT name, surname FROM TRANSF M ❀ Master(stu1 (name, surname, "Master")) m5 : SELECT e.name, e.surname, e.type FROM ENROLLED e, GRAD g WHERE e.id = g.id ❀ Graduated(stu1 (name, surname, type)) Intuitively, m1 (m2 resp.) maps every id in ENROLLED with type "Bachelor" ("Master") to a bachelor (master) student. Such a student is constructed by “objectifying” the name, surname and course type using variable term stu1 /3. In m3 , the MNum attribute is instead created using directly the value of id to fill in the target of the attribute. Notice the use of the val function symbol for mapping id to the range of MNum. Mapping m4 leads to create further master students by starting from the temporary TRANSF M table. Since such students are not explicitly associated to course type, but it is intended that they are "Master", objectification is applied to students’ name and surname, adding "Master" as a constant in the variable term. Notice that, according to the TBox T , such students have a matriculation number, but its value is not known (and, in fact, no mapping exists to generate their MNum attribute). Finally, m5 generates graduated students by selecting only those students in the ENROLLED table whose matriculation number is also contained in the GRAD table. ⊓ ⊔ Given a database instance D made up of values in V and conforming to schema R, and given a mapping M, the virtual ABox generated from D by a mapping assertion S m = Φ(x) ❀ Ψ (y, t) in M is m(D) = v∈eval(Φ,D) Ψ [x/v], where eval (Φ, D) denotes the evaluation of the SQL query Φ over D, and where we consider Ψ [x/v] to be a set of atoms (as opposed to S a conjunction). Then, the ABox generated from D by the mapping M is M(D) = m∈M m(D). Notice that ADOM(M(D)) ⊆ C. As for ABoxes, the active domain ADOM(D) of a database instance D is the set of values occurring in D. Notice that ADOM(D) ⊆ V. Given an OBDA system O = hR, T , Mi and a database instance D for R, a model for O wrt D is a model of the ontology (T , M(D)). We say that O wrt D is satisfiable if it admits a model wrt D. 26 D. Calvanese et al. Example 2. Consider a database instance D = {ENROLLED(123, john, doe, Bachelor, NULL)}. The corresponding virtual ABox obtained from the application of the mapping M is M(D) = {Bachelor(stu1 (john, doe, Bachelor)), MNum(stu1 (john, doe, Bachelor), val(123))}. ⊓ ⊔ An UCQ q over an OBDA system O = hR, T , Mi is simply an UCQ over T . To compute the certain answers of q over O wrt a database instance D for R, we follow a three-step approach: (i) q is rewritten to compile away T , obtaining qr = REW(q, T ); (ii) the mapping M is used to unfold qr into a query over R, denoted by UNFOLD(qr , M), which turns out to be an SQL query [17]; (iii) such a query is executed over D, obtaining the certain answers. For an ECQ, we can proceed in a similar way, applying the rewriting and unfolding steps to the embedded UCQs. It follows that computing certain answers to UCQs/ECQs in an OBDA system is FOL rewritable. Applying the unfolding step to qunsat (T ), we obtain also that satisfiability in O is FOL rewritable. 3 Semantically-Governed Data-Aware Processes A Semantically-Governed Data-Aware Process (SGDAP) S = hO, P, D0 i is formed by an OBDA System O = hR, T , Mi by a process component P, and by an initial database instance D0 that conforms to the relational schema R in O. Intuitively, the OBDA system keeps all the data of interest, while the process component modifies and evolves such data, starting from the initial database D0 . The process component P constitutes the progression mechanism for the SGDAP. Formally, P = hF, A, πi, where: (i) F is a finite set of functions representing calls to external services, which return values; (ii) A is a finite set of actions, whose execution progresses the data layer, and may involve external service calls; (iii) π is a finite set of condition-action rules that form the specification of the overall process, which tells at any moment which actions can be executed. An action α ∈ A has the form α(p1 , . . . , pn ) : {e1 , . . . , em }, where: (i) α(p1 , . . . , pn ) is the signature of the action, constituted by a name α and a sequence p1 , . . . , pn of input parameters that need to be substituted with values for the execution of the action, and (ii) {e1 , . . . , em } is a set of effect specifications, whose specified effects are assumed to take place simultaneously. Each ei has the form qi+ ∧ Q− Ei , i where: (a) qi+ ∧Q− is a query over R whose terms are variables ~ x , action parameters, and i constants from ADOM(D0 ). The query qi+ is a UCQ, and the query Q− i is an arbitrary FO formula whose free variables are included in those of qi+ . Intuitively, qi+ selects the tuples to instantiate the effect, and Q− i filters away some of them. (b) Ei is the effect, i.e., a set of facts for R, which includes as terms: terms in ADOM(D0 ), input parameters, free variables of qi+ , and in addition Skolem terms formed by applying a function f ∈ F to one of the previous kinds of terms. Such Skolem terms involving functions represent external service calls and are interpreted so as to return a value chosen by an external user/environment when executing the action. The process π is a finite set of condition-action rules Q 7→ α, where α is an action in A and Q is a FO query over R whose free variables are exactly the parameters of α, and whose other terms can be quantified variables or values in ADOM(D0 ). Example 3. Consider the OBDA system O defined in Example 1. We now define a process component P = hF , A, πi over the relational schema R of O, so as to obtain a full SGDAP. Semantically-Governed Data-Aware Processes 27 In particular, π is constituted by the following condition-action rules (’ ’ denotes existentially quantified variables that are not used elsewhere): – ENROLLED(id, , , , NULL) GRADUATE(id) – TRANSF M(name, surname) COMPL - ENR (name, surname) The first rule extracts a matriculation number id of a currently enrolled student (endDate=NULL) from the ENROLLED relation and graduates the student, whereas the second rule selects a pair name surname in TRANSF M and use them to complete the enrollment of that student. In order to be effectively executed, the involved actions rely on the following set F of service calls: (i) today() returns the current date; (ii) getMark(id, type) returns the final mark received by student id; (iii) getID(name, surname, type) returns the matriculation number for the name-surname pair of a student. The two actions GRADUATE and COMPL - ENR are then defined as follows: GRADUATE (id) : { GRAD(id2 , m, t) GRAD(id2 , m, t), TRANSF M(n, s), TRANSF M(n, s) ENROLLED(id2 , n, s, t, d) ∧ id2 6= id ENROLLED(id2 , n, s, t, d), ENROLLED(id, n, s, t, NULL) ENROLLED(id, n, s, t, today()), ENROLLED(id, , , t, NULL) GRAD(id, getMark(id, t), t) }; COMPL - ENR (n, s) : { GRAD(id, m, t) GRAD(id, m, t), ENROLLED(id, n2 , s2 , t, d) ENROLLED(id, n2 , s2 , t, d), TRANSF M(n2 , s2 ), TRANSF M(n2 , s2 ) ∧ (n2 6= n ∨ s2 6= s) ENROLLED(getID(n, s, "Master"), n, s, "Master", NULL)} TRANSF M(n, s) Given a matriculation number id, action GRADUATE inserts a new tuple for id in GRAD, updating at the same time the enrollment’s end date for id in ENROLLED to the current date, while keeping all other entries in TRANSF M, GRAD and ENROLLED. Given a name and surname, action COMPL - ENR has the effect of moving the corresponding tuple in TRANSF M to a new tuple in ENROLLED, for which the matriculation number is obtained by interacting with the getID service call; all other entries TRANSF M, GRAD and ENROLLED are preserved. ⊓ ⊔ 4 Semantics of SGDAP This work focuses on the semantics of SGDAP assuming that external services behave nondeterministically, i.e., two calls of a service with the same arguments may return different results during the same run. This captures both services that model a truly nondeterministic process (e.g., human operators), and services that model stateful servers. Let S = hO, P, D0 i be a SGDAP where O = hR, T , Mi and P = hF, A, πi. The semantics of S is defined in terms of a possibly infinite transition system (TS), which represents all possible computations that the process component can do over the data starting from D0 . We start by defining the semantics of action execution. Let α be an action in A of the form α(~ p) : {e1 , . . . , en } with effects ei = qi+ ∧ Q− Ei , and let σ i be a substitution of p~ with values in V. The evaluation of the effects of α on a database instance D using a substitution σ is captured by the following function: DO (D, α, σ) = S qi+ ∧Q− i Ei in α S θ∈ANS ((qi+ ∧Q− i )σ,D) Ei σθ which returns a database instance made up of values in V and Skolem terms representing service calls. We denote with CALLS(DO (D, α, σ)) such service calls, and with EVALS (D, α, σ) the set of substitutions that replace these service calls with values in V: EVALS (D, α, σ) = {θ | θ : CALLS ( DO (D, α, σ)) → V is a total function}. 28 D. Calvanese et al. We then say that the database instance D′ over V and conforming to R is produced from D by the application of action α using substitution σ if D′ = DO (D, α, σ)θ, where θ ∈ EVALS (D, α, σ). Relational Layer Transition System (RTS). Let S = hO, P, D0 i be a SGDAP with O = hR, T , Mi. The RTS ΥSR of S is formally defined as hR, Σ, s0 , db, ⇒i, where Σ is a (possibly infinite) set of states, s0 is the initial state, db is a total function from states in Σ to database instances made up of values in V and conforming to R, and ⇒⊆ Σ × Σ is a transition relation. Σ, ⇒ and db are defined by simultaneous induction as the smallest sets such that s0 ∈ Σ, with db(s0 ) = D0 , and satisfying the following property: Given s ∈ Σ, for each condition-action rule Q(~ p) 7→ α(~ p) ∈ π, for each substitution σ of p~ such that σ ∈ ANS (Q, D), consider every database instance D′ produced from D by the application of α using σ. Then: (i) if there exists s′ ∈ Σ such that db(s′ ) = D′ , then s ⇒ s′ ; (ii) otherwise, if O is satisfiable wrt D′ , then s′ ∈ Σ, s ⇒ s′ and db(s′ ) = D′ , where s′ is a fresh state. We observe that the satisfiability check done in the last step of the RTS construction accounts for semantic governance. Semantic Layer Transition System (STS). Given a SGDAP S with O = hR, T , Mi and with RTS ΥSR = hR, Σ, s0 , db, ⇒i, the STS ΥSS of S is a “virtualization” of the RTS in the semantic layer. In particular, ΥSS maintains the structure of ΥSR unaltered, reflecting that the process component is executed over the relational layer, but it associates each state to a virtual ABox obtained from the application of the mapping M to the database instance associated by ΥSR to the same state. Formally, ΥSS = hT , Σ, s0 , abox, ⇒i, where abox is a total function from Σ to ABoxes made up of individual objects in C and conforming to T , such that for each s ∈ Σ with db(s) = D, abox(s) = M(D). 5 Dynamic Constraints Formalism Let S = hO, P, D0 i be an SGDAP where O = hR, T , Mi and P = hF, A, πi. We are interested in the verification of conceptual temporal properties over S, i.e., properties that constrain the dynamics of S understood at the semantic layer. Technically, this means that properties are verified over the SGDAP’s STS ΥSS , combining temporal operators with queries posed over the ontologies obtained by combining the TBox T with the ABoxes associated to the states of ΥSS . More specifically, we adopt ECQs [9] to query the ontologies of ΥSS , and µ-calculus [15] to predicate over the dynamics of ΥSS . We use a variant of µ-calculus [15], one of the most powerful temporal logics subsuming LTL, PSL, and CTL* [13], called µLEQL C , whose formulae have the form: Φ ::= Q | Z | ¬Φ | Φ1 ∧ Φ2 | ∃x ∈ C0 .Φ | h−iΦ | µZ.Φ where Q is an ECQ over T , C0 = ADOM(M(D0 )) is the set of object terms appearing in the initial virtual ABox (obtained by applying the mapping M over the database instance D0 ), and Z is a predicate variable. As usual, syntactic monotonicity is enforced to ensure existence of unique fixpoints. Beside the usual FOL abbreviations, we also make use of the following ones: [−]Φ = ¬h−i(¬Φ) and νZ.Φ = ¬µZ.¬Φ[Z/¬Z]. The subscript C in µLEQL stands for “closed”, and attests that ECQs are closed queries. In C fact, µLEQL formulae only support the limited form of quantification ∃x ∈ C0 .Φ, which C Semantically-Governed Data-Aware Processes 29 W is a convenient, compact notation for c∈ADOM(M(D0 )) Φ[x/c]. We make this assumption for simplicity, but actually, with some care, our result can be extended to a more general form of quantification over time [5]. In order to define the semantics of µLEQL we resort to transition systems. Let C Υ = hT , Σ, s0 , abox, ⇒i be an STS. Let V be a predicate and individual variable valuation on Υ , i.e., a mapping from the predicate variables Z to subsets of the states Σ, and from individual variables to constants in ADOM(M(D0 )). Then, we assign meaning to µLEQL formulas by associating to Υ and V an extension function (·)A V , which maps C EQL µLC formulas to subsets of Σ. The extension function (·)A is defined inductively as: V (Q)A V (Z)A V (¬Φ)A V (Φ1 ∧ Φ2 )A V (∃x ∈ C0 .Φ)A V (h−iΦ)A V (µZ.Φ)A V = {s ∈ Σ | ANS (QV, T , abox(s)) = true} = V (Z) ⊆ Σ = Σ − (Φ)A V = (Φ1 )A ∩ (Φ2 )A V V S = {(Φ)A V [x/c] | c ∈ ADOM (M(D0 ))} = {s ∈ Σ | ∃s′ . s ⇒ s′ and s′ ∈ (Φ)A V} T = {E ⊆ Σ | (Φ)A ⊆ E} v[Z/E],V A When Φ is a closed formula, (Φ)A V does not depend on V , and we denote it by (Φ) . We EQL are interested in the model checking problem, i.e., verify whether a µLC closed formula S Φ holds for the SGDAP S. This problem is defined as checking whether s0 ∈ (Φ)ΥS , that is, whether Φ is true in the initial state s0 of ΥSS . If it is the case, we write ΥSS |= Φ. Example 4. An example of dynamic property in our running example is Φ = µZ.((∀s.[Student(s)] → [Graduated(s)]) ∨[−]Z), which says that every evolution of the system leads to a state in which all students present in that state are graduated. ⊓ ⊔ 6 Verification of Dynamic Properties over SGDAPs We now describe how µLEQL properties can be effectively verified over SGDAPs. Let C S = hO, P, D0 i be an SGDAP where O = hR, T , Mi and P = hF, A, πi. Let Φ be dynamic property specified over the T , and let ΥSS and ΥSR respectively be a µLEQL C the STS and RTS of S. The main issue to be tackled is that ΥSS and ΥSR are in general infinite-state, and their verification undecidable. In [5], some decidability boundaries for the verification of Data-Centric Dynamic Systems (DCDSs) have been extensively studied. DCDSs are tightly related to SGDAPs, with some key differences in the data component: (i) the process component is identical in the two frameworks; (ii) DCDSs are only equipped with a relational layer, i.e., no ontology nor mapping are specified; (iii) while SGDAPs define constraints over the data at the semantic layer, DCDSs are equipped with denial constraints posed directly over the database schema. Given a µLEQL property Φ, we therefore attack the verification problem ΥSS |= Φ in the following C way: (1) We transform Φ into a corresponding µLC property Φ′ , i.e., a µL property whose atoms are closed FO queries over R, thus reducing ΥSS |= Φ to ΥSR |= Φ′ . (2) We show, again exploiting FOL rewritability in DL-LiteA , that the consistency check used to generate ΥSR can be rewritten as denial constraints over R. This means that ΥSR can be generated by a purely relational DCDS. (3) We argue that Φ′ belongs to the dynamic 30 D. Calvanese et al. S1 A1 ΥSS S3 A2 S0 A0 S2 |= Φ A3 Semantic Transition System M M D0 M S1 D1 ΥR S M S3 D2 S0 Φ� = unfold(rew(Φ, T ), M) S2 S1 D1 |= Φ� D3 D0 S3 D2 S0 D3 S2 |= Φ� Abstract Transition System Relational Transition System Fig. 2. Verification of dynamic µLEQL properties over SGDAP C property language investigated in [5] for DCDSs under the nondeterministic semantics. (4) We can therefore reuse the decidability results of [5] to check whether ΥSR |= Φ′ can be decided and, in the positive case, we apply the abstraction technique defined in [5] for reducing the verification problem to conventional finite-state model checking. Details are provided below. The idea of the approach is depicted in Figure 2. Property Transformation. In order to transform the property, we separate the treatment of the dynamic part and of the embedded ECQs. Since the dynamics of an SGDAP is completely determined at the relational layer, the dynamic part is maintained unaltered. ECQs are instead manipulated as defined in Section 2. In particular, the rewriting of Φ wrt the TBox T , denoted by Φr = REW(Φ, T ), is done by replacing each embedded ECQ with its corresponding rewriting wrt T . Example 5. Consider the µLEQL property Φ described in Example 4, together with the TBox T C introduced in Example 1. The rewriting of Φ wrt T produces Φr = REW (Φ, T ), which is: µZ.(∀s.[Student(s) ∨ Bachelor(s) ∨ Master(s) ∨ MNum(s, )] → [Graduated(s)]) ∨ [−]Z ⊔ ⊓ Before unfolding the rewritten dynamic property Φ we translate each subformula of r W the form ∃x ∈ C0 .Ψ into the equivalent form c∈ADOM(M(D0 )) Ψ [x/c]. This means that when such a form of quantification is used, the initial ABox must be materialized in order to compute the initial active domain of the semantic layer. We then extend the UNFOLD() function defined in Section 2 to unfold a µLEQL dynamic property over the semantic C layer into a corresponding property over the relational layer. As for the rewriting, the temporal structure is maintained unaltered, reflecting that the dynamics of SGDAPs is determined at the relational layer. For what concerns the ECQs embedded in the property, the interesting case to be discussed is the one of (existential) quantification: UNFOLD (∃x.ϕ, M) =W ∃x.UNFOLD(ϕ, M) ∨ (f /n)∈FS(M) ∃x1 , . . . , xn .UNFOLD(ϕ[x/f (x1 , . . . , xn )], M) where FS(M) is the set of function symbols contained in M. This unfolding reflects that quantification over individuals at the semantic layer must be properly rephrased as a corresponding quantification over those values in the relational layer that could lead to produce such individuals through the application of M. This is done by unfolding ∃x.ϕ into a disjunction of formulae, where: (i) the first formula corresponds to ∃x.ϕ itself, and is used to tackle the case in which x appears in the range of an attribute, which is in fact a value; (ii) Each of the other formulae is obtained from ϕ by replacing x with one of the possible variable terms produced by M, and quantifying over the existence of values used to construct the corresponding object term. Semantically-Governed Data-Aware Processes 31 property Φr of Example 5, together with the mapping Example 6. Let us consider the µLEQL C M defined in Example 1. We get that UNFOLD(Φr , M) corresponds to:  µZ. ∀x1 , x2 , x3 .AUXm3 (x1 , x2 , x3 , ) → AUX m5 (x1 , x2 , x3 )  ∨ [−]Z where AUXm3 (name, surname, type, id) and AUXm5 (name, surname, type) represent the auxiliary view predicates of mapping assertions m3 and m5 respectively, whose defining queries are the SQL queries in the left-hand side of the mapping assertion themselves. When unfolding the UCQ Student(stu1 (x1 , x2 , x3 )) ∨ Bachelor(stu1 (x1 , x2 , x3 )) ∨ Master(stu1 (x1 , x2 , x3 )) ∨ MNum(stu1 (x1 , x2 , x3 ), ), we notice that the involved mapping assertions are m1 , m2 , and m3 . However, we only consider m3 , because the query on its left-hand side contains the ones on the left-hand side of m1 and m2 . Reduction to Data-Centric Dynamic Systems. The connection between SGDAPs and DCDSs is straightforward (see [5] for the definition of DCDS). Given a SGDAP S = hO, P, D0 i with O = hR, T , Mi, we can construct a corresponding DCDS with nondeterministic services SREL = hD, Pi, where D = hV, R, {qunsat (T ) → false}, D0 i. Thanks to this encoding, we obtain ΥSR ≡ ΥSDCDS , where ΥSDCDS is the RTS constructed REL REL for the DCDS SREL following the definition in [5]. Verification. Leveraging on the parallel between SGDAPs and DCDSs, verification of a µLEQL property over a SGDAP can be reduced to the verification of a µLC property C over the corresponding DCDS. In fact, µLC (µ-calculus over closed FOL queries) is contained in the fragments of FO µ-calculus studied for DCDSs in [5], namely µLA and µLP . Both µLA and µLP support FOL queries over the DCDS, allowing for controlled forms of FO quantification across states, and therefore they clearly support FO sentences. Let S = hO, P, D0 i be a SGDAP with O = hR, T , Mi, STS ΥSS and ΥSR = hR, Σ, s0 , db, ⇒i. We say that ΥSR is state-bounded if there exists a bound b such that for each s ∈ Σ, |ADOM(db(s))| < b. Let Φ be a µLEQL property, and let Φ′ = C S UNFOLD ( REW (Φ, T ), M). Since (i) ΥS |= Φ can be reduced to ΥSR |= Φ′ , (ii) Φ′ belongs to µLC (which is contained in µLP ), (iii) ΥSR can be generated by a DCDS with nondeterministic services, we can reuse the decidability results presented in [5]. In particular, we obtain that ΥSS |= Φ is decidable if ΥSR is state bounded. Verification can in this case be reduced to conventional finite-state model checking. Example 7. Consider the SGDAP S = hO, P, D0 i, where O is the OBDA system defined in Example 1, P the process component defined in Example 3. It is easy to see that the resulting RTS ΥSR is state-bounded. Intuitively, this follows from the facts that the actions of S either move tuples from the TRANSF M table to the ENROLLED one, or copy tuples from the ENROLLED table to the GRAD one. Hence, the size of each database instance appearing in ΥSR is at most twice the size of D0 , thus verification of µLEQL properties over the STS ΥSS is decidable. ⊓ ⊔ C Acknowledgements. This research has been partially supported by the ICT Collaborative Project ACSI (Artifact-Centric Service Interoperation), funded by the EU under FP7 ICT Call 5, 2009.1.2, grant agreement No. FP7-257593. 32 D. Calvanese et al. References 1. S. Abiteboul, P. Bourhis, A. Galland, and B. Marinoiu. The AXML artifact model. In Proc. of TIME 2009, pages 11–17, 2009. 2. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, 2003. 3. B. Bagheri Hariri, D. Calvanese, G. De Giacomo, and R. De Masellis. Verification of conjunctive-query based semantic artifacts. In Proc. of DL 2011, volume 745 of CEUR, ceur-ws.org, 2011. 4. B. Bagheri Hariri, D. Calvanese, G. De Giacomo, R. De Masellis, and P. Felli. Foundations of relational artifacts verification. In Proc. of BPM 2011, volume 6896 of LNCS, pages 379–395. Springer, 2011. 5. B. Bagheri Hariri, D. Calvanese, G. De Giacomo, A. Deutsch, and M. Montali. Verification of relational data-centric dynamic systems with external services. CoRR Technical Report arXiv:1203.0024, arXiv.org e-Print archive, 2012. Available at http://arxiv.org/ abs/1203.0024. 6. F. Belardinelli, A. Lomuscio, and F. Patrizi. Verification of deployed artifact systems via data abstraction. In Proc. of ICSOC 2011, 2011. 7. K. Bhattacharya, C. Gerede, R. Hull, R. Liu, and J. Su. Towards formal analysis of artifactcentric business process models. In Proc. of BPM 2007, volume 4714 of LNCS, pages 288–234. Springer, 2007. 8. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, M. Rodrı́guez-Muro, and R. Rosati. Ontologies and databases: The DL-Lite approach. In S. Tessaris and E. Franconi, editors, Semantic Technologies for Informations Systems – 5th Int. Reasoning Web Summer School (RW 2009), volume 5689 of LNCS, pages 255–356. Springer, 2009. 9. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. EQL-Lite: Effective first-order query processing in description logics. In Proc. of IJCAI 2007, 2007. 10. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. of Automated Reasoning, 39(3):385–429, 2007. 11. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Path-based identification constraints in description logics. In Proc. of KR 2008, pages 231–241, 2008. 12. P. Cangialosi, G. De Giacomo, R. De Masellis, and R. Rosati. Conjunctive artifact-centric services. In Proc. of ICSOC 2010, volume 6470 of LNCS, pages 318–333. Springer, 2010. 13. E. M. Clarke, O. Grumberg, and D. A. Peled. Model checking. The MIT Press, Cambridge, MA, USA, 1999. 14. D. Cohn and R. Hull. Business artifacts: A data-centric approach to modeling business operations and processes. IEEE Bull. on Data Engineering, 32(3):3–9, 2009. 15. E. A. Emerson. Automated temporal reasoning about reactive systems. In F. Moller and G. Birtwistle, editors, Logics for Concurrency: Structure versus Automata, volume 1043 of LNCS, pages 41–101. Springer, 1996. 16. A. Nigam and N. S. Caswell. Business artifacts: An approach to operational specification. IBM Systems Journal, 42(3):428–445, 2003. 17. A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, and R. Rosati. Linking data to ontologies. J. on Data Semantics, X:133–173, 2008. 18. M. Rodrı́guez-Muro and D. Calvanese. Dependencies: Making ontology based data access work in practice. In Proc. of AMW 2011, volume 749 of CEUR, ceur-ws.org, 2011. 19. W. M. P. van der Aalst, P. Barthelmess, C. A. Ellis, and J. Wainer. Proclets: A framework for lightweight interacting workflow processes. Int. J. of Cooperative Information Systems, 10(4):443–481, 2001. Knowledge-intensive Processes: An Overview of Contemporary Approaches⋆ Claudio Di Ciccio, Andrea Marrella, and Alessandro Russo Sapienza Università di Roma, Rome, Italy {cdc,marrella,arusso}@dis.uniroma1.it Abstract. Engineering of knowledge-intensive processes is far from being mastered. Processes are defined knowledge-intensive when people/agents carry them out in a fair degree of “uncertainty”, where the uncertainty depends on different factors, such as the high number of tasks to be represented, their unpredictable nature, or their dependency on the scenario. In the worst case, there is no pre-defined view of the knowledge-intensive process, and tasks are mainly discovered as the process unfolds. In this work, starting from three different real scenarios, we present a critical comparative analysis of the existing approaches used for supporting knowledge-intensive processes, and we discuss some recent research techniques that may complement or extend the existing state of the art. Keywords: Knowledge-intensive Processes, Process Management Systems, Health Care, Process Adaptation, Process Mining 1 Introduction Process management systems (PMSs) hold the promise of facilitating the everyday operation of many enterprises and work environments. However, PMSs remain especially useful in a limited range of applications where business processes can be described with relative ease. Current modeling techniques are used to codify processes that are completely predictable: all possible paths along the process are well-understood, and the process participants never need to make a decision about what to do next, since the workflow is completely determined by their data entry or other attributes of the process. This kind of highly-structured work includes mainly production and administrative processes. However, most business functions involve collaborative features and unstructured processes that do not have the same level of predictability as the routine structured work [58]. In [29] processes have been classified on the basis of their “degree of structure”. Traditional PMSs perform well with fully structured processes and controlled interactions between participants. A major assumption is that such processes, after having been modeled, can be repeatedly instantiated and executed ⋆ This work has been partly supported by the SAPIENZA grant TESTMED and by the EU Commission through the project SmartVortex 34 C. Di Ciccio et al. in a predictable and controlled manner. However, even for structured processes, the combination and sequence of tasks may vary from instance to instance due to changes in the execution context such as user preferences, or modifications in the environment such as exceptions and changes in the business rules. In such cases (structured processes with ad hoc exceptions), processes should be adapted accordingly (e.g. by adding, removing or generating an alternative sequence of activities). In general, structured processes can be described by an explicit and accurate model. But in scenarios where processes are to a large extent unclear and/or unstructured, process modeling cannot be completed prior to execution (due to lack of domain knowledge a priori or to the complexity of task combinations). Hence the classical axiom “first model, then execute” – valid for the enactment of structured processes – fails. As processes are executed and knowledge is acquired via experience, it is needed to go back to the process definitions and correct them according to work practices. This is the case of unstructured processes with predefined fragments, where processes cannot be anticipated, and thus cannot be studied or modeled as a whole. Instead, what can be done is to identify and study a set of individual activities, and then try to understand the ways in which these activities can precede or follow each other. At the end of the classification lies the category of unstructured processes, where it is impossible to define a priori the exact steps to be taken in order to complete an assignment. Since there is no pre-defined view of the process, process steps are discovered as the process scenario unfolds, and might involve decisions not based on some “codified policy”, but on the user expertise applied on the scenario at hand. The class of knowledge-intensive processes is transversal with respect to the classification proposed in [29]. In the literature, different definitions have been proposed about what does “knowledge-intensive” mean for a business process. In [24] a process is defined as knowledge intensive if its value can only be created through the fulfillment of the knowledge requirements of the process participants, while Davenport recognizes the knowledge intensity by the diversity and uncertainty of process input and output [11]. In our view, a knowledge-intensive process is characterized by activities that can not be planned easily, may change on the fly and are driven by the contextual scenario that the process is embedded in. The scenario dictates who should be involved and who is the right person to execute a particular step, and the set of users involved may be not formally defined and be discovered as the process scenario unfolds. Collaborative interactions among the users typically is a major part of such processes, and new process steps might have to be defined at run time on the basis of contextual changes. Despite the popularity of commercial PMSs, there is still a lack of maturity in managing such processes, i.e., a lack of a semantic associated to the models or an easy way to reason about that semantic. In this paper, starting from three different real application scenarios, we present a critical and comparative analysis of the existing approaches used for supporting knowledge-intensive processes, and we discuss some recent research techniques which may complement or extend the existing state of the art. The rest of the paper is organized as follows. Section 2 discusses the role of knowledge- Knowledge-intensive Processes: An Overview of Contemporary Approaches 35 intensive processes in the health-care domain, mainly focusing on how different modeling approaches can contribute to the process representation and execution. Section 3 discusses the use of knowledge-intensive processes for supporting the work in highly dynamic scenarios, by focusing on the challenging aspect of process adaptation. Section 4 traces the evolution of process mining, from the beginnings up to the current open challenge of discovering flexible models for knowledge-intensive partially structured processes, along with the graphical models proposed for presenting them to the user. Finally, Section 5 concludes the paper. 2 Modeling Approaches for Healthcare Processes Healthcare is widely recognized as one of the most promising, yet challenging, domains for the adoption of process-oriented solutions able to support both organizational and clinical processes [10,31,46,30]. Organizational processes, which also include administrative tasks (patient admission/discharge, appointment scheduling, etc.), are typically structured, stable and repetitive, and represent the ideal setting for the application of traditional approaches for process automation and improvement. On the other side, the knowledge-intensive nature and flexibility requirements of medical treatment processes [3,37] pose challenges that existing process management approaches are not able to adequately handle. Although BPM solutions can potentially support these processes, in practice their uptake in healthcare is limited, mainly due to a generally perceived lack of flexibility [30]. Clinical decision making is highly knowledge-driven, as it depends on medical knowledge and evidence, on case- and patient-specific data, and on clinicians’ expertise and experience. Patient case management is mainly the result of knowledge work, where clinicians act in response to relevant events and changes in the clinical context on a per-case basis, according to so-called diagnostictherapeutic cycles based on the interleaving between observation, reasoning and action [31]. Clinical practices can not be captured by process models that require a complete specification of activities and their control/data flow, with the risk of constraining the clinicians and undermining the acceptance of proposed tools. Despite these characteristics, in the last years the medical community has introduced Clinical Guidelines (CGs), in an attempt to improve care quality and reduce costs. CGs are “systematically developed statements to assist practitioner and patient decisions about appropriate health care for specific clinical circumstances”[21] and act as blueprints that guide the care delivery process and provide evidence-based recommendations. Consequently, many research groups have focused on computer-interpretable clinical guidelines (CIGs) and different languages have been proposed [49,42,61], which can be broadly classified as rule-based (e.g., Arden Syntax), logic-based (e.g., PROforma), network-based (e.g., EON) and workflow-based (e.g., Guide). Most of them follow a task-based paradigm where modeling primitives for representing actions, decisions and patient states are linked via scheduling and temporal constraints, often in a rigid flowchart-like structure, and many representation models are supported by sys- 36 C. Di Ciccio et al. tems that allow the definition and enactment of CGs [27]. This rapid evolution in medical informatics has occurred mainly independently of the advances in the BPM community. However, the recent shift in the BPM domain towards process flexibility, adaptation (see Section 3) and evolution [47,30] has led to reconsider the link with CIGs and investigate the benefits coming from the application of process-oriented approaches in the healthcare domain [36]. On the one side, pattern-based analyses of CIG languages have shown that the expressiveness of these models, although specifically developed for the medical domain, is comparable with (or even lower than) the expressiveness of process modeling languages [39]. On the other side, emerging declarative constraint-based approaches [40,32] have been investigated as a possible solution to achieve a high degree of flexibility, taking advantage of loosely specified process models. In this direction, the combination of procedural and declarative models is under investigation, in order to support healthcare processes with different degrees of structuredness. After more than a decade of research activities, researchers and practitioners agree on three main points: (i) clinical procedures, based on semi-structured and unstructured decision making, can not be completely specified in advance nor fully automated; (ii) deviations and variations during the care process (as well as uncertainty and changes in the clinical context) represent the rule rather than the exception; (iii) process- and activity-centric models can not adequately represent and support clinical case management. One of the main limitations of existing approaches is that they often underestimate the knowledge and data dimension. As patient treatment is knowledge-driven, the focus should be not on automating the decision making process, but rather on supporting the clinician during this process, according to a “system suggests, user controls” approach [62] that makes available the appropriate data and relevant knowledge when needed or required. Any system intended to support CGs should allow for representing and integrating at a semantic level evolving medical knowledge, patient-related data (including conditions, medical history, prescribed treatments and medications, etc.), and the existing (sometimes unpredictable) interactions between patient conditions, treatments and medications. This focus on data and knowledge is producing a shift from a process management approach to a more flexible case management approach, well understood by clinicians (although mostly in the form of paper-based processes) but only partially investigated in the BPM area [60]. Process support requires object-awareness in the form of a full integration of processes with patient data models consisting of object types and object relations [30,5]. Domain-relevant objects (such as medical orders, clinical and lab reports, etc.), their attributes and their possible states need to be explicitly represented, along with their inter-relations, so as to define a rich information model. This data model enables the identification and definition of the activities that rely on the object-related information and act on it, producing changes on attribute values, relations and object states. As a result, a tight integration between data objects and process activities can be achieved. As object-awareness requires a data-driven process modeling and execution approach, based on ob- Knowledge-intensive Processes: An Overview of Contemporary Approaches 37 ject behavior and object interactions, process/activity-centric methodologies are being replaced by data-centric models evolving over time [7]. In the context of a CG, patient’s clinical situation (referred to as patient state, scenario, or context [49]) is central and represent the shared knowledge that drives the decision making and evolves as a result of performed actions, made decisions and collected data. Conditions defined over patient state, along with temporal constraints, are typically used as entry/exit points for a guideline [61] and as eligibility criteria for specific actions [49]. During the collaboration-based patient management activities, clinicians have to react to internal (e.g., a change in patient’s state) and external (e.g., availability of lab test results) events, that can occur in any sequence. Moreover, it is often not possible to predetermine which activities have to be executed and in which order when an event occurs: according to the diagnostic-therapeutic cycles mentioned before, the clinician first assesses and evaluate the situation and then acts or plans the actions to be performed. This suggests an interleaving and overlapping of modeling and execution, where the process is “created at the time it is executed”. Any modeling and execution approach for supporting this view has to consider that the clinician should be guided by what can be done and not restricted by what has to be done [35]. Although the path to be followed can be initially unclear and is gradually determined by clinician decisions, the care process evolves through a series of intermediate goals or milestones to be achieved (e.g., bring a parameter back to a normal level) that can again be expressed as conditions or constraints over patient state. Given the above scenario, a promising and emerging approach for modeling CGs and supporting their execution and management is the artifact-centric paradigm, which considers data and knowledge as an integral part of business processes [51]. It is based on the concept of business artifacts as an abstraction for business-relevant entities and data that evolve according to a lifecycle and drive the activities in a business setting. Activities are defined in the context of interrelated artifacts and become enabled as the result of triggering events (internal or external) constrained by conditions defined and evaluated over the artifacts. Events and conditions over artifacts can also be used to set specific goals and evaluate the progress towards their achievement. The scheduling of actions is thus event- and data-driven, rather than induced by direct control flow dependencies. Under this perspective, it emerges a clear correspondence between artifact-centric concepts and clinical case management, in particular if considering the Guard-Stage-Milestone (GSM) meta-model [51] as a representative example of the artifact-based paradigm. GSM builds on the concepts of information model and lifecycle model, where the latter includes milestones to be achieved, hierarchically organized stages as clusters of possible activities to be performed to achieve milestones, and guards, timed events and conditions that control the stages and determine milestones’ achievement. The patient and his/her state, a diagnostic test, a treatment course can all be considered as artifact types and represented by an information model that evolves according to a lifecycle and captures all relevant data and relations (e.g., as a relational 38 C. Di Ciccio et al. model or domain ontology). CGs could be seen as progressing through a set of stages, where each performed action, made decision or event occurrence is driven by (eligibility criteria mentioned before) and has an impact on patient state, as reflected in the underlying information model. The data-driven nature of the model facilitates the integration between process control knowledge and the patient-related and medical knowledge; in addition, the distinction between data attributes and status attributes can directly support an integrated and explicit representation of both patient and execution states, not provided by all CIG models [61,49]. Although artifact-centric models can open the way for a new generation of flexible and adaptive case management systems in healthcare, further investigation is needed to understand the contribution that these models can bring in solving well-known problems for CIGs; among them: (i) how to reconcile the decision-action nature of CGs with a declarative modeling approach than can be used and understood by clinicians and is able to represent the evidence-based knowledge contained in the CGs; (ii) how to define an information model that is able to capture all clinically relevant data and takes into account existing standards, models, and ontologies used in Electronic Medical Records (EMRs) for patient and medical data; (iii) to what extent clinical events and medical knowledge can be represented and encoded by rules and conditions; (iv) how can an artifact-centric model address the problems of guideline acquisition, verification, testing, tracing and evolution, and how to turn or customize abstract models in executable models that take into account additional information, such as resource availability, roles and local services, in a collaborative multi-user environment. 3 Process Adaptation in Highly Dynamic Scenarios A recent open research question in the BPM field concerns how to tackle scenarios characterized by being very dynamic and subject to higher frequency of unexpected contingencies than classical scenarios, e.g., scenarios for emergency management. There, a PMS can be used to coordinate the activities of first responders on the field (e.g., reach a location, evacuate people from collapsed buildings, extinguish a fire, etc.). The use of processes for supporting the work in highly dynamic contexts has become a reality, thanks also to the growing use of mobile devices in everyday life, which offer a simple way for picking up and executing tasks. These kinds of processes are also named dynamic processes. A dynamic process usually includes a wide range of knowledge-intensive tasks; as the process proceeds, the sequence of tasks depends so much upon the specifics of the context (for example, which resources are available and what particular options exist at that time), and often it is unpredictable the way in how it unfolds. This is due to the high number of tasks to be represented and to their unpredictable nature, or to a difficulty to model the whole knowledge of the domain of interest at design time. If we refer again to the classification shown in [29], dynamic processes can be classified between structured processes with ad hoc exceptions and unstructured processes with predefined fragments. Knowledge-intensive Processes: An Overview of Contemporary Approaches 39 Research efforts in this field try to enhance the ability of dynamic processes and their support environments to modify their behavior in order to deal with contextual changes and exceptions that may occur in the operating environment during process enactment and execution. On the one hand, existing PMSs like YAWL [50] provide the support for the handling of expected exceptions. The process schemas are designed in order to cope with potential exceptions, i.e., for each kind of exception that is envisioned to occur, a specific contingency process (a.k.a. exception handler or compensation flow) is defined. On the other hand, adaptive PMSs like ADEPT2 [65] support the handling of unanticipated exceptions, by enabling different kinds of ad-hoc deviations from the pre-modeled process instance at run-time, according to the structural process change patterns defined in [64]. However, traditional approaches that try to anticipate how the work will happen by solving each problem at design time, as well as approaches that allow to manually change the process structure at run time, are often ineffective or not applicable in rapidly evolving contexts. The design-time specification of all possible compensation actions requires an extensive manual effort for the process designer, that has to anticipate all potential problems and ways to overcome them in advance, in an attempt to deal with the unpredictable nature of this kind of processes. Moreover, the designer often lacks the needed knowledge to model all the possible contingencies, or this knowledge can become obsolete as process instances are executed and evolve, by making useless his/her initial effort. In general, for a dynamic process there is not a clear, anticipated correlation between a change in the context and corresponding process changes, since the process may be different every time it runs and the recovery procedure strictly depends on the actual contextual information. For the same reason, it is also difficult to manually define an ad-hoc recovery procedure at run-time, as the correctness of the process execution is highly constrained by the values (or combination of values) of contextual data. Dealing with dynamic processes require that PMSs provide intelligent failure handling mechanisms that, starting from the original process model, are able to adapt process instances without explicitly defining at design time all the handlers/policies to recover from exceptions and without the intervention of domain experts. Recently, some techniques from the field of artificial intelligence (AI) have been applied to process management, with the purpose of improving the degree of automatic adaptation of dynamic processes. In [23], the authors present a concept for dynamic and automated workflow re-planning that allows recovering from task failures. To handle the situation of a partially executed workflow, a multistep procedure is proposed that includes the termination of failed activities, the sound suspension of the workflow, the generation of a new complete process definition and the adequate process resumption. In [28], the authors take a much broader view of the problem of adaptive workflow systems, and show that there is a strong mapping between the requirements of such systems and the capabilities offered by AI techniques. In particular, the work describes how planning can be interleaved with process execution and plan refinement, and investigates plan 40 C. Di Ciccio et al. patching and plan repair as means to enhance flexibility and responsiveness. A new life cycle for workflow management based on the continuous interplay between learning and planning is proposed in [20]. The approach is based on learning business activities as planning operators and feeding them to a planner that generates the process model. The main result is that it is possible to produce fully accurate process models even though the activities (i.e., the operators) may not be accurately described. The approach presented in [45] highlights the improvements that a legacy workflow application can gain by incorporating planning techniques into its day-to-day operation. The use of contingency planning to deal with uncertainty (instead of replanning) increases system flexibility, but it does suffer from a number of problems. Specifically, contingency planning is often highly time-consuming and does not guarantee a correct execution under all possible circumstances. Planning techniques are also used in [22] to define a self-healing approach for handling exceptions in service-based processes and repairing faulty activities with a model-based approach. During the process execution, when an exception occurs, a new repair plan is generated by taking into account constraints posed by the process structure and by applying or deleting actions taken from a given generic repair plan, defined manually at design time. An interesting approach for dealing with exceptional changes has been proposed in [13,34]. Here, it is presented SmartPM (Smart Process Management), a model and a proof-of-concept PMS featuring a set of techniques providing support for automatic adaptation of processes. In SmartPM, a process model is defined as a set of n task definitions, where each task ti can be considered as a single step that consumes input data and produces output data. Data are represented through some process variables whose definition depends strictly on the specific process domain of interest. The model allows to define logical constraints based on process variables through a set F of predicates fj . Such predicates can be used to constrain the task assignment (in terms of task preconditions), to assess the outcome of a task (in terms of task effects) and as guards into the expressions at decision points (e.g., for cycles or conditional statements). Choosing the predicates that are used to describe each activity falls into the general problem of knowledge representation. To this end, the environment, services and tasks are grounded in domain theories described in Situation Calculus [48]. Situation Calculus is specifically designed for representing dynamically changing worlds in which all changes are the result of the tasks’ execution. Processes are represented as IndiGolog programs. IndiGolog [12] allows for the definition of programs with cycles, concurrency, conditional branching and interrupts that rely on program steps that are actions of some domain theory expressed in Situation Calculus. The dynamic world of SmartPM is modeled as progressing through a series of situations. Each situation is the result of various tasks being performed so far. Predicates may be thought of as “properties” of the world whose values may vary across situations. SmartPM provides mechanisms for adapting process schemas that require no pre-defined handlers. Specifically, adaptation in SmartPM can be seen as reducing the gap between the expected reality, the (idealized) model of reality that is used by the PMS to reason, and the physical reality, the real Knowledge-intensive Processes: An Overview of Contemporary Approaches 41 world with the actual values of conditions and outcomes. The physical reality Φs reflects the concept of “now”, i.e., what is happening in the real environment whilst the process is under execution. In general, a task ti can only be performed in a given physical reality Φs if and only if that reality satisfies the preconditions P rei of that task. Moreover, each task has also a set of effects Ef fi that change the current physical reality Φs into a new physical reality Φs+1 . At execution time, the process can be easily invalidated because of task failures or since the environment may change due to some external event. For this purpose, the concept of expected reality Ψs is given. A recovery procedure is needed if the two realities are different from each other. An execution monitor is responsible for detecting whether the gap between the expected and physical realities is such that the original process δ0 cannot progress its execution. In that case, the PMS has to find a recovery process δh that repairs δ0 and removes the gap between the two kinds of reality. Currently, the adaptation algorithm deployed in SmartPM synthesizes a linear process δh (i.e., a process consisting of a sequence of tasks) and inserts it at a given point of the original process - specifically, that point of the process where the deviation was first noted. This means that such technique is able to automatically recover from exceptions without defining explicitly any recovery policy. 4 Mining Process Mining [54], also referred to as Workflow Mining [53], is the set of techniques that allow the extraction of process descriptions, stemming from a set of recorded executions. Throughout this Section, we will investigate the techniques adopted, along with the notations used to display the results, i.e., the mined processes. To date, ProM [55] is one of the most used plug-in based software environment for implementing workflow mining techniques. The idea to apply process mining in the context of workflow management systems was introduced in [1]. There, processes were modelled as directed graphs where vertices represented individual activities and edges stood for dependencies between them. Cook and Wolf, at the same time, investigated similar issues in the context of software engineering processes. In [8] they described three methods for process discovery: (i) neural network-based, (ii) purely algorithmic, (iii) adopting a Markovian approach. The authors considered the latter two as the most promising. Although, the results presented in [8] were limited to sequential behavior only. The nowadays mainstream process mining algorithms and management tools model processes with a graphical syntax derived from a subset of Petri Nets, i.e., Workflow Nets (WfN [53]), explicitly designed to represent the control-flow dimension of a workflow. See [41] for a history of Petri nets and an extensive bibliography. From [1] onwards many techniques have been proposed, in order to address specific issues: pure algorithmic (e.g., α algorithm [59] and its evolution α++ [67]), heuristic (e.g., [66]), genetic (e.g., [38]). Heuristic and genetic algorithms were introduced to cope with noise, that the pure algorithmic techniques were not able to manage. Whereas algorithmic processes rely on footprints of 42 C. Di Ciccio et al. traces (i.e., tables reporting whether events appeared before or afterwards, if decidable) to determine the workflow net that could have generated them, heuristic approaches build a representation similar to causal nets, taking frequencies of events and sequences into account when constructing the process model, in order to ignore infrequent paths. Genetic process mining adopts an evolutionary approach to the discovery and differs from the other two in that its computation evolves in a non-deterministic way: the final output, indeed, is the result of a simulation of a process of natural selection and evolutionary reproduction of the procedures used to determine the final outcome. A very smart extension to the previous research was achieved by the two-steps algorithm proposed in [52]. Differently from previous works, in which the proposed approaches provide a single process mining step, it splitted the computation in two phases: the first built a Transition System that represents the process behavior and the tasks causal dependencies; the second made use of the state-based “theory of regions” [9,15] to construct a Petri Net bisimilar to the Transition System. The first phase was made “tunable”, so that it could be either more strictly adhering or more permissive to the analyzed log traces behavior, i.e., the expert could determine a balance between “overfitting” and “underfitting”. Indeed, past execution traces are not the whole universe of possible ones that may run: hence, the extracted process model should be valid for future unpredictable cases, on one hand, nevertheless checking whether the latter actually adhere to the common behavior, on the other hand. This issue reveals to be particularly relevant in the field of knowledge-intensive processes. To date, the majority of research relating to processes coped with structured business processes. [26] discusses about a particular class of knowledge-intensive processes, named “artful business processes”; they are typically carried out by those people whose work is mental rather than physical (managers, professors, researchers, etc.), the so called “knowledge workers” ([63]). With their skills, experience and knowledge, they are used to perform difficult tasks which require complex, rapid decisions among multiple possible strategies, in order to fulfill specific goals. In contrast to business processes that are formal and standardized, informal processes are not even written down, often, let alone defined formally, and can vary from person to person even when those involved are pursuing the same objective. Knowledge workers create informal processes “on the fly” to cope with many of the situations which arise in their daily work. While informal processes are frequently repeated, because they are not written down, they are not exactly reproducible, even by their originators, nor can they be easily shared. [63] described the “ACTIVE” EU collaborative project, coordinated by British Telecom. Such project addressed the need for greater knowledge worker productivity by providing more effective and efficient tools. Among the main objectives, it aimed at helping users to share and reuse informal processes, even by learning those processes from the user’s behavior. Basing on the work of [6] and [56], [19] investigated the challenge of mining these processes out of semi-structured texts, i.e., the email conversations exchanged among knowledge workers, through the interplay of text mining, object matching and process mining techniques. It Knowledge-intensive Processes: An Overview of Contemporary Approaches 43 provided an architectural overview of the application (named MailOfMine) able to fulfill the objective. The need for flexibility in the definition of some types of process, such as artful business processes, leads to an alternative to the classical “imperative” approach: the “declarative”. Rather than using a procedural language for expressing the allowed sequences of activities, it is based on the description of workflows through the usage of constraints: the idea is that every task can be performed, except what does not respect them. [58] showed how the declarative approach can help in obtaining a fair trade-off between flexibility in managing collaborative processes and support in controlling and assisting the enactment of workflows. DecSerFlow [57] and ConDec [43], now under the name of Declare [44], define such constraints as formulations in Linear Temporal Logic. [33] outlines an algorithm for mining Declare processes, integrated in ProM (namely, Declare Miner). The tool is based on the translation of Declare constraints into automata, and works in conjunction with the optimization techniques described in [68]. [4] describes the usage of inductive logic programming techniques to mine models expressed as a SCIFF theory. SCIFF theory is thus translated into the ConDec notation [43]. [2] differs from both [4] and [33] in that it does not directly verify the candidate constraints over the whole set of traces in input. It prepares an ad-hoc knowledge base of its own, instead, which specific queries are further submitted to. The model is determined on the base of the result of such queries. MINERful, proposed in [18], exploits this two-steps technique too, in order to improve the efficiency of the mining procedure. [17] proves the complexity of the algorithm to be polynomial w.r.t. the size of both the alphabet of constraints and the input traces. Differently from [33], [4] and [2], it is independent of the formalism adopted for representing constraints. Declare provides a graphical model for representing declarative processes, useful to depict the constraints that hold between activities as a graph where nodes are activities and arcs are constraints among them. [25] and [16] presented a different approach to the graphical modelling. The former describes an eventbased model, namely DCRGraph, showing the current state of the workflow at run-time, through the listing of tasks that can (either optionally or mandatorily) or can not be executed at the moment. A section describing the mapping of that notation to Büchi Automata is provided as well. The latter provides multiple graphical syntaxes, respectively depicting the process from two viewpoints: (i) global, i.e., focused on the representation of constraints between tasks, represented all together in a single graph and (ii) local i.e., focused instead on the constraints directly related to one single activity at a time. The first is then divided into a base and an extended version, in order to respectively depict less or more details about the nature of constraints that hold in the process – following the so called “map metaphor” [14]. The second is also twofold. The static view shows the constraints affecting an activity, which is put on the origin of a cartesian-like diagram. There, the implication and the temporal succession are aligned on orthogonal axes. The tasks involved in constraints related to the activity under analysis are put on different coordinates accordingly. In the dynamic 44 C. Di Ciccio et al. view, the graph evolves as new tasks are executed. Starting from the initial, the enacted task is chained down to the previous. On the basis of the execution trace, the consequent next tasks are shown below the chain, in compliance with the constraints that hold at the moment. 5 Conclusions In this work, we provided a critical and comparative analysis of the existing approaches used for supporting knowledge-intensive processes, and we showed some recent research techniques that may complement or extend the existing state of the art to this end. In the health care domain, several challenges still need to be addressed and an interdisciplinary research effort is required. In this direction, the existing gap between the general evidence-based knowledge contained in CGs and the knowledge and information required to apply them to specific patients in local healthcare organizational contexts needs further investigation. Similarly, modeling approaches should allow to capture all “knowledge layers” and their possible interactions, including the procedural knowledge contained in CGs, the declarative knowledge representing domain- or site-specific constraints and properties, and clinicians’ basic medical knowledge. In highly dynamic environments, commercial PMSs are not able to deal with knowledge-intensive processes sufficiently, due to the static and only implicitly defined meta models of those systems. Basically, a dynamic process is largely dependent on the scenario at hand, and the result of process modeling is often a static plan of actions, which is difficult to adapt to changing procedures or to different business goals. In order to devise intelligent failure handling mechanisms for dynamic processes there is the need to define enriched workflow models, possibly with a declarative specification of process tasks, i.e., comprising the specification of input/output artefacts and task preconditions and effects. In general, the use of AI techniques for adapting dynamic processes seems very promising. In the area of process mining, the declarative model proves to be very effective in allowing flexibility required by knowledge-intensive processes. Although, it has to be verified with people involved in those processes. E.g., the graphical notation proposed in [16] has to be implemented and its readability tested with real actors of those processes. A graphical notation representing the level of severity of a constraint in the process still misses. In the area of declarative workflow mining, it might be useful to determine the tightness of the discovered constraints on the basis of the frequency with which a constraint did not hold in the past. Moreover, a study on the impact of noise in such analysis could be done. References 1. Agrawal, R., Gunopulos, D., Leymann, F.: Mining process models from workflow logs. In: EDBT’98 (1998) Knowledge-intensive Processes: An Overview of Contemporary Approaches 45 2. Alberti, M., Chesani, F., Gavanelli, M., Lamma, E., Mello, P., Torroni, P.: Verifiable agent interaction in abductive logic programming: The sciff framework. ACM Trans. Comput. Log. 9(4) (2008) 3. Ammon, D., Hoffmann, D., Jakob, T., Finkeissen, E., Detschew, V., Wetter, T.: Management of Knowledge-Intensive Healthcare Processes on the Example of General Medical Documentation. In: BPM Workshops (2008) 4. Chesani, F., Lamma, E., Mello, P., Montali, M., Riguzzi, F., Storari, S.: Exploiting inductive logic programming techniques for declarative process mining. T. Petri Nets and Other Models of Concurrency 2, 278–295 (2009) 5. Chiao, C.M., Künzle, V., Reichert, M.: Towards Object-aware Process Support in Healthcare Information Systems. In: eTELEMED 2012 (2012) 6. Cohen, W.W., Carvalho, V.R., Mitchell, T.M.: Learning to classify email into “speech acts”. In: EMNLP. pp. 309–316. ACL (2004) 7. Combi, C., Gambini, M., Migliorini, S., Posenato, R.: Modelling temporal, datacentric medical processes. In: ACM SIGHIT IHI 2012 (2012) 8. Cook, J.E., Wolf, A.L.: Discovering models of software processes from event-based data. ACM Trans. Softw. Eng. Methodol. 7(3), 215–249 (1998) 9. Cortadella, J., Kishinevsky, M., Lavagno, L., Yakovlev, A.: Deriving petri nets from finite transition systems. IEEE Trans. on Computers 47(8), 859 –882 (1998) 10. Dadam, P., Reichert, M., Kuhn, K.: Clinical Workflows - The Killer Application for Process-oriented Information Systems? In: BIS’00 (2000) 11. Davenport, T.H.: Improving knowledge work processes. In: Sloan Management Review, vol. 37 (1996) 12. De Giacomo, G., Lespérance, Y., Levesque, H., Sardina, S.: IndiGolog: A HighLevel Programming Language for Embedded Reasoning Agents. In: Multi-Agent Prog.: Languages, Platforms and Applications (2009) 13. de Leoni, M., Marrella, A., Mecella, M., Sardina, S.: SmartPM – Featuring Automatic Adaptation to Unplanned Exceptions. Tech. rep., Sapienza Università di Roma (2011), http://ojs.uniroma1.it/index.php/DIS_TechnicalReports/ article/view/9221/9141 14. de Leoni, M., van der Aalst, W.M.P., ter Hofstede, A.H.M.: Visual support for work assignment in process-aware information systems. In: BPM’08 (2008) 15. Desel, J., Reisig, W.: The synthesis problem of petri nets. Acta Informatica 33, 297–315 (1996) 16. Di Ciccio, C., Catarci, T., Mecella, M.: Representing and visualizing mined artful processes in MailOfMine. In: HCI-KDD (2011) 17. Di Ciccio, C., Mecella, M.: MINERful, a mining algorithm for declarative process constraints in MailOfMine. Tech. rep., Sapienza Università di Roma (2012), http: //ojs.uniroma1.it/index.php/DIS_TechnicalReports/issue/view/416 18. Di Ciccio, C., Mecella, M.: Mining constraints for artful processes. In: BIS’12 (2012) 19. Di Ciccio, C., Mecella, M., Scannapieco, M., Zardetto, D., Catarci, T.: MailOfMine - analyzing mail messages for mining artful collaborative processes. In: SIMPDA’11 (2011) 20. Ferreira, H., Ferreira, D.: An Integrated Life Cycle for Workflow Management Based on Learning and Planning. Int. J. Coop. Inf. Syst. 15 (2006) 21. Field, M.J., Lohr, K.N.: Clinical Practice Guidelines: Directions for a New Program. Institute of Medicine, Washington, DC (1990) 22. Friedrich, G., Fugini, M., Mussi, E., Pernici, B., Tagni, G.: Exception Handling for Repair in Service-Based Processes. IEEE Trans. on Soft. Eng. 36 (2010) 23. Gajewski, M., Meyer, H., Momotko, M., Schuschel, H., Weske, M.: Dynamic Failure Recovery of Generated Workflows. In: DEXA’05 (2005) 46 C. Di Ciccio et al. 24. Gronau, N., Weber, E.: Management of knowledge intensive business processes. In: BPM’04 (2004) 25. Hildebrandt, T.T., Mukkamala, R.R.: Declarative event-based workflow as distributed dynamic condition response graphs. In: PLACES’10 (2010) 26. Hill, C., Yates, R., Jones, C., Kogan, S.L.: Beyond predictable workflows: Enhancing productivity in artful business processes. IBM Syst. J. 45(4), 663–682 (2006) 27. Isern, D., Moreno, A.: Computer-based execution of clinical guidelines: a review. Int. J. of Medical Informatics 77(12) (2008) 28. Jarvis, P., Moore, J., Stader, J., Macintosh, A., du Mont, A.C., Chung, P.: Exploiting AI Technologies to Realise Adaptive Workflow Systems. AAAI Workshop on Agent-Based Systems in the Business Context (1999) 29. Kemsley, S.: The Changing Nature of Work: From Structured to Unstructured, from Controlled to Social. In: BPM’11 (2011) 30. Lenz, R., Peleg, M., Reichert, M.: Healthcare Process Support: Achievements, Challenges, Current Research. IJKBO (2012) 31. Lenz, R., Reichert, M.: IT support for healthcare processes - Premises, challenges, perspectives. Data & Know. Eng. 61(1) (2007) 32. Lyng, K.M., Hildebrandt, T.T., Mukkamala, R.R.: From Paper Based Clinical Practice Guidelines to Declarative Workflow Management. In: BPM (2008) 33. Maggi, F.M., Mooij, A.J., van der Aalst, W.M.P.: User-guided discovery of declarative process models. In: CIDM. pp. 192–199. IEEE (2011) 34. Marrella, A., Mecella, M.: Continuous Planning for Solving Business Process Adaptivity. In: BPMDS’11 (2011) 35. de Man, H.: Case Management: A Review of Modeling Approaches. BPTrends, www.bptrends.com (2009) 36. Mans, R.S., van der Aalst, W.M.P., Russell, N.C., Bakker, P.J.M., Moleman, A.J.: Process-Aware Information System Development for the Healthcare Domain - Consistency, Reliability, and Effectiveness. In: BPM Workshops (2009) 37. Marjanovic, O.: Improving Knowledge-Intensive Health Care Processes beyond Efficiency. In: ICIS’11 (2011) 38. Medeiros, A.K., Weijters, A.J., Aalst, W.M.: Genetic process mining: an experimental evaluation. Data Min. Knowl. Discov. 14(2), 245–304 (2007) 39. Mulyar, N., van der Aalst, W.M., Peleg, M.: A Pattern-based Analysis of Clinical Computer-interpretable Guideline Modeling Languages. JAMIA 14(6) (2007) 40. Mulyar, N., Pesic, M., Van Der Aalst, W.M.P., Peleg, M.: Declarative and procedural approaches for modelling clinical guidelines: addressing flexibility issues. In: BPM’07 (2007) 41. Murata, T.: Petri nets: Properties, analysis and applications. Proceedings of the IEEE 77(4), 541 –580 (1989) 42. Peleg, M.e.a.: Comparing Computer-Interpretable Guideline Models: A Case-Study Approach. JAMIA 10(1) (2003) 43. Pesic, M., van der Aalst, W.M.P.: A declarative approach for flexible business processes management. In: BPM Workshops (2006) 44. Pesic, M., Schonenberg, H., van der Aalst, W.M.P.: Declare: Full support for loosely-structured processes. In: EDOC. pp. 287–300 (2007) 45. R-Moreno, M.D., Borrajo, D., Cesta, A., Oddi, A.: Integrating planning and scheduling in workflow domains. Expert Syst. with App. 33(2) (2007) 46. Reichert, M.: What BPM technology can do for healthcare process support. In: AIME’11 (2011) 47. Reichert, M., Rinderle-Ma, S., Dadam, P.: Flexibility in Process-Aware Information Systems. In: Trans. on Petri Nets and Other Models of Concurrency II (2009) Knowledge-intensive Processes: An Overview of Contemporary Approaches 47 48. Reiter, R.: Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems. The MIT Press (2001) 49. Sonnenberg, F.A., Hagerty, C.G.: Computer-Interpretable Clinical Practice Guidelines. Where are we and where are we going? Yearbook of Medical Inf. 45 (2006) 50. ter Hofstede, A., van der Aalst, W., Adams, M., Russell, N.: Modern Business Process Automation: YAWL and its Support Environment. Springer (2009) 51. Vaculin, R., Hull, R., Heath, T., Cochran, C., Nigam, A., Sukaviriya, P.: Declarative business artifact centric modeling of decision and knowledge intensive business processes. In: EDOC ’11 (2011) 52. van der Aalst, W.M.P., Rubin, V., Verbeek, H., van Dongen, B., Kindler, E., Günther, C.: Process mining: a two-step approach to balance between underfitting and overfitting. Software and Systems Modeling 9, 87–111 (2010) 53. van der Aalst, W.M.P.: The application of petri nets to workflow management. Journal of Circuits, Systems, and Computers 8(1), 21–66 (1998) 54. van der Aalst, W.M.P.: Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer (2011) 55. van der Aalst, W.M.P., van Dongen, B.F., Günther, C.W., Rozinat, A., Verbeek, E., Weijters, T.: Prom: The process mining toolkit. In: BPM’09 (Demos) (2009) 56. van der Aalst, W.M.P., Nikolov, A.: Mining e-mail messages: Uncovering interaction patterns and processes using e-mail logs. IJIIT 4(3), 27–45 (2008) 57. van der Aalst, W.M.P., Pesic, M.: DecSerFlow: Towards a truly declarative service flow language. In: WS-FM. LNCS, vol. 4184, pp. 1–23. Springer (2006) 58. van der Aalst, W.M.P., Pesic, M., Schonenberg, H.: Declarative workflows: Balancing between flexibility and support. Comp. Sc. - R&D 23(2), 99–113 (2009) 59. van der Aalst, W.M.P., Weijters, T., Maruster, L.: Workflow mining: Discovering process models from event logs. IEEE Trans. K. D. Eng. 16(9), 1128–1142 (2004) 60. van der Aalst, W.M.P., Weske, M.: Case handling: a new paradigm for business process support. Data & Know. Eng. 53(2) (2005) 61. Wang, D., Peleg, M., Tu, S., Boxwala, A., Greenes, R., Patel, V., Shortliffe, E.: Representation Primitives, Process Models and Patient Data in ComputerInterpretable Clinical Practice Guidelines: A Literature Review of Guideline Representation Models. Int. J. of Medical Informatics 68 (2002) 62. Wang, D., Peleg, M., Tu, S.W., Boxwala, A.A., Ogunyemi, O., Zeng, Q., Greenes, R.A., Patel, V.L., Shortliffe, E.H.: Design and implementation of the GLIF3 guideline execution engine. J. of Biomedical Informatics 37(5) (2004) 63. Warren, P., Kings, N., Thurlow, I., Davies, J., Buerger, T., Simperl, E., Ruiz, C., Gomez-Perez, J.M., Ermolayev, V., Ghani, R., Tilly, M., Bösser, T., Imtiaz, A.: Improving knowledge worker productivity - the Active integrated approach. BT Technology Journal 26(2), 165–176 (2009) 64. Weber, B., Reichert, M., Rinderle-Ma, S.: Change Patterns and Change Support Features - Enhancing Flexibility in Process-aware Information Systems. Data Knowl. Eng. 66 (2008) 65. Weber, B., Wild, W., Lauer, M., Reichert, M.: Improving Exception Handling by Discovering Change Dependencies in Adaptive Process Management Systems. In: BPI’06 (2006) 66. Weijters, A., van der Aalst, W.: Rediscovering workflow models from event-based data using little thumb. Integrated Computer-Aided Engineering 10, 2003 (2001) 67. Wen, L., van der Aalst, W.M.P., Wang, J., Sun, J.: Mining process models with non-free-choice constructs. Data Min. Knowl. Discov. 15(2), 145–180 (2007) 68. Westergaard, M.: Better algorithms for analyzing and enacting declarative workflow languages using LTL. In: BPM’11 (2011) Business Processes Verification with Temporal Answer Set Programming ⋆ L. Giordano1 , A. Martelli2 , M. Spiotta1 , and D. Theseider Dupré1 1 Dipartimento di Informatica, Università del Piemonte Orientale 2 Dipartimento di Informatica, Università di Torino Abstract. The paper provides a framework for the specification and verification of business processes, based on a temporal extension of answer set programming (ASP). The framework allows to capture fluent annotations as well as data awareness in a uniform way. It allows for a declarative specification of business process but also for a direct encoding of processes specified in conventional workflow languages. Verification of temporal properties of a business process, including verification of compliance to business rules, can be performed by LTL bounded model checking techniques. 1 Introduction The verification of business process compliance to business rules and regulations has gained a lot of interest in recent years and it has led to the development to a process annotation approach [12, 18, 33, 23], where a business processes is enriched with information relevant for compliance verification, to capture the semantics of atomic tasks execution through preconditions and effects. The treatment of data in business process verification, on the other hand, has attracted growing interest in the last decade, with the definition of artifact-centric and data-centric process models [27, 5, 9]. In this paper we combine the two perspectives and propose a framework for the specification and verification of business processes which allows to model both annotations and data properties by specifying atomic tasks in a uniform way. The approach is well suited for a declarative specification of the business process, which has been advocated by many authors in the literature [32, 30, 25]. Following [7], the specification of annotation can be done in an action theory by defining the effects and preconditions of atomic tasks. The same approach allows to capture data properties, by modelling data acquisition tasks as actions which nondeterministically assign values to variables (data objects) on given domains, under the restriction that domains are finite. The use of directional rules for modeling business rules as well as to capture the conditional structure of norms is widely used in the literature [18]. In our approach, besides the specification of action preconditions and direct effects, causal rules in an action domain allow to capture dependencies among fluents ⋆ This work has been partially supported by Regione Piemonte, Project ICT4LAW. Business Processes Verification with Temporal Answer Set Programming 49 (propositions whose truth is affected by actions) and fluent changes, as well as dependencies between process data and fluents. Our claim is that both static and dynamic causal laws are useful for the specification of business process annotations and their use allows unintended conclusions to be avoided. Observe that, once the data perspective is included, causal laws can include both conditions on data and annotations. For instance, the rule age ≥ 18 ⇒ of Age may establish a link between the business process, whose execution assigns values to the variable age, and the compliance rules dealing with persons ”of age”. The approach we propose is based on Answer Set Programming (ASP) [11] and, more precisely, on the temporal extension of ASP in [16], combining ASP with the temporal logic DLTL [22], an extension of LTL in which the temporal operators are enriched with program expressions. The action language in [16] allows general DLTL constraints to be included in action domains, which can be profitably used for a declarative specification of the business process advocated in the literature [32, 30, 25]. In addition, the proposed approach also allows for a direct encoding of processes specified in workflow languages, and it can be used in combination with state of the art workflow management systems. The paper considers several verification tasks including the verification of business process compliance to business rules. Verification is performed through Bounded Model Checking [6] techniques and exploits the approach in [16] for DLTL bounded model checking in ASP, which extends the approach for Bounded LTL Model Checking with Stable Models in [21]. 2 A Temporal Answer Set Programming language In this section we recall the temporal ASP language introduced in [16]. The language is based on a temporal extension of Answer Set Programming (ASP) which combines ASP with the temporal logic DLTL [22], an extension of LTL in which temporal operators are enriched with program expressions. In particular, in DLTL the next state modality can be indexed by actions, and the until operator U π can be indexed by a program π which, as in PDL, can be any regular expression built from atomic actions using sequence (;), nondeterministic choice (+) and finite iteration (∗). Satisfiability and validity for DLTL are PSPACE-complete problems [22]. Let Σ = {a1 , . . . , an } be a finite non-empty alphabet of actions. From the until operator, the derived modalities hπi, [π], (next), UW , ✸ and ✷ can be defined as follows: hπiα ≡ ⊤U π α, [π]α ≡ ¬hπi¬α, α ≡ a∈Σ haiα, αU β ≡ ∗ ∗ αU Σ β, ✸α ≡ ⊤Uα, ✷α ≡ ¬✸¬α, where, in U Σ , Σ is taken to be a shorthand for the program a1 + . . . + an . Informally, a formula [π]α is true in a world w of a linear temporal model if α holds in all the worlds of the model which are reachable from w through any execution of the program π. A formula hπiα is true in a world w of a linear temporal model if there exists a world of the model reachable from w through an execution of the program π, in which α holds. A domain description D is a pair (Π, C), where Π is a set of laws describing the effects and executability preconditions of actions (as described below), and C 50 L. Giordano et al. is a set of temporal constraints, i.e., general DLTL formulas. Atomic propositions describing the state of the domain are called fluents. Actions may have direct effects, described by action laws, and indirect effects, described by causal laws capturing the causal dependencies among fluents. Let L be a first-order language which includes a finite number of constants and variables, but no function symbol. Let P be the set of predicate symbols, V ar the set of variables and C the set of constant symbols. We call fluents atomic literals of the form p(t1 , . . . , tn ), where, for each i, ti ∈ V ar ∪ C. A simple fluent literal l is an atomic literal p(t1 , . . . , tn ) or its negation ¬p(t1 , . . . , tn ). We denote by LitS the set of all simple fluent literals, and we assume that the fluent ⊥ representing the inconsistency is included in LitS . A temporal fluent literal has the form [a]l or l, where l ∈ LitS and a is an action name (an atomic proposition, possibly containing variables). Given a (simple or temporal) fluent literal l, not l represents the default negation of l. A (simple or temporal) fluent literal possibly preceded by a default negation, will be called an extended fluent literal. The laws are formulated as rules of a temporally extended logic programming language having the form l0 ← l1 , . . . , lm , not lm+1 , . . . , not ln (1) where the li ’s are simple or temporal fluent literals. As usual in ASP, rules with variables are a shorthand for the set of their ground instances; and we let Σ be the set of ground instances of atomic actions in the domain description. In the following we call a state a set of ground fluent literals. A state is said to be consistent if it is not the case that both f and ¬f belong to the state, or that ⊥ belongs to the state. The execution of an action in a state may possibly change the values of fluents in the state through its direct and indirect effects, thus giving rise to a new state. We assume that a law as (1) can be applied in all states while, when prefixed with the Init, it only applies to the initial state. Action laws, causal laws, precondition laws, persistency laws, initial state laws, etc., which are normally used in action theories, can all be defined as instances of (1). Action laws describe the effects of atomic tasks. The meaning of an action law [a]l0 ← l1 , . . . , lm , not lm+1 , . . . , not ln , (where l0 ∈ LitS and l1 , . . . , ln are either simple fluent literals of temporal fluent literals of the form [a]l) is that executing action a in a state in which l1 , . . . , lm hold and lm+1 , . . . , ln do not hold makes the effect l0 to hold (in the state after the action). Precondition laws allow the specification of executability conditions for atomic tasks; they are a special case of action laws with ⊥ as effect, i.e., they have the form: [a]⊥ ← l1 , . . . , lm , not lm+1 , . . . , not ln meaning that a cannot be executed (has an inconsistent effect) in case l1 , . . . , lm hold and lm+1 , . . . , ln do not hold. Causal laws define causal dependencies among propositions, which are used to derive indirect effect of actions, called ramifications in the literature of reasoning about actions where it is well known that causal dependencies among propositions are not suitably represented by material implication in classical logic. Static causal laws have the form: l0 ← l1 , . . . , lm , not lm+1 , . . . , not ln where the li ’s are fluent literals. Their meaning is: if l1 , . . . , lm hold and lm+1 , . . . , ln do not hold in a state, then l0 is caused to hold in that state. Dynamic causal laws have the Business Processes Verification with Temporal Answer Set Programming 51 form: l0 ← t1 , . . . , tm , not tm+1 , . . . , not tn where l0 is a fluent literal and the ti ’s are either fluent literals or temporal fluent literals of the form li (meaning that the fluent literal li holds in the next state). Their meaning is: if t1 , . . . , tm hold and lm+1 , . . . , ln do not hold, then l0 is caused to hold in the next state. In particular, in the premise, a combination of the form ¬f, f (or f, ¬f ) may be used to mean that fluent f becomes true (resp., false). The language also includes constraints of the form ⊥ ← l1 , . . . , lm , not lm+1 , . . . , not ln where the li ’s are simple or temporal fluent literals. In this language, default negation in clause bodies allows for the specification of nondeterministic action laws, of the form [a](l0 ∨ . . . ∨ lk ) ← lk+1 , . . . , lm , not lm+1 , . . . , not ln , stating that the execution of action a in a state in which lk+1 , . . . , lm hold and lm+1 , . . . , ln do not hold, makes nondeterministically one of l0 , . . . , lk true. In fact, [a](l0 ∨. . .∨lk ) ← Body can be seen as a shorthand for the rules [a]li ← Body, not [a]l1 , . . . not [a]li−1 , not [a]li+1 , . . . not [a]lk (i = 1, . . . , k). The laws above can be used to define persistency laws to deal with frame fluents as well as to complete the initial state in all the possible ways compatible with the initial state specification. The semantics of a domain description, is defined by extending the notion of answer set [11] to temporal answer sets, so to capture the linear structure of temporal models. We refer to [16] for details. 3 Declarative specification of business processes: merging annotations with data A declarative specification of a business process can be given by exploiting the action theory above to define the effects of atomic tasks as well as their executability preconditions. This approach has been followed in different contexts such as in the declarative specification of web services in [26, 5] and in the declarative specification of agent communication protocols in [35, 14]. We show that causal laws have a relevant role in the specification of background knowledge, which is common both to the business process and to the business rules, and that the proposed approach allows for an easy integration of the data perspective. The declarative specification of business processes has been advocated by many authors [32, 30, 25], as opposed to the more rigid transition based approach. A declarative specification of a process is, generally, more concise than transition based specification as it abstracts away form rigid control-flow details and does not require the order among the actions in the process to be rigidly defined. The Temporal ASP language in Section 2 is well suited for defining immediate and indirect effects of atomic tasks and their preconditions. Consider, for instance, the business process of an investment firm in [7], where the firm offers financial instruments to an investor. The atomic task investor identification has as effect that the investor has been identified, while investor profiling has the nondeterministic effect that the investor is recognized as being either risk averse or risk seeking. This can be modeled by the action laws: [investor ident(I)]investor identif ied(I) [prof iling(I)](risk averse(I)∨risk seeking(I)) ← investor identif ied(I) 52 L. Giordano et al. The first action law has empty precondition. The fact that prof iling can be executed only when the atomic task investor identification has been executed, can be modeled by introducing the precondition law: [prof iling(I)]⊥ ← not investor identif ied(I)) which, literally, states that executing action prof iling in a state in which the investor I has not been identified gives an inconsistency. Observe that, in this language, an action is executable unless there is a precondition law for it whose antecedent is not satisfied. Hence, once the investor has been identified, the action prof iling(I) becomes executable. However, to guarantee that it will be eventually executed, we can add in C the DLTL constraint ✷[investor ident(I)]✸hprof iling(I)i⊤ To force the execution of prof iling immediately after investor identification, instead, we could add the constraint: ✷[investor ident(I)] hprof iling(I)i⊤. The presence of DLTL constraints in a domain specification allows for a simple way to constrain activities in a business process. Observe that, as DLTL is an extension of LTL, it is possible to provide an encoding of all ConDec [28] constraints into our action language. The additional expressivity which comes from the presence of program expressions in DLTL, allows for a very compact encoding of certain declarative properties of the domain dealing with finite iterations. For instance, the property “action b must be executed immediately after any even occurrence of action a in a run” can be expressed by the temporal constraint: ✷[(a; Σ ∗ ; a)∗ ]hbi⊤), where Σ ∗ represents any finite action sequence. In [7] it has been shown that program expressions can be used to model the control flow of a business process in a rigid way. However, the solution in [7] does not deal with non-structured workflows. As concerns the data perspective, an atomic task which acquires the value of a data variable (data object) x can be regarded as an action assigning nondeterministically to x one of the values in its domain. Consider, for instance, the atomic task verif y status which verifies the status of a customer. Assume it has the effect of assigning a value (gold, silver or unknown) to a variable status. The task verif y status can be regarded as a non deterministic action assigning one of the possible values to the variable status: [verif y status]( status(gold) ∨ status(silver)) In general, we model a data acquisition task as a nondeterministic action. As an example, let us consider an atomic task get order which acquires an order of a product P and an atomic task select shipper(P ) which selects a shipper among the available shippers, which are compatible with the choice of the product P . Let us introduce the notation 1{[a]R(X) | P (X)}1 (similar to the notations used in Clingo and in S-models) as a shorthand for the two laws: [a]R(X) ← not [a]¬R(X) ∧ P (X) [a]¬R(X) ← [a]R(Y ) ∧ P (X) ∧ P (Y ) ∧ X 6= Y Business Processes Verification with Temporal Answer Set Programming 53 meaning that after the execution of action a, R(X) holds for a unique value of X among those values satisfying P (X). Let available product(P ) and available shipper(S) be the predicates defining the available products and shippers, and compatible(P, S) be a predicate saying that product P and shipper S are compatible. We can represent the effect of action get order by the law: 1{[get order]product(P ) | available product(P )}1 and the effect of action select shipper(P ) as 1{[select shipper(P )]shipper(S) | available shipper(S)}1. The requirement that P and S must be compatible can be enforced introducing the constraint: ⊥ ← [select shipper(P )]shipper(S) ∧ not compatible(P, S) meaning that it is not the case that the selected shipper S and the product P to be shipped are not compatible. The above specification of the effects of the task select shipper(P ) has strong similarities with the specification of a post-condition for a service in [9]. Indeed, in [9], a post-condition of the form R(x) := ψ(x), associated with a service σ, requires that after the execution of σ the argument x of R is instantiated with a (unique) tuple u such that ψ(u) holds in the previous state (artifact instance). As a difference with [9], where ψ(x) is a first-order temporal formula, our temporal language does not allow for explicit quantification: all variables occurring in action and causal laws are intended to be universally quantified in front of the laws. Furthermore, in our approach we cannot deal with infinite domains. As usual in ASP, a finite groundization the set of laws in the domain specification is required. Abstraction techniques as those in [24] can be adopted to abstract infinite or large domains to a finite, small set of abstract values. 4 Specification of business rules: causality and commitments The use of directional implications for modeling business rules as well as for modeling the conditional structure of norms is widely recognized in the literature [18]. In this section we claim that static and dynamic causal laws, proposed in the AI literature about reasoning about actions and change, are also appropriate for modeling business processes. Consider the domain in examples 2 and 3 in [33], with the rule stating that if an insurance claim is accepted by reviewer A and reviewer B, then it is accepted. Suppose this is represented as the material implication claimAccRevA ∧ claimAccRevB ⊃ claimAccepted i.e., the clause ¬claimAccRevA ∨ ¬claimAccRevB ∨ claimAccepted. Suppose further, as in [33], that as a result of an action with direct effects, we accept models where such effects hold, that satisfy a background theory including the implication above, and, according to the Possible Models Approach [34], differ minimally from the previous state. Consider a state where claimAccRevA already holds, and an action of acceptance for reviewer B occurs, with direct effect claimAccRevB. In order to satisfy the material implication, claimAccepted 54 L. Giordano et al. should become true, or claimAccRevA should become false, or both; minimal difference with the previous state only excludes this third alternative, while providing equal status to the first two. If the redundancy in the process means that the assessment of a reviewer has no influence on the other’s, then only the first result, where claimAccepted becomes true, is intended. The (static) causal rule claimAccepted ← claimAccRevA, claimAccRevB allows to obtain the first solution, given that its semantics imposes that in all states, if claimAccRevA ∧ claimAccRevB is true (and, in particular, it just became true), then claimAccepted holds (and it becomes true as a side effect if the premise just became true). However, the above implication might not actually be intended, as in case later steps in the process could make the claim not accepted. For example, the process model might specify that if the amount claimed is greater than a threshold, it should go through further approval by a supervisor (with possible effect ¬claimAccepted). Unlike [33], we consider the case where this does not mean that claimAccRevA ∧ claimAccRevB should become false, i.e., at least one conjunct (or exactly one, for a minimal change) should become false. Rather, we suggest that here, after reviewers acceptance, claimAccepted actually stands for “accepted unless decision is overridden” Dynamic causal laws are suitable to represent this; the side effect of acceptance by the single reviewers becomes: claimAccepted ← claimAccRevA, ¬claimAccRevB, claimAccRevB claimAccepted ← ¬claimAccRevA, claimAccRevA, claimAccRevB where syntactic sugar can be introduced, as in [8], to succinctly state that the conjunction claimAccRevA ∧ claimAccRevB is initiated i.e., it becomes true. Such rules correctly make claimAccepted true after reviewer acceptance, but, if a further step has the effect ¬claimAccepted, they do not “fire” because claimAccRevA ∧ claimAccRevB is true, but it is not becoming true. Note the difference with the static causal rule which would fire (because claimAccRevA ∧ claimAccRevB is true) and then contradict ¬claimAccepted. A particularly significant case of the pattern above, where a fluent becomes true as an indirect effect of some activity, but may be canceled by further activities, is the one of obligations, which arise naturally in compliance rules: several such rules are variants of “if B happens, then A shall happen”, or, “if B is (or becomes) true, then A shall become true”. Compliance verification for such rules could be performed by verifying a straightforward representation of the rule as a temporal logic formula, e.g., in LTL, the formula ✷(B ⊃ ✸A). This, however, does not admit the possibility that a later activity cancels the obligation: e.g., if an order for goods is confirmed by the seller, goods have to be shipped; but if the customer cancels the order, the obligation to ship goods is canceled. An explicit representation of obligations is useful to this purpose. In this paper we limit our attention to one type of obligations in the classification in [19]: the case where a given condition should become true at least once, after they have been triggered; i.e., we consider achievement obligations in [19], and we only consider the case where the obligation should be fulfilled after it is triggered. Business Processes Verification with Temporal Answer Set Programming 55 We then identify obligations with the notion of commitment from the social approach to agent communication [30, 20, 10]. A (base) commitment C(i, j, A), means that agent i is committed to agent j to bring about A, while conditional commitments of the form CC(i, j, B, A), mean that agent i is committed to agent j to bring about A, if condition B is brought about [35, 14]. In this paper we do not consider agents explicitly, and we concentrate our attention to base commitments C(A) where A is a fluent; C(A) is also a fluent, which can be made true, due to an action law or a dynamic causal law, as a direct or indirect effect of an activity in the process (order confirmation, in the example). The commitment (to ship goods, in the example) can be made false by an action with effect ¬C(A) (the customer cancelling the order). Fulfilling the commitment (shipping goods) also makes the commitment false. Compliance verification, as we shall see in Section 6, amounts then to verifying that commitments, if introduced, are discharged, i.e., they are either fulfilled or explicitly canceled. We refer to [7] for the treatment of defeasible business rules by means of default negation in ASP. 5 Translating business process workflows in ASP The temporal action language introduced above provides a flexible and declarative specification language for business processes, and in [16] we have provided its translation to standard ASP. There are, however, cases where the business process is naturally modeled (or it has already been modeled) in a workflow language such as YAWL [31]. In principle, such process models could be translated automatically to the temporal action language, but we have provided a direct translation to ASP for a subset of YAWL including AND- and XOR- splits and joins. The translation is based on an enabling semantics of arcs and tasks: an atomic task can be executed (i.e., the action can occur) when it is enabled. It is enabled when its only incoming arc is enabled, or it is an AND-join and all incoming arcs are enabled, or it is a XOR-join an one incoming arc is enabled. The execution of a task enables the outgoing arcs, and, in case it is a XOR-split, the execution of a subsequent activity based on the enabling of one such arc disables the other arcs. 6 Business process verification by bounded model checking In [16] we have developed Bounded Model Checking techniques for the verification of DLTL constraints. In particular, the approach extends the one developed in [21] for bounded LTL model checking with Stable Models. The approach can be used for checking satisfiability of temporal formulas. To prove the validity of a formula, its negation is checked for satisfiability. In case the formula is not valid, a counterexample is provided. Several verification tasks can be addressed within the proposed approach. Compliance verification (described in some detail in [7]) amounts to check that all 56 L. Giordano et al. the business rules are satisfied in all the execution of the process. We distinguish among business rules which can be encoded as a temporal formula and business rules whose modeling involves commitments. As an example of rule which can be encoded as a temporal formula to be verified, consider, in the order-production-delivery process in [24], the rule “Premium customer status shall only be offered after a prior solvency check”: it can be verified by checking the validity of the temporal formula ✷(solvency check done ∨ ¬hof f er premium statusi⊤) i.e., by verifying that in all executions of the business process if the action of f er premium status is executable, the fluent solvency check done holds. As an example of rule modeled through causal laws whose effect is adding a commitment, consider the rule “if the investor signs an order, the firm is obliged to provide him a copy of the contract”. It can be encoded by the causal law: C(sent contract) ← order signed We require that all the commitments generated are eventually fulfilled, unless they are explicitly cancelled (e.g., in the example, cancelling the order also cancels the obligation to send the contract). Observe that canceling a commitment would not be possible if the commitment to α corresponded directly to the temporal formula ✸α. A commitment is also discharged when it is fulfilled, i.e., the following causal rule is added for all possible commitments: ¬C(α) ← C(α) ∧ α Then the verification of rules involving commitments amounts to verifying the validity, for all possible commitments C(α), of the formula: ✷(C(α) → ✸(¬C(α))) A verification task considered in [9] is that of verifying properties of a business process, under the assumption that the process satisfies some given business rules. This verification task can also be addressed in our approach: the specification of the business rules is given by adding temporal constraints (and, possibly, causal laws) to the domain specification. The executions of the resulting domain specification are then verified against other temporal properties. Satisfiability and validity of a DLTL formula over the business process executions are decidable problems. However, given that BMC is not complete in general, an alternative approach to BMC in ASP is proposed in [15] to address the problem of completeness, by exploiting the Büchi automaton construction while searching for a counterexample. 7 Conclusions and related work The paper presents an approach to the verification of the compliance of business processes with norms. The approach is based on a temporal extension of ASP. Business Processes Verification with Temporal Answer Set Programming 57 The business process, its semantic annotation and the norms are encoded using temporal ASP rules as well as temporal constraints. Causal laws are used for modeling norms, and commitments are introduced for representing obligations. Compliance verification can be performed using the BMC technique developed in [16] for DLTL bounded model checking in ASP, which extends the approach for bounded LTL model checking with Stable Models in [21]. This paper enhances the approach to business processes compliance verification in [7] by taking into consideration the data perspective and by providing a declarative specification of the business process, while in [7] the control flow of a structured business process is modeled in a rigid way by means of a program expression. Also, we have shown that a direct encoding of the process workflow in ASP can be given and exploited for process verification. Several proposals in the literature introduce annotations on business processes for dealing with compliance verification [12, 18, 33]. In particular, [18] proposes a logical approach to business process compliance based on the idea of annotating the business process. Annotations and normative specifications are provided in the same logical language, namely, the Formal Contract Language (FCL), which combines defeasible logic [3] and deontic logic of violations [17]. Compliance is verified by traversing the graph describing the process and identifying the effects of tasks and the obligations triggered by task execution. Ad hoc algorithms for propagating obligations through the process graph are defined. The idea of describing the effects of atomic tasks on data through preconditions and effects is already present in [23], where effects and preconditions are sets of atomic formulas, and the background knowledge consists of a theory in clausal form; I-Propagation [33] is exploited for computing annotations. In our approach the domain theory contains directional causal rules rather than general clauses (which allow unintended conclusions to be avoided when reasoning about side effects), and domain annotations are combined with data properties in a uniform approach. In the related paper [33] several verification tasks are defined to verify that the business process control flow interacts correctly with the behaviour of the individual activities. In [9] a service over an artifact schema is defined as a triple: a precondition, a post-condition and a set of static rules, which define changes on state relations, and are formulas in a first-order temporal logic. State update rules S(x) ← φ+ (x) and ¬S(x) ← φ− (x) are essentially specific kind of causal laws whose antecedents φ+ and φ+ are evaluated in the artifact instance in which the service is executed and whose consequents are added to the resulting artifact instance. [9] identifies a class of guarded artifacts for which verification of properties in a (guarded) first-order extension of LTL is decidable. While our action language does not allow for explicit quantification, it allows for a flexible formulation of action effects and causal laws, which permits (as shown in Section 3) an encoding of post-conditions as in [9]. In [4] compliance checking for BPMN process models is based on the BPMNQ visual language. Rules are given a declarative representation as BPMN-Q queries, which are translated into temporal formulas for verification. 58 L. Giordano et al. In [25] the Abductive Logic Programming framework SHIFF [2] is exploited in the declarative specification of business processes as well as in the verification of their properties. In [1] expectations are used for modelling obligations and prohibitions and norms are formalized by abductive integrity constraints. In [29] Concurrent Transaction Logic (CTR) is used to model and reason about general service choreographies. Service choreographies and contract requirements are represented in CTR. The paper addresses the problem of deciding if there is an execution of the service choreography that complies both with the service policies and the client contract requirements. Temporal rule patterns for regulatory policies are introduced in [13], where regulatory requirements are formalized as sets of compliance rules in a real-time temporal object logic. The approach is used essentially for event monitoring. References 1. M. Alberti, M. Gavanelli, E. Lamma, P. Mello, P. Torroni, and G. Sartor. Mapping of Deontic Operators to Abductive Expectations. NORMAS, pages 126–136, 2005. 2. Marco Alberti, Federico Chesani, Marco Gavanelli, Evelina Lamma, Paola Mello, and Paolo Torroni. Verifiable agent interaction in abductive logic programming: the SCIFF framework. ACM Trans. Comput. Log., 9(4), 2008. 3. G. Antoniou, D. Billington, G. Governatori, and M. J. Maher. Representation results for defeasible logic. ACM Trans. on Computational Logic, 2:255–287, 2001. 4. Ahmed Awad, Gero Decker, and Mathias Weske. Efficient compliance checking using BPMN-Q and temporal logic, LNCS 5240. In BPM, pages 326–341. Springer, 2008. 5. K. Bhattacharya, C. Gerede, R. Hull, R. Liu, and J. Su. Towards formal analysis of artifact-centric business process models. In BPM, pages 288–304, 2007. 6. A. Biere, A. Cimatti, E. M. Clarke, O. Strichman, and Y. Zhu. Bounded model checking. Advances in Computers, 58:118–149, 2003. 7. D. D’Aprile, L. Giordano, V. Gliozzi, A. Martelli, G. L. Pozzato, and D. Theseider Dupré. Verifying business process compliance by reasoning about actions. In CLIMA XI, pages 99–116, 2010. 8. M. Denecker, D. Theseider Dupré, and K. Van Belleghem. An inductive definitions approach to ramifications. Electronic Transactions on Artificial Intelligence, 2:25– 97, 1998. 9. A. Deutsch, R. Hull, F. Patrizi, and V. Vianu. Automatic verification of datacentric business processes. In ICDT, pages 252–267, 2009. 10. N. Fornara and M. Colombetti. Defining Interaction Protocols using a Commitment-based Agent Communication Language. AAMAS03, pages 520–527. 11. M. Gelfond. Answer Sets. Handbook of Knowledge Representation, chapter 7, Elsevier, 2007. 12. A. Ghose and G. Koliadis. Auditing business process compliance. ICSOC, LNCS 4749, pages 169–180, 2007. 13. C. Giblin, S. Müller, and B. Pfitzmann. From Regulatory Policies to Event Monitoring Rules: Towards Model-Driven Compliance Automation. IBM Reasearch Report, 2007. 14. L. Giordano, A. Martelli, and C. Schwind. Specifying and Verifying Interaction Protocols in a Temporal Action Logic. Journal of Applied Logic, 5:214–234, 2007. Business Processes Verification with Temporal Answer Set Programming 59 15. L. Giordano, A. Martelli, and D. Theseider Dupré. Achieving completeness in bounded model checking of action theories in ASP. In Proc. KR 2012. 16. L. Giordano, A. Martelli, and D. Theseider Dupré. Reasoning about actions with temporal answer sets. Theory and Practice of Logic Programming, 2012. 17. G. Governatori and A. Rotolo. Logic of Violations: A Gentzen System for Reasoning with Contrary-To-Duty Obligations. Australasian Journal of Logic, 4:193–215, 2006. 18. G. Governatori and S. Sadiq. The journey to business process compliance. Handbook of Research on BPM, IGI Global, pages 426–454, 2009. 19. Guido Governatori. Law, logic and business processes. In Third International Workshop on Requirements Engineering and Law. IEEE, 2010. 20. F. Guerin and J. Pitt. Verification and Compliance Testing. Communications in Multiagent Systems, Springer LNAI 2650, 2003. 21. K. Heljanko and I. Niemelä. Bounded LTL model checking with stable models. Theory and Practice of Logic Programming, 3(4-5):519–550, 2003. 22. J.G. Henriksen and P.S. Thiagarajan. Dynamic Linear Time Temporal Logic. Annals of Pure and Applied logic, 96(1-3):187–207, 1999. 23. J. Hoffmann, I. Weber, and G. Governatori. On compliance checking for clausal constraints in annotated process models. Information Systems Frontieres, 2009. 24. D. Knuplesch, L. T. Ly, S. Rinderle-Ma, H. Pfeifer, and P. Dadam. On enabling data-aware compliance checking of business process models. In Proc. ER 2010, 29th International Conference on Conceptual Modeling, pages 332–346, 2010. 25. M. Montali, P. Torroni, F. Chesani, P. Mello, M. Alberti, and E. Lamma. Abductive logic programming as an effective technology for the static verification of declarative business processes. Fundam. Inform., 102(3-4):325–361, 2010. 26. S. Narayanan and S. McIlraith. Simulation, verification and automated composition of web services. In Proc. 11th Int. World Wide Web Conference, WWW2002, pages 77–88, 2002. 27. A. Nigam and N. S. Caswell. Business artifacts: An approach to operational specification. IBM Systems Journal, 42(3):428445, 2003. 28. Maja Pesic and Wil M. P. van der Aalst. A declarative approach for flexible business processes management. In Business Process Management Workshops, LNCS 4103, pages 169–180. Springer, 2006. 29. D. Roman and M. Kifer. Semantic web service choreography: Contracting and enactment. In International Semantic Web Conference, LNCS 5318, pages 550– 566, 2008. 30. M. P. Singh. A social semantics for Agent Communication Languages. Issues in Agent Communication, LNCS(LNAI) 1916, pages 31–45, 2000. 31. A. M. ter Hofstede, W. M. P. van der Aalst, M. Adamns, and N. Russell. Modern Business Process Automation: YAWL and its Support Environment. 2010. 32. Wil M. P. van der Aalst and Maja Pesic. Decserflow: Towards a truly declarative service flow language. In The Role of Business Processes in Service Oriented Architectures, volume 06291 of Dagstuhl Seminar Proceedings, 2006. 33. I. Weber, J. Hoffmann, and J. Mendling. Beyond soundness: On the verification of semantic business process models. Distributed and Parallel Databases (DAPD), 2010. 34. M. Winslett. Reasoning about action using a possible models approach. In Proc. AAAI 88, 7th National Conference on Artificial Intelligence, pages 89–93, 1988. 35. P. Yolum and M.P. Singh. Flexible Protocol Specification and Execution: Applying Event Calculus Planning using Commitments. AAMAS’02, pages 527–534, 2002. A Knowledge-based Approach to the Configuration of Business Process Model Abstractions Shamila Mafazi1 , Wolfgang Mayer2 , Georg Grossmann2 , and Markus Stumptner2 University of South Australia, Adelaide, SA, 5095, Australia 1 shamila.mafazi@mymail.unisa.edu.au 2 {firstname.lastname}@unisa.edu.au Abstract. Methods for abstraction have been proposed to ease comprehension, monitoring, and validation of large processes and their running instances. To date, abstraction mechanisms have focused predominantly on structural aggregation, projection, and ad-hoc transformations. We propose an approach for configuration of process abstractions tailored to a specific abstraction goal expressed as constraints on the abstraction relation and process transformation operators. Our framework goes beyond simple structural aggregation and leverages domain-specific properties, taxonomies, meronymy, and flow criteria. In this paper we outline the constraint-based framework and its underlying inference procedure. We show that our approach can handle most of the common process analysis use cases. Keywords: business process abstraction, business process management, process configuration 1 Introduction Models of business processes and operational procedures are increasingly being used in modern organizations, and the size and complexity of processes and their models can often be large. Development processes in large technology-focused organizations can easily span more than one thousand process steps [10]. As a result, process models have become difficult to understand and manage, as they may not be specified in full in order to enable flexible executions. However, such flexibility comes at a price: it is no longer easily possible to reason about executions based on a single process model. Although learning methods have been developed to reconstruct process models from execution logs [5], the resulting processes are often very specific and can be difficult to comprehend in full. Therefore, methods for business process abstraction are desired that enable process analysts to tailor large models to their specific analysis task at hand. Methods for abstraction have been proposed to ease comprehension, monitoring, and validation of large processes and their running instances. To date, abstraction mechanisms have focused predominantly on structural aggregation and projection. Collapsing “similar” entities in a process model into one abstract element and projecting away irrelevant entities are among the most common forms of simplification employed for abstraction. Similarity and relevancy of process entities is often defined ad-hoc using A knowledge-based approach to the conf. of business proc. model abstractions 61 process structure, clustering techniques, and user-specified selection criteria [4]. Clustering techniques, statistical methods, and ad-hoc criteria are commonly used to devise a concise summary representation that reflects certain aspects of the larger process. Although structural aggregation can lead to considerable simplification of large process models, the resulting model may not show all required elements or aggregate elements together that would be better kept separate. However, these measures fail to take into consideration the purpose of the abstraction for the user. We propose an approach to computing abstractions of business process models tailored to conducting selected common business process analysis tasks. We address this problem by imposing constraints on the abstraction relations that relate concrete and abstract process models such that the abstract process model induced by the abstraction relation is guaranteed to include the information needed to assess selected properties of the process. Rather than relying on cumbersome explicit specification of relevant process elements, we combine a questionnaire-driven approach to eliciting constraints for common analysis tasks with explicit specification of additional constraints a user may have. As a result, significance and granularity of an abstract model can be explicitly controlled and adjusted to suit a given task. Furthermore, the granularity need not be uniform across the entire model; different abstraction operators can be applied to different regions of the process model. Although techniques for parameterizing the granularity of the resulting abstractions have been introduced in order to compensate for current techniques’ inability to devise representations that are fit for the user’s objective [8], to the best of our knowledge, no explicit means to control abstractions is available to non-experts in formal process analysis. Our method can be seen as configuration of process models, where configuration applies to the abstraction operators used in devising process rather than the process model itself. In contrast to classic configuration where one chooses between alternative instantiations of given variation points within a parametric process model, our approach takes a detailed process model without explicit variation points and derives simplified variations thereof. Hence, our configuration method controls the operators applied within the abstraction process rather than the underlying process model. In this paper we make the following contributions: – a knowledge-based framework for configuring purposeful abstractions; – a framework for specifying constraints on the abstraction; – a method to infer the process elements (nodes, data, labels) that need to be retained in a conforming abstraction; – a method to compute abstractions conforming to the abstraction goal. The subsequent sections are structured as follows. Our process model and abstraction framework are introduced in section 2, our constraint-based abstraction framework and configuration mechanism are described in sections 3 and 4, respectively. Abstraction operators are modeled in section 5 and our method of synthesizing conforming abstractions is summarized in section 6, followed by discussion of related work in section 7. 62 2 S. Mafazi et al. Process Model Abstractions Different users of a process model are usually interested in observing a process model at different levels of details. This requires creation of different abstract process models from one model. However, not all abstract views of a process are equally desirable, as useful abstractions should be tailored to the user’s needs. In this work, we pursue this aspect of process abstraction by constraining abstractions such that certain user-selected properties of the underlying concrete process are maintained in its abstract view. We adapt the process model of Smirnov et al.[15] for our purposes and furnish the model with explicit representations of data- and domain-specific properties attached to tasks: Definition 1 (Process Model). A tuple (N, F, P, φ, DP ) is a process model, where N is a finite set of nodes partitioned into tasks Nt and gateways Ng , F ⊆ N × N is the flow relation such that (N, F ) is a connected graph, P is a finite set of properties of tasks, DP is a finite set of property values of tasks, and φ : N × P 7→ DP is a function that maps each property of a node to its value. For brevity, we write n.p for φ(n, p). Let M denote the set of all process models. The set of properties P comprises common domain-specific properties, predicate valuations, and information derived from executions of process instances. Common properties include roles, resources, timing information, and used and modified data flow information. Domain-specific predicates are boolean properties expressing facts such as “is on a critical path”. Information derived from executions indicate aggregate information, for example execution frequencies or number of running instances of a task. Given a concrete model m of a business process, an abstract view of m is a process model m̂ that retains “significant” entities of m and omits insignificant ones. In our framework, entities comprise the nodes, flows, and properties associated with nodes in a given model. We write Ωm to denote the set of entities in m where Ωm ⊆ N ∪ F ∪ {n.p|n ∈ N, p ∈ P }. Which entities are considered significant is largely determined by the purpose of the abstract model and hence should be defined flexibly based on the goals of the analyst. We will therefore use an abstract predicate sign ⊆ Ωm ∪ Ωm̂ to capture the significant entities. Whereas insignificant entities can be either eliminated from in the abstraction or absorbed into an abstract entity, the significant elements are to be retained. The correspondence between significant entities of m and their abstract counterpart in m̂ is given by an abstraction relation R ⊆ Ωm × Ωm̂ . Definition 2 (Process model abstraction). A business process model abstraction is a function α : M 7→ M that transforms a model m into a model m̂ = α(m) with correspondence relation Rα such that – – – – ∀ω̂ ∈ Ωm̂ sign(ω̂) is true, ∀ω̂ ∈ Ωm̂ ∃ω ∈ Ωm (ω, ω̂) ∈ Rα , ∀ω ∈ Ωm sign(ω) → ∃ω̂ ∈ Ωm̂ : (ω, ω̂) ∈ Rα , and α preserves local composition of m in m̂. A knowledge-based approach to the conf. of business proc. model abstractions Abstract Process Model m̂ 63 Receptionist 10 min Staff 10 min Receptionist 5 min Receptionist 7 min Customer 7 min Receptionist 10 min Admin 7 min Receptionist 3 min Receptionist 5 min Receptionist 7 min Admin 2 min Accountant 1 min Concrete Process Model m Fig. 1. Example Process Model (bottom), Abstract Model (top), and Correspondence Relation The first three conditions ensure that all retained entities in the abstraction are significant, are justified by the existence of at least one entity in the concrete process, and that all significant concrete entities have a corresponding element in the abstract model. The fourth condition restricts correspondences to meaningful maps that preserve the local structural composition of m in m̂. We require that each concrete entity maps to at most one abstract counterpart. Where each abstract property attaches to the abstraction of the concrete nodes belonging to the concrete properties. Also the abstract flow relation reflects the flow in the concrete process model: – ∀ω ∈ Ωm ∀ω̂, ω̂ ′ ∈ Ωm̂ (ω, ω̂) ∈ Rα ∧ (ω, ω̂ ′ ) ∈ Rα → ω̂ = ω̂ ′ , – ∀n.p ∈ Ωm ∀n̂.p̂ ∈ Ωm̂ (n.p, n̂.p̂) ∈ Rα → (n, n̂) ∈ Rα , – (m̂, n̂) ∈ F̂ → ∃m, n ∈ N (m, n) ∈ F ∗ ∧ (m, m̂) ∈ Rα ∧ (n, n̂) ∈ Rα . Consider the example process models in Figure 1, where the model in the lower half depicts the concrete process and the upper half shows the abstract model. The correspondence relation for tasks is indicated by dashed lines; the correspondences for flows are left implicit. Assuming that all elements performed by role Receptionist in m are significant, the abstraction satisfies the condition of Definition 2 as well as the three constraints stated above. For illustration, assume that tasks Cancel Late and Send Cancellation Confirmation each have a property Duration, then the constraints on Rα ensure that property Duration of abstract task Cancel is an abstraction of only the concrete tasks’ property. 64 3 S. Mafazi et al. Abstraction Specification According to Smirnov et al.[15] business process abstraction consists of three aspects: the why, the when and the how aspect. The why aspect captures the reasons for building an abstraction of a process model (fragment), the when aspect describes the conditions under which an element of a process model needs to be abstracted, and the how aspect relates to the concrete transformation mechanism to devise an abstraction. Whereas an extensive body of work covers the how aspect, comparatively little work is available to address the remaining aspects. Our work aims to address the why and when aspects. We assume that a specification of the information, its granularity, and predicates whose truth values shall be preserved by the abstraction can be elicited, represented formally, and exploited to guide a search procedure to infer suitable abstractions. Let Γ be such a specification, formulated over the entities in a given process model m. Specifically, we are interested in abstract models m̂ = α(m) satisfying Γ . By making the abstraction criterion explicit, the why aspect of process abstraction is captured, which can be translated into conditions for when it is admissible to abstract different entities. We define the significance predicate such that the entities are preserved which are required to ensure that criterion Γ is fulfilled on the abstract model. Building on prevalent structural rewriting mechanisms, we provide generic operators on properties and their values in order to automatically eliminate or aggregate entities, and furnish the abstract model with a suitable representation of aggregated information. The application of operators is restricted such that the resulting abstract model retains the significant entities and predicates. An abstraction criterion may be composed of the following specification primitives: sign(ω) for ω ∈ Ωm ; ω = ω ′ for ω, ω ′ ∈ Ωm ∪ Ωm̂ ; (n, n′ ) ∈ F ∗ ∪ F̂ ∗ , n, n′ ∈ Nt ∪ N̂t , n, n′ ∈ Ng ∪ N̂g ; n.p ⊕ c, where n, p, and c are a node, a property and a constant drawn from DP , respectively, and ⊕ is a relational operator (e.g. ≺, , =, 6=, . . . ); – (ω, ω̂) ∈ Rα ; – negation, conjunction, disjunction, universal and existential quantification. – – – – This language is expressive enough to capture many interesting properties, including domain-specific predicates and some aggregate instance information. The starred F ∗ and F̂ ∗ denote the transitive closure of the flow relation. For example, one could be interested only in the expensive tasks in the process model in Figure 1, where the value of Fee exceeds some threshold $$: Γ = x.Fee ≥ $$ → ∃x (x, x̂) ∈ Rα ∧ x.p = x̂.p ∧ (x.p, x̂.p) ∈ Rα for p ∈ {Fee, Label }. Capturing this explicitly in Γ , significance predicate and aggregation operators can be found. The example formula implies that all “expensive” tasks will retain their precise labels and fee information, whereas all other tasks and properties can potentially be abstracted away (subject to maintaining the generic abstraction constraints and well-formedness of the resulting abstract process). While this example may seem trivial, our approach generalizes to more involved situations. For example, if execution times shall be retained, but labels of some tasks need A knowledge-based approach to the conf. of business proc. model abstractions Cancel Cancel Early Fig. 2. Role Hierarchy Cancel Invoice 65 Cancel Cancel Late Send Cancelation Confirmation Fig. 3. Task Meronymy not be, our approach allows us to absorb otherwise insignificant tasks into other tasks, but prevents us from eliminating the task entirely, which would result in its contribution to execution time being lost. Similarly, the model abstractions that may be applied in devising an abstraction would be restricted to aggregate the property of sequence of nodes using the sum function but not, for example, max function. Furthermore, data flow in the model may impose restrictions on significance of non-local process entities. To facilitate the abstraction of data properties and other non-structural aspects of the business process, we assume that the value domain Dp of each property (including the label of nodes) p ∈ P forms a (finite-height) (semi-)lattice with partial order ≺p , where x ≺p y denotes that x is more precise (or has more information) than y. We use ⊤p to denote the least element of Dp , which provides no information. In this case, the property can be omitted. For example, let us revisit the model in Figure 1. An example of the (semi-)lattice for the Role properties is shown in Figure 2. The lattice for roles indicates that roles Receptionist and Admin are specializations of role Staff and are therefore candidates for role abstraction. For example, one could be interested in distinguishing Customers from Staff but not the precise staff roles. This could be captured in Γ as a constraint on the Role property of nodes. As a result, any value r of property Role that satisfies r ≺ Staff would be abstracted to value Staff. We impose one more constraint on Rα : any admissible Rα must satisfy that no information can be gained in the abstract model. That is, (ω, ω̂) ∈ Rα → ω  ω̂ must hold for all property entities ω, ω̂. 4 Abstraction Configuration Although the method of constraining valid abstractions is powerful, direct exposure of the formal framework to business analysts is rarely feasible in practice. Therefore, we employ knowledge-based configuration mechanisms to elicit appropriate partial abstraction specifications. We use a variant of the questionnaire method of process configuration [6], which interacts with the user in terms of simple domain-specific questions in order to construct the formal domain representation from the user’s answers. Different from previous work, our configuration model does not rely on established variation points within the process model, but rather aims to construct a formula that constrains the admissible abstraction relations and operators that can be used to construct it. No explicit library of processes and variation points specific to the process under consideration is needed. 66 S. Mafazi et al. We envision our process abstraction configurator to provide a wizard-like interaction where process analysts may select the information and predicates they wish to retain in the abstraction, and define domain-specific value lattices, aggregation- and structural transformation operators. Underlying our configurator is a catalog of abstraction constraint templates, which can be selected and its parameters instantiated by the user. Definition 3 (Configuration Model). A configuration model is a triple (C, O, G), where C is a catalog of abstraction aspects, O is a library of abstraction operators (defined in section 5), and G is a finite set of boolean propositions. The catalog contains configuration options and associated abstraction constraints, the library of abstraction operators defines the transformations that can potentially be applied to the process model, and the set of propositions allows one to restrict the set of applicable operators based on choices made for aspects in the catalog. We first describe the catalog and defer discussion of the operators until the next section. Definition 4 (Abstraction Aspect Catalog). An abstraction aspect catalog is a set of templates (Q, X, C[X, G]) where Q is a configuration option, X is a set of parameter variables, and C[X, G] is a formula template parametric in X specifying the abstraction constraints associated with Q in terms of the process model, and abstraction operator constraint in terms of assignments to G. Each placeholder variable x ∈ X can be assigned a predicate or domain value from the process model (subject to resulting in a well-formed formula C[X, G]). The configuration criterion Γ is simply the conjunction of all constraints Ci [xi , Gi ] of selected Qi with binding Xi = xi . As an example, let the configuration option Q1 be ’Get a process view from all the interactions between two specific roles’. By selecting this configuration option, the parameter variables are set as: X = {Role1 , Role2 }. The values for the roles are requested and assigned as Role1 = Admin and Role2 = Accountant. The configuration imposes constraints on the abstraction relation: a task n must be retained in the abstraction if its Role property valuation matches either Role1 or Role2 , and there is a flow from n to another task n′ that has property Role set to the remaining given role. Formally, the abstraction criterion Γ can be expressed as ∀n1 , n2 ∈ Nt : (n1 , n2 ) ∈ F ∗ ∧ ((n1 .Role = Role1 ∧ n2 .Role = Role2 ) ∨ (n1 .Role = Role1 ∧ n2 .Role = Role2 )) → (n1 , n1 ) ∈ Rα ∧ (n2 , n2 ) ∈ Rα . The catalog allows for convenient elicitation of user’s requirements based on common abstraction goal patterns. Table 1 shows how 11 of the 14 common use cases for process abstraction presented by Smirnov et al[15] can be expressed in our framework. Most constraints restrict which tasks and properties may be abstracted, and whether insignificant tasks shall be eliminated or aggregated. In the first group of uses cases (1–4), a process view respecting one or more properties of a task, such as resources and data objects, is required. For this purpose the properties of all tasks are compared A knowledge-based approach to the conf. of business proc. model abstractions 67 with the user specified property P. Tasks satisfying property P over property A are retained in the abstraction, whereas others are eliminated. In the second use case, tracing a task, the effect of a task in the process model needs to be assessed. For this purpose a process view containing the tasks which are reachable from the interesting task is produced. The constraint ensures that all tasks x′ reachable from a given task x are retained in the abstraction. For instance-related use cases (5–7), we currently require a pre-processing stage, where the tasks in the process model are furnished with aggregate property information derived from the instances. For example, an property representing execution frequencies or cumulative case costs could be added. For use case 9, adapt process model for an external partner, the tasks which need to be presented to the external partner are selected. The selected tasks are considered as significant, hence they need to be retained while the rest of the tasks are aggregated. The first constraint ensures that selected tasks are retained in the abstraction, whereas the second constraint ensures that no insignificant tasks are eliminated from the model (although such tasks may be aggregated with other insignificant tasks). In use case 10, a process view respecting the data dependencies of the tasks is required. For this purpose those tasks which make use of the data objects of interest are considered as significant and must be retained in the abstraction while the rest of the tasks are considered as insignificant and can be eliminated from the abstract model. For use case 13, a process view respecting user specified property(s) is required. Different from use cases 1–4, in this process view the insignificant tasks (tasks without interesting property(s)) are aggregated and presented as a composite task in the process view. Hence the constraint prohibiting the elimination of insignificant tasks must be imposed in addition to the constraint capturing use cases 1–4. Three use cases cannot directly be expressed in our framework. In use case 14, Retrieve coarse grained activities, a view over the coarse-grained tasks are required but not a view over the process model. This requires inferring the coarse-grained activities, i.e, abstraction hierarchies and meronymy, from the detailed process model. In contrast, our approach relies on given abstraction hierarchies and meronymy to compute abstractions. In use case 12, the user needs to control the abstraction level gradually while in our approach the process model is abstracted until all the user specified criteria are met. Finally, use case 8 requires to infer possible executions of the process model given a specification of a case instance. Extensions to our framework would be required in order to infer transitions that are potentially enabled or blocked based on guard conditions and values in the given case instance. 5 Abstraction Operators Once abstraction constraints have been set, the concrete process model m can be transformed into a customized process view m̂. In our framework, this amounts to constructing an abstraction function α and its induced Rα such that all abstraction constraints are satisfied when applying α on m. We employ generic search techniques to compose α from individual model transformation operators selected from a library of abstraction operators. 68 S. Mafazi et al. Preserving Relevant Tasks (Use cases 1–4) Q1 : Retain a task if property [A] satisfies [P] C1 [A, P ] = ∀x ∈ Nt [P ](x.[A]) → (x, x) ∈ Rα Tracing a Task (Use case 11) Q2 : Retain a task if it is reachable from the node [x] C2 [x] = ∀x′ ∈ N (x, x′ ) ∈ F ∗ → (x′ , x′ ) ∈ Rα Preserving Relevant Process Instances (Use cases 5–7) Q1 and Q2 , based on pre-processed model Adapt Process Model for an External Partner (Use case 9) Q3 : Retain selected tasks in set T ∀x ∈ T (x, x) ∈ Rα Q′3 : Aggregate insignificant tasks: ∀x ∈ N sign(x) Trace Data Dependencies (Use case 10) Q4 : Retain a task if it uses data property [P] C4 [P ] = ∀x ∈ Nt ∀p ∈ [P ] HasP roperty(x, p) → (x, x) ∈ Rα ∧ (x.p, x.p) ∈ Rα Get Process Quick View Respecting a Property (Use case 13) Q1 and Q′3 Table 1. Representation of Use Cases in [15] Abstraction operators are model transformations that rewrite the concrete model’s entities into their abstract counterparts. Traditionally, work on business process abstraction focuses predominantly on structural transformations, where rules specify how fragments in a model shall be transformed into an abstract (smaller) fragments in the abstract model. Our work extends this approach to data properties. Similar to constraints on the abstraction relation, which limit the information retained in the abstraction, the selection of abstraction operators is subject to constraints imposed by the configuration model that ensure abstract data values are given meaningful values consistent with the purpose of the abstraction. Definition 5 (Abstraction Operator). An abstraction operator is a tuple (R, S, V, W ) where R, S are fragments of a process model (“patterns”) with common variables V , and W is a boolean expression over propositions G (in the configuration model) and V , governing the application of the operator. If R matches with binding σ in a model m, and W is satisfiable, a model m′ = m[Rσ 7→ Sσ] is obtained by replacing the matched part Rσ in m with the replacement fragment Sσ. Substitute S may contain data transformation functions that compute the aggregate value for properties in the abstract model. Operators include sum, min, max, avg, for numeric properties, and least upper bound and greatest lower bound operators (if defined) on properties’ value lattices. Our library of abstraction operators currently comprises: – Projection operators that eliminate tasks/flows; – Entity abstraction rules that transform labels and properties of individual tasks. These operators abstract property values according to the corresponding lattices of domain values; A knowledge-based approach to the conf. of business proc. model abstractions 69 – Structural rewrite rules that transform the process structure and re-arrange tasks and flows; – Aggregation rules that aggregate values of properties of multiple tasks. Separate rules exist for properties of different type, and different aggregation functions may need to be used for sequence, choice, parallel, and loop constructs. For space reasons we cannot present the entire collection in detail. Figure 4 contains examples of property-related aggregation for properties of different types (numeric, set-valued, boolean). The bottom part shows the concrete fragments and the top part the abstract counterparts; X and Y represent variables to be matched and a,b,c represent placeholders for numeric, set-valued, and boolean properties, respectively. Figure 4a indicates 2 tasks in a block. To aggregate the numeric properties of the two tasks, the operators such as Max, Min, Avg, Sum can be employed. Selecting an operator is completely case based. For example, assume a user is interested in tasks with high hand-off times. In this case, the operator Max needs to be selected to assign the maximum hand-off time to the composite task XY. Likewise, for the set-valued properties, an operator such as union, aggregate meronymy, abstract label, based on the configuration option in hand, can be selected. The operators for boolean properties of the tasks, include, Or, And, Xor. As an example, assume a user is interested in observing the tasks which are in a critical path, the operator Or can be employed which indicates the composite task is whether on a critical path or not. Figure 4b shows an abstraction operator for two tasks in a loop. For the numeric properties of these tasks, based on how many times the loop is executed, the result from the abstracting operator needs to be multiplied or widened to infinity, if an upper bound is not known. Figure 4c shows an abstraction operator for sequential tasks. In this case, where numeric properties typically are aggregated, set-valued properties are merged, and boolean properties are either merged or combined using logic operators to infer the property value associated with the abstract task. Table 2 gives a list of operators currently defined in our library. The table on the right-hand side of the figure shows examples for formalization of three operators in our framework. Our formalization relies on a set G of propositions defined in the configuration model that is used to govern the application of certain abstraction operators. The elements of this set are determined by the selected configuration options and domain model and consists of propositions of the form Enable(o, op, p), where o is the name of an abstraction operator, op is an aggregation operation, and p is a property. Together with a hierarchy of properties (with specialization ordering ⊑), the propositions are used to control which operators can be applied to certain operations. For example, abstraction operator SumNumPropSeq is only applicable if none of the configuration options prohibits its application. Whereas most operators are generic and can be applied to process models from any domain, domain-specific operators can be introduced to account for specific abstractions, such as the meronymy approach presented in [14]. 6 Abstraction Computation Conceptually, our abstraction method proceeds as follows. Starting with a given concrete process model m and configuration constraints Γ , we employ a search procedure 70 S. Mafazi et al. [Y.a, ∞] Fig. 4. Structural and Property Aggregation Operators Operator Remove Task/Flow Remove Property Abstract Label Abstract Property Value Aggregate Sequence Aggregate Concurrent Aggregate Choice Aggregate Loop Aggregate Meronymy Simplify Gateway Shift Gateway Aggregate Value (Seq) Aggregate Value (Concurrent) Aggregate Value (Choice) Aggregate Value (Loop) Type Logic Representation of the Operators Projection RemoveTask(x): Projection ∀x ∈ Nt : ¬sign(x) → Entity Entity ∄x̂ ∈ N̂t : (x, x̂) ∈ Rα Structural AggregateTaskSeq(x,y): Structural x, y ∈ Nt ∧ (x, y) ∈ F → Structural ∃c xy ∈ N̂t : (y, x cy) ∈ Rα Structural ∧ (y, x cy) ∈ Rα Structural SumNumPropSeq(x,y,p): Structural x, y ∈ Nt ∧ (x, y) ∈ F ∧ Structural (x, x cy) ∈ Rα ∧ (y, x cy) ∈ Rα Aggregation ∧p ⊑ N umeric Aggregation ∧ ¬Enable(SumN umP ropSeq, +, p) ∈ /G Aggregation →x cy.p = x.p + y.p Aggregation Table 2. Abstraction Operators to incrementally build an abstraction. An applicable abstraction operator r is selected and applied to m, yielding a transformed model m′ . If structural aggregation was performed, additional rules to determine the property values of new task(s) are applied. Concurrently, the abstraction function and its correspondence relation are extended to account for the effects of r. This process repeats until an abstraction satisfying all constraints in Γ has been created and no further rule applications are possible. As a result, we obtain an abstraction function that transforms the given model m in a maximally abstract process model reflecting the relevant tasks and properties. If the intermediate results are recorded, this yields a hierarchy of abstractions of varying granularity. Although not all models in this hierarchy necessarily satisfy all abstraction constraints, navigating the abstraction hierarchy could be useful to “drill-down” in specific areas if needed (comparable to the approach in [12]). Incremental specification and adjustment of abstraction constraints based on initial abstract views remains a direction for future research. A knowledge-based approach to the conf. of business proc. model abstractions 71 If multiple operators are applicable, this approach may result in multiple possible abstractions. To steer our algorithm towards desirable abstractions, we employ a simple optimization method that aims to minimize both constraint violations and model complexity. When selecting an abstraction operator, we choose the operator that minimizes the sum viol(Γ, α, m) + size(α(m)), where viol(Γ, α, m) denotes the number of constraints in Γ that are violated by the current abstraction α when applied to m, and size(α(m)) measures the number of elements (|N | + |F |) in the abstract model α(m). In addition, we maintain a worklist of the current best k abstraction functions. Currently, k is a user-set parameter. For example, let us revisit the process in Figure 1. Assume that only tasks that are involving role “Receptionist” with Duration> 3min are required to be shown in the abstraction. Based on the given the abstraction constraints, the abstraction criterion Γ can be expressed as: ∀n ∈ Nt ∧ n.Role = Receptionist ∧ n.Duration > 3 → (n, n) ∈ Rα Considering the criteria, tasks Use, Cancel Early and Cancel Invoice are insignificant, as for example U se.Role 6= Receptionist. Aggregating the two tasks Cancel Early and Cancel Invoice does not result in a significant task either. Hence among others, operator “Remove Task” can be applied to these tasks to eliminate them from the process model. Tasks Cancel Late and Send Cancellation Confirmation are also insignificant but unlike Cancel Early and Cancel Invoice, aggregating these two tasks results in a significant task. Hence, operator Abstract Property Value can be applied to their role properties to lift the property value to the abstract value Staff. Now, operator Aggregate Meronymy can be applied (based on the meronymy in Figure 3), combining Cancel Late and Send Cancellation Confirmation into Cancel. The operator SumNumPropSeq is applied on the duration properties of the two tasks to add up these properties. Since the abstract task was formed by sequential composition, Aggregate Value (Seq) must be applied twice to infer the value for properties Role and Duration of the abstract task. At this point, no operators are applicable that satisfy the abstraction constraints. Further simplification of properties and removal of tasks or flows would yield either an ill-formed process model or violate an abstraction constraint. 7 Related Work The research presented in this paper complements the areas of business process model abstraction and process model configuration. Due to emerging various needs, several approaches have been proposed by which the size of a process model can be reduced. However, no single approach provides the same level of configuration ability as ours. Many approaches for simplifying a given process model based on rewrite rules have been developed [15]. Rewriting approaches based on process fragments, process regions and patterns aim to simplify the structure of large processes by hierarchical aggregation. Various process visualization techniques rely on users selecting interesting tasks and eliminating the remaining tasks from the process model [2]. Pankratius et al. [11] proposed Petri Net based reduction patterns, including place and transition elim- 72 S. Mafazi et al. ination and place and transition join, for abstraction. Liu et al. [9] cluster tasks in a process model, preserving ordering constraints, roles, execution frequencies, and process view for external partners. Since their main abstraction operation is aggregation, the clusters are aggregated into composite nodes. In both of these approaches [11, 9], the authors address the how component of the business process abstraction. Since the papers ignore the execution semantic of the process model and treat only tasks, but not the reachability criterion, as the abstraction objects, the process views related to the process instances (use cases [5-7]) cannot be captured by their techniques. Additionally, compared to our approach, their approach is not user interactive. Cardoso et al.[3] proposed reduction rules to synthesize process views respecting ordering constraints and roles. The paper concentrates on the how component of the process abstraction while only non-functional property values have been considered. Furthermore, their reduction technique is pattern based. Once a region matches one of their predefined patterns, the region is aggregated into a composite node. Hence, it is not always possible to aggregate an insignificant task, as forming a region for the task that matches the patterns, can be impossible. Bobrik et al.[1] aggregate information derived from running instances into a summary process model, including completion status, data properties, and timing information. In this paper only the how component is discussed. Also the paper does not discuss the property aggregation operations for different types of properties. Polyvyanyy et al. [13] defined abstraction criteria based on slider approach which separate significant from insignificant tasks, which are subsequently aggregated based on structural process patterns. Although the abstraction criteria can be extended to cover more abstraction scenarios, they are limited to those properties which have a quantitative measurement such as cost and execution duration. Fahland et al.[5] proposed a simplification strategy for Petri nets that is based on unfolding and subsequent transformation and folding regardless of abstraction purposes. Overall most of the process model abstraction approaches focus on only the how component, reduce a process model based on predefined patterns, consider only a limited number of properties, and are not user interactive. In contrast, we take other process abstraction components into account, we do not restrict the preservation or aggregation of a task based on its region and the corresponding patterns, we provide an aggregation solution for properties with different types. Finally using a questionnaire, different needs of a user from abstracting a process model are taken into account. In process model configuration literature, La Rosa et al. [8] introduce a questionnaire approach for system configuration. The questionnaire elicits facts about the desired process variant. Facts are associated with actions that adapt a given generic reference process to suit the users requirements. Gottschalk et al. [7] summarizes similar approaches for EPCs and YAWL, where tasks in the process are either blocked or hidden. In contrast, our approach does not rely on a reference process with variation points. Instead, we constrain the resulting abstraction relation and employ search techniques to compute suitable abstractions for tasks and data entities in the process model. A knowledge-based approach to the conf. of business proc. model abstractions 8 73 Conclusion We presented a configuration method for generating tailored business process abstractions that satisfy user-selected abstraction criteria. Our method is based on imposing constraints on the abstraction relation, which is computed using a generic search procedure using a library of generic and domain-specific abstraction operators. Elicitation of relevant abstraction constraints is simplified by a questionnaire-based approach that hides much of the formal underpinnings of our method. Our abstraction approach goes beyond simple structural transformation and also considers data properties and flow aspects within the process model in the abstraction. In this paper we focused on conceptual elaboration of our method. Immediate future work will focus on empirical evaluation of the approach on large business processes, and on incorporating preference orderings into our search and operator selection algorithms. Other avenues for research are incremental elicitation of abstraction constraints in the context of incremental process exploration and integration of process instancebased properties and further reachability-based criteria. 9 Acknowledgement We would like to acknowledge that this research was supported by the Australian Research Council (ARC) under grant DP0988961. References 1. Bobrik, R., Reichert, M., Bauer, T.: Parameterizable views for process visualization. Tech. rep., Centre for Telematics and Information Technology, University of Twente (2007) 2. Bobrik, R., Reichert, M., Bauer, T.: View-based process visualization. In: Proc. BPM. pp. 88–95. Springer (2007) 3. Cardoso, J., Sheth, A., Miller, J., Arnold, J., Kochut, K.: Quality of service for workflows and web service processes. Web Semantics: Science, Services and Agents on the World Wide Web 1(3), 281 – 308 (2004) 4. Ehrig, M., Koschmider, A., Oberweis, A.: Measuring similarity between semantic business process models. In: Proc. APCCM. pp. 71–80. Australian Computer Society (2007) 5. Fahland, D., van der Aalst, W.: Simplifying mined process models: An approach based on unfoldings. In: Rinderle-Ma, S., Toumani, F., Wolf, K. (eds.) Business Process Management, LNCS, vol. 6896, pp. 362–378. Springer (2011) 6. Gottschalk, F., La Rosa, M.: Process configuration in yawl. In: Hofstede, A.H.M., Aalst, W.M.P., Adams, M., Russell, N. (eds.) Modern Business Process Automation, pp. 313–382. Springer (2010) 7. Gottschalk, F., Wagemakers, T., Jansen-Vullers, M., van der Aalst, W., La Rosa, M.: Configurable process models: Experiences from a municipality case study. In: Advanced Information Systems Engineering, Lecture Notes in Computer Science, vol. 5565, pp. 486–500. Springer Berlin / Heidelberg (2009) 8. La Rosa, M., van der Aalst, W., Dumas, M., ter Hofstede, A.: Questionnaire-based variability modeling for system configuration. Software and Systems Modeling 8, 251–274 (2009) 9. Liu, D.R., Shen, M.: Workflow modeling for virtual processes: an order-preserving processview approach. Information Systems 28, 505 – 532 (2003) 74 S. Mafazi et al. 10. Mayer, W., Killisperger, P., Stumptner, M., Grossmann, G.: A declarative framework for work process configuration. AI EDAM 25(2), 145–165 (2011) 11. Pankratius, V., Stucky, W.: A formal foundation for workflow composition, workflow view definition, and workflow normalization based on petri nets. In: APCCM. pp. 79–88. APCCM ’05, Australian Computer Society (2005) 12. Polyvyanyy, A., Smirnov, S., Weske, M.: Process model abstraction: A slider approach. In: EDOC. pp. 325–331 (2008) 13. Polyvyanyy, A., Smirnov, S., Weske, M.: Business process model abstraction. In: Handbook on Business Process Management 1, pp. 149–166. Springer (2010) 14. Smirnov, S., Dijkman, R., Mendling, J., Weske, M.: Meronymy-based aggregation of activities in business process models. In: Parsons, J., Saeki, M., Shoval, P., Woo, C., Wand, Y. (eds.) Conceptual Modeling ER 2010, LNCS, vol. 6412, pp. 1–14. Springer (2010) 15. Smirnov, S., Reijers, H., Weske, M., Nugteren, T.: Business process model abstraction: a definition, catalog, and survey. Distributed and Parallel Databases 30, 63–99 (2012) Modular Representation of a Business Process Planner Shahab Tasharrofi, Eugenia Ternovska Simon Fraser University, Canada {sta44,ter}@cs.sfu.ca Abstract. The business process planner relies on external services for particular tasks. The tasks performed by each of the providers or the planner are often NP-complete, e.g. the Traveling Salesman Problem. Therefore, finding a combined solution is a computationally (as well as conceptually) complex task. Such a central planner could be used in business process management in e.g. logistics service provider, manufacturer supply chain management, mid-size businesses relying on external web services and cloud computing. The main challenge is a high level of uncertainty and that each module can be described in a different language. The language is determined by its suitability for the task and the expertise of the local developers. To allow for multiple languages, we approach the problem of finding combined solutions model-theoretically. We describe a knowledge representation formalism for representing such systems and then demonstrate how to use it for representing a business process planner. We prove correctness of our representation, describe general properties of modular systems and ideas for how to automate finding solutions. 1 Introduction Formulating AI tasks as model finding has recently become very promising due to the overwhelming success of SAT (propositional satisfiability) solvers and related technology such as ASP (answer set programming) and SMT (satisfiability modulo theories). In our research direction we focus on a particular kind of model finding which we call model expansion. The task of model expansion underlies all search problems where for an instance of a problem, which we represent as a logical structure, one needs to find a certificate (solution) satisfying certain specification. For example, given a graph, we are looking for its 3-colouring in a classic NP-search problem. Such search problems occur broadly in applications; they include planning, scheduling, problems in formal verification (where we are looking for a path to a bug), computational biology, and so on. In addition to being quite common, the task of model expansion is generally simpler (for the same logic) than satisfiability from the computational point of view. Indeed, for a given logic L, we have, in terms of computational complexity, MC(L) ≤ MX(L) ≤ Satisfiability(L), where MC(L) stands for model checking (structure for the entire vocabulary of the formula in logic L is given), MX(L) stands for model expansion (structure interpreting a part of the vocabulary is given) and Satisfiability(L) stands for satisfiability task (where 76 S. Tasharrofi and E. Ternovska we are looking for a structure satisfying the formula). A comparison of the complexity of the three tasks for several logics of practical interest is given in [15]. The next step is to extend the framework to a modular setting. In [21], we started to develop a model-theoretic framework to represent search problems which consist of several modules. In this paper, we develop our ideas further through an example of a Business Process Planner (BPP). This planner generalizes a wide range of practical problems. We envision such a planner used as a part of a multi-tool process management system. The task solved by BPP is extremely complex, and doing it manually requires significant resources. The technology is now ready to automate such computationally complex tasks, and our effort is geared towards making the technology available to less specialized users. In systems like our planner, a high level of uncertainty is present. In our framework, we can model the following types of uncertainty. – Each agent can see only the inputs and the outputs of other modules, but not their internals. The modules are viewed as black boxes by the outside world. Modules communicate with each other through common vocabulary symbols. – Modules can be represented using languages that are not known to other modules. Such languages can even be old and no longer supported, as is common for legacy systems. – Each module (an agent) can have multiple models (i.e., structures satisfying an axiomatization), each representing a possible plan of an individual module. This is a feature that generates uncertainty in planning. We view each module abstractly as a set of structures satisfying the axioms of the module. The main challenge is that each module can be represented in a different language, reflecting the local problem’s specifics and local expertise. Thus, the only way to formalize such a system is model-theoretic. Our goal is not only to formalize, but to eventually develop a method for finding solutions to complex modular systems like the BPP. This is a computationally complex task. Our inspiration for finding solutions to such systems comes from “combined” solvers for computationally complex tasks such as Satisfiability Modulo Theories (SMT). There, two kinds of propagation work interactively – propositional satisfiability (SAT) and theory propagation. In the case of modular systems, each module will have a so-called oracle that is similar to solvers/propagators used in SMT. If the logic language used by a module has a clear model-theoretic semantics, such an oracle (propagator) is easy to construct, but in the most extreme cases, derivations can be even performed by a human expert. At the level of solving, oracles would interact using a common internal solver language with a clear formal semantics. We believe that a formal model-theoretic approach is the right approach to developing a general algorithm for solving modular systems such as the BPP. This is another important motivation for developing a rigorous model-theoretic framework. In this paper, we demonstrate how to use ideas of model expansion and modular systems together to naturally represent modular systems such as BPP. We prove correctness of our formalization and explain how finding solutions to such systems can be automated. Modular Representation of a Business Process Planner 2 77 Business Process Planner A business process planner is an entity which plans a particular task by relying on external services for particular tasks. Often, in business, there are cases when one needs to buy services from other service providers. The planner combines services provided by different companies to minimize the cost of the enterprise. The customer needs to allocate required services to different service providers and to ask them for their potential plans for their share. These plans will then be used to produce the final plan, which can be a computationally complex task. The tasks performed by each of the providers are often NP-complete, e.g. the Traveling Salesman Problem. Therefore, finding a combined solution is a computationally (as well as conceptually) complex task. Such a central planner could be used in business process management in many areas such as: – Logistics Service Provider operates on the global scale, uses contracted carriers, local post, fleet management, driver dispatch, warehouse services, transportation management systems, e-business services as well as local logistics service providers with their own sub-modules. – Manufacturer Supply Chain Management uses a supply chains planner relying on transportation, shipping services, various providers for inventory spaces, etc.. It uses services of third party logistics (3PL) providers, which themselves depend on services provided by smaller local companies. – Mid-size Businesses Relying on External Web Services and Cloud Computing Such businesses often use data analysis services, storing, spreadsheet software (office suite), etc.. The new cloud-based software paradigm satisfies the same need in the domain of software systems. P 1' P 2 ' P 3 ' R P Planner S R1 S1 Provider1 P1 R2 S2 Provider2 P2 R3 S3 Provider3 P3 Fig. 1. Business Process Planner (BPP). Figure 1 shows a general representation of a business process planner with three providers. Each of the solid boxes in Figure 1 represents a business entity which, while interested to participate in the process, is not necessarily willing to share the information that has affected their decisions. Therefore, any approach to representing and solving such systems that assumes unlimited access to complete axiomatizations of these entities is impractical. 78 S. Tasharrofi and E. Ternovska The business process planner in Figure 1 takes a set S of services and a set R of restrictions (such as service dependencies or deadlines) and generates plan P . Each “Provideri ” takes a subset of services Si and their restrictions Ri . Provideri generates a potential plan Pi for subset Si of services and returns it to “Planner”. Planner takes all these partial plans and, if not satisfied with them, reconsiders service allocations or providers. However, if satisfied, it outputs plan P by combining partial plans Pi . 3 Background: Model Expansion Task In [17], the authors formalize combinatorial search problems as the task of model expansion (MX), the logical task of expanding a given (mathematical) structure with new relations. Formally, the user axiomatizes their problem in some logic L. This axiomatization relates an instance of the problem (a finite structure, i.e., a universe together with some relations and functions), and its solutions (certain expansions of that structure with new relations or functions). Logic L corresponds to a specification/modelling language. It could be an extension of first-order logic, or an ASP language, or a modelling language from the Constraint Programming (CP) community such as ESSENCE [12]. MX task underlies many practical approaches to declarative problem solving. Recall that a vocabulary is a set of non-logical (predicate and function) symbols. An interpretation for a vocabulary is provided by a structure, which consists of a set, called the domain or universe and denoted by dom(.), together with a collection of relations and (total) functions over the universe. A structure can be viewed as an assignment to the elements of the vocabulary. An expansion of a structure A is a structure B with the same universe, and which has all the relations and functions of A, plus some additional relations or functions. The task of model expansion for an arbitrary logic L (abbreviated L-MX), is: Model Expansion for logic L Given: (1) An L-formula φ with vocabulary σ ∪ ε and (2) A structure A for σ Find: an expansion of A, to σ ∪ ε, that satisfies φ. We call σ, the vocabulary of A, the instance vocabulary, and ε := vocab(φ) \ σ the expansion vocabulary1 . Example 1. The following formula φ in the language of logic programming under answer set semantics constitutes an MX specification for Graph 3-colouring. 1{R(x), B(x), G(x)}1 ← V (x). ⊥ ← E(x, y), R(x), R(y). ⊥ ← E(x, y), G(x), G(y). ⊥ ← E(x, y), B(x), B(y). An instance is a structure for vocabulary σ = {E}, i.e., a graph A = G = (V ; E). The task is to find an interpretation for the symbols of the expansion vocabulary ε = {R, B, G} such that the expansion of A with these is a model of φ: 1 By “:=” we mean “is by definition” or “denotes”. Modular Representation of a Business Process Planner 79 A z }| { (V ; E A , RB , B B , GB ) |= φ. | {z } B The interpretations of ε, for structures B that satisfy φ, are exactly the proper 3colourings of G. Given a specification, we can talk about a set (class) of σ∪ε-structures which satisfy the specification. Alternatively, we can simply talk about a set (class) of σ ∪ ε-structures as an MX-task, without mentioning a particular specification the structures satisfy. Example 2 (BPP as Model Expansion). In Figure 1, both the planner box and the provider boxes can be viewed as model expansion tasks. For example, the box labeled with “Provider1 ” can be abstractly viewed as an MX task with instance vocabulary σ = {S1 , R1 } and expansion vocabulary ε = {P1 }. The task is: given some services S1 and some restrictions R1 , find a plan P1 to deliver services in S1 such that all restrictions in R1 are satisfied. Moreover, in Figure 1, the bigger box with dashed borders can also be viewed as an MX task with instance vocabulary σ ′ = {S, R} and expansion vocabulary ε′ = {P }. This task is a compound MX task whose result depends on the internal work of all the providers and the planner. 4 Modular Systems This section presents the main concepts of modular systems. Definition 1 (Primitive Module). A primitive module M is a set (class) of σM ∪ εM structures, where σM is the instance vocabulary, εM is the expansion vocabulary. Each module can be axiomatized in a different logic. However, we can abstract away from the logics and study modular systems entirely model-theoretically. A modular system is formally described as a set of primitive modules (individual sets of structures) combined using the operations of: 1. Projection(πτ (M )) to restrict a module’s vocabulary, 2. Composition(M1 ⊲ M2 ) to connect outputs of M1 to M2 , 3. Intersection(M1 ∩ M2 ), 4. Union(M1 ∪ M2 ), 5. Feedback(M [R = S]) which connects output S of M to its inputs R. Formal definitions of these operations were introduces in [21] and are given below. The initial development of of our algebraic approach was inspired by [14]. In contrast to that work, our contribution was to use a model-theoretic setting, simplify the framework and add a loop operator which increases the expressive power significantly, by one level in the polynomial time hierarchy. Here, we only consider modular systems that do not use the union operator. 80 S. Tasharrofi and E. Ternovska Operations for Combining Modules Definition 2 (Composable, Independent [14]). Modules M1 and M2 are composable if εM1 ∩ εM2 = ∅ (no output interference). Module M1 is independent from M2 if σM1 ∩ εM2 = ∅ (no cyclic module dependencies). Definition 3 (Modular Systems). Modular systems are built inductively from constraint modules using projection, composition, union and feedback operators: Base Case A primitive module is a modular system. Projection For modular system M and τ ⊆ σM ∪ εM , modular system πτ (M ) is defined such that (a) σπτ (M ) = σM ∩ τ , (b) επτ (M ) = εM ∩ τ , and (c) B ∈ πτ (M ) iff there is a structure B ′ ∈ M with B ′ |τ = B. Composition For composable modular systems M and M ′ (no output interference) with M independent from M ′ (no cyclic module dependencies), M ⊲ M ′ is a modular system such that (a) σM ⊲M ′ = σM ∪ (σM ′ \ εM ), (b) εM ⊲M ′ = εM ∪ εM ′ , and (c) B ∈ (M ⊲ M ′ ) iff B|vocab(M ) ∈ M and B|vocab(M ′ ) ∈ M ′ . Union For modular systems M1 and M2 with σM1 ∩σM2 = σM1 ∩εM2 = εM1 ∩σM2 = ∅, the expression M1 ∪ M2 defines a modular system such that (a) σM1 ∪M2 = σM1 ∪σM2 , (b) εM1 ∪M2 = εM1 ∪εM2 , and (c) B ∈ (M1 ∪M2 ) iff B|vocab(M1 ) ∈ M1 or B|vocab(M2 ) ∈ M2 . Feedback For modular system M and R ∈ σM and S ∈ εM being two symbols of similar type (i.e., either both function symbols or both predicate symbols) and of the same arities; expression M [R = S] is a modular system such that (a) σM [R=S] = σM \ {R}, (b) εM [R=S] = εM ∪ {R}, and (c) B ∈ M [R = S] iff B ∈ M and RB = S B . Further operators for combining modules can be defined as combinations of basic operators above. For instance, [14] introduced M1 ◮ M2 (composition with projection operator) as πσM1 ∪εM2 (M1 ⊲M2 ). Also, M1 ∩M2 is defined to be equivalent to M1 ⊲M2 (or M2 ⊲ M1 ) when σM1 ∩ εM2 = σM2 ∩ εM1 = εM1 ∩ εM2 = ∅. Definition 4 (Models/Solutions of Modular Systems). For a modular system M , a (σM ∪ εM )-structure B is a model of M if B ∈ M . Since each modular system is a set of structures, we call the structures in a modular system models of that system. Example 3 (Stable Model Semantics). Let P be a normal logic program. We know S is a stable model for P iff S = Dcl(P S ) where P S is the reduct of P under set S of atoms (a positive program) and Dcl computes the deductive closure of a positive program, i.e., the smallest set of atoms satisfying it. Now, let M1 (S, P, Q) be the module that given a set of atoms S and ASP program P computes the reduct Q of P under S. Also, let M2 (Q, S ′ ) be a module that, given a positive logic program Q, returns the smallest set of atoms S ′ satisfying Q. Now define M as follows: M := π{P,S} ((M1 ⊲ M2 )[S = S ′ ]). Then, M represents a module which takes a ground ASP program P and returns all and only its stable models. Figure 2 shows the corresponding diagram of M . Modular Representation of a Business Process Planner 81 L Dcl P’ Reduct L’ P Fig. 2. Modular Representation of an ASP Solver. On a model-theoretic level, this module represents all possible ASP programs and all their solutions, where programs are encoded by structures. While such a module is certainly possible, a more practical use would be where one module corresponds to a particular ASP program such as the one for graph 3-colouring in Example 1. Nevertheless, the Example 3 is useful because it represents a well-known construction and illustrates several concepts associated with modular systems. Example 4 (BPP as a Modular System). Figure 1 can be viewed as a modular representation of the business process planner. There, each primitive module is represented by a box with solid borders and our module of interest is the compound module which is shown by the box with dotted borders. This module is specified by the following formula: BP P := π{S,R,P } (Planner ⊲ ((Provider1 ∩ Provider2 ∩ (1) Provider3 )[P1′ = P1 ][P2′ = P2 ][P3′ = P3 ])). As in Figure 1, the only vocabulary symbols which are important outside the big box with dashed borders are S, R and P . There are also three feedbacks from P1 to P1′ , P2 to P2′ , and P3 to P3′ . 5 Details of the Business Process Planner In this section we give a detailed description of one of the many kinds of business process planners, i.e., a logistics service provider on the global scale which hires local carriers and warehouses. So, in Figure 1, “Planner” refers to the global entity and “Provider” refers to local entities. The logistics provider need a plan to execute the services so that all restrictions are met. Some sample restrictions are: (1) latest delivery time (e.g., Halloween masks should be in stores before Halloween), (2) type of carrying vehicles (perishable products need refrigerator trucks), and (3) level of care needed (glass-works should be carried carefully). We say that a plan P is good for a set of services S and restrictions R (Good(P, S, R)) if P does all services in S and satisfies all restrictions in R. For simplicity, here, we only consider time restrictions, i.e., the value of t(i) is the (latest) delivery time for item i. There are also functions s(.) and d(.) to indicate the source 82 S. Tasharrofi and E. Ternovska and the destination of an item. For an item i, a plan is a sequence of cities hc1 , · · · , cn i along with its pickup times pt(i, j) and arrival time at(i, j). So, we have that2 : ∀i ∈ Items (P (i) = hc0 , · · · , cn i ⊃ c0 = s(i) ∧ cn = d(i)), ∀i ∈ Items (P (i) = hc0 , · · · , cn i ⊃ at(i, n) ≤ t(i)), ∀i ∈ Items (P (i) = hc0 , · · · , cn i ⊃ ∀j ∈ [1, n] (connected(cj−1 , cj )), ∀i ∈ Items (P (i) = hc0 , · · · , cn i ⊃ ∀j ∈ [0, n] (pt(i, j) ≥ at(i, j))), ∀i ∈ Items (P (i) = hc0 , · · · , cn i ⊃ ∀j ∈ [1, n] (at(i, j) = pt(i, j − 1) + time(cj−1 , cj ))). Intuitively, these axioms tell us that a plan for each item should: (1) start at the source and end at the destination, (2) arrive at the destination sooner than their latest delivery time, (3) pass through cities which are connected to each other, (4) respect time constraints, i.e., be picked up at a city after they have arrived at that city, and (5) respect the distance between cities. Certainly, a good plan needs to satisfy all these conditions, but, of course, this does not give us a full axiomatization of the problem. Here, we do not even intend to do that, because we believe that this is enough for the reader to have a good idea on how such full axiomatizatins look like. Given a definition of a good plan, one can define the intended solutions of a business process planner as below: Definition 5 (Intended Solutions). Let BP P be a business process planner with access to n providers. Structure B is an intended solution of BP P if: 1. P B is good for S B and RB , i.e., B |= Good(P, S, R), 2. All atomic actions A of P B (here, moving items between different cities) are doable by one of the n providers. So, by Definition 5, if some set of services cannot be executed under some restrictions, there should not exist any solution for the whole modular system which interprets S by those services and R by those restrictions. Now, to ensure that the intended solutions of modular system in Figure 1 coincide with the models of this modular system under our modular semantics, we use the declarative representations below for the modules: 2 We slightly abuse logic notations here to keep the axiomatization simpler. For example, we use the notation P (i) = hc0 , · · · , cn i to denote that item i takes a path starting at city c0 and then going to city c1 and so on until it getting to city cn . In practice, such a specification can be realized using two expansion function “len(.)” (to show the length of the path of an item) and “loc(., .)” (to show its location). As an example, this is how the first axiom above is rewritten in terms of “len” and “loc”: ∀i ∈ Items (loc(i, 0) = s(i) ∧ loc(i, len(i)) = d(i)). Modular Representation of a Business Process Planner 83 Module “Planner” is the set of structures over vocabulary σ = {R, S, P1 , · · · , Pn } and ε = {P, S1 , · · · , Sn , R1 , · · · , Rn } which satisfies: ^ Good(P, S, R) ⇔ Good(Pi , Si , Ri ), (2) i∈{1,··· ,n} P is a join of sub-plans Pi (for i ∈ {1, · · · , n}). (3) This module is easily specifiable in extended FO. Module “Provideri ” is the set of structures over vocabulary σ = {Ri , Si } and ε = {Pi } which satisfy Good(Pi , Si , Ri ). Each such module “Provideri ” can be specified using mixed integer linear programming. Also, in practice, many such modules are realized using special purpose programs (so, no standard language). Our framework enables us to deal with such programs in a unified way. Proposition 1 (Correctness). Structure B is in modular system BP P := π{S,R,P } (Planner ⊲ ((Provider1 ∩ · · · ∩ Providern )[P1′ = P1 ] · · · [Pn′ = Pn ])) (where “Planner” and “Provideri ”s are defined as above) iff B is an intended solution of BP P (according to Definition 5). Proof. (1) Take B which satisfies all modules, each PiB has to be good for SiB and RiB . Therefore, P B is good for S B and RB . Thus, B is an intended solution of BP P . (2) Conversely, take an intended solution B. P B should be such that P B is good for S B ′ and RB . So, set B ′ to be an expansion of B such that PiB is the parts of P B which ′ ′ ′ are executed by i-th provider. Also, SiB is those services that PiB executes and RiB is ′ those restrictions satisfied by PiB , e.g., the latest delivery time of item a is the delivery ′ ′ ′ ′ time of a according to PiB . Now, PiB is good for SiB and RiB . So, B ∈ BP P . 6 The Bigger Picture Complexity of the modular framework In this subsection, we summarize one of our important results about the modular framework from [21]. In order to do so, we first have to introduce the concepts of totality, determinacy, monotonicity, anti-monotonicity, etc. For lack of space, we do this through examples. The exact definitions can be found in [21]. Example 5 (Reachability). Consider the following model expansion task with σ = {S, E, B} and ε = {R}: R(v) ← S(v). R(v) ← R(u), E(u, v), not B(u). (4) where S represents a set of source vertices of a graph, E represents the edges of the graph, B represents a set of blocked vertices of the graph and R represents a set of vertices which can be reached from a source vertex without passing any blocked vertices. Through this section, let MR denote a primitive module which represents the MX task of Example 5. Obviously, σMR = {S, E, B} and εMR = {R}: Then, we have: 84 S. Tasharrofi and E. Ternovska Totality: Module MR is {S, E, B}-{R}-total because for every interpretation of S, E and B, there is an interpretation for R which is a stable model of program 4. Determinacy: Module MR is {S, E, B}-{R}-deterministic because for every interpretation of S, E and B, there is at most one interpretation for R which satisfies (4). Monotonicity: Module MR is {E}-{S, B}-{R}-monotone because if we fix the interpretation of symbols S and B and increase the set of edges E, then the interpretation of R (reachable vertices) increases. Anti-monotonicity: Module MR is {E}-{S, B}-{R}-anti-monotone because if we fix the interpretation of S and E and increase the set of blocked vertices (B), then, the set R of reachable vertices decreases. Polytime Checkability/Solvability: Module MR is both polytime checkable (because one can check in polynomial time if a structure B belongs to MR ) and polytime solvable (because, given interpretations to S, E and B, one can compute the only valid interpretation for R in polynomial time). However, the module MC which corresponds to the graph 3-coloring (Example 1) is polytime checkable but not polytime solvable (unless P=NP). Now, we are ready to restate our main theorem from [21]. We should however point out one difference to the readers who are not accustomed to the logical approach to complexity: In theoretical computing science, a problem is a subset of {0, 1}∗ . However, in descriptive complexity, the equivalent definition of a problem being a set of structures is adopted. The following theorem gives a capturing result for complexity class NP: Theorem 1 (Capturing NP over Finite Structures). Let K be a problem over the class of finite structures closed under isomorphism. Then, the following are equivalent: 1. K is in NP, 2. K is the models of a modular system where all primitive modules M are σM -εM deterministic, σM -total, σM -vocab(K)-εM -anti-monotone, and polytime solvable, 3. K is the models of a modular system with polytime checkable primitive modules. Note that Theorem 1 shows that when basic modules are restricted to polytime checkable modules, the modular system’s expressive power is limited to NP. Without this restriction, the modular framework can represent Turing-complete problems. As an example, one can encode Turing machines as finite structures and have modules that accept a finite structure iff it corresponds to a halting Turing machine. Theorem 1 shows that the feedback operator causes a jump in expressive power P from P to NP (or, more generally, from ∆P k to Σk+1 ). Example 6 (Stable Model Semantics). In Example 3, firstly, note that primitive module M1 is {S}-total and {S}-{P }-{Q}-anti-monotone, and also polytime solvable. Secondly, module M2 is {Q}-total, {Q}-{}-{S ′ }-monotone and, again, polytime solvable. However, the module M := π{P,S} ((M1 ⊲ M2 )[S = S ′ ]) is neither total nor monotone or anti-monotone. Moreover, M represents the NP-complete problem of finding a stable model for a normal logic program. This shows how, in the modular framework, one can describe a complex modular system in terms of very simple primitive modules. Modular Representation of a Business Process Planner 85 Solving modular systems We would like to find a method for solving complex tasks such as the application in this paper, without limiting to the particular structure of Figure 1, and without committing to a particular language. The language is determined by its suitability for the task and the expertise of the local developers. For example, the planner module is more easily specified as a SAT (propositional satisfiability) problem, while some provider modules are most easily specified using MILP (mixed integer linear programming), and global constraints with CP (constraint programming). A module performing scheduling with exceptions is more easily specified with ASP (answer set programming). In our research, we focus on the central aspect of this challenging task, namely on solving the underlying computationally complex task, for arbitrary modular systems and arbitrary languages suitable for specifying combinatorially hard search/optimization problems. Our approach is model-theoretic. We aim at finding structures satisfying multi-language constraints of the modular system, where the system is viewed as a function of individual modules. Our main goal is to develop and implement an algorithm that takes a modular system as its input and generates its solutions. Such a prototype system should treat each primitive module as a black-box (i.e., should not assume access to a complete axiomatization of the module). Not assuming complete knowledge is essential in solving problems like business process planning. We take our inspiration in how “combined” solvers are constructed in the general field of declarative problem solving. The field consists of many areas such as MILP, CP, ASP, SAT, and each of these areas has many solvers, including powerful “combined” solvers such as SMT, ASP-CP solvers. There are several methods e.g. cutting plain techniques of ILP, the formal interaction between SAT and theory solvers in SMT, etc. used in different communities. We made the fundamental observation [22] that while different on the surface, the techniques are similar when looked at model-theoretically. We proposed that those general principles can be used to develop a new method of solving modular systems as in the example above. 7 Related Work In [21],we continued the line of research initiated in [14]. We introduced MX-based modular systems and extended the previous work in several ways such as adding the feedback (loop) operator, thus drastically increasing the expressive power. The current paper shows one of the important real-world applications of systems with loops. In our modelling of the business process planner, we use the language independence of modular systems in an essential way. This is an essential property because, in practice, providers use domain-specific software which may not belong to a well-studied logic. This property separates the modular framework of [21] from many other languages which support modularity such as modular logic programs [7, 18, 13], and frameworks with multiple languages [19, 10]. An early work on adding modularity to logic programs is [7]. There, the authors derive a semantics for modular logic programs by viewing a logic program as a generalized quantifier. This work is continued by [18] to introduce modular equivalence in normal logic programs under the stable model semantics. That work, in turn, is ex- 86 S. Tasharrofi and E. Ternovska tended to define modularity for disjunctive programs in [13]. The last two papers focus on introducing modular programming in logic programs and dealing with difficulties that arise there. Applications such as business process planning need an abstract notion of a module, independent from the languages used. Our MX-based modular framework is well-suited for this purpose. That cannot be said about many other approaches of adding modularity to ASP languages and FO(ID) (such as those described in [2, 1, 6]) because they address different goals. Modular programming enables ASP languages to be extended by constraints or other external relations. This view is explored in [8, 9, 20, 3, 16]. While this view is advantageous in its own right, we needed an approach that is completely model-theoretic. Also, some practical modelling languages incorporate other modelling languages. For example, X-ASP [19] and ASP-PROLOG [10] extend prolog with ASP. Also ESRA [11], ESSENCE [12] and Zinc [5] are CP languages extended with features from other languages. Such practical modelling languages are further proof that combining different languages is extremely important for practitioners. We take this view to its extreme by looking at modules as only sets of structures and, thus, having no dependency on the language they are described in. The existing practical languages with support for specific languages could not have been applied to our task. Yet another direction to modularity is the multi-context systems. In [4], the authors introduced non-monotonic bridge rules to the contextual reasoning and originated an interesting and active line of research followed by many others for solving or explaining inconsistencies in non-monotonic multi-context systems. However, we believe that this application cannot be naturally described as a multi-context system because it is impractical to define the concepts of a logic, a knowledge-base and an acceptability relation (these are concepts that are essential to define in multi-context systems) for a domain-specific application which might not use any known logical fragment. 8 Conclusion and Future Work In this paper, we introduced an important range of real-world applications, i.e., business process planning. We discussed several examples of where this general scheme is used. Then we represented this problem as a model expansion task in the modular setting introduced in [21]. We gave a detailed description of the modules involved in describing business process planning in the modular framework and proved the correctness of our representation. Our main challenge is to devise an appropriate mathematical abstraction of “combined” solving. Remaining particular tasks include: Algorithm Design and Implementation We will design and implement an algorithm that given a modular system, computes the models of that modular system iteratively, and then extracts the solutions. Reduction in Search Space We will improve our algorithm by using approximation methods proposed in [21]. These methods correspond to least fixpoint and wellfounded model computations (but in modular setting). We will extend our algorithm so that it prunes the search space by propagating information from the approximation process to the solver. Modular Representation of a Business Process Planner 87 References 1. M. Balduccini. Modules and signature declarations for a-prolog: Progress report. In Workshop on Software Engineering for Answer Set Programming (SEA 2007), pages 41–55, 2007. 2. Chitta Baral, Juraj Dzifcak, and Hiro Takahashi. Macros, macro calls and use of ensembles in modular answer set programming. In Sandro Etalle and Miroslaw Truszczynski, editors, Logic Programming, volume 4079 of Lecture Notes in Computer Science, pages 376–390. Springer Berlin / Heidelberg, 2006. 3. S. Baselice, P. Bonatti, and M. Gelfond. Towards an integration of answer set and constraint solving. In Maurizio Gabbrielli and Gopal Gupta, editors, Logic Programming, volume 3668 of Lecture Notes in Computer Science, pages 52–66. Springer Berlin / Heidelberg, 2005. 4. Gerhard Brewka and Thomas Eiter. Equilibria in heterogeneous nonmonotonic multi-context systems. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 1, pages 385–390. AAAI Press, 2007. 5. Maria de la Banda, Kim Marriott, Reza Rafeh, and Mark Wallace. The modelling language zinc. In Frédéric Benhamou, editor, Principles and Practice of Constraint Programming - CP 2006, volume 4204 of Lecture Notes in Computer Science, pages 700–705. Springer Berlin / Heidelberg, 2006. 6. M. Denecker and E. Ternovska. A logic of non-monotone inductive definitions. Transactions on Computational Logic, 9(2):1–51, 2008. 7. Thomas Eiter, Georg Gottlob, and Helmut Veith. Modular logic programming and generalized quantifiers. In Jürgen Dix, Ulrich Furbach, and Anil Nerode, editors, Logic Programming And Nonmonotonic Reasoning, volume 1265 of Lecture Notes in Computer Science, pages 289–308. Springer Berlin / Heidelberg, 1997. 8. Thomas Eiter, Giovambattista Ianni, Roman Schindlauer, and Hans Tompits. A uniform integration of higher-order reasoning and external evaluations in answer-set programming. In Proceedings of the 19th international joint conference on Artificial intelligence, pages 90–96, San Francisco, CA, USA, 2005. Morgan Kaufmann Publishers Inc. 9. Islam Elkabani, Enrico Pontelli, and Tran Son. Smodels A – a system for computing answer sets of logic programs with aggregates. In Chitta Baral, Gianluigi Greco, Nicola Leone, and Giorgio Terracina, editors, Logic Programming and Nonmonotonic Reasoning, volume 3662 of Lecture Notes in Computer Science, pages 427–431. Springer Berlin / Heidelberg, 2005. 10. O. Elkhatib, E. Pontelli, and T.C. Son. Asp – prolog: A system for reasoning about answer set programs in prolog. In Proc. of Practical Aspects of Declarative Languages, 6th International Symposium, (PADL 2004), volume 3057, pages 148–162, Dallas, TX, USA, 2004. 11. Pierre Flener, Justin Pearson, and Magnus Ågren. Introducing ESRA, a relational language for modelling combinatorial problems. In Maurice Bruynooghe, editor, Logic Based Program Synthesis and Transformation, volume 3018 of Lecture Notes in Computer Science, pages 214–232. Springer Berlin / Heidelberg, 2004. 12. Alan M. Frisch, Warwick Harvey, Chris Jefferson, Bernadette Martı́nez-Hernández, and Ian Miguel. Essence: A constraint language for specifying combinatorial problems. Constraints, 13:268–306, September 2008. 13. Tomi Janhunen, Emilia Oikarinen, Hans Tompits, and Stefan Woltran. Modularity aspects of disjunctive stable models. Journal of Artificial Intelligence Research, 35:813–857, August 2009. 14. Matti Järvisalo, Emilia Oikarinen, Tomi Janhunen, and Ilkka Niemelä. A module-based framework for multi-language constraint modeling. In Esra Erdem, Fangzhen Lin, and Torsten Schaub, editors, Logic Programming and Nonmonotonic Reasoning, volume 5753 of Lecture Notes in Computer Science, pages 155–168. Springer Berlin / Heidelberg, 2009. 88 S. Tasharrofi and E. Ternovska 15. Antonina Kolokolova, Yongmei Liu, David Mitchell, and Eugenia Ternovska. On the complexity of model expansion. In Proceedings of the 17th international conference on Logic for programming, artificial intelligence, and reasoning, LPAR’10, pages 447–458, Berlin, Heidelberg, 2010. Springer-Verlag. 16. Veena Mellarkod, Michael Gelfond, and Yuanlin Zhang. Integrating answer set programming and constraint logic programming. Annals of Mathematics and Artificial Intelligence, 53:251–287, 2008. 17. David G. Mitchell and Eugenia Ternovska. A framework for representing and solving np search problems. In Proceedings of the 20th national conference on Artificial intelligence Volume 1, pages 430–435. AAAI Press, 2005. 18. Emilia Oikarinen and Tomi Janhunen. Modular equivalence for normal logic programs. In Proceeding of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 – September 1, 2006, Riva del Garda, Italy, pages 412–416, Amsterdam, The Netherlands, The Netherlands, 2006. IOS Press. 19. T. Swift and D. S. Warren. The XSB System, 2009. 20. L. Tari, C. Baral, and S. Anwar. A language for modular answer set programming: Application to ACC tournament scheduling. In Proc. of Answer Set Programming: Advances in Theory and Implementation, CEUR-WS, pages 277–292, 2005. 21. S. Tasharrofi and E. Ternovska. A semantic account for modularity in multi-language modelling of search problems. In FroCoS 2011. 22. S. Tasharrofi, X. Wu, and E. Ternovska. Solving modular model expansion tasks. In WLP/INAP 2011. Author Index B Bulanov, Pavel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C Calvanese, Diego . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D De Giacomo, Giuseppe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Di Ciccio, Claudio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dumas, Marlon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G Giordano, Laura . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Grossmann, Georg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . K Kaldeli, Eirini . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . L Lazovik, Alexander . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lembo, Domenico . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lespérance, Yves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M Mafazi, Shamila . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marrella, Andrea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Martelli, Alberto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mayer, Wolfgang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Montali, Marco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R Russo, Alessandro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S Santoso, Ario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spiotta, Matteo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stumptner, Markus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T Tasharrofi, Shahab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ternovska, Eugenia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Theseider Dupré, Daniele . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V van Beest, Nick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W Wortmann, Hans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 21 21 33 1 48 60 6 6 21 5 60 33 48 60 21 33 21 48 60 75 75 48 6 6