-
Efficient Online Computation of Business Process State From Trace Prefixes via N-Gram Indexing
Authors:
David Chapela-Campa,
Marlon Dumas
Abstract:
This paper addresses the following problem: Given a process model and an event log containing trace prefixes of ongoing cases of a process, map each case to its corresponding state (i.e., marking) in the model. This state computation operation is a building block of other process mining operations, such as log animation and short-term simulation. An approach to this state computation problem is to…
▽ More
This paper addresses the following problem: Given a process model and an event log containing trace prefixes of ongoing cases of a process, map each case to its corresponding state (i.e., marking) in the model. This state computation operation is a building block of other process mining operations, such as log animation and short-term simulation. An approach to this state computation problem is to perform a token-based replay of each trace prefix against the model. However, when a trace prefix does not strictly follow the behavior of the process model, token replay may produce a state that is not reachable from the initial state of the process. An alternative approach is to first compute an alignment between the trace prefix of each ongoing case and the model, and then replay the aligned trace prefix. However, (prefix-)alignment is computationally expensive. This paper proposes a method that, given a trace prefix of an ongoing case, computes its state in constant time using an index that represents states as n-grams. An empirical evaluation shows that the proposed approach has an accuracy comparable to that of the prefix-alignment approach, while achieving a throughput of hundreds of thousands of traces per second.
△ Less
Submitted 9 September, 2024;
originally announced September 2024.
-
Discovery and Simulation of Data-Aware Business Processes
Authors:
Orlenys López-Pintado,
Serhii Murashko,
Marlon Dumas
Abstract:
Simulation is a common approach to predict the effect of business process changes on quantitative performance. The starting point of Business Process Simulation (BPS) is a process model enriched with simulation parameters. To cope with the typically large parameter spaces of BPS models, several methods have been proposed to automatically discover BPS models from event logs. Virtually all these app…
▽ More
Simulation is a common approach to predict the effect of business process changes on quantitative performance. The starting point of Business Process Simulation (BPS) is a process model enriched with simulation parameters. To cope with the typically large parameter spaces of BPS models, several methods have been proposed to automatically discover BPS models from event logs. Virtually all these approaches neglect the data perspective of business processes. Yet, the data attributes manipulated by a business process often determine which activities are performed, how many times, and when. This paper addresses this gap by introducing a data-aware BPS modeling approach and a method to discover data-aware BPS models from event logs. The BPS modeling approach supports three types of data attributes (global, case-level, and event-level) as well as deterministic and stochastic attribute update rules and data-aware branching conditions. An empirical evaluation shows that the proposed method accurately discovers the type of each data attribute and its associated update rules, and that the resulting BPS models more closely replicate the process execution control flow relative to data-unaware BPS models.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
Detecting $K_{2,3}$ as an induced minor
Authors:
Clément Dallard,
Maël Dumas,
Claire Hilaire,
Martin Milanič,
Anthony Perez,
Nicolas Trotignon
Abstract:
We consider a natural generalization of chordal graphs, in which every minimal separator induces a subgraph with independence number at most $2$. Such graphs can be equivalently defined as graphs that do not contain the complete bipartite graph $K_{2,3}$ as an induced minor, that is, graphs from which $K_{2,3}$ cannot be obtained by a sequence of edge contractions and vertex deletions.
We develo…
▽ More
We consider a natural generalization of chordal graphs, in which every minimal separator induces a subgraph with independence number at most $2$. Such graphs can be equivalently defined as graphs that do not contain the complete bipartite graph $K_{2,3}$ as an induced minor, that is, graphs from which $K_{2,3}$ cannot be obtained by a sequence of edge contractions and vertex deletions.
We develop a polynomial-time algorithm for recognizing these graphs. Our algorithm relies on a characterization of $K_{2,3}$-induced minor-free graphs in terms of excluding particular induced subgraphs, called Truemper configurations.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Enhancing the Accuracy of Predictors of Activity Sequences of Business Processes
Authors:
Muhammad Awais Ali,
Marlon Dumas,
Fredrik Milani
Abstract:
Predictive process monitoring is an evolving research field that studies how to train and use predictive models for operational decision-making. One of the problems studied in this field is that of predicting the sequence of upcoming activities in a case up to its completion, a.k.a. the case suffix. The prediction of case suffixes provides input to estimate short-term workloads and execution times…
▽ More
Predictive process monitoring is an evolving research field that studies how to train and use predictive models for operational decision-making. One of the problems studied in this field is that of predicting the sequence of upcoming activities in a case up to its completion, a.k.a. the case suffix. The prediction of case suffixes provides input to estimate short-term workloads and execution times under different resource schedules. Existing methods to address this problem often generate suffixes wherein some activities are repeated many times, whereas this pattern is not observed in the data. Closer examination shows that this shortcoming stems from the approach used to sample the successive activity instances to generate a case suffix. Accordingly, the paper introduces a sampling approach aimed at reducing repetitions of activities in the predicted case suffixes. The approach, namely Daemon action, strikes a balance between exploration and exploitation when generating the successive activity instances. We enhance a deep learning approach for case suffix predictions using this sampling approach, and experimentally show that the enhanced approach outperforms the unenhanced ones with respect to control-flow accuracy measures.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
Prescriptive Process Monitoring Under Resource Constraints: A Reinforcement Learning Approach
Authors:
Mahmoud Shoush,
Marlon Dumas
Abstract:
Prescriptive process monitoring methods seek to optimize the performance of business processes by triggering interventions at runtime, thereby increasing the probability of positive case outcomes. These interventions are triggered according to an intervention policy. Reinforcement learning has been put forward as an approach to learning intervention policies through trial and error. Existing appro…
▽ More
Prescriptive process monitoring methods seek to optimize the performance of business processes by triggering interventions at runtime, thereby increasing the probability of positive case outcomes. These interventions are triggered according to an intervention policy. Reinforcement learning has been put forward as an approach to learning intervention policies through trial and error. Existing approaches in this space assume that the number of resources available to perform interventions in a process is unlimited, an unrealistic assumption in practice. This paper argues that, in the presence of resource constraints, a key dilemma in the field of prescriptive process monitoring is to trigger interventions based not only on predictions of their necessity, timeliness, or effect but also on the uncertainty of these predictions and the level of resource utilization. Indeed, committing scarce resources to an intervention when the necessity or effects of this intervention are highly uncertain may intuitively lead to suboptimal intervention effects. Accordingly, the paper proposes a reinforcement learning approach for prescriptive process monitoring that leverages conformal prediction techniques to consider the uncertainty of the predictions upon which an intervention decision is based. An evaluation using real-life datasets demonstrates that explicitly modeling uncertainty using conformal predictions helps reinforcement learning agents converge towards policies with higher net intervention gain
△ Less
Submitted 20 January, 2024; v1 submitted 13 July, 2023;
originally announced July 2023.
-
An improved kernelization algorithm for Trivially Perfect Editing
Authors:
Maël Dumas,
Anthony Perez
Abstract:
In the Trivially Perfect Editing problem one is given an undirected graph $G = (V,E)$ and an integer $k$ and seeks to add or delete at most $k$ edges in $G$ to obtain a trivially perfect graph. In a recent work, Dumas, Perez and Todinca [Algorithmica 2023] proved that this problem admits a kernel with $O(k^3)$ vertices. This result heavily relies on the fact that the size of trivially perfect modu…
▽ More
In the Trivially Perfect Editing problem one is given an undirected graph $G = (V,E)$ and an integer $k$ and seeks to add or delete at most $k$ edges in $G$ to obtain a trivially perfect graph. In a recent work, Dumas, Perez and Todinca [Algorithmica 2023] proved that this problem admits a kernel with $O(k^3)$ vertices. This result heavily relies on the fact that the size of trivially perfect modules can be bounded by $O(k^2)$ as shown by Drange and Pilipczuk [Algorithmica 2018]. To obtain their cubic vertex-kernel, Dumas, Perez and Todinca [Algorithmica 2023] then showed that a more intricate structure, so-called \emph{comb}, can be reduced to $O(k^2)$ vertices. In this work we show that the bound can be improved to $O(k)$ for both aforementioned structures and thus obtain a kernel with $O(k^2)$ vertices. Our approach relies on the straightforward yet powerful observation that any large enough structure contains unaffected vertices whose neighborhood remains unchanged by an editing of size $k$, implying strong structural properties.
△ Less
Submitted 26 October, 2023; v1 submitted 29 June, 2023;
originally announced June 2023.
-
Can I Trust My Simulation Model? Measuring the Quality of Business Process Simulation Models
Authors:
David Chapela-Campa,
Ismail Benchekroun,
Opher Baron,
Marlon Dumas,
Dmitry Krass,
Arik Senderovich
Abstract:
Business Process Simulation (BPS) is an approach to analyze the performance of business processes under different scenarios. For example, BPS allows us to estimate what would be the cycle time of a process if one or more resources became unavailable. The starting point of BPS is a process model annotated with simulation parameters (a BPS model). BPS models may be manually designed, based on inform…
▽ More
Business Process Simulation (BPS) is an approach to analyze the performance of business processes under different scenarios. For example, BPS allows us to estimate what would be the cycle time of a process if one or more resources became unavailable. The starting point of BPS is a process model annotated with simulation parameters (a BPS model). BPS models may be manually designed, based on information collected from stakeholders and empirical observations, or automatically discovered from execution data. Regardless of its origin, a key question when using a BPS model is how to assess its quality. In this paper, we propose a collection of measures to evaluate the quality of a BPS model w.r.t. its ability to replicate the observed behavior of the process. We advocate an approach whereby different measures tackle different process perspectives. We evaluate the ability of the proposed measures to discern the impact of modifications to a BPS model, and their ability to uncover the relative strengths and weaknesses of two approaches for automated discovery of BPS models. The evaluation shows that the measures not only capture how close a BPS model is to the observed behavior, but they also help us to identify sources of discrepancies.
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
Learning When to Treat Business Processes: Prescriptive Process Monitoring with Causal Inference and Reinforcement Learning
Authors:
Zahra Dasht Bozorgi,
Marlon Dumas,
Marcello La Rosa,
Artem Polyvyanyy,
Mahmoud Shoush,
Irene Teinemaa
Abstract:
Increasing the success rate of a process, i.e. the percentage of cases that end in a positive outcome, is a recurrent process improvement goal. At runtime, there are often certain actions (a.k.a. treatments) that workers may execute to lift the probability that a case ends in a positive outcome. For example, in a loan origination process, a possible treatment is to issue multiple loan offers to in…
▽ More
Increasing the success rate of a process, i.e. the percentage of cases that end in a positive outcome, is a recurrent process improvement goal. At runtime, there are often certain actions (a.k.a. treatments) that workers may execute to lift the probability that a case ends in a positive outcome. For example, in a loan origination process, a possible treatment is to issue multiple loan offers to increase the probability that the customer takes a loan. Each treatment has a cost. Thus, when defining policies for prescribing treatments to cases, managers need to consider the net gain of the treatments. Also, the effect of a treatment varies over time: treating a case earlier may be more effective than later in a case. This paper presents a prescriptive monitoring method that automates this decision-making task. The method combines causal inference and reinforcement learning to learn treatment policies that maximize the net gain. The method leverages a conformal prediction technique to speed up the convergence of the reinforcement learning mechanism by separating cases that are likely to end up in a positive or negative outcome, from uncertain cases. An evaluation on two real-life datasets shows that the proposed method outperforms a state-of-the-art baseline.
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
Intervening With Confidence: Conformal Prescriptive Monitoring of Business Processes
Authors:
Mahmoud Shoush,
Marlon Dumas
Abstract:
Prescriptive process monitoring methods seek to improve the performance of a process by selectively triggering interventions at runtime (e.g., offering a discount to a customer) to increase the probability of a desired case outcome (e.g., a customer making a purchase). The backbone of a prescriptive process monitoring method is an intervention policy, which determines for which cases and when an i…
▽ More
Prescriptive process monitoring methods seek to improve the performance of a process by selectively triggering interventions at runtime (e.g., offering a discount to a customer) to increase the probability of a desired case outcome (e.g., a customer making a purchase). The backbone of a prescriptive process monitoring method is an intervention policy, which determines for which cases and when an intervention should be executed. Existing methods in this field rely on predictive models to define intervention policies; specifically, they consider policies that trigger an intervention when the estimated probability of a negative outcome exceeds a threshold. However, the probabilities computed by a predictive model may come with a high level of uncertainty (low confidence), leading to unnecessary interventions and, thus, wasted effort. This waste is particularly problematic when the resources available to execute interventions are limited. To tackle this shortcoming, this paper proposes an approach to extend existing prescriptive process monitoring methods with so-called conformal predictions, i.e., predictions with confidence guarantees. An empirical evaluation using real-life public datasets shows that conformal predictions enhance the net gain of prescriptive process monitoring methods under limited resources.
△ Less
Submitted 7 December, 2022;
originally announced December 2022.
-
Why am I Waiting? Data-Driven Analysis of Waiting Times in Business Processes
Authors:
Katsiaryna Lashkevich,
Fredrik Milani,
David Chapela-Campa,
Ihar Suvorau,
Marlon Dumas
Abstract:
Waiting times in a business process often arise when a case transitions from one activity to another. Accordingly, analyzing the causes of waiting times of activity transitions can help analysts to identify opportunities for reducing the cycle time of a process. This paper proposes a process mining approach to decompose the waiting time observed in each activity transition in a process into multip…
▽ More
Waiting times in a business process often arise when a case transitions from one activity to another. Accordingly, analyzing the causes of waiting times of activity transitions can help analysts to identify opportunities for reducing the cycle time of a process. This paper proposes a process mining approach to decompose the waiting time observed in each activity transition in a process into multiple direct causes and to analyze the impact of each identified cause on the cycle time efficiency of the process. An empirical evaluation shows that the proposed approach is able to discover different direct causes of waiting times. The applicability of the proposed approach is demonstrated on a real-life process.
△ Less
Submitted 2 December, 2022;
originally announced December 2022.
-
Row Conditional-TGAN for generating synthetic relational databases
Authors:
Mohamed Gueye,
Yazid Attabi,
Maxime Dumas
Abstract:
Besides reproducing tabular data properties of standalone tables, synthetic relational databases also require modeling the relationships between related tables. In this paper, we propose the Row Conditional-Tabular Generative Adversarial Network (RC-TGAN), a novel generative adversarial network (GAN) model that extends the tabular GAN to support modeling and synthesizing relational databases. The…
▽ More
Besides reproducing tabular data properties of standalone tables, synthetic relational databases also require modeling the relationships between related tables. In this paper, we propose the Row Conditional-Tabular Generative Adversarial Network (RC-TGAN), a novel generative adversarial network (GAN) model that extends the tabular GAN to support modeling and synthesizing relational databases. The RC-TGAN models relationship information between tables by incorporating conditional data of parent rows into the design of the child table's GAN. We further extend the RC-TGAN to model the influence that grandparent table rows may have on their grandchild rows, in order to prevent the loss of this connection when the rows of the parent table fail to transfer this relationship information. The experimental results, using eight real relational databases, show significant improvements in the quality of the synthesized relational databases when compared to the benchmark system, demonstrating the effectiveness of the RC-TGAN in preserving relationships between tables of the original database.
△ Less
Submitted 14 November, 2022;
originally announced November 2022.
-
Repairing Activity Start Times to Improve Business Process Simulation
Authors:
David Chapela-Campa,
Marlon Dumas
Abstract:
Business Process Simulation (BPS) is a common technique to estimate the impact of business process changes, e.g. what would be the cycle time of a process if the number of traces increases? The starting point of BPS is a business process model annotated with simulation parameters (a BPS model). Several studies have proposed methods to automatically discover BPS models from event logs -- extracted…
▽ More
Business Process Simulation (BPS) is a common technique to estimate the impact of business process changes, e.g. what would be the cycle time of a process if the number of traces increases? The starting point of BPS is a business process model annotated with simulation parameters (a BPS model). Several studies have proposed methods to automatically discover BPS models from event logs -- extracted from enterprise information systems -- via process mining techniques. These approaches model the processing time of each activity based on the start and end timestamps recorded in the event log. In practice, however, it is common that the recorded start times do not precisely reflect the actual start of the activities. For example, a resource starts working on an activity, but its start time is not recorded until she/he interacts with the system. If not corrected, these situations induce waiting times in which the resource is considered to be free, while she/he is actually working. To address this limitation, this article proposes a technique to identify the waiting time previous to each activity instance in which the resource is actually working on them, and repair their start time so that they reflect the actual processing time. The idea of the proposed technique is that, as far as simulation is concerned, an activity instance may start once it is enabled and the corresponding resource is available. Accordingly, for each activity instance, the proposed technique estimates the activity enablement and the resource availability time based on the information available in the event log, and repairs the start time to include the non-recorded processing time. An empirical evaluation involving eight real-life event logs shows that the proposed approach leads to BPS models that closely reflect the temporal dynamics of the process.
△ Less
Submitted 24 August, 2022;
originally announced August 2022.
-
Business Process Simulation with Differentiated Resources: Does it Make a Difference?
Authors:
Orlenys Lopez-Pintado,
Marlon Dumas
Abstract:
Business process simulation is a versatile technique to predict the impact of one or more changes on the performance of a process. Mainstream approaches in this space suffer from various limitations, some stemming from the fact that they treat resources as undifferentiated entities grouped into resource pools. These approaches assume that all resources in a pool have the same performance and share…
▽ More
Business process simulation is a versatile technique to predict the impact of one or more changes on the performance of a process. Mainstream approaches in this space suffer from various limitations, some stemming from the fact that they treat resources as undifferentiated entities grouped into resource pools. These approaches assume that all resources in a pool have the same performance and share the same availability calendars. Previous studies have acknowledged these assumptions, without quantifying their impact on simulation model accuracy. This paper addresses this gap in the context of simulation models automatically discovered from event logs. The paper proposes a simulation approach and a method for discovering simulation models, wherein each resource is treated as an individual entity, with its own performance and availability calendar. An evaluation shows that simulation models with differentiated resources more closely replicate the distributions of cycle times and the work rhythm in a process than models with undifferentiated resources.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
On graphs coverable by k shortest paths
Authors:
Maël Dumas,
Florent Foucaud,
Anthony Perez,
Ioan Todinca
Abstract:
We show that if the edges or vertices of an undirected graph $G$ can be covered by $k$ shortest paths, then the pathwidth of $G$ is upper-bounded by a single-exponential function of $k$. As a corollary, we prove that the problem Isometric Path Cover with Terminals (which, given a graph $G$ and a set of $k$ pairs of vertices called terminals, asks whether $G$ can be covered by $k$ shortest paths, e…
▽ More
We show that if the edges or vertices of an undirected graph $G$ can be covered by $k$ shortest paths, then the pathwidth of $G$ is upper-bounded by a single-exponential function of $k$. As a corollary, we prove that the problem Isometric Path Cover with Terminals (which, given a graph $G$ and a set of $k$ pairs of vertices called terminals, asks whether $G$ can be covered by $k$ shortest paths, each joining a pair of terminals) is FPT with respect to the number of terminals. The same holds for the similar problem Strong Geodetic Set with Terminals (which, given a graph $G$ and a set of $k$ terminals, asks whether there exist $\binom{k}{2}$ shortest paths covering $G$, each joining a distinct pair of terminals). Moreover, this implies that the related problems Isometric Path Cover and Strong Geodetic Set (defined similarly but where the set of terminals is not part of the input) are in XP with respect to parameter $k$.
△ Less
Submitted 14 April, 2023; v1 submitted 30 June, 2022;
originally announced June 2022.
-
Enhancing Business Process Simulation Models with Extraneous Activity Delays
Authors:
David Chapela-Campa,
Marlon Dumas
Abstract:
Business Process Simulation (BPS) is a common approach to estimate the impact of changes to a business process on its performance measures. For example, it allows us to estimate what would be the cycle time of a process if we automated one of its activities, or if some resources become unavailable. The starting point of BPS is a business process model annotated with simulation parameters (a BPS mo…
▽ More
Business Process Simulation (BPS) is a common approach to estimate the impact of changes to a business process on its performance measures. For example, it allows us to estimate what would be the cycle time of a process if we automated one of its activities, or if some resources become unavailable. The starting point of BPS is a business process model annotated with simulation parameters (a BPS model). In traditional approaches, BPS models are manually designed by modeling specialists. This approach is time-consuming and error-prone. To address this shortcoming, several studies have proposed methods to automatically discover BPS models from event logs via process mining techniques. However, current techniques in this space discover BPS models that only capture waiting times caused by resource contention or resource unavailability. Oftentimes, a considerable portion of the waiting time in a business process corresponds to extraneous delays, e.g., a resource waits for the customer to return a phone call. This article proposes a method that discovers extraneous delays from event logs of business process executions. The proposed approach computes, for each pair of causally consecutive activity instances in the event log, the time when the target activity instance should theoretically have started, given the availability of the relevant resource. Based on the difference between the theoretical and the actual start times, the approach estimates the distribution of extraneous delays, and it enhances the BPS model with timer events to capture these delays. An empirical evaluation involving synthetic and real-life logs shows that the approach produces BPS models that better reflect the temporal dynamics of the process, relative to BPS models that do not capture extraneous delays.
△ Less
Submitted 2 February, 2024; v1 submitted 28 June, 2022;
originally announced June 2022.
-
Libra: High-Utility Anonymization of Event Logs for Process Mining via Subsampling
Authors:
Gamal Elkoumy,
Marlon Dumas
Abstract:
Process mining techniques enable analysts to identify and assess process improvement opportunities based on event logs. A common roadblock to process mining is that event logs may contain private information that cannot be used for analysis without consent. An approach to overcome this roadblock is to anonymize the event log so that no individual represented in the original log can be singled out…
▽ More
Process mining techniques enable analysts to identify and assess process improvement opportunities based on event logs. A common roadblock to process mining is that event logs may contain private information that cannot be used for analysis without consent. An approach to overcome this roadblock is to anonymize the event log so that no individual represented in the original log can be singled out based on the anonymized one. Differential privacy is an anonymization approach that provides this guarantee. A differentially private event log anonymization technique seeks to produce an anonymized log that is as similar as possible to the original one (high utility) while providing a required privacy guarantee. Existing event log anonymization techniques operate by injecting noise into the traces in the log (e.g., duplicating, perturbing, or filtering out some traces). Recent work on differential privacy has shown that a better privacy-utility tradeoff can be achieved by applying subsampling prior to noise injection. In other words, subsampling amplifies privacy. This paper proposes an event log anonymization approach called Libra that exploits this observation. Libra extracts multiple samples of traces from a log, independently injects noise, retains statistically relevant traces from each sample, and composes the samples to produce a differentially private log. An empirical evaluation shows that the proposed approach leads to a considerably higher utility for equivalent privacy guarantees relative to existing baselines.
△ Less
Submitted 27 June, 2022;
originally announced June 2022.
-
When to intervene? Prescriptive Process Monitoring Under Uncertainty and Resource Constraints
Authors:
Mahmoud Shoush,
Marlon Dumas
Abstract:
Prescriptive process monitoring approaches leverage historical data to prescribe runtime interventions that will likely prevent negative case outcomes or improve a process's performance. A centerpiece of a prescriptive process monitoring method is its intervention policy: a decision function determining if and when to trigger an intervention on an ongoing case. Previous proposals in this field rel…
▽ More
Prescriptive process monitoring approaches leverage historical data to prescribe runtime interventions that will likely prevent negative case outcomes or improve a process's performance. A centerpiece of a prescriptive process monitoring method is its intervention policy: a decision function determining if and when to trigger an intervention on an ongoing case. Previous proposals in this field rely on intervention policies that consider only the current state of a given case. These approaches do not consider the tradeoff between triggering an intervention in the current state, given the level of uncertainty of the underlying predictive models, versus delaying the intervention to a later state. Moreover, they assume that a resource is always available to perform an intervention (infinite capacity). This paper addresses these gaps by introducing a prescriptive process monitoring method that filters and ranks ongoing cases based on prediction scores, prediction uncertainty, and causal effect of the intervention, and triggers interventions to maximize a gain function, considering the available resources. The proposal is evaluated using a real-life event log. The results show that the proposed method outperforms existing baselines regarding total gain.
△ Less
Submitted 15 June, 2022;
originally announced June 2022.
-
Polynomial kernels for edge modification problems towards block and strictly chordal graphs
Authors:
Maël Dumas,
Anthony Perez,
Mathis Rocton,
Ioan Todinca
Abstract:
We consider edge modification problems towards block and strictly chordal graphs, where one is given an undirected graph $G = (V,E)$ and an integer $k \in \mathbb{N}$ and seeks to edit (add or delete) at most $k$ edges from $G$ to obtain a block graph or a strictly chordal graph. The completion and deletion variants of these problems are defined similarly by only allowing edge additions for the fo…
▽ More
We consider edge modification problems towards block and strictly chordal graphs, where one is given an undirected graph $G = (V,E)$ and an integer $k \in \mathbb{N}$ and seeks to edit (add or delete) at most $k$ edges from $G$ to obtain a block graph or a strictly chordal graph. The completion and deletion variants of these problems are defined similarly by only allowing edge additions for the former and only edge deletions for the latter. Block graphs are a well-studied class of graphs and admit several characterizations, e.g. they are diamond-free chordal graphs. Strictly chordal graphs, also referred to as block duplicate graphs, are a natural generalization of block graphs where one can add true twins of cut-vertices. Strictly chordal graphs are exactly dart and gem-free chordal graphs. We prove the NP-completeness for most variants of these problems and provide $O(k^2)$ vertex-kernels for Block Graph Edition and Block Graph Deletion, $O(k^3)$ vertex-kernels for Strictly Chordal Completion and Strictly Chordal Deletion and a $O(k^4)$ vertex-kernel for Strictly Chordal Edition.
△ Less
Submitted 1 February, 2024; v1 submitted 31 January, 2022;
originally announced January 2022.
-
AI-Augmented Business Process Management Systems: A Research Manifesto
Authors:
Marlon Dumas,
Fabiana Fournier,
Lior Limonad,
Andrea Marrella,
Marco Montali,
Jana-Rebecca Rehse,
Rafael Accorsi,
Diego Calvanese,
Giuseppe De Giacomo,
Dirk Fahland,
Avigdor Gal,
Marcello La Rosa,
Hagen Völzer,
Ingo Weber
Abstract:
AI-Augmented Business Process Management Systems (ABPMSs) are an emerging class of process-aware information systems, empowered by trustworthy AI technology. An ABPMS enhances the execution of business processes with the aim of making these processes more adaptable, proactive, explainable, and context-sensitive. This manifesto presents a vision for ABPMSs and discusses research challenges that nee…
▽ More
AI-Augmented Business Process Management Systems (ABPMSs) are an emerging class of process-aware information systems, empowered by trustworthy AI technology. An ABPMS enhances the execution of business processes with the aim of making these processes more adaptable, proactive, explainable, and context-sensitive. This manifesto presents a vision for ABPMSs and discusses research challenges that need to be surmounted to realize this vision. To this end, we define the concept of ABPMS, we outline the lifecycle of processes within an ABPMS, we discuss core characteristics of an ABPMS, and we derive a set of challenges to realize systems with these characteristics.
△ Less
Submitted 4 November, 2022; v1 submitted 30 January, 2022;
originally announced January 2022.
-
Differentially Private Release of Event Logs for Process Mining
Authors:
Gamal Elkoumy,
Alisa Pankova,
Marlon Dumas
Abstract:
The applicability of process mining techniques hinges on the availability of event logs capturing the execution of a business process. In some use cases, particularly those involving customer-facing processes, these event logs may contain private information. Data protection regulations restrict the use of such event logs for analysis purposes. One way of circumventing these restrictions is to ano…
▽ More
The applicability of process mining techniques hinges on the availability of event logs capturing the execution of a business process. In some use cases, particularly those involving customer-facing processes, these event logs may contain private information. Data protection regulations restrict the use of such event logs for analysis purposes. One way of circumventing these restrictions is to anonymize the event log to the extent that no individual can be singled out using the anonymized log. This article addresses the problem of anonymizing an event log in order to guarantee that, upon release of the anonymized log, the probability that an attacker may single out any individual represented in the original log does not increase by more than a threshold. The article proposes a differentially private release mechanism, which samples the cases in the log and adds noise to the timestamps to the extent required to achieve the above privacy guarantee. The article reports on an empirical comparison of the proposed approach against the state-of-the-art approaches using 14 real-life event logs in terms of data utility loss and computational efficiency.
△ Less
Submitted 15 December, 2022; v1 submitted 9 January, 2022;
originally announced January 2022.
-
Efficient Checking of Temporal Compliance Rules Over Business Process Event Logs
Authors:
Adriano Augusto,
Ahmed Awad,
Marlon Dumas
Abstract:
Verifying temporal compliance rules, such as a rule stating that an inquiry must be answered within a time limit, is a recurrent operation in the realm of business process compliance. In this setting, a typical use case is one where a manager seeks to retrieve all cases where a temporal rule is violated, given an event log recording the execution of a process over a time period. Existing approache…
▽ More
Verifying temporal compliance rules, such as a rule stating that an inquiry must be answered within a time limit, is a recurrent operation in the realm of business process compliance. In this setting, a typical use case is one where a manager seeks to retrieve all cases where a temporal rule is violated, given an event log recording the execution of a process over a time period. Existing approaches for checking temporal rules require a full scan of the log. Such approaches are unsuitable for interactive use when the log is large and the set of compliance rules is evolving. This paper proposes an approach to evaluate temporal compliance rules in sublinear time by pre-computing a data structure that summarizes the temporal relations between activities in a log. The approach caters for a wide range of temporal compliance patterns and supports incremental updates. Our evaluation on twenty real-life logs shows that our data structure allows for real-time checking of a large set of compliance rules.
△ Less
Submitted 9 December, 2021; v1 submitted 8 December, 2021;
originally announced December 2021.
-
Prescriptive Process Monitoring: Quo Vadis?
Authors:
Kateryna Kubrak,
Fredrik Milani,
Alexander Nolte,
Marlon Dumas
Abstract:
Prescriptive process monitoring methods seek to optimize a business process by recommending interventions at runtime to prevent negative outcomes or poorly performing cases. In recent years, various prescriptive process monitoring methods have been proposed. This paper studies existing methods in this field via a Systematic Literature Review (SLR). In order to structure the field, the paper propos…
▽ More
Prescriptive process monitoring methods seek to optimize a business process by recommending interventions at runtime to prevent negative outcomes or poorly performing cases. In recent years, various prescriptive process monitoring methods have been proposed. This paper studies existing methods in this field via a Systematic Literature Review (SLR). In order to structure the field, the paper proposes a framework for characterizing prescriptive process monitoring methods according to their performance objective, performance metrics, intervention types, modeling techniques, data inputs, and intervention policies. The SLR provides insights into challenges and areas for future research that could enhance the usefulness and applicability of prescriptive process monitoring methods. The paper highlights the need to validate existing and new methods in real-world settings, to extend the types of interventions beyond those related to the temporal and cost perspectives, and to design policies that take into account causality and second-order effects.
△ Less
Submitted 3 December, 2021;
originally announced December 2021.
-
Prescriptive Process Monitoring Under Resource Constraints: A Causal Inference Approach
Authors:
Mahmoud Shoush,
Marlon Dumas
Abstract:
Prescriptive process monitoring is a family of techniques to optimize the performance of a business process by triggering interventions at runtime. Existing prescriptive process monitoring techniques assume that the number of interventions that may be triggered is unbounded. In practice, though, specific interventions consume resources with finite capacity. For example, in a loan origination proce…
▽ More
Prescriptive process monitoring is a family of techniques to optimize the performance of a business process by triggering interventions at runtime. Existing prescriptive process monitoring techniques assume that the number of interventions that may be triggered is unbounded. In practice, though, specific interventions consume resources with finite capacity. For example, in a loan origination process, an intervention may consist of preparing an alternative loan offer to increase the applicant's chances of taking a loan. This intervention requires a certain amount of time from a credit officer, and thus, it is not possible to trigger this intervention in all cases. This paper proposes a prescriptive process monitoring technique that triggers interventions to optimize a cost function under fixed resource constraints. The proposed technique relies on predictive modeling to identify cases that are likely to lead to a negative outcome, in combination with causal inference to estimate the effect of an intervention on the outcome of the case. These outputs are then used to allocate resources to interventions to maximize a cost function. A preliminary empirical evaluation suggests that the proposed approach produces a higher net gain than a purely predictive (non-causal) baseline.
△ Less
Submitted 11 October, 2021; v1 submitted 7 September, 2021;
originally announced September 2021.
-
Discovering executable routine specifications from user interaction logs
Authors:
Volodymyr Leno,
Adriano Augusto,
Marlon Dumas,
Marcello La Rosa,
Fabrizio Maria Maggi,
Artem Polyvyanyy
Abstract:
Robotic Process Automation (RPA) is a technology to automate routine work such as copying data across applications or filling in document templates using data from multiple applications. RPA tools allow organizations to automate a wide range of routines. However, identifying and scoping routines that can be automated using RPA tools is time consuming. Manual identification of candidate routines vi…
▽ More
Robotic Process Automation (RPA) is a technology to automate routine work such as copying data across applications or filling in document templates using data from multiple applications. RPA tools allow organizations to automate a wide range of routines. However, identifying and scoping routines that can be automated using RPA tools is time consuming. Manual identification of candidate routines via interviews, walk-throughs, or job shadowing allow analysts to identify the most visible routines, but these methods are not suitable when it comes to identifying the long tail of routines in an organization. This article proposes an approach to discover automatable routines from logs of user interactions with IT systems and to synthesize executable specifications for such routines. The approach starts by discovering frequent routines at a control-flow level (candidate routines). It then determines which of these candidate routines are automatable and it synthetizes an executable specification for each such routine. Finally, it identifies semantically equivalent routines so as to produce a set of non-redundant automatable routines. The article reports on an evaluation of the approach using a combination of synthetic and real-life logs. The evaluation results show that the approach can discover automatable routines that are known to be present in a UI log, and that it identifies automatable routines that users recognize as such in real-life logs.
△ Less
Submitted 25 June, 2021;
originally announced June 2021.
-
A cubic vertex-kernel for Trivially Perfect Editing
Authors:
Maël Dumas,
Anthony Perez,
Ioan Todinca
Abstract:
We consider the Trivially Perfect Editing problem, where one is given an undirected graph $G = (V,E)$ and a parameter $k \in \mathbb{N}$ and seeks to edit (add or delete) at most $k$ edges from $G$ to obtain a trivially perfect graph. The related Trivially Perfect Completion and Trivially Perfect Deletion problems are obtained by only allowing edge additions or edge deletions, respectively. Trivia…
▽ More
We consider the Trivially Perfect Editing problem, where one is given an undirected graph $G = (V,E)$ and a parameter $k \in \mathbb{N}$ and seeks to edit (add or delete) at most $k$ edges from $G$ to obtain a trivially perfect graph. The related Trivially Perfect Completion and Trivially Perfect Deletion problems are obtained by only allowing edge additions or edge deletions, respectively. Trivially perfect graphs are both chordal and cographs, and have applications related to the tree-depth width parameter and to social network analysis. All variants of the problem are known to be NP-Complete and to admit so-called polynomial kernels. More precisely, the existence of an $O(k^3)$ vertex-kernel for Trivially Perfect Completion was announced by Guo (ISAAC 2007) but without a stand-alone proof. More recently, Drange and Pilipczuk (Algorithmica 2018) provided $O(k^7)$ vertex-kernels for these problems and left open the existence of cubic vertex-kernels. In this work, we answer positively to this question for all three variants of the problem.
△ Less
Submitted 18 May, 2021;
originally announced May 2021.
-
Prescriptive Process Monitoring for Cost-Aware Cycle Time Reduction
Authors:
Zahra Dasht Bozorgi,
Irene Teinemaa,
Marlon Dumas,
Marcello La Rosa,
Artem Polyvyanyy
Abstract:
Reducing cycle time is a recurrent concern in the field of business process management. Depending on the process, various interventions may be triggered to reduce the cycle time of a case, for example, using a faster shipping service in an order-to-delivery process or giving a phone call to a customer to obtain missing information rather than waiting passively. Each of these interventions comes wi…
▽ More
Reducing cycle time is a recurrent concern in the field of business process management. Depending on the process, various interventions may be triggered to reduce the cycle time of a case, for example, using a faster shipping service in an order-to-delivery process or giving a phone call to a customer to obtain missing information rather than waiting passively. Each of these interventions comes with a cost. This paper tackles the problem of determining if and when to trigger a time-reducing intervention in a way that maximizes the total net gain. The paper proposes a prescriptive process monitoring method that uses orthogonal random forest models to estimate the causal effect of triggering a time-reducing intervention for each ongoing case of a process. Based on this causal effect estimate, the method triggers interventions according to a user-defined policy. The method is evaluated on two real-life logs.
△ Less
Submitted 14 September, 2021; v1 submitted 14 May, 2021;
originally announced May 2021.
-
Automated Discovery of Process Models with True Concurrency and Inclusive Choices
Authors:
Adriano Augusto,
Marlon Dumas,
Marcello La Rosa
Abstract:
Enterprise information systems allow companies to maintain detailed records of their business process executions. These records can be extracted in the form of event logs, which capture the execution of activities across multiple instances of a business process. Event logs may be used to analyze business processes at a fine level of detail using process mining techniques. Among other things, proce…
▽ More
Enterprise information systems allow companies to maintain detailed records of their business process executions. These records can be extracted in the form of event logs, which capture the execution of activities across multiple instances of a business process. Event logs may be used to analyze business processes at a fine level of detail using process mining techniques. Among other things, process mining techniques allow us to discover a process model from an event log -- an operation known as automated process discovery. Despite a rich body of research in the field, existing automated process discovery techniques do not fully capture the concurrency inherent in a business process. Specifically, the bulk of these techniques treat two activities A and B as concurrent if sometimes A completes before B and other times B completes before A. Typically though, activities in a business process are executed in a true concurrency setting, meaning that two or more activity executions overlap temporally. This paper addresses this gap by presenting a refined version of an automated process discovery technique, namely Split Miner, that discovers true concurrency relations from event logs containing start and end timestamps for each activity. The proposed technique is also able to differentiate between exclusive and inclusive choices. We evaluate the proposed technique relative to existing baselines using 11 real-life logs drawn from different industries.
△ Less
Submitted 12 May, 2021;
originally announced May 2021.
-
Learning Accurate Business Process Simulation Models from Event Logs via Automated Process Discovery and Deep Learning
Authors:
Manuel Camargo,
Marlon Dumas,
Oscar González-Rojas
Abstract:
Business process simulation is a well-known approach to estimate the impact of changes to a process with respect to time and cost measures -- a practice known as what-if process analysis. The usefulness of such estimations hinges on the accuracy of the underlying simulation model. Data-Driven Simulation (DDS) methods leverage process mining techniques to learn process simulation models from event…
▽ More
Business process simulation is a well-known approach to estimate the impact of changes to a process with respect to time and cost measures -- a practice known as what-if process analysis. The usefulness of such estimations hinges on the accuracy of the underlying simulation model. Data-Driven Simulation (DDS) methods leverage process mining techniques to learn process simulation models from event logs. Empirical studies have shown that, while DDS models adequately capture the observed sequences of activities and their frequencies, they fail to accurately capture the temporal dynamics of real-life processes. In contrast, generative Deep Learning (DL) models are better able to capture such temporal dynamics. The drawback of DL models is that users cannot alter them for what-if analysis due to their black-box nature. This paper presents a hybrid approach to learn process simulation models from event logs wherein a (stochastic) process model is extracted via DDS techniques, and then combined with a DL model to generate timestamped event sequences. An experimental evaluation shows that the resulting hybrid simulation models match the temporal accuracy of pure DL models, while partially retaining the what-if analysis capability of DDS approaches.
△ Less
Submitted 19 March, 2022; v1 submitted 22 March, 2021;
originally announced March 2021.
-
Mine Me but Don't Single Me Out: Differentially Private Event Logs for Process Mining
Authors:
Gamal Elkoumy,
Alisa Pankova,
Marlon Dumas
Abstract:
The applicability of process mining techniques hinges on the availability of event logs capturing the execution of a business process. In some use cases, particularly those involving customer-facing processes, these event logs may contain private information. Data protection regulations restrict the use of such event logs for analysis purposes. One way of circumventing these restrictions is to ano…
▽ More
The applicability of process mining techniques hinges on the availability of event logs capturing the execution of a business process. In some use cases, particularly those involving customer-facing processes, these event logs may contain private information. Data protection regulations restrict the use of such event logs for analysis purposes. One way of circumventing these restrictions is to anonymize the event log to the extent that no individual can be singled out using the anonymized log. This paper addresses the problem of anonymizing an event log in order to guarantee that, upon disclosure of the anonymized log, the probability that an attacker may single out any individual represented in the original log, does not increase by more than a threshold. The paper proposes a differentially private disclosure mechanism, which oversamples the cases in the log and adds noise to the timestamps to the extent required to achieve the above privacy guarantee. The paper reports on an empirical evaluation of the proposed approach using 14 real-life event logs in terms of data utility loss and computational efficiency.
△ Less
Submitted 30 August, 2021; v1 submitted 22 March, 2021;
originally announced March 2021.
-
Privacy-Preserving Directly-Follows Graphs: Balancing Risk and Utility in Process Mining
Authors:
Gamal Elkoumy,
Alisa Pankova,
Marlon Dumas
Abstract:
Process mining techniques enable organizations to analyze business process execution traces in order to identify opportunities for improving their operational performance. Oftentimes, such execution traces contain private information. For example, the execution traces of a healthcare process are likely to be privacy-sensitive. In such cases, organizations need to deploy Privacy-Enhancing Technolog…
▽ More
Process mining techniques enable organizations to analyze business process execution traces in order to identify opportunities for improving their operational performance. Oftentimes, such execution traces contain private information. For example, the execution traces of a healthcare process are likely to be privacy-sensitive. In such cases, organizations need to deploy Privacy-Enhancing Technologies (PETs) to strike a balance between the benefits they get from analyzing these data and the requirements imposed onto them by privacy regulations, particularly that of minimizing re-identification risks when data are disclosed to a process analyst. Among many available PETs, differential privacy stands out for its ability to prevent predicate singling out attacks and its composable privacy guarantees. A drawback of differential privacy is the lack of interpretability of the main privacy parameter it relies upon, namely epsilon. This leads to the recurrent question of how much epsilon is enough? This article proposes a method to determine the epsilon value to be used when disclosing the output of a process mining technique in terms of two business-relevant metrics, namely absolute percentage error metrics capturing the loss of accuracy (a.k.a. utility loss) resulting from adding noise to the disclosed data, and guessing advantage, which captures the increase in the probability that an adversary may guess information about an individual as a result of a disclosure. The article specifically studies the problem of protecting the disclosure of the so-called Directly-Follows Graph (DFGs), which is a process mining artifact produced by most process mining tools. The article reports on an empirical evaluation of the utility-risk trade-offs that the proposed approach achieves on a collection of 13 real-life event logs.
△ Less
Submitted 3 December, 2020; v1 submitted 2 December, 2020;
originally announced December 2020.
-
Discovering Generative Models from Event Logs: Data-driven Simulation vs Deep Learning
Authors:
Manuel Camargo,
Marlon Dumas,
Oscar Gonzalez-Rojas
Abstract:
A generative model is a statistical model that is able to generate new data instances from previously observed ones. In the context of business processes, a generative model creates new execution traces from a set of historical traces, also known as an event log. Two families of generative process simulation models have been developed in previous work: data-driven simulation models and deep learni…
▽ More
A generative model is a statistical model that is able to generate new data instances from previously observed ones. In the context of business processes, a generative model creates new execution traces from a set of historical traces, also known as an event log. Two families of generative process simulation models have been developed in previous work: data-driven simulation models and deep learning models. Until now, these two approaches have evolved independently and their relative performance has not been studied. This paper fills this gap by empirically comparing a data-driven simulation technique with multiple deep learning techniques, which construct models are capable of generating execution traces with timestamped events. The study sheds light into the relative strengths of both approaches and raises the prospect of developing hybrid approaches that combine these strengths.
△ Less
Submitted 8 September, 2020;
originally announced September 2020.
-
Process Mining Meets Causal Machine Learning: Discovering Causal Rules from Event Logs
Authors:
Zahra Dasht Bozorgi,
Irene Teinemaa,
Marlon Dumas,
Marcello La Rosa,
Artem Polyvyanyy
Abstract:
This paper proposes an approach to analyze an event log of a business process in order to generate case-level recommendations of treatments that maximize the probability of a given outcome. Users classify the attributes in the event log into controllable and non-controllable, where the former correspond to attributes that can be altered during an execution of the process (the possible treatments).…
▽ More
This paper proposes an approach to analyze an event log of a business process in order to generate case-level recommendations of treatments that maximize the probability of a given outcome. Users classify the attributes in the event log into controllable and non-controllable, where the former correspond to attributes that can be altered during an execution of the process (the possible treatments). We use an action rule mining technique to identify treatments that co-occur with the outcome under some conditions. Since action rules are generated based on correlation rather than causation, we then use a causal machine learning technique, specifically uplift trees, to discover subgroups of cases for which a treatment has a high causal effect on the outcome after adjusting for confounding variables. We test the relevance of this approach using an event log of a loan application process and compare our findings with recommendations manually produced by process mining experts.
△ Less
Submitted 3 September, 2020;
originally announced September 2020.
-
Identifying candidate routines for Robotic Process Automation from unsegmented UI logs
Authors:
V. Leno,
A. Augusto,
M. Dumas,
M. La Rosa,
F. Maggi,
A. Polyvyanyy
Abstract:
Robotic Process Automation (RPA) is a technology to develop software bots that automate repetitive sequences of interactions between users and software applications (a.k.a. routines). To take full advantage of this technology, organizations need to identify and to scope their routines. This is a challenging endeavor in large organizations, as routines are usually not concentrated in a handful of p…
▽ More
Robotic Process Automation (RPA) is a technology to develop software bots that automate repetitive sequences of interactions between users and software applications (a.k.a. routines). To take full advantage of this technology, organizations need to identify and to scope their routines. This is a challenging endeavor in large organizations, as routines are usually not concentrated in a handful of processes, but rather scattered across the process landscape. Accordingly, the identification of routines from User Interaction (UI) logs has received significant attention. Existing approaches to this problem assume that the UI log is segmented, meaning that it consists of traces of a task that is presupposed to contain one or more routines. However, a UI log usually takes the form of a single unsegmented sequence of events. This paper presents an approach to discover candidate routines from unsegmented UI logs in the presence of noise, i.e. events within or between routine instances that do not belong to any routine. The approach is implemented as an open-source tool and evaluated using synthetic and real-life UI logs.
△ Less
Submitted 26 August, 2020; v1 submitted 13 August, 2020;
originally announced August 2020.
-
Detecting sudden and gradual drifts in business processes from execution traces
Authors:
Abderrahmane Maaradji,
Marlon Dumas,
Marcello La Rosa,
Alireza Ostovar
Abstract:
Business processes are prone to unexpected changes, as process workers may suddenly or gradually start executing a process differently in order to adjust to changes in workload, season, or other external factors. Early detection of business process changes enables managers to identify and act upon changes that may otherwise affect process performance. Business process drift detection refers to a f…
▽ More
Business processes are prone to unexpected changes, as process workers may suddenly or gradually start executing a process differently in order to adjust to changes in workload, season, or other external factors. Early detection of business process changes enables managers to identify and act upon changes that may otherwise affect process performance. Business process drift detection refers to a family of methods to detect changes in a business process by analyzing event logs extracted from the systems that support the execution of the process. Existing methods for business process drift detection are based on an explorative analysis of a potentially large feature space and in some cases they require users to manually identify specific features that characterize the drift. Depending on the explored feature space, these methods miss various types of changes. Moreover, they are either designed to detect sudden drifts or gradual drifts but not both. This paper proposes an automated and statistically grounded method for detecting sudden and gradual business process drifts under a unified framework. An empirical evaluation shows that the method detects typical change patterns with significantly higher accuracy and lower detection delay than existing methods, while accurately distinguishing between sudden and gradual drifts.
△ Less
Submitted 7 May, 2020;
originally announced May 2020.
-
Discovering Business Process Simulation Models in the Presence of Multitasking
Authors:
Bedilia Estrada-Torres,
Manuel Camargo,
Marlon Dumas,
Maksym Yerokhin
Abstract:
Business process simulation is a versatile technique for analyzing business processes from a quantitative perspective. A well-known limitation of process simulation is that the accuracy of the simulation results is limited by the faithfulness of the process model and simulation parameters given as input to the simulator. To tackle this limitation, several authors have proposed to discover simulati…
▽ More
Business process simulation is a versatile technique for analyzing business processes from a quantitative perspective. A well-known limitation of process simulation is that the accuracy of the simulation results is limited by the faithfulness of the process model and simulation parameters given as input to the simulator. To tackle this limitation, several authors have proposed to discover simulation models from process execution logs so that the resulting simulation models more closely match reality. Existing techniques in this field assume that each resource in the process performs one task at a time. In reality, however, resources may engage in multitasking behavior. Traditional simulation approaches do not handle multitasking. Instead, they rely on a resource allocation approach wherein a task instance is only assigned to a resource when the resource is free. This inability to handle multitasking leads to an overestimation of execution times. This paper proposes an approach to discover multitasking in business process execution logs and to generate a simulation model that takes into account the discovered multitasking behavior. The key idea is to adjust the processing times of tasks in such a way that executing the multitasked tasks sequentially with the adjusted times is equivalent to executing them concurrently with the original processing times. The proposed approach is evaluated using a real-life dataset and synthetic datasets with different levels of multitasking. The results show that, in the presence of multitasking, the approach improves the accuracy of simulation models discovered from execution logs.
△ Less
Submitted 20 April, 2020;
originally announced April 2020.
-
Automated Discovery of Data Transformations for Robotic Process Automation
Authors:
Volodymyr Leno,
Marlon Dumas,
Marcello La Rosa,
Fabrizio Maria Maggi,
Artem Polyvyanyy
Abstract:
Robotic Process Automation (RPA) is a technology for automating repetitive routines consisting of sequences of user interactions with one or more applications. In order to fully exploit the opportunities opened by RPA, companies need to discover which specific routines may be automated, and how. In this setting, this paper addresses the problem of analyzing User Interaction (UI) logs in order to d…
▽ More
Robotic Process Automation (RPA) is a technology for automating repetitive routines consisting of sequences of user interactions with one or more applications. In order to fully exploit the opportunities opened by RPA, companies need to discover which specific routines may be automated, and how. In this setting, this paper addresses the problem of analyzing User Interaction (UI) logs in order to discover routines where a user transfers data from one spreadsheet or (Web) form to another. The paper maps this problem to that of discovering data transformations by example - a problem for which several techniques are available. The paper shows that a naive application of a state-of-the-art technique for data transformation discovery is computationally inefficient. Accordingly, the paper proposes two optimizations that take advantage of the information in the UI log and the fact that data transfers across applications typically involve copying alphabetic and numeric tokens separately. The proposed approach and its optimizations are evaluated using UI logs that replicate a real-life repetitive data transfer routine.
△ Less
Submitted 3 January, 2020;
originally announced January 2020.
-
Secure Multi-Party Computation for Inter-Organizational Process Mining
Authors:
Gamal Elkoumy,
Stephan A. Fahrenkrog-Petersen,
Marlon Dumas,
Peeter Laud,
Alisa Pankova,
Matthias Weildich
Abstract:
Process mining is a family of techniques for analysing business processes based on event logs extracted from information systems. Mainstream process mining tools are designed for intra-organizational settings, insofar as they assume that an event log is available for processing as a whole. The use of such tools for inter-organizational process analysis is hampered by the fact that such processes i…
▽ More
Process mining is a family of techniques for analysing business processes based on event logs extracted from information systems. Mainstream process mining tools are designed for intra-organizational settings, insofar as they assume that an event log is available for processing as a whole. The use of such tools for inter-organizational process analysis is hampered by the fact that such processes involve independent parties who are unwilling to, or sometimes legally prevented from, sharing detailed event logs with each other. In this setting, this paper proposes an approach for constructing and querying a common type of artifact used for process mining, namely the frequency and time-annotated Directly-Follows Graph (DFG), over multiple event logs belonging to different parties, in such a way that the parties do not share the event logs with each other. The proposal leverages an existing platform for secure multi-party computation, namely Sharemind. Since a direct implementation of DFG construction in Sharemind suffers from scalability issues, the paper proposes to rely on vectorization of event logs and to employ a divide-and-conquer scheme for parallel processing of sub-logs. The paper reports on an experimental evaluation that tests the scalability of the approach on real-life logs.
△ Less
Submitted 13 April, 2020; v1 submitted 4 December, 2019;
originally announced December 2019.
-
Business Process Variant Analysis: Survey and Classification
Authors:
Farbod Taymouri,
Marcello La Rosa,
Marlon Dumas,
Fabrizio Maria Maggi
Abstract:
Process variant analysis aims at identifying and addressing the differences existing in a set of process executions enacted by the same process model. A process model can be executed differently in different situations for various reasons, e.g., the process could run in different locations or seasons, which gives rise to different behaviors. Having intuitions about the discrepancies in process beh…
▽ More
Process variant analysis aims at identifying and addressing the differences existing in a set of process executions enacted by the same process model. A process model can be executed differently in different situations for various reasons, e.g., the process could run in different locations or seasons, which gives rise to different behaviors. Having intuitions about the discrepancies in process behaviors, though challenging, is beneficial for managers and process analysts since they can improve their process models efficiently, e.g., via interactive learning or adapting mechanisms. Several methods have been proposed to tackle the problem of uncovering discrepancies in process executions. However, because of the interdisciplinary nature of the challenge, the methods and sorts of analysis in the literature are very heterogeneous. This article not only presents a systematic literature review and taxonomy of methods for variant analysis of business processes but also provides a methodology including the required steps to apply this type of analysis for the identification of variants in business process executions.
△ Less
Submitted 22 December, 2019; v1 submitted 18 November, 2019;
originally announced November 2019.
-
Scalable Alignment of Process Models and Event Logs: An Approach Based on Automata and S-Components
Authors:
Daniel Reißner,
Abel Armas-Cervantes,
Raffaele Conforti,
Marlon Dumas,
Dirk Fahland,
Marcello La Rosa
Abstract:
Given a model of the expected behavior of a business process and an event log recording its observed behavior, the problem of business process conformance checking is that of identifying and describing the differences between the model and the log. A desirable feature of a conformance checking technique is to identify a minimal yet complete set of differences. Existing conformance checking techniq…
▽ More
Given a model of the expected behavior of a business process and an event log recording its observed behavior, the problem of business process conformance checking is that of identifying and describing the differences between the model and the log. A desirable feature of a conformance checking technique is to identify a minimal yet complete set of differences. Existing conformance checking techniques that fulfil this property exhibit limited scalability when confronted to large and complex models and logs. This paper presents two complementary techniques to address these shortcomings. The first technique transforms the model and log into two automata. These automata are compared using an error-correcting synchronized product, computed via an A* that guarantees the resulting automaton captures all differences with a minimal amount of error corrections. The synchronized product is used to extract minimal-length alignments between each trace of the log and the closest corresponding trace of the model. A limitation of the first technique is that as the level of concurrency in the model increases, the size of the automaton of the model grows exponentially, thus hampering scalability. To address this limitation, the paper proposes a second technique wherein the process model is first decomposed into a set of automata, known as S-components, such that the product of these automata is equal to the automaton of the whole process model. An error-correcting product is computed for each S-component separately and the resulting automata are recomposed into a single product automaton capturing all differences without minimality guarantees. An empirical evaluation shows that the proposed techniques outperform state-of-the-art baselines in terms of computational efficiency. Moreover, the decomposition-based technique is optimal for the vast majority of datasets and quasi-optimal for the remaining ones.
△ Less
Submitted 4 March, 2020; v1 submitted 22 October, 2019;
originally announced October 2019.
-
The Scalability, Efficiency and Complexity of Universities and Colleges: A New Lens for Assessing the Higher Educational System
Authors:
Ryan C. Taylor,
Xiaofan Liang,
Manfred D. Laubichler,
Geoffrey B. West,
Christopher P. Kempes,
Marion Dumas
Abstract:
The growing need for affordable and accessible higher education is a major global challenge for the 21st century. Consequently, there is a need to develop a deeper understanding of the functionality and taxonomy of universities and colleges and, in particular, how their various characteristics change with size. Scaling has been a powerful tool for revealing systematic regularities in systems acros…
▽ More
The growing need for affordable and accessible higher education is a major global challenge for the 21st century. Consequently, there is a need to develop a deeper understanding of the functionality and taxonomy of universities and colleges and, in particular, how their various characteristics change with size. Scaling has been a powerful tool for revealing systematic regularities in systems across a range of topics from physics and biology to cities, and for understanding the underlying principles of their organization and growth. Here, we apply this framework to institutions of higher learning in the United States and show that, like organisms, ecosystems and cities, they scale in a surprisingly systematic fashion following simple power law behavior. We analyze the entire spectrum encompassing 5,802 institutions ranging from large research universities to small professional schools, organized in seven commonly used sectors, which reveal distinct regimes of institutional scaling behavior. Metrics include variation in expenditures, revenues, graduation rates and estimated economic added value, expressed as functions of total enrollment, our fundamental measure of size. Our results quantify how each regime of institution leverages specific economies of scale to address distinct priorities. Taken together, the scaling of features within a sector and shifts in scaling across sectors implies that there are generic mechanisms and constraints shared by all sectors which lead to tradeoffs between their different societal functions and roles. We particularly highlight the strong complementarity between public and private research universities, and community and state colleges, four sectors that display superlinear returns to scale.
△ Less
Submitted 11 October, 2019;
originally announced October 2019.
-
Automated Discovery of Business Process Simulation Models from Event Logs
Authors:
Manuel Camargo,
Marlon Dumas,
Oscar González-Rojas
Abstract:
Business process simulation is a versatile technique to estimate the performance of a process under multiple scenarios. This, in turn, allows analysts to compare alternative options to improve a business process. A common roadblock for business process simulation is that constructing accurate simulation models is cumbersome and error-prone. Modern information systems store detailed execution logs…
▽ More
Business process simulation is a versatile technique to estimate the performance of a process under multiple scenarios. This, in turn, allows analysts to compare alternative options to improve a business process. A common roadblock for business process simulation is that constructing accurate simulation models is cumbersome and error-prone. Modern information systems store detailed execution logs of the business processes they support. Previous work has shown that these logs can be used to discover simulation models. However, existing methods for log-based discovery of simulation models do not seek to optimize the accuracy of the resulting models. Instead they leave it to the user to manually tune the simulation model to achieve the desired level of accuracy. This article presents an accuracy-optimized method to discover business process simulation models from execution logs. The method decomposes the problem into a series of steps with associated configuration parameters. A hyper-parameter optimization method is used to search through the space of possible configurations so as to maximize the similarity between the behavior of the simulation model and the behavior observed in the log. The method has been implemented as a tool and evaluated using logs from different domains.
△ Less
Submitted 27 February, 2020; v1 submitted 11 October, 2019;
originally announced October 2019.
-
Interpreted Execution of Business Process Models on Blockchain
Authors:
Orlenys López-Pintado,
Marlon Dumas,
Luciano García-Bañuelos,
Ingo Weber
Abstract:
Blockchain technology provides a tamper-proof mechanism to execute inter-organizational business processes involving mutually untrusted parties. Existing approaches to blockchain-based process execution are based on code generation. In these approaches, a process model is compiled into one or more smart contracts, which are then deployed on a blockchain platform. Given the immutability of the depl…
▽ More
Blockchain technology provides a tamper-proof mechanism to execute inter-organizational business processes involving mutually untrusted parties. Existing approaches to blockchain-based process execution are based on code generation. In these approaches, a process model is compiled into one or more smart contracts, which are then deployed on a blockchain platform. Given the immutability of the deployed smart contracts, these compiled approaches ensure that all process instances conform to the process model. However, this advantage comes at the price of inflexibility. Any changes to the process model require the redeployment of the smart contracts (a costly operation). In addition, changes cannot be applied to running process instances. To address this lack of flexibility, this paper presents an interpreter of BPMN process models based on dynamic data structures. The proposed interpreter is embedded in a business process execution system with a modular multi-layered architecture, supporting the creation, execution, monitoring and dynamic update of process instances. For efficiency purposes, the interpreter relies on compact bitmap-based encodings of process models. An experimental evaluation shows that the proposed interpreted approach achieves comparable or lower costs relative to existing compiled approaches.
△ Less
Submitted 4 June, 2019;
originally announced June 2019.
-
Fire Now, Fire Later: Alarm-Based Systems for Prescriptive Process Monitoring
Authors:
Stephan A. Fahrenkrog-Petersen,
Niek Tax,
Irene Teinemaa,
Marlon Dumas,
Massimiliano de Leoni,
Fabrizio Maria Maggi,
Matthias Weidlich
Abstract:
Predictive process monitoring is a family of techniques to analyze events produced during the execution of a business process in order to predict the future state or the final outcome of running process instances. Existing techniques in this field are able to predict, at each step of a process instance, the likelihood that it will lead to an undesired outcome.These techniques, however, focus on ge…
▽ More
Predictive process monitoring is a family of techniques to analyze events produced during the execution of a business process in order to predict the future state or the final outcome of running process instances. Existing techniques in this field are able to predict, at each step of a process instance, the likelihood that it will lead to an undesired outcome.These techniques, however, focus on generating predictions and do not prescribe when and how process workers should intervene to decrease the cost of undesired outcomes. This paper proposes a framework for prescriptive process monitoring, which extends predictive monitoring with the ability to generate alarms that trigger interventions to prevent an undesired outcome or mitigate its effect. The framework incorporates a parameterized cost model to assess the cost-benefit trade-off of generating alarms. We show how to optimize the generation of alarms given an event log of past process executions and a set of cost model parameters. The proposed approaches are empirically evaluated using a range of real-life event logs. The experimental results show that the net cost of undesired outcomes can be minimized by changing the threshold for generating alarms, as the process instance progresses. Moreover, introducing delays for triggering alarms, instead of triggering them as soon as the probability of an undesired outcome exceeds a threshold, leads to lower net costs.
△ Less
Submitted 14 October, 2020; v1 submitted 23 May, 2019;
originally announced May 2019.
-
Business Process Privacy Analysis in Pleak
Authors:
Aivo Toots,
Reedik Tuuling,
Maksym Yerokhin,
Marlon Dumas,
Luciano García-Bañuelos,
Peeter Laud,
Raimundas Matulevičius,
Alisa Pankova,
Martin Pettai,
Pille Pullonen,
Jake Tom
Abstract:
Pleak is a tool to capture and analyze privacy-enhanced business process models to characterize and quantify to what extent the outputs of a process leak information about its inputs. Pleak incorporates an extensible set of analysis plugins, which enable users to inspect potential leakages at multiple levels of detail.
Pleak is a tool to capture and analyze privacy-enhanced business process models to characterize and quantify to what extent the outputs of a process leak information about its inputs. Pleak incorporates an extensible set of analysis plugins, which enable users to inspect potential leakages at multiple levels of detail.
△ Less
Submitted 13 February, 2019;
originally announced February 2019.
-
Dynamic Role Binding in Blockchain-Based Collaborative Business Processes
Authors:
Orlenys López-Pintado,
Marlon Dumas,
Luciano García-Bañuelos,
Ingo Weber
Abstract:
Blockchain technology enables the execution of collaborative business processes involving mutually untrusted parties. Existing platforms allow such processes to be modeled using high-level notations and compiled into smart contracts that can be deployed on blockchain platforms. However, these platforms brush aside the question of who is allowed to execute which tasks in the process, either by defe…
▽ More
Blockchain technology enables the execution of collaborative business processes involving mutually untrusted parties. Existing platforms allow such processes to be modeled using high-level notations and compiled into smart contracts that can be deployed on blockchain platforms. However, these platforms brush aside the question of who is allowed to execute which tasks in the process, either by deferring the question altogether or by adopting a static approach where all actors are bound to roles upon process instantiation. Yet, a key advantage of blockchains is their ability to support dynamic sets of actors. This paper presents a model for dynamic binding of actors to roles in collaborative processes and an associated binding policy specification language. The proposed language is endowed with a Petri net semantics, thus enabling policy consistency verification. The paper also outlines an approach to compile policy specifications into smart contracts for enforcement. An experimental evaluation shows that the cost of policy enforcement increases linearly with the number of roles and constraints.
△ Less
Submitted 7 December, 2018;
originally announced December 2018.
-
CATERPILLAR: A Business Process Execution Engine on the Ethereum Blockchain
Authors:
Orlenys López-Pintado,
Luciano García-Bañuelos,
Marlon Dumas,
Ingo Weber,
Alex Ponomarev
Abstract:
Blockchain platforms, such as Ethereum, allow a set of actors to maintain a ledger of transactions without relying on a central authority and to deploy scripts, called smart contracts, that are executed whenever certain transactions occur. These features can be used as basic building blocks for executing collaborative business processes between mutually untrusting parties. However, implementing bu…
▽ More
Blockchain platforms, such as Ethereum, allow a set of actors to maintain a ledger of transactions without relying on a central authority and to deploy scripts, called smart contracts, that are executed whenever certain transactions occur. These features can be used as basic building blocks for executing collaborative business processes between mutually untrusting parties. However, implementing business processes using the low-level primitives provided by blockchain platforms is cumbersome and error-prone. In contrast, established business process management systems, such as those based on the standard Business Process Model and Notation (BPMN), provide convenient abstractions for rapid development of process-oriented applications. This article demonstrates how to combine the advantages of a business process management system with those of a blockchain platform. The article introduces a blockchain-based BPMN execution engine, namely Caterpillar. Like any BPMN execution engine, Caterpillar supports the creation of instances of a process model and allows users to monitor the state of process instances and to execute tasks thereof. The specificity of Caterpillar is that the state of each process instance is maintained on the (Ethereum) blockchain and the workflow routing is performed by smart contracts generated by a BPMN-to-Solidity compiler. The Caterpillar compiler supports a large array of BPMN constructs, including subprocesses, multi-instances activities and event handlers. The paper describes the architecture of Caterpillar, and the interfaces it provides to support the monitoring of process instances, the allocation and execution of work items, and the execution of service tasks.
△ Less
Submitted 22 April, 2019; v1 submitted 10 July, 2018;
originally announced August 2018.
-
Semantic DMN: Formalizing and Reasoning About Decisions in the Presence of Background Knowledge
Authors:
Diego Calvanese,
Marlon Dumas,
Fabrizio Maria Maggi,
Marco Montali
Abstract:
The Decision Model and Notation (DMN) is a recent OMG standard for the elicitation and representation of decision models, and for managing their interconnection with business processes. DMN builds on the notion of decision tables, and their combination into more complex decision requirements graphs (DRGs), which bridge between business process models and decision logic models. DRGs may rely on add…
▽ More
The Decision Model and Notation (DMN) is a recent OMG standard for the elicitation and representation of decision models, and for managing their interconnection with business processes. DMN builds on the notion of decision tables, and their combination into more complex decision requirements graphs (DRGs), which bridge between business process models and decision logic models. DRGs may rely on additional, external business knowledge models, whose functioning is not part of the standard. In this work, we consider one of the most important types of business knowledge, namely background knowledge that conceptually accounts for the structural aspects of the domain of interest, and propose decision knowledge bases (DKBs), which semantically combine DRGs modeled in DMN, and domain knowledge captured by means of first-order logic with datatypes. We provide a logic-based semantics for such an integration, and formalize different DMN reasoning tasks for DKBs. We then consider background knowledge formulated as a description logic ontology with datatypes, and show how the main verification tasks for DMN in this enriched setting can be formalized as standard DL reasoning services, and actually carried out in ExpTime. We discuss the effectiveness of our framework on a case study in maritime security.
△ Less
Submitted 14 September, 2018; v1 submitted 30 July, 2018;
originally announced July 2018.
-
Survey and cross-benchmark comparison of remaining time prediction methods in business process monitoring
Authors:
Ilya Verenich,
Marlon Dumas,
Marcello La Rosa,
Fabrizio Maggi,
Irene Teinemaa
Abstract:
Predictive business process monitoring methods exploit historical process execution logs to generate predictions about running instances (called cases) of a business process, such as the prediction of the outcome, next activity or remaining cycle time of a given process case. These insights could be used to support operational managers in taking remedial actions as business processes unfold, e.g.…
▽ More
Predictive business process monitoring methods exploit historical process execution logs to generate predictions about running instances (called cases) of a business process, such as the prediction of the outcome, next activity or remaining cycle time of a given process case. These insights could be used to support operational managers in taking remedial actions as business processes unfold, e.g. shifting resources from one case onto another to ensure this latter is completed on time. A number of methods to tackle the remaining cycle time prediction problem have been proposed in the literature. However, due to differences in their experimental setup, choice of datasets, evaluation measures and baselines, the relative merits of each method remain unclear. This article presents a systematic literature review and taxonomy of methods for remaining time prediction in the context of business processes, as well as a cross-benchmark comparison of 16 such methods based on 16 real-life datasets originating from different industry domains.
△ Less
Submitted 10 May, 2018; v1 submitted 8 May, 2018;
originally announced May 2018.
-
Discovering Process Maps from Event Streams
Authors:
Volodymyr Leno,
Abel Armas-Cervantes,
Marlon Dumas,
Marcello La Rosa,
Fabrizio M. Maggi
Abstract:
Automated process discovery is a class of process mining methods that allow analysts to extract business process models from event logs. Traditional process discovery methods extract process models from a snapshot of an event log stored in its entirety. In some scenarios, however, events keep coming with a high arrival rate to the extent that it is impractical to store the entire event log and to…
▽ More
Automated process discovery is a class of process mining methods that allow analysts to extract business process models from event logs. Traditional process discovery methods extract process models from a snapshot of an event log stored in its entirety. In some scenarios, however, events keep coming with a high arrival rate to the extent that it is impractical to store the entire event log and to continuously re-discover a process model from scratch. Such scenarios require online process discovery approaches. Given an event stream produced by the execution of a business process, the goal of an online process discovery method is to maintain a continuously updated model of the process with a bounded amount of memory while at the same time achieving similar accuracy as offline methods. However, existing online discovery approaches require relatively large amounts of memory to achieve levels of accuracy comparable to that of offline methods. Therefore, this paper proposes an approach that addresses this limitation by mapping the problem of online process discovery to that of cache memory management, and applying well-known cache replacement policies to the problem of online process discovery. The approach has been implemented in .NET, experimentally integrated with the Minit process mining tool and comparatively evaluated against an existing baseline using real-life datasets.
△ Less
Submitted 8 April, 2018;
originally announced April 2018.
-
Alarm-Based Prescriptive Process Monitoring
Authors:
Irene Teinemaa,
Niek Tax,
Massimiliano de Leoni,
Marlon Dumas,
Fabrizio Maria Maggi
Abstract:
Predictive process monitoring is concerned with the analysis of events produced during the execution of a process in order to predict the future state of ongoing cases thereof. Existing techniques in this field are able to predict, at each step of a case, the likelihood that the case will end up in an undesired outcome. These techniques, however, do not take into account what process workers may d…
▽ More
Predictive process monitoring is concerned with the analysis of events produced during the execution of a process in order to predict the future state of ongoing cases thereof. Existing techniques in this field are able to predict, at each step of a case, the likelihood that the case will end up in an undesired outcome. These techniques, however, do not take into account what process workers may do with the generated predictions in order to decrease the likelihood of undesired outcomes. This paper proposes a framework for prescriptive process monitoring, which extends predictive process monitoring approaches with the concepts of alarms, interventions, compensations, and mitigation effects. The framework incorporates a parameterized cost model to assess the cost-benefit tradeoffs of applying prescriptive process monitoring in a given setting. The paper also outlines an approach to optimize the generation of alarms given a dataset and a set of cost model parameters. The proposed approach is empirically evaluated using a range of real-life event logs.
△ Less
Submitted 19 June, 2018; v1 submitted 23 March, 2018;
originally announced March 2018.