Keywords

1 Introduction

One of the critical determinants of process performance is the effectiveness (or even optimality) of resource allocation decisions (i.e., decisions on what resources to allocate to each process task). Process execution histories annotated with quality of service (or performance or outcome) measures together with the process context, can be a rich source of knowledge about the best resource allocation decisions. The optimality of resource allocation decisions is not determined by the process instance alone, but also by the context in which these instances are executed. This phenomenon turns out to be even more compelling when the resources in question are human resources. Human workers with same the organizational role and capabilities can have heterogeneous behaviors based on their operational context. In this work, we propose an approach to supporting resource allocation decisions by extracting information about the process context and process performance from past process executions.

The notion of process context plays a key role in this account. We define the process context to be that body of exogenous knowledge potentially relevant to the execution of the process that is available at the start of the execution of the process, and that is not impacted/modified via the execution of the process (in general, exogenous knowledge impacting the execution of a process can be dynamic, changing during the execution of the process, but our focus on only the knowledge that holds at the start of the execution of a process is a simplifying assumption). The process context can impact resource allocation decisions in a variety of ways. Consider a document printing process that takes as input a document and goes through a series of steps resulting in the document being printed. During office hours, the process might allocate a high-throughput (and high carbon-footprint) printer to the print task, but allocate a slower (but lower carbon-footprint) printer outside of office hours. The differential resourcing of the print task is driven by the context (specifically the time of day) which does not form part of the process data (generated or consumed by the process) but is exogenous. The context can also contain important information about resource characteristics (which, again, do not form part of the process data). Thus, for handling an insurance claim from a high priority customer, we might allocate an experienced employee as a resource (the experience or other attributes of employees do not form part of the process data - they are neither generated, impacted or consumed by the process - but have a bearing on the execution of the process). We note that processes can always be re-designed to incorporate context attributes as process inputs, but such an approach is not particularly useful given the complexity of the process designs that would result (consider, for example, the complexity of a process design that incorporates XOR branches for each distinct resourcing modality for a task).

Our proposed approach involves the use of two data mining techniques: (1) Decision tree learning and (2) the k-Nearest Neighbor (k-NN) algorithm. With the former, we take a process context and a history of past process instances (each instance consisting of set of tasks executed, the relevant process data and a set of outcomes or performance indicators) and compute a decision tree which enables us to predict the performance of a (potentially partially executed) process instance. The decision tree thus obtained can also be used to extract rules correlating contextual knowledge with process data when the intent is to guarantee a certain set of outcomes (in other words, a certain performance profile). Given that resource characteristics typically form part of the process context, these rules can be valuable in determining the attributes of the resources necessary for achieving desired performance. With the k-NN approach, we use k-NN regression to determine from the nearest neighbors of a process instance, those values of the process context attributes (and particularly those that characterize resources) that would likely lead to the desired outcomes. We present an evaluation of our approach using both a real-world dataset and a synthetic dataset.

The approach that we propose is of considerable practical value. Conventionally, the decision taken by a project or team lead (in many practical process resourcing settings) is based on human judgment, experience and on her implicit understanding of the context. Consequently, resource allocation activity is subjective and relies on the experience of a project or team lead. Automated, data-driven support can potentially serve as a game-changer in these settings.

The paper is organized as follows. Section 2 presents related work, while Sect. 3 provides a simple running example. Section 4 presents a discussion of the general setting. Our proposed approach is outlined in Sect. 5, and a detailed empirical evaluation is presented in Sect. 6, followed by conclusions and future work in Sect. 7.

2 Related Work

Context Modeling in Business Processes: Modeling of context in business process has been proposed by Saidani et al. [17] who define context as “... any information reflecting changing circumstances during the execution of a BP can be considered as contextual information”. They introduce a taxonomy of contextual information for business processes consisting of four categories: (i) context related to location (ii) context related to time (iii) context related to resources and (iv) context related to organization. In more recent work [18], a meta-model for context has been defined. The meta-model comprises of context entity and context attributes. Context entities are connected to each other using context relationships. We have leveraged this meta-model in our work, we have used context entities such as process and resource, and their related contextual attributes. The contextual attributes of each process instance are used in conjunction with process outcomes to extract resource allocation rules. Ghattas et al. [4, 5] use process context and process outcomes from execution histories to discover decisions taken in the past. In their work, the authors model the process context and outcomes. The definition of context is based on a Generic Process Model defined by the authors where, external events, that are out of the control of process execution are referred to as context (something we leverage in our conception of process context). Process instances, containing contextual information and process outcomes are clustered and a decision tree is built to discover decisions taken in the past. The approach has been validated for a clinical process [3]. Our work is similar, as it uses past execution histories and context information to further link with process outcomes. However, in our work we have focused in greater detail on specific contextual characteristics of resources, which has an impact on both the process outcomes and allocation decisions. Process context has also been defined by a different set of authors as “Minimum set of variables containing all relevant information that impact the design and execution of a business process.” [16] (our conception of process context is somewhat more specific relative to this definition). A large body of additional work exists in context-aware approaches to information systems, as well formal approaches to the modeling of context, but space constraints preclude a detailed discussion of these.

Resource Allocation Recommendation: Some of the early work by Kumar et al. [12], proposed dynamic allocation of task to resources, considering factors such as suitability, availability, urgency and conformance. Further, their recent work [11] highlighted the use of cooperation among the team members involved in the process, and developed an allocation algorithm to maximized team cooperation. The authors, have highlighted the need for examining impact of cooperation on throughput and other process outcomes. In this work, we consider multiple such resource specific context, in addition to process outcome and discover resource allocation rules. Work experience is an important contextual characteristic of a (human) resource, that influences allocation of tasks. Sonja et al. [10], define various measures of experience. The authors, further describe an experience breeding model [9], for maintaining experience levels of the resources. Detailed modeling of experience enables better evaluation of resource allocation decisions. In our work, we have used experience as one of the contextual characteristics of a resource. In their work [13], Nakatumba et al. have analyzed the influence of workload on service times. The authors use event logs to extract service times and workload on a resource and build a regression model using workload as a single predictor of service time. While this model is useful to compare specific resources and their efficiencies, it is limited as there are several factors of resource influencing service times. In this work, we use predictors that are guided by the context model. Resource behavior indicators [14], has been defined by Pika et al. In their work, the authors provide a framework for extracting resource behavior indicators from event logs and highlight the change in these indicators over time. Huang et al. [7] present resource behavior measures for competence, preference, availability and collaboration. Enriching resource model, to include additional resource characteristics, has also been described in [22]. In our work, we use resource behavior indicators as resource context, and discover the influence of these behaviors on process outcomes. Recommending the next action to take, based on a user’s current execution history and specific goal, has been described in [19]. The approach evaluates past history of executions to mine recommendations. The work focuses on the control flow and the context is not considered. Predicting process outcome in terms of duration of task has been evaluated by authors in [20]. However, the resources involved and the context of the executing process, has not been taken into account. In one of the recent works [1], the authors present a general framework to derive and correlate process characteristics. The framework does not consider contextual characteristics of process, resources, and its influence on the process characteristics.

3 Example

For the purpose of illustration, we describe an example process that is adopted throughout the paper. Consider a process for handling vehicle repair and maintenance in a garage. Figure 1 illustrates the business process. The process starts with a ‘Receptionist’ receiving the vehicle (task ‘Receive Vehicle’). A ‘Supervisor’ inspects the vehicle (task ‘Inspect Vehicle’). After inspection, a decision is made to either repair the vehicle or send it for regular maintenance. The vehicle is either repaired (task ‘Repair Vehicle’) or goes through maintenance (task ‘Vehicle Maintenance’). These tasks are performed by a mechanic. The supervisor finally checks the vehicle (task ‘Check Quality’). The process ends by the receptionist handing over the vehicle to the owner (task ‘Hand over Vehicle’).

In this process, there are certain attributes, defined as a part of the process design: the process type indicating whether a vehicle requires repair or maintenance, the models of the car that the process supports, the organization model of the process with resources, their roles and capabilities. There are certain aspects that are dependent on the environment or situation during process execution: the problem list associated to each car, the utilization of mechanics at a specific instance of time (the number of tasks the mechanic is currently working on, when multi-tasking or number of tasks waiting in the queue of a mechanic), the preference of a mechanic to work on a given model of car or collaboration of a mechanic with a co-worker. These aspects do impact the process execution but are not modeled as a part of the design and become contextual characteristics of a process instance or resource. In this process, if a problem of the vehicle has been handled by only one particular mechanic, in the past, then it is preferable to allocate task to that mechanic. Contextual characteristics of the resource and process instance are considered during task allocation and forms a part of the experience gained by person allocating tasks.

Process outcomes are another important aspect that are defined and need to be assessed during execution. In this process, there is a goal set for the completion time of the process: The repair of a car should take no more than 3 days while the car maintenance should take no more than 5 h. A process instance may be successful or may fail in meeting the goal. Our approach uses the process outcome, process instance attributes, contextual characteristics of the process instance and resources involved to discern allocation rules.

Fig. 1.
figure 1

Process model vehicle maintenance and repair

4 The General Setting

In this section, we explain the notion of process context and describe the key data items that are used by the data mining machinery we outline in the following section. We define the process context to consist of exogenously determined knowledge potentially relevant to the execution of the process that is available at the start of the execution of the process, and that is not impacted/modified via the execution of the process. The intent is to capture the knowledge/data that does not fall under the ambit of the traditional notion of process data (or process attributes) but can be an important determinant of the performance of a process instance. It is critical that only exogenously determined data (i.e., determined not by the process but by the “rest of the world”) constitute the process context. In contrast, process attributes (or process data) include endogenously determined elements (i.e., attributes whose values are determined via the execution of the process) as well data provided as input to the process. In general, the process context can be dynamic, i.e., exogenously determined knowledge relevant to the process might change while the process executes. For the purposes of this paper, we make the simplifying assumption that only the context that holds at the start of the execution of the process is of interest. A particularly interesting type of contextual knowledge is knowledge about resources (resource characteristics are typically not determined, impacted, or provided as input to the execution of a process, and thus correctly belong to the process context). Thus the experience of a vehicle repairer (i.e., a mechanic) is part of the process context in the example in the previous section. Contextual knowledge unrelated to resources can also be of interest. For instance, a history of process executions of a insurance claim handling process might suggest that these tend to perform poorly (in terms of completion time, cost or number of problem escalations) during periods of financial market volatility. Thus financial market volatility might be an important contextual attribute that determines the performance of the claim handling process. The context can be of two types (i) generic and relevant to all processes and (ii) domain specific [17]. Some of the generic contextual characteristics defined in [17] are reusable across processes, while the domain specific contextual characteristics need to be identified by a domain experts.

We assume that the process context is modeled by a set of attribute-value pairs C. Other approaches to modeling the context are possible, such as truth-functional assertions in an appropriate logic, but our approach is quite general, and the overall framework remains valid even if we adopt alternative representation schemes for the context. Our knowledge about the resources available to a process is also part of the contextual knowledge that can brought to bear (resource attributes are typically not part of process data, and hence satisfy our definition of what can be deemed to be contextual knowledge). We sometimes find it convenient to denote knowledge about resources as \(C_r \subseteq C\) and to denote those parts of contextual knowledge that do not pertain to resources by \(C_p\) where \(C_p = C - C_r\). We use a set of attribute-value pairs X to denote process data in the usual sense, i.e., data provided as input to a process, data modified or impacted by a process and data generated as output by a process. We note that the signature of X (i.e., the schema for process data) is associated with a process design while an actual set of attribute-value pairs are associated with a process instance. We use A to denote the set of all activities that form part of a process design. Finally, we are interested in the (non-functional) outcomes (or performance) of a process (we aim to predict these for a process instance, and to provision processes to achieve desired outcomes). We use a set of non-functional attribute (or QoS factor)-value pairs O to denote the outcome of a given process instance. The signature of O is associated with a process design, and represents the set of non-functional attributes that can be used to assess the performance of an instance of that design.

Our approach relies on being able to mine an execution history represented by a set of process instances and their associated process contexts. On occasion, we will also leverage a record of a partially-executed process instance for determining the best resource to allocate to process task (based on knowledge mines from the execution history).

Definition 1

(Process Instance). A process instance is a tuple

\(PI = \langle v_x,v_a, C,v_o \rangle \), where:

  • \(v_x = (v_x^1, \dots v_x^{i}) \subseteq X\), is a set of attribute-value pairs representing available process data for that instance.

  • \(v_o = (v_o^1, \dots v_o^{j}) \subseteq O\), is a set of \(\langle \)non-functional-attribute, value\(\rangle \) pairs or outcomes.

  • \(v_a= (a_i | a_i \in A \wedge f_{executed}(a_i)=true)\), set of activities that were executed in that process instance, (\(f_{executed}(a_i)=true) =>\) activity \(a_i\) was executed in the process).

  • C, is a set of attribute-value pairs of the process context.

5 Proposed Approach

Our intent is to provide data-enabled decision support for allocating resources to process tasks. We achieve this in two ways: (1) By applying decision tree learning and (2) By deploying the k-nearest neighbor algorithm.

Decision Tree Learning: The key problem we solve is as follows. Given:

  • An execution history of process instances and their associated process contexts as defined above and

  • A description of the process context as defined above,

Compute:

  • A decision tree which enables us to predict the performance of a (potentially partially executed) process instance.

Given the decision tree that is mined, we are able to answer the following questions:

  • Given a specification of context and process data, we are able to predict the performance of the process. We can make predictions about performance even in the presence of partial specifications of context and process data.

  • We can extract rules from the decision tree that identify what states of the context and what process data are likely to lead to a given process outcome. These rules provide important guidance in process provisioning decisions.

In both the above modes of leveraging the decision tree that is learnt, we rely on the important observation that the context often contains detailed knowledge about the resources that might potentially be used in a process instance. We also represent knowledge about resource-task pairs in the context (for example the experience of the resource used for the repair task in Fig. 3 is represented by the context attribute Experience.RepairV - this tells us what the experience of the repairer was, independent of the identity of the specific individual) and not about resources in isolation (e.g., a specific person, or a specific machine). We first cluster process instances using the process attributes (\(v_x\)) and process activities (\(v_a\), indicating the path of the process). Clustering, groups the process instances in a way that similar clusters have similar process attributes and execution paths. Two-step clustering method [8] is used, as it is capable of handling both categorical and numerical data, and identifies the optimal number of clusters from the data. However, any Gaussian mixture model based clustering with a suitable distance metric to identify (dis)similar process instances can be used [2]. The intent behind clustering is to mine decision trees only from clusters of similar process instances, and not from across the board. The next step is to generate a decision tree model using the outcome(s) (\(v_o\)) as the target variable(s) and context attributes as predictor/independent variables (C). Our approach is best illustrated in the example in Fig. 3. The root node of decision tree in Fig. 3 is the process outcome. At each branch, a branching criterion is used for determining which predictor variable is best suited to split process instances. At the first branch, customerType is used to split the process instances. 35 % of the process instances have the value \(customerType=premium\). The remaining 65 % of the instances have \(customerType=normal\) (the branches further on, has not been detailed due to lack of space). The next split of the tree, for premium customer, is based on the experience of the repairer. The percentage of process instances having a specific value of the process outcome, is available at each node. The next predictor used for splitting the tree is the preference of the repairer and the final split is based on the utilization of the repairer. Given the attributes of a resource-task pair, the tree helps predict the process outcome. In Fig. 3, if \(customerType=premium\) and if the experience of the repairer is low \((Experience.RepairV=LOW)\), then the probability of meeting the service level is very low \(0.3\,\%,(0.19*0.02=0.003).\)

k-Nearest Neighbor (k-NN): This approach is one of the options available when the intent is to provide decision support for allocating resources to process tasks in partially executed process instances. We provide as input the process data, the sequence of tasks executed thus far in the process instance and the desired outcomes (assignments of values to non-functional attributes). The k-nearest neighbor algorithm [6] identifies past process instances that are similar to the current instance. We use k-NN regression to identify the contextual conditions (specifically, those parts of the context that represent knowledge about the resource-task pairs) which would lead to the desired outcome/performance of the instance. k-NN regression thus provides the attributes of the resource to be deployed for a given task. Using the same setting as the example in Fig. 3, k-NN regression might tell, based on neighbors most similar in terms of process data and the partial sequence of tasks executed, that using a repairer with high experience is most likely to lead to a good outcome (in this case, service level being met). k-NN regression relies on averaging attribute values of the nearest neighbors. For discrete-valued attributes, we use majority voting of the nearest neighbors.

Figure 2 illustrates our approach to provide data-enabled decision support.

Fig. 2.
figure 2

Approach for Context-Aware analysis of Resource allocations

6 Evaluation

This section presents two evaluations: first using synthetic execution logs and second using a sub set of a real-world event log. Evaluation of the synthetic data aims to verify the ability of using the approach to discover context dependent task allocation rules. The real-world data, is used to validate the possibility of extracting context, and gain insights using event logs.

6.1 Evaluation Using Simulated Process Instances

The synthetic data is created by simulating process instances of the car repair and maintenance process, described in Sect. 2. The context comprises of the process context \(C_p\) and resource context \(C_r\).

Attributes of \(C_p=\{problemType,TimeOfDay,caseHandling\}\)

problemType is a problem reported by the customer that needs to be addressed. caseHandling is a domain specific context attribute and is set to true, if the supervisor who has inspected the car is the same as the supervisor doing a quality check.

Attributes of \(C_r= \{Experience,Preference,Collaboration,Utilization\}\)

Context of a resource includes availability, proximity, competency, experience, collaboration sensitivity, age, gender and so on [17]. Further, some of these resource contextual characteristics include behavior of the resource such as utilization, preference and collaboration have been identified and measured in the previous work [7, 14].

The schema for process data is given by

\(X=\{isRepair,vehicleModel,customerType\}\)

isRepair indicates if the request is for repair. The vehicleModel is the model of the vehicle to be repaired or maintained and customerType depicts if the customer is a premium customer or normal customer.

\(O=\{completionTime,metServiceLevel\}\)

completionTime is the time taken for the process to complete. metServiceLevel refers to the meeting service levels defined for customer type. In the example scenario, if a customer is a premium customer, metServiceLevel is defined as true if \(completionTime \le \!18h\) for repair and \(\le \!2h\) for maintenance. For a normal customer, metServiceLevel is true if \(completionTime \le \!24 h\) for repair and \(\le \!3h\) for maintenance.

Process Instances Generated for the Model. A simulation model is used to generate process instances based on the process context model. Gaussian distribution functions are used to generate values for the context of process, resources and process attributes. The completion time is generated by considering the context and attributes of the process. There is additional randomness added to the generation of completion time to imitate real-world settings. 10000 process instances are simulated. The process instance data is used as follows:

Table 1. Importance of Predictor with metServiceLevel as the target
Fig. 3.
figure 3

Decision tree depicting one path from root node to leaf nodes for metServiceLevel prediction

Decision Tree Learning: This step starts with clustering the process instances based on process attributes. The process instances are clustered based on a process attribute isRepair indicating if a vehicle is for repair. A decision tree is built with the metServiceLevel as the target variable and context as predictor. Chi-square Automatic Interaction Detection (CHAID) algorithm is used to construct the decision tree [15]. Table 1 shows the predictor importance. The most important predictor is the customerType as the serviceLevel is stringent for a premium customer and relaxed for a normal customer. The other resource context variables such as experience of the resource performing the repair task, the preference of the resource, enable predicting the process outcome. The decision tree (Fig. 3) predicts the outcome with 95.3 % accuracy. The task allocation rules for a premium customer can be derived from the tree. One such rule for task allocation would be:

then \((metServciceLevel=true)\)

The variable Experience.RepairV implies the value ‘Experience’, of the resource performing Repair vehicle activity. The rule indicates that for the repair task of a premium customer, a resource with high experience and higher preference and utilization of \(<=\)1 should be chosen for higher probability of successful outcome.

K-Nearest Neighbor: Another useful scenario would be in supporting the decision of task allocation, during process execution. In this scenario, a process may have executed partially (or is in its initial state). The new executing process instance and the target outcome of the executing process are given as input. In the example, we provide the \(\{isRepair=false,customerType=premium,vehicleModel=XY\}\) and completionTime of 4 h, as input. The input values are matched against past process instances. K-Nearest Neighbor algorithm (K-NN), is used to find process instances that are closest to the current process execution instance. There are similarity distance functions that consider continuous and categorical data. For our evaluation, we use Euclidean distance measure. Statistical packages such as SPSS [8] provide an estimate of K. For our experiment, K is set to 5.

The context values of the nearest neighbors (average for continuous values and majority voting for discrete values), is used as input, to find the matching resources. Table 2 shows the key context attributes required for the maintenance of a vehicle for premium customer (process attribute) and an outcome or completionTime \(=\) 4 h. The matching resource is identified by selecting resources with the same experience, utilization and preference.

Table 2. Nearest neighbors and resource recommendations for a premium customer with 4 h completionTime

A similar K-NN model, when built for a process requiring maintenance of a vehicle for a regular customer, with an outcome or completionTime \(=\)6 h, indicates that resource with lower experience and higher preference is capable of meeting the outcome. Hence, resource context required for a process outcome varies with the process attribute values.

Table 3. Importance of Predictor with metServiceLevel as the target

6.2 Evaluation Using Real-World Event Log

The approach was evaluated on a real-world event log. To this end, we used the logs from the BPI Challenge of 2013 [21]. The data set comprises of logs from an IT incident management system. An incident is created where there is an issue in the IT application. Each incident or issue, has an associated impact and relates to a product of the enterprise. A resource, is allocated the task of resolving the issue.

Lack of information about the domain, limits our ability to model process attributes or the context of the process. Hence, the process context model is limited to generic attributes such as TimeOfDay of the incident. The process attributes are the impact of the incident and product associated with the incident. The resource context is derived from event logs. We use resource behavior measures that are computed from event logs, for capturing the resource context based on the work presented in [7, 14]. The evaluation is done on a subset of the instances where a single resource resolves an incident. Event logs involving multiple resources, do not provide clarity the time spent by each resource on the incident and hence is not used.

The process outcome is based on the completion time. A process with a completion time of < 1 day is set to have metServciceLevel (process outcome). A decision tree is build using the process attributes and context. The model predicts the process outcome with 88.2 % accuracy. Table 3 presents the predictor importance for the process outcome. The utilization of the resource impacts the outcome, followed by the preference of the resource. The impact of the incident, is a process attribute, that influences the outcome. Experience of the resource has lower importance in the model. However, in this model, we had two categories of experience levels (level1, level2), based on the organization or team the resource belonged to.

There could be several other factors, that could influence the outcome, which have not been used for the evaluation of real-world event logs. This requires us access to additional information available in the PAIS. However, the current results indicate, that process context has an impact on the process outcome.

7 Conclusion and Future Work

This paper shows how a history of past process instances and their associated contexts can be mined to provide guidance in resource allocation decisions for a currently executing process instance. Research in the past has analyzed resource behavior or context, but in isolation. The work presented here, uses it in conjunction with the process context and outcomes. This work further uses real-world event logs to derive resource context and discover the influence of the context on process outcome. The effectiveness of such approaches are limited by the insufficient information captured in real-world process systems. As an extension of this work, we would define a taxonomy for process execution logs that would support the ability to derive rich context information of a process. This would further enable us to evaluate the approach by applying it to real-world data and help better understand its applicability in wide variety of business processes.