Abstract
Event logs have become a valuable information source for business process management, e.g., when analysts discover process models to inspect the process behavior and to infer actionable insights. To this end, analysts configure discovery pipelines in which logs are filtered, enriched, abstracted, and process models are derived. While pipeline operations are necessary to manage log imperfections and complexity, they might, however, influence the nature of the discovered process model and its properties. Ultimately, not considering this possibility can negatively affect downstream decision making. We hence propose a framework for assessing the consistency of model properties with respect to the pipeline operations and their parameters, and, if inconsistencies are present, for revealing which parameters contribute to them. Following recent literature on software engineering for machine learning, we refer to it as debugging. From evaluating our framework in a real-world analysis scenario based on complex event logs and third-party pipeline configurations, we see strong evidence towards it being a valuable addition to the process mining toolbox.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
https://dplyr.tidyverse.org, accessed 2021-05-12.
- 2.
https://www.bupar.net, accessed 2021-05-12.
- 3.
https://pandas.pydata.org, accessed 2021-05-12.
- 4.
https://pm4py.fit.fraunhofer.de, accessed 2021-05-12.
- 5.
http://www.promtools.org/, accessed 2021-05-12.
- 6.
https://www.win.tue.nl/bpi/doku.php?id=2015:challenge, accessed 2021-05-12.
- 7.
https://data.4tu.nl/articles/dataset/Sepsis_Cases_-_Event_Log/12707639, accessed 2021-03-12.
- 8.
References
van der Aalst, W.: Process Mining: Data Science in Action. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49851-4
Adriansyah, A., Buijs, J.C.A.M.: Mining process performance from event logs. In: BPM Workshops, pp. 217–218 (2013)
Amershi, S., et al.: Software engineering for machine learning: a case study. In: ICSE SEIP, pp. 291–300 (2019)
Arpteg, A., Brinne, B., Crnkovic-Friis, L., Bosch, J.: Software engineering challenges of deep learning. In: SEAA, pp. 50–59 (2018)
Augusto, A., Conforti, R., Dumas, M., La Rosa, M., Polyvyanyy, A.: Split miner: automated discovery of accurate and simple business process models from event logs. Knowl. Inf. Syst. 59, 251–284 (2019)
Ballambettu, N.P., Suresh, M.A., Bose, R.P.J.C.: Analyzing process variants to understand differences in key performance indices. In: CAISE, pp. 298–313 (2017)
Bauer, M., Senderovich, A., Gal, A., Grunske, L., Weidlich, M.: How much event data is enough? a statistical framework for process discovery. In: CAISE, pp. 239–256 (2018)
Bose, R.P.J.C., Mans, R.S.: Van Der Aalst, W.M.P.: Wanna improve process mining results? In: IEEE SSCI, pp. 127–134 (2013)
Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: Quality dimensions in process discovery: the importance of fitness, precision, generalization and simplicity. Int. J. Coop. Inf. Syst. 23(01), 1440001 (2014)
van Eck, M.L., Lu, X., Leemans, S.J.J., van der Aalst, W.M.P.: PM2: a process mining project methodology. In: CAISE, pp. 297–313 (2015)
Fani Sani, M., van Zelst, S.J., van der Aalst, W.M.P.: The impact of event log subset selection on the performance of process discovery algorithms. In: ADBIS, pp. 391–404 (2019)
García-Bañuelos, L., van Beest, N.R.T.P., Dumas, M., Rosa, M.L., Mertens, W.: Complete and interpretable conformance checking of business processes. IEEE Trans. Softw. Eng. 44(3), 262–290 (2018)
Homma, T., Saltelli, A.: Importance measures in global sensitivity analysis of nonlinear models. Reliab. Eng. Syst. Saf. 52(1), 1–17 (1996)
Jansen, M.J.W.: Analysis of variance designs for model output. Comput. Phys. Commun. 117(1), 35–43 (1999)
Kalenkova, A., Polyvyanyy, A., La Rosa, M.: A framework for estimating simplicity of automatically discovered process models based on structural and behavioral characteristics. In: BPM, pp. 129–146 (2020)
Klinkmüller, C., van Beest, N.R.T.P., Weber, I.: Towards reliable predictive process monitoring. In: CAISE Forum, pp. 163–181 (2018)
Klinkmüller, C., Müller, R., Weber, I.: Mining process mining practices: an exploratory characterization of information needs in process analytics. In: BPM, pp. 322–337 (2019)
Klinkmüller, C., Weber, I.: Every apprentice needs a master: Feedback-based effectiveness improvements for process model matching. Inf. Syst. 95, 101612 (2021)
Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process models from event logs - a constructive approach. In: Petri Nets, pp. 311–329 (2013)
Leemans, S.J.J., Goel, K., Van Zelst, S.J.: Using multi-level information in hierarchical process mining: Balancing behavioural quality and model complexity. In: ICPM, pp. 137–144 (2020)
Leemans, S.J.J., Shabaninejad, S., Goel, K., Khosravi, H., Sadiq, S., Wynn, M.T.: Identifying cohorts: recommending drill-downs based on differences in behaviour for process mining. In: ER, pp. 92–102 (2020)
Maggi, F.M., Di Francescomarino, C., Dumas, M., Ghidini, C.: Predictive monitoring of business processes. In: CAISE, pp. 457–472 (2014)
Mannhardt, F., Blinde, D.: Analyzing the trajectories of patients with sepsis using process mining. In: BPMDS, pp. 72–80 (2017)
Manousakis, I., Goiri, I.N., Bianchini, R., Rigo, S., Nguyen, T.D.: Uncertainty propagation in data processing systems (2018)
Mariscal, G., Marbán, S., Fernández, C.: A survey of data mining and knowledge discovery process models and methodologies. Knowl. Eng. Rev. 25(2), 137–166 (2010)
Pegoraro, M., van der Aalst, W.M.P.: Mining uncertain event data in process mining. In: ICPM, pp. 89–96 (2019)
Polyvyanyy, A., Armas-Cervantes, A., Dumas, M., García-Bañuelos, L.: On the expressive power of behavioral profiles. Formal Aspects Comput. 28(4), 597–613 (2016)
Puy, A., Lo Piano, S., Saltelli, A.: Is vars more intuitive and efficient than sobol’ indices? Environ. Model Softw. 137, 104960 (2021)
Razavi, S., Gupta, H.V.: A new framework for comprehensive, robust, and efficient global sensitivity analysis: 1. theory. Water Resour. Res. 52(1), 423–439 (2016)
Rozinat, A., van der Aalst, W.M.P.: Conformance checking of processes based on monitoring real behavior. Inf. Syst. 33(1), 64–95 (2008)
Sacha, D., Senaratne, H., Kwon, B.C., Ellis, G., Keim, D.A.: The role of uncertainty, awareness, and trust in visual analytics. IEEE Trans. Vis. Comput. Graph. 22(1), 240–249 (2016)
Sacha, D., Stoffel, A., Stoffel, F., Kwon, B.C., Ellis, G., Keim, D.A.: Knowledge generation model for visual analytics. IEEE Trans. Vis. Comput. Graph. 20(12), 1604–1613 (2014)
Saltelli, A.: Making best use of model evaluations to compute sensitivity indices. Comput. Phys. Commun. 145(2), 280–297 (2002)
Saltelli, A., Aleksankina, K., Becker, W., Fennell, P., Ferretti, F., Holst, N., Li, S., Wu, Q.: Why so many published sensitivity analyses are false: a systematic review of sensitivity analysis practices. Environ. Model Softw. 114, 29–39 (2019)
Saltelli, A., Annoni, P., Azzini, I., Campolongo, F., Ratto, M., Tarantola, S.: Variance based sensitivity analysis of model output design and estimator for the total sensitivity index. Comput. Phys. Commun. 181(2), 259–270 (2010)
Saltelli, A., et al.: Global Sensitivity Analysis. The Primer, Wiley, Hoboken (2008)
Sargent, R.G.: Verification and validation of simulation models. J. Simul. 7, 12–24 (2013)
Seeliger, A., Sánchez Guinea, A., Nolle, T., Mühlhäuser, M.: Processexplorer: intelligent process mining guidance. In: BPM (2019)
Sobol, I.M.: Uniformly distributed sequences with an additional uniform property. USSR Comput. Math. Math. Phys. 16(5), 236–242 (1976)
Suriadi, S., Andrews, R., ter Hofstede, A.H.M., Wynn, M.T.: Event log imperfection patterns for process mining: Towards a systematic approach to cleaning event logs. Inf. Syst. 64, 132–150 (2017)
Weidlich, M., Mendling, J., Weske, M.: Efficient consistency measurement based on behavioral profiles of process models. IEEE Trans. Softw. Eng. 37(3), 410–429 (2011)
Weidlich, M., Polyvyanyy, A., Mendling, J., Weske, M.: Efficient computation of causal behavioural profiles using structural decomposition. In: Petri Nets, pp. 63–83 (2010)
Weidlich, M., Polyvyanyy, A., Mendling, J., Weske, M.: Causal behavioural profiles - efficient computation, applications, and evaluation. Fundam. Inf. 113(3–4), 399–435 (2011)
Wieringa, R.J.: Design Science Methodology for Information Systems and Software Engineering. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43839-8
Yang, K., Huang, B., Stoyanovich, J., Schelter, S.: Fairness-aware instrumentation of preprocessing pipelines for machine learning. In: HILDA (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Klinkmüller, C., Seeliger, A., Müller, R., Pufahl, L., Weber, I. (2021). A Method for Debugging Process Discovery Pipelines to Analyze the Consistency of Model Properties. In: Polyvyanyy, A., Wynn, M.T., Van Looy, A., Reichert, M. (eds) Business Process Management. BPM 2021. Lecture Notes in Computer Science(), vol 12875. Springer, Cham. https://doi.org/10.1007/978-3-030-85469-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-85469-0_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85468-3
Online ISBN: 978-3-030-85469-0
eBook Packages: Computer ScienceComputer Science (R0)