Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Enhancing Event Log Quality: Detecting and Quantifying Timestamp Imperfections

  • Conference paper
  • First Online:
Business Process Management (BPM 2020)

Abstract

Timestamp information recorded in event logs plays a crucial role in uncovering meaningful insights into business process performance and behaviour via Process Mining techniques. Inaccurate or incomplete timestamps may cause activities in a business process to be ordered incorrectly, leading to unrepresentative process models and incorrect process performance analysis results. Thus, the quality of timestamps in an event log should be evaluated thoroughly before the log is used as input for any Process Mining activity. To the best of our knowledge, research on the (automated) quality assessment of event logs remains scarce. Our work presents an automated approach for detecting and quantifying timestamp-related issues (timestamp imperfections) in an event log. We define 15 metrics related to timestamp quality across two axes: four levels of abstraction (event, activity, trace, log) and four quality dimensions (accuracy, completeness, consistency, uniqueness). We adopted the design science research paradigm and drew from knowledge related to data quality as well as event log quality. The approach has been implemented as a prototype within the open-source Process Mining framework ProM and evaluated using three real-life event logs and involving experts from practice. This approach paves the way for a systematic and interactive enhancement of timestamp imperfections during the data pre-processing phase of Process Mining projects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. van der Aalst, W.M.P.: Process Mining: Data Science in Action, vol. 2. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49851-4

    Book  Google Scholar 

  2. van der Aalst, W.M.P., Bichler, M., Heinzl, A.: Responsible data science. Bus. Inf. Syst. Eng. 59(5), 311–313 (2017). https://doi.org/10.1007/s12599-017-0487-z

    Article  Google Scholar 

  3. Alkhattabi, M., Neagu, D., Cullen, A.: Assessing information quality of e-learning systems. Comput. Hum. Behav. 27(2), 862–873 (2011). https://doi.org/10.1016/j.chb.2010.11.011

    Article  Google Scholar 

  4. Andrews, R., van Dun, C.G.J., Wynn, M.T., Kratsch, W., Röglinger, M.K.E., ter Hofstede, A.H.M.: Quality-informed semi-automated event log generation for process mining. Decis. Support Syst. 132(3) (2020). https://doi.org/10.1016/j.dss.2020.113265

  5. Askham, N., et al.: The six primary dimensions for data quality assessment (2013)

    Google Scholar 

  6. Awad, A., Zaki, N.M., Di Francescomarino, C.: Analyzing and repairing overlapping work items. Inf. Softw. Technol. 80, 110–123 (2016). https://doi.org/10.1016/j.infsof.2016.08.010

    Article  Google Scholar 

  7. Bose, R.P.J.C., Mans, R.S., van der Aalst, W.M.P.: Wanna improve process mining results? In: CIDM 2013, pp. 127–134. IEEE (2013). https://doi.org/10.1109/CIDM.2013.6597227

  8. Conforti, R., la Rosa, M., ter Hofstede, A.H.M.: Timestamp repair for business process event logs. Technical report, University of Melbourne (2018)

    Google Scholar 

  9. Dixit, P.M., et al.: Detection and interactive repair of event ordering imperfection in process logs. In: Krogstie, J., Reijers, H.A. (eds.) CAiSE 2018. LNCS, vol. 10816, pp. 274–290. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91563-0_17

    Chapter  Google Scholar 

  10. Emamjome, F., Andrews, R., ter Hofstede, A.H.M.: A case study lens on process mining in practice. In: Panetto, H., Debruyne, C., Hepp, M., Lewis, D., Ardagna, C.A., Meersman, R. (eds.) OTM 2019. LNCS, vol. 11877, pp. 127–145. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33246-4_8

    Chapter  Google Scholar 

  11. Gregor, S., Hevner, A.R.: Positioning and presenting design science research for maximum impact. MIS Q. 337–355 (2013). https://doi.org/10.25300/MISQ/2013/37.2.01

  12. Gschwandtner, T., Gärtner, J., Aigner, W., Miksch, S.: A taxonomy of dirty time-oriented data. In: Quirchmayr, G., Basl, J., You, I., Xu, L., Weippl, E. (eds.) CD-ARES 2012. LNCS, vol. 7465, pp. 58–72. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32498-7_5

    Chapter  Google Scholar 

  13. van der Aalst, W., et al.: Process mining manifesto. In: Daniel, F., Barkaoui, K., Dustdar, S. (eds.) BPM 2011. LNBIP, vol. 99, pp. 169–194. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28108-2_19

    Chapter  Google Scholar 

  14. Johnson, A.E.W., et al.: MIMIC-III, a freely accessible database. Sci. Data 3, 160035 (2016). https://doi.org/10.1038/sdata.2016.35

    Article  Google Scholar 

  15. Kherbouche, M.O., Laga, N., Masse, P.A.: Towards a better assessment of event logs quality. In: IEEE SSCI 2016, pp. 1–8. IEEE (2016). https://doi.org/10.1109/SSCI.2016.7849946

  16. Krippendorff, K.: Reliability in content analysis. Hum. Commun. Res. 30(3), 411–433 (2004). https://doi.org/10.1111/j.1468-2958.2004.tb00738.x

    Article  Google Scholar 

  17. Lee, Y.W., Pipino, L.L., Funk, J.D., Wang, R.Y.: Journey to Data Quality. The MIT Press, Cambridge (2009). https://doi.org/10.7551/mitpress/4037.001.0001

    Book  Google Scholar 

  18. Lee, Y.W., Strong, D.M., Kahn, B.K., Wang, R.Y.: AIMQ: a methodology for information quality assessment. Inf. Manag. 40(2), 133–146 (2002). https://doi.org/10.1016/S0378-7206(02)00043-5

    Article  Google Scholar 

  19. Lu, X., et al.: Semi-supervised log pattern detection and exploration using event concurrence and contextual information. In: Panetto, H., et al. (eds.) OTM 2017. LNCS, vol. 10573, pp. 154–174. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69462-7_11

    Chapter  Google Scholar 

  20. Martin, N., Swennen, M., Depaire, B., Jans, M., Caris, A., Vanhoof, K.: Retrieving batch organisation of work insights from event logs. Decis. Support Syst. 100, 119–128 (2017). https://doi.org/10.1016/j.dss.2017.02.012

    Article  Google Scholar 

  21. Peffers, K., Tuunanen, T., Rothenberger, M.A., Chatterjee, S.: A design science research methodology for information systems research. J. Manag. Inf. Syst. 24(3), 45–77 (2007). https://doi.org/10.2753/MIS0742-1222240302

    Article  Google Scholar 

  22. Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Commun. ACM 45(4), 211–218 (2002). https://doi.org/10.1145/505248.506010

    Article  Google Scholar 

  23. Sattler, K.U.: Data quality dimensions. In: Liu, L., Özsu, T.M. (eds.) Encyclopedia of Database Systems, pp. 612–615. Springer, Heidelberg (2009). https://doi.org/10.1007/978-0-387-39940-9_108

    Chapter  Google Scholar 

  24. Sonnenberg, C., vom Brocke, J.: Evaluations in the science of the artificial – reconsidering the build-evaluate pattern in design science research. In: Peffers, K., Rothenberger, M., Kuechler, B. (eds.) DESRIST 2012. LNCS, vol. 7286, pp. 381–397. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29863-9_28

    Chapter  Google Scholar 

  25. Stvilia, B., Gasser, L., Twidale, M.B., Smith, L.C.: A framework for information quality assessment. J. Am. Soc. Inf. Sci. Technol. 58, 1720–1733 (2007). https://doi.org/10.1002/asi.20652

    Article  Google Scholar 

  26. Suriadi, S., Andrews, R., ter Hofstede, A.H.M., Wynn, M.T.: Event log imperfection patterns for process mining. Inf. Syst. 64, 132–150 (2017). https://doi.org/10.1016/j.is.2016.07.011

    Article  Google Scholar 

  27. Tax, N., Lu, X., Sidorova, N., Fahland, D., van der Aalst, W.M.P.: The imprecisions of precision measures in process mining. Inf. Process. Lett. 135, 1–8 (2018). https://doi.org/10.1016/j.ipl.2018.01.013

    Article  MathSciNet  MATH  Google Scholar 

  28. Verbeek, H.M.W., Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: XES, XESame, and ProM 6. In: Soffer, P., Proper, E. (eds.) CAiSE Forum 2010. LNBIP, vol. 72, pp. 60–75. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-17722-4_5

    Chapter  Google Scholar 

  29. Wand, Y., Wang, R.Y.: Anchoring data quality dimensions in ontological foundations. Commun. ACM 39(11), 86–95 (1996). https://doi.org/10.1145/240455.240479

    Article  Google Scholar 

  30. Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manag. Inf. Syst. 12(4), 5–33 (1996). https://doi.org/10.1080/07421222.1996.11518099

    Article  Google Scholar 

  31. Webster, J., Watson, R.T.: Analyzing the past to prepare for the future: writing a literature review. MIS Q. 26(2), 13–23 (2002). https://doi.org/10.5555/2017160.2017162

    Article  Google Scholar 

  32. Wynn, M.T., Sadiq, S.: Responsible process mining - a data quality perspective. In: Hildebrandt, T., van Dongen, B.F., Röglinger, M., Mendling, J. (eds.) BPM 2019. LNCS, vol. 11675, pp. 10–15. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26619-6_2

    Chapter  Google Scholar 

Download references

Acknowledgements

We would like to thank Queensland’s Motor Accident Insurance Commission and the Queensland University of Technology for allowing us access to their datasets.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dominik Andreas Fischer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fischer, D.A., Goel, K., Andrews, R., van Dun, C.G.J., Wynn, M.T., Röglinger, M. (2020). Enhancing Event Log Quality: Detecting and Quantifying Timestamp Imperfections. In: Fahland, D., Ghidini, C., Becker, J., Dumas, M. (eds) Business Process Management. BPM 2020. Lecture Notes in Computer Science(), vol 12168. Springer, Cham. https://doi.org/10.1007/978-3-030-58666-9_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58666-9_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58665-2

  • Online ISBN: 978-3-030-58666-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics