research-article

Free access

Just Accepted

ProcessGAN: Generating Privacy-Preserving Time-Aware Process Data with Conditional Generative Adversarial Nets

Authors:

Keyi Li,

Sen Yang,

Travis M. Sullivan,

Randall S. Burd,

Ivan MarsicAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data

Accepted on 26 July 2024

https://doi.org/10.1145/3687464

Online AM: 28 August 2024 Publication History

PDF eReader

Abstract

Process data constructed from event logs provides valuable insights into procedural dynamics over time. The confidential information in process data, together with the data's intricate nature, makes the datasets not sharable and challenging to collect. Consequently, research is limited using process data and analytics in the process mining domain. In this study, we introduced a synthetic process data generation task to address the limitation of sharable process data. We introduced a generative adversarial network, called ProcessGAN, to generate process data with activity sequences and corresponding timestamps. ProcessGAN consists of a transformer-based network as the generator, and a time-aware self-attention network as the discriminator. It can generate privacy-preserving process data from random noise. ProcessGAN considers the duration of the process and time intervals between activities to generate realistic activity sequences with timestamps. We evaluated ProcessGAN on five real-world datasets, two that are public and three collected in medical domains that are private. To evaluate the synthetic data, in addition to statistical metrics, we trained a supervised model to score the synthetic processes. We also used process mining to discover workflows for synthetic medical processes and had domain experts evaluate the clinical applicability of the synthetic workflows. ProcessGAN outperformed the existing generative models in generating complex processes with valid parallel pathways. The synthetic process data generated by ProcessGAN better represented the long-range dependencies between activities, a feature relevant to complicated medical and other processes. The timestamps generated by the ProcessGAN model showed similar distributions with the authentic timestamps. In addition, we trained a transformer-based network to generate synthetic contexts (e.g., patient demographics) that were associated with the synthetic processes. The synthetic contexts generated by our model outperformed the baseline models, with the distributions similar to the authentic contexts. We conclude that ProcessGAN can generate sharable synthetic process data indistinguishable from authentic data. Our source code is available in https://github.com/raaachli/ProcessGAN.

References

[1]

Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. 2015. Scheduled sampling for sequence prediction with recurrent neural networks. Advances in neural information processing systems 28 (2015).

Abstract

References

Index Terms

Recommendations

Privacy-preserving process mining: A microaggregation-based approach

Privacy-Preserving Synthetic Data Generation for Recommendation Systems

Differential Privacy Generative Adversarial Networks in Recommendation Systems

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations