research-article

HedgeCut: Maintaining Randomised Trees for Low-Latency Machine Unlearning

Authors:

Sebastian Schelter,

Stefan Grafberger, and

Ted DunningAuthors Info & Claims

SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

June 2021

Pages 1545 - 1557

https://doi.org/10.1145/3448016.3457239

Published: 18 June 2021 Publication History

Get Access

Abstract

Software systems that learn from user data with machine learning (ML) have become ubiquitous over the last years. Recent law such as the "General Data Protection Regulation" (GDPR) requires organisations that process personal data to delete user data upon request (enacting the "right to be forgotten"). However, this regulation does not only require the deletion of user data from databases, but also applies to ML models that have been learned from the stored data. We therefore argue that ML applications should offer users to unlearn their data from trained models in a timely manner. We explore how fast this unlearning can be done under the constraints imposed by real world deployments, and introduce the problem of low-latency machine unlearning: maintaining a deployed ML model in-place under the removal of a small fraction of training samples without retraining.

We propose HedgeCut, a classification model based on an ensemble of randomised decision trees, which is designed to answer unlearning requests with low latency. We detail how to efficiently implement HedgeCut with vectorised operators for decision tree learning. We conduct an experimental evaluation on five privacy-sensitive datasets, where we find that HedgeCut can unlearn training samples with a latency of around 100 microseconds and answers up to 36,000 prediction requests per second, while providing a training time and predictive accuracy similar to widely used implementations of tree-based ML models such as Random Forests.

Supplementary Material

MP4 File (3448016.3457239.mp4)

Software systems that learn from user data with machine learning (ML) have become ubiquitous over the last years. Recent law requires organisations that process personal data to delete user data upon request (enacting the right to be forgotten). However, it is not sufficient to merely delete the user data from databases. ML models that have been learned from the stored data often resemble a lossy compressed version of the data, and pose a potential privacy risk. We therefore argue that ML applications should also provide "privacy through deletion" by offering users to "unlearn" their data from trained models. In order to account for the constraints imposed by real world deployments, we introduce the problem of low-latency machine unlearning: maintaining a deployed ML model in place under the removal of a small fraction of training samples without reaccessing its training data. We propose "HedgeCut", a classification model based on an ensemble of randomized decision trees, which is designed to answer unlearning requests with low latency. We detail how to efficiently implement HedgeCut with vectorised operators for decision tree learning. We conduct an experimental evaluation on five privacy sensitive datasets, where we find that HedgeCut can unlearn training samples with a latency of around 100 microseconds and answers up to 36,000 prediction requests per second, while providing a training time and predictive accuracy similar to widely used implementations of tree-based ML models such as Random Forests.

Download
46.35 MB

References

[1]

Pierre Andrews, Aditya Kalro, and Alexander Sidorov. 2016. Productionizing machine learning pipelines at scale. ML Systems workshop at ICML (2016).

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

When Machine Unlearning Jeopardizes Privacy

Machine Unlearning in Gradient Boosting Decision Trees

Lazy Machine Unlearning Strategy for Random Forests

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations