note

Obfuscating the Dataset: Impacts and Applications

Authors:

Ren Ping LiuAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology, Volume 14, Issue 5

Article No.: 85, Pages 1 - 15

https://doi.org/10.1145/3597936

Published: 30 September 2023 Publication History

Abstract

Obfuscating a dataset by adding random noises to protect the privacy of sensitive samples in the training dataset is crucial to prevent data leakage to untrusted parties when dataset sharing is essential. We conduct comprehensive experiments to investigate how the dataset obfuscation can affect the resultant model weights —in terms of the model accuracy, ℓ²-distance-based model distance, and level of data privacy—and discuss the potential applications with the proposed Privacy, Utility, and Distinguishability (PUD)-triangle diagram to visualize the requirement preferences. Our experiments are based on the popular MNIST and CIFAR-10 datasets under both independent and identically distributed (IID) and non-IID settings. Significant results include a tradeoff between the model accuracy and privacy level and a tradeoff between the model difference and privacy level. The results indicate broad application prospects for training outsourcing and guarding against attacks in federated learning both of which have been increasingly attractive in many areas, particularly learning in edge computing.

References

[1]

He Li, Kaoru Ota, and Mianxiong Dong. 2018. Learning IoT in edge: Deep learning for the internet of things with edge computing. IEEE Network 32, 1 (2018), 96–101. DOI:

[2]

Jiasi Chen and Xukan Ran. 2019. Deep learning with edge computing: A review. Proc. IEEE 107, 8 (2019), 1655–1674. DOI:

[3]

Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS’16). ACM, New York, 308–318. DOI:

Digital Library

[4]

Reza Shokri and Vitaly Shmatikov. 2015. Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS’15). ACM, New York, 1310–1321. DOI:

Digital Library

[5]

Ehsan Hesamifard, Hassan Takabi, Mehdi Ghasemi, and Rebecca N. Wright. 2018. Privacy-preserving machine learning as a service. In Proceedings on Privacy Enhancing Technologies 2018, 3 (2018), 123–142.

[6]

Olga Ohrimenko, Felix Schuster, Cedric Fournet, Aastha Mehta, Sebastian Nowozin, Kapil Vaswani, and Manuel Costa. 2016. Oblivious multi-party machine learning on trusted processors. In Proceedings of the 25th USENIX Security Symposium (USENIX Security 16). USENIX Association, Austin, TX, 619–636. 978-1-931971-32-4https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/ohrimenko.

[7]

Raphael Bost, Raluca Ada Popa, Stephen Tu, and Shafi Goldwasser. 2014. Machine Learning Classification over Encrypted Data. Cryptology ePrint Archive, Paper 2014/331. (2014). DOI:; https://eprint.iacr.org/2014/331.

[8]

Abbas Acar, Hidayet Aksu, A. Selcuk Uluagac, and Mauro Conti. 2018. A survey on homomorphic encryption schemes: Theory and implementation. Comput. Surveys 51, 4, Article 79 (Jul2018), 35 pages. DOI:

Digital Library

[9]

Tianwei Zhang, Zecheng He, and Ruby B. Lee. 2018. Privacy-preserving machine learning through data obfuscation. CoRR abs/1807.01860 (2018). arXiv:1807.01860. http://arxiv.org/abs/1807.01860.

[10]

Congzheng Song, Thomas Ristenpart, and Vitaly Shmatikov. 2017. Machine learning models that remember too much. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS’17). ACM, New York, 587–601. 9781450349468 DOI:

Digital Library

[11]

Hengrui Jia, Mohammad Yaghini, Christopher A. Choquette-Choo, Natalie Dullerud, Anvith Thudi, Varun Chandrasekaran, and Nicolas Papernot. 2021. Proof-of-learning: Definitions and practice. In Proceedings of the 2021 IEEE Symposium on Security and Privacy (SP). 1039–1056. DOI:

[12]

Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. 2017. Machine learning with adversaries: Byzantine tolerant gradient descent. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/f4b9ec30ad9f68f89b29639786cb62ef-Paper.pdf.

[13]

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324. DOI:

[14]

Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. University of Toronto.

[15]

Yunlong Lu, Xiaohong Huang, Yueyue Dai, Sabita Maharjan, and Yan Zhang. 2020. Differentially private asynchronous federated learning for mobile edge computing in urban informatics. IEEE Transactions on Industrial Informatics 16, 3 (2020), 2134–2143. DOI:

[16]

Qinbin Li, Yiqun Diao, Quan Chen, and Bingsheng He. 2021. Federated learning on non-iid data silos: An experimental study. arXiv preprint arXiv:2102.02079 (2021).

Cited By

Wan YQu YNi WXiang YGao LHossain E(2024)Data and Model Poisoning Backdoor Attacks on Wireless Federated Learning, and the Defense Mechanisms: A Comprehensive SurveyIEEE Communications Surveys & Tutorials10.1109/COMST.2024.336145126:3(1861-1897)Online publication date: Nov-2025
https://doi.org/10.1109/COMST.2024.3361451
Xiao BYu XNi WWang XPoor H(2024)Over-the-air federated learning: Status quo, open challenges, and future directionsFundamental Research10.1016/j.fmre.2024.01.011Online publication date: Feb-2024
https://doi.org/10.1016/j.fmre.2024.01.011

Index Terms

Obfuscating the Dataset: Impacts and Applications
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
2. Security and privacy
  1. Security services
    1. Privacy-preserving protocols

Recommendations

BronzeGate: real-time transactional data obfuscation for GoldenGate
EDBT '10: Proceedings of the 13th International Conference on Extending Database Technology

Data privacy laws have appeared recently, such as the HIPAA laws for protecting medical records, and the PCI guidelines for protecting Credit Card information. Data privacy can be defined as maintaining the privacy of Personal Identifiable Information (...
Extending a re-identification risk-based anonymisation framework and evaluating its impact on data mining classifiers

Preserving sensitive information in data mining processes is one of the major issues in the context of big data. Handling huge volumes of data demands techniques to assure that private data is not accessible to non-authorised users. One of these ...
Privacy preserving data obfuscation for inherently clustered data

Privacy is defined as the freedom from unauthorised intrusion. The availability of public records along with intelligent search engines and data mining tools allow easy access to useful information. They also serve as a haven for individuals with ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 14, Issue 5

October 2023

472 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/3615589

Editor:
Huan Liu
Arizona State University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 September 2023

Online AM: 23 May 2023

Accepted: 12 May 2023

Revised: 16 March 2023

Received: 18 September 2022

Published in TIST Volume 14, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Note

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
238
Total Downloads

Downloads (Last 12 months)187
Downloads (Last 6 weeks)8

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wan YQu YNi WXiang YGao LHossain E(2024)Data and Model Poisoning Backdoor Attacks on Wireless Federated Learning, and the Defense Mechanisms: A Comprehensive SurveyIEEE Communications Surveys & Tutorials10.1109/COMST.2024.336145126:3(1861-1897)Online publication date: Nov-2025
https://doi.org/10.1109/COMST.2024.3361451
Xiao BYu XNi WWang XPoor H(2024)Over-the-air federated learning: Status quo, open challenges, and future directionsFundamental Research10.1016/j.fmre.2024.01.011Online publication date: Feb-2024
https://doi.org/10.1016/j.fmre.2024.01.011

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents