Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
note

Obfuscating the Dataset: Impacts and Applications

Published: 30 September 2023 Publication History

Abstract

Obfuscating a dataset by adding random noises to protect the privacy of sensitive samples in the training dataset is crucial to prevent data leakage to untrusted parties when dataset sharing is essential. We conduct comprehensive experiments to investigate how the dataset obfuscation can affect the resultant model weights —in terms of the model accuracy, ℓ2-distance-based model distance, and level of data privacy—and discuss the potential applications with the proposed Privacy, Utility, and Distinguishability (PUD)-triangle diagram to visualize the requirement preferences. Our experiments are based on the popular MNIST and CIFAR-10 datasets under both independent and identically distributed (IID) and non-IID settings. Significant results include a tradeoff between the model accuracy and privacy level and a tradeoff between the model difference and privacy level. The results indicate broad application prospects for training outsourcing and guarding against attacks in federated learning both of which have been increasingly attractive in many areas, particularly learning in edge computing.

References

[1]
He Li, Kaoru Ota, and Mianxiong Dong. 2018. Learning IoT in edge: Deep learning for the internet of things with edge computing. IEEE Network 32, 1 (2018), 96–101. DOI:
[2]
Jiasi Chen and Xukan Ran. 2019. Deep learning with edge computing: A review. Proc. IEEE 107, 8 (2019), 1655–1674. DOI:
[3]
Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS’16). ACM, New York, 308–318. DOI:
[4]
Reza Shokri and Vitaly Shmatikov. 2015. Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS’15). ACM, New York, 1310–1321. DOI:
[5]
Ehsan Hesamifard, Hassan Takabi, Mehdi Ghasemi, and Rebecca N. Wright. 2018. Privacy-preserving machine learning as a service. In Proceedings on Privacy Enhancing Technologies 2018, 3 (2018), 123–142.
[6]
Olga Ohrimenko, Felix Schuster, Cedric Fournet, Aastha Mehta, Sebastian Nowozin, Kapil Vaswani, and Manuel Costa. 2016. Oblivious multi-party machine learning on trusted processors. In Proceedings of the 25th USENIX Security Symposium (USENIX Security 16). USENIX Association, Austin, TX, 619–636. 978-1-931971-32-4https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/ohrimenko.
[7]
Raphael Bost, Raluca Ada Popa, Stephen Tu, and Shafi Goldwasser. 2014. Machine Learning Classification over Encrypted Data. Cryptology ePrint Archive, Paper 2014/331. (2014). DOI:; https://eprint.iacr.org/2014/331.
[8]
Abbas Acar, Hidayet Aksu, A. Selcuk Uluagac, and Mauro Conti. 2018. A survey on homomorphic encryption schemes: Theory and implementation. Comput. Surveys 51, 4, Article 79 (Jul2018), 35 pages. DOI:
[9]
Tianwei Zhang, Zecheng He, and Ruby B. Lee. 2018. Privacy-preserving machine learning through data obfuscation. CoRR abs/1807.01860 (2018). arXiv:1807.01860. http://arxiv.org/abs/1807.01860.
[10]
Congzheng Song, Thomas Ristenpart, and Vitaly Shmatikov. 2017. Machine learning models that remember too much. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS’17). ACM, New York, 587–601. 9781450349468 DOI:
[11]
Hengrui Jia, Mohammad Yaghini, Christopher A. Choquette-Choo, Natalie Dullerud, Anvith Thudi, Varun Chandrasekaran, and Nicolas Papernot. 2021. Proof-of-learning: Definitions and practice. In Proceedings of the 2021 IEEE Symposium on Security and Privacy (SP). 1039–1056. DOI:
[12]
Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. 2017. Machine learning with adversaries: Byzantine tolerant gradient descent. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/f4b9ec30ad9f68f89b29639786cb62ef-Paper.pdf.
[13]
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324. DOI:
[14]
Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. University of Toronto.
[15]
Yunlong Lu, Xiaohong Huang, Yueyue Dai, Sabita Maharjan, and Yan Zhang. 2020. Differentially private asynchronous federated learning for mobile edge computing in urban informatics. IEEE Transactions on Industrial Informatics 16, 3 (2020), 2134–2143. DOI:
[16]
Qinbin Li, Yiqun Diao, Quan Chen, and Bingsheng He. 2021. Federated learning on non-iid data silos: An experimental study. arXiv preprint arXiv:2102.02079 (2021).

Cited By

View all
  • (2024)Data and Model Poisoning Backdoor Attacks on Wireless Federated Learning, and the Defense Mechanisms: A Comprehensive SurveyIEEE Communications Surveys & Tutorials10.1109/COMST.2024.336145126:3(1861-1897)Online publication date: Nov-2025
  • (2024)Over-the-air federated learning: Status quo, open challenges, and future directionsFundamental Research10.1016/j.fmre.2024.01.011Online publication date: Feb-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 14, Issue 5
October 2023
472 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3615589
  • Editor:
  • Huan Liu
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 September 2023
Online AM: 23 May 2023
Accepted: 12 May 2023
Revised: 16 March 2023
Received: 18 September 2022
Published in TIST Volume 14, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data obfuscation
  2. privacy
  3. data leakage
  4. machine learning
  5. federated learning
  6. edge computing

Qualifiers

  • Note

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)187
  • Downloads (Last 6 weeks)8
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Data and Model Poisoning Backdoor Attacks on Wireless Federated Learning, and the Defense Mechanisms: A Comprehensive SurveyIEEE Communications Surveys & Tutorials10.1109/COMST.2024.336145126:3(1861-1897)Online publication date: Nov-2025
  • (2024)Over-the-air federated learning: Status quo, open challenges, and future directionsFundamental Research10.1016/j.fmre.2024.01.011Online publication date: Feb-2024

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media