Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3548606.3559355acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

SSLGuard: A Watermarking Scheme for Self-supervised Learning Pre-trained Encoders

Published: 07 November 2022 Publication History

Abstract

Self-supervised learning is an emerging machine learning (ML) paradigm. Compared to supervised learning which leverages high-quality labeled datasets, self-supervised learning relies on unlabeled datasets to pre-train powerful encoders which can then be treated as feature extractors for various downstream tasks. The huge amount of data and computational resources consumption makes the encoders themselves become the valuable intellectual property of the model owner. Recent research has shown that the ML model's copyright is threatened by model stealing attacks, which aim to train a surrogate model to mimic the behavior of a given model. We empirically show that pre-trained encoders are highly vulnerable to model stealing attacks. However, most of the current efforts of copyright protection algorithms such as watermarking concentrate on classifiers. Meanwhile, the intrinsic challenges of pre-trained encoder's copyright protection remain largely unstudied. We fill the gap by proposing SSLGuard, the first watermarking scheme for pre-trained encoders. Given a clean pre-trained encoder, SSLGuard injects a watermark into it and outputs a watermarked version. The shadow training technique is also applied to preserve the watermark under potential model stealing attacks. Our extensive evaluation shows that SSLGuard is effective in watermark injection and verification, and it is robust against model stealing and other watermark removal attacks such as input noising, output perturbing, overwriting, model pruning, and fine-tuning.

Supplementary Material

MP4 File (CCS22-fp0146.mp4)
Presentation video for the paper "SSLGuard: A Watermarking Scheme for Self-supervised Learning Pre-trained Encoders"

References

[1]
https://openai.com/api/.
[2]
https://www.clarifai.com/.
[3]
https://www.cs.toronto.edu/~kriz/cifar.html.
[4]
http://yann.lecun.com/exdb/mnist/.
[5]
Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, and Joseph Keshet. Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring. In USENIX Security Symposium (USENIX Security), pages 1615--1631. USENIX, 2018.
[6]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language Models are Few-Shot Learners. In Annual Conference on Neural Information Processing Systems (NeurIPS). NeurIPS, 2020.
[7]
T. Tony Cai, Jianqing Fan, and Tiefeng Jiang. Distributions of Angles in Random Packing on Spheres. Journal of Machine Learning Research, 2013.
[8]
Xiaoyu Cao, Jinyuan Jia, and Neil Zhenqiang Gong. IPGuard: Protecting Intel- lectual Property of Deep Neural Networks via Fingerprinting the Classification Boundary. In ACM Asia Conference on Computer and Communications Security (ASIACCS), pages 14--25. ACM, 2021.
[9]
Nicholas Carlini and David Wagner. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods. CoRR abs/1705.07263, 2017.
[10]
Varun Chandrasekaran, Kamalika Chaudhuri, Irene Giacomelli, Somesh Jha, and Songbai Yan. Model Extraction and Active Learning. CoRR abs/1811.02054, 2018.
[11]
Varun Chandrasekaran, Kamalika Chaudhuri, Irene Giacomelli, Somesh Jha, and Songbai Yan. Exploring Connections Between Active Learning and Model Extraction. In USENIX Security Symposium (USENIX Security), pages 1309--1326. USENIX, 2020.
[12]
Jialuo Chen, Jingyi Wang, Tinglan Peng, Youcheng Sun, Peng Cheng, Shouling Ji, Xingjun Ma, Bo Li, and Dawn Song. Copy, Right? A Testing Framework for Copyright Protection of Deep Learning Models. In IEEE Symposium on Security and Privacy (S&P). IEEE, 2022.
[13]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. A Simple Framework for Contrastive Learning of Visual Representations. In In- ternational Conference on Machine Learning (ICML), pages 1597--1607. PMLR, 2020.
[14]
Xiaoyi Chen, Ahmed Salem, Michael Backes, Shiqing Ma, Qingni Shen, Zhonghai Wu, and Yang Zhang. BadNL: Backdoor Attacks Against NLP Models with Semantic-preserving Improvements. In Annual Computer Security Applications Conference (ACSAC), pages 554--569. ACSAC, 2021.
[15]
Xinlei Chen, Haoqi Fan, Ross B. Girshick, and Kaiming He. Improved Baselines with Momentum Contrastive Learning. CoRR abs/2003.04297, 2020.
[16]
Adam Coates, Andrew Y. Ng, and Honglak Lee. An Analysis of Single-Layer Networks in Unsupervised Feature Learning. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 215--223. JMLR, 2011.
[17]
David DeFazio and Arti Ramesh. Adversarial Model Extraction on Graph Neural Networks. CoRR abs/1912.07721, 2019.
[18]
Yunjie Ge, Qian Wang, Baolin Zheng, Xinlu Zhuang, Qi Li, Chao Shen, and Cong Wang. Anti-Distillation Backdoor Attacks: Backdoors Can Really Survive in Knowledge Distillation. In ACM International Conference on Multimedia (MM), pages 826--834. ACM, 2021.
[19]
John M. Giorgi, Osvald Nitski, Bo Wang, and Gary D. Bader. DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations. In Annual Meeting of the Association for Computational Linguistics (ACL), pages 879--895. ACL, 2021.
[20]
Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and Harness- ing Adversarial Examples. In International Conference on Learning Representations (ICLR), 2015.
[21]
Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Ávila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, and Michal Valko. Bootstrap Your Own Latent - A New Approach to Self- Supervised Learning. In Annual Conference on Neural Information Processing Systems (NeurIPS). NeurIPS, 2020.
[22]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross B. Girshick. Momentum Contrast for Unsupervised Visual Representation Learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 9726--9735. IEEE, 2020.
[23]
Xinlei He, Zheng Li, Weilin Xu, Cory Cornelius, and Yang Zhang. Membership- Doctor: Comprehensive Assessment of Membership Inference Against Machine Learning Models. CoRR abs/2208.10445, 2022.
[24]
Xinlei He, Hongbin Liu, Neil Zhenqiang Gong, and Yang Zhang. Semi-Leak: Membership Inference Attacks Against Semi-supervised Learning. In European Conference on Computer Vision (ECCV). Springer, 2022.
[25]
Xinlei He, Rui Wen, Yixin Wu, Michael Backes, Yun Shen, and Yang Zhang. Node- Level Membership Inference Attacks Against Graph Neural Networks. CoRR abs/2102.05429, 2021.
[26]
Xinlei He and Yang Zhang. Quantifying and Mitigating Privacy Risks of Contrastive Learning. In ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 845--863. ACM, 2021.
[27]
Matthew Jagielski, Nicholas Carlini, David Berthelot, Alex Kurakin, and Nicolas Papernot. High Accuracy and High Fidelity Extraction of Neural Networks. In USENIX Security Symposium (USENIX Security), pages 1345--1362. USENIX, 2020.
[28]
Hengrui Jia, Christopher A. Choquette-Choo, Varun Chandrasekaran, and Nicolas Papernot. Entangled Watermarks as a Defense against Model Extraction. In USENIX Security Symposium (USENIX Security), pages 1937--1954. USENIX, 2021.
[29]
Jinyuan Jia, Hongbin Liu, and Neil Zhenqiang Gong. 10 Security and Privacy Problems in Self-Supervised Learning. CoRR abs/2110.15444, 2021.
[30]
Jinyuan Jia, Yupei Liu, and Neil Zhenqiang Gong. BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning. In IEEE Symposium on Security and Privacy (S&P). IEEE, 2022.
[31]
Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR), 2015.
[32]
Kalpesh Krishna, Gaurav Singh Tomar, Ankur P. Parikh, Nicolas Papernot, and Mohit Iyyer. Thieves on Sesame Street! Model Extraction of BERT-based APIs. In International Conference on Learning Representations (ICLR), 2020.
[33]
Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial Examples in the Physical World. CoRR abs/1607.02533, 2016.
[34]
Zheng Li, Chengyu Hu, Yang Zhang, and Shanqing Guo. How to Prove Your Model Belongs to You: A Blind-Watermark based Framework to Protect Intellectual Property of DNN. In Annual Computer Security Applications Conference (ACSAC), pages 126--137. ACM, 2019.
[35]
Zheng Li, Yiyong Liu, Xinlei He, Ning Yu, Michael Backes, and Yang Zhang. Auditing Membership Leakages of Multi-Exit Networks. CoRR abs/2208.11180, 2022.
[36]
Zheng Li and Yang Zhang. Membership Leakage in Label-Only Exposures. In ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 880--895. ACM, 2021.
[37]
Hongbin Liu, Jinyuan Jia, Wenjie Qu, and Neil Zhenqiang Gong. EncoderMI: Membership Inference against Pre-trained Encoders in Contrastive Learning. In ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2021.
[38]
Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks. In Research in Attacks, Intrusions, and Defenses (RAID), pages 273--294. Springer, 2018.
[39]
Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into Transferable Adversarial Examples and Black-box Attacks. CoRR abs/1611.02770, 2016.
[40]
Nils Lukas, Edward Jiang, Xinda Li, and Florian Kerschbaum. SoK: How Robust is Image Classification Deep Neural Network Watermarking? In IEEE Symposium on Security and Privacy (S&P). IEEE, 2022.
[41]
Erwan Le Merrer, Patrick Perez, and Gilles Trédan. Adversarial Frontier Stitching for Remote Neural Network Watermarking. CoRR abs/1711.01894, 2017.
[42]
Tribhuvanesh Orekondy, Bernt Schiele, and Mario Fritz. Knockoff Nets: Stealing Functionality of Black-Box Models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4954--4963. IEEE, 2019.
[43]
Nicolas Papernot, Patrick D. McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical Black-Box Attacks Against Machine Learning. In ACM Asia Conference on Computer and Communications Security (ASIACCS), pages 506--519. ACM, 2017.
[44]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning Transferable Visual Models From Natural Language Supervision. In International Conference on Machine Learning (ICML), pages 8748--8763. PMLR, 2021.
[45]
Bita Darvish Rouhani, Huili Chen, and Farinaz Koushanfar. DeepSigns: A Generic Watermarking Framework for IP Protection of Deep Learning Models. CoRR abs/1804.00750, 2018.
[46]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. CoRR abs/1409.0575, 2015.
[47]
Aniruddha Saha, Akshayvarun Subramanya, and Hamed Pirsiavash. Hidden Trigger Backdoor Attacks. In AAAI Conference on Artificial Intelligence (AAAI), pages 11957--11965. AAAI, 2020.
[48]
Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal Berrang, Mario Fritz, and Michael Backes. ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models. In Network and Distributed System Security Symposium (NDSS). Internet Society, 2019.
[49]
Yun Shen, Xinlei He, Yufei Han, and Yang Zhang. Model Stealing Attacks Against Inductive Graph Neural Networks. In IEEE Symposium on Security and Privacy (S&P). IEEE, 2022.
[50]
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Member- ship Inference Attacks Against Machine Learning Models. In IEEE Symposium on Security and Privacy (S&P), pages 3--18. IEEE, 2017.
[51]
Liwei Song and Prateek Mittal. Systematic Evaluation of Privacy Risks of Machine Learning Models. In USENIX Security Symposium (USENIX Security). USENIX, 2021.
[52]
Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. The German Traffic Sign Recognition Benchmark: A Multi-class Classification Competition. In International Joint Conference on Neural Networks (IJCNN), pages 1453--1460. IEEE, 2011.
[53]
Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. Stealing Machine Learning Models via Prediction APIs. In USENIX Security Symposium (USENIX Security), pages 601--618. USENIX, 2016.
[54]
Yusuke Uchida, Yuki Nagai, Shigeyuki Sakazawa, and Shin'ichi Satoh. Embedding Watermarks into Deep Neural Networks. In International Conference on Multimedia Retrieval (ICMR), pages 269--277. ACM, 2017.
[55]
Laurens van der Maaten and Geoffrey Hinton. Visualizing Data using t-SNE. Journal of Machine Learning Research, 2008.
[56]
Bang Wu, Xiangwen Yang, Shirui Pan, and Xingliang Yuan. Model Extraction Attacks on Graph Neural Networks: Taxonomy and Realization. CoRR abs/2010.12751, 2020.
[57]
Yuxin Wu and Kaiming He. Group Normalization. In European Conference on Computer Vision (ECCV), pages 3--19. Springer, 2018.
[58]
Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. CoRR abs/1708.07747, 2017.
[59]
Yuanshun Yao, Huiying Li, Haitao Zheng, and Ben Y. Zhao. Latent Backdoor Attacks on Deep Neural Networks. In ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 2041--2055. ACM, 2019.
[60]
Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. Graph Contrastive Learning with Augmentations. In Annual Conference on Neural Information Processing Systems (NeurIPS). NeurIPS, 2020.
[61]
Jialong Zhang, Zhongshu Gu, Jiyong Jang, Hui Wu, Marc Ph. Stoecklin, Heqing Huang, and Ian Molloy. Protecting Intellectual Property of Deep Neural Networks with Watermarking. In ACM Asia Conference on Computer and Communications Security (ASIACCS), pages 159--172. ACM, 2018.
[62]
Michael Zhu and Suyog Gupta. To Prune, or Not to Prune: Exploring the Efficacy of Pruning for Model Compression. In International Conference on Learning Representations (ICLR), 2018.

Cited By

View all
  • (2024)Protecting object detection models from model extraction attack via feature space coverageProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/48(431-439)Online publication date: 3-Aug-2024
  • (2024)Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model MergingProceedings of the 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis10.1145/3689217.3690614(69-76)Online publication date: 19-Nov-2024
  • (2024)A Unified Membership Inference Method for Visual Self-supervised Encoder via Part-aware CapabilityProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690202(1241-1255)Online publication date: 2-Dec-2024
  • Show More Cited By

Index Terms

  1. SSLGuard: A Watermarking Scheme for Self-supervised Learning Pre-trained Encoders

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CCS '22: Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security
      November 2022
      3598 pages
      ISBN:9781450394505
      DOI:10.1145/3548606
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 November 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. dnns watermark
      2. model stealing attacks
      3. self-supervised learning

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      CCS '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

      Upcoming Conference

      CCS '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)205
      • Downloads (Last 6 weeks)21
      Reflects downloads up to 09 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Protecting object detection models from model extraction attack via feature space coverageProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/48(431-439)Online publication date: 3-Aug-2024
      • (2024)Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model MergingProceedings of the 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis10.1145/3689217.3690614(69-76)Online publication date: 19-Nov-2024
      • (2024)A Unified Membership Inference Method for Visual Self-supervised Encoder via Part-aware CapabilityProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690202(1241-1255)Online publication date: 2-Dec-2024
      • (2024)Towards regulatory generative AI in ophthalmology healthcare: a security and privacy perspectiveBritish Journal of Ophthalmology10.1136/bjo-2024-325167(bjo-2024-325167)Online publication date: 4-Jun-2024
      • (2024)MEA-Defender: A Robust Watermark against Model Extraction Attack2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00099(2515-2533)Online publication date: 19-May-2024
      • (2024)Test-Time Poisoning Attacks Against Test-Time Adaptation Models2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00072(1306-1324)Online publication date: 19-May-2024
      • (2024)Towards accountable and privacy-preserving blockchain-based access control for data sharingJournal of Information Security and Applications10.1016/j.jisa.2024.10386685:COnline publication date: 1-Sep-2024
      • (2024)When deep learning meets watermarkingComputer Standards & Interfaces10.1016/j.csi.2023.10383089:COnline publication date: 25-Jun-2024
      • (2024)PtbStolen: Pre-trained Encoder Stealing Through Perturbed SamplesEmerging Information Security and Applications10.1007/978-981-99-9614-8_1(1-19)Online publication date: 4-Jan-2024
      • (2023)AWEncoder: Adversarial Watermarking Pre-Trained Encoders in Contrastive LearningApplied Sciences10.3390/app1306353113:6(3531)Online publication date: 9-Mar-2023
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media