Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3433210.3453079acmconferencesArticle/Chapter ViewAbstractPublication Pagesasia-ccsConference Proceedingsconference-collections
research-article
Open access

REFIT: A Unified Watermark Removal Framework For Deep Learning Systems With Limited Data

Published: 04 June 2021 Publication History

Abstract

Training deep neural networks from scratch could be computationally expensive and requires a lot of training data. Recent work has explored different watermarking techniques to protect the pre-trained deep neural networks from potential copyright infringements. However, these techniques could be vulnerable to watermark removal attacks. In this work, we propose REFIT, a unified watermark removal framework based on fine-tuning, which does not rely on the knowledge of the watermarks, and is effective against a wide range of watermarking schemes. In particular, we conduct a comprehensive study of a realistic attack scenario where the adversary has limited training data, which has not been emphasized in prior work on attacks against watermarking schemes. To effectively remove the watermarks without compromising the model functionality under this weak threat model, we propose two techniques that are incorporated into our fine-tuning framework: (1) an adaption of the elastic weight consolidation (EWC) algorithm, which is originally proposed for mitigating the catastrophic forgetting phenomenon; and (2) unlabeled data augmentation (AU), where we leverage auxiliary unlabeled data from other sources. Our extensive evaluation shows the effectiveness of REFIT against diverse watermark embedding schemes. In particular, both EWC and AU significantly decrease the amount of labeled training data needed for effective watermark removal, and the unlabeled data samples used for AU do not necessarily need to be drawn from the same distribution as the benign data for model evaluation. The experimental results demonstrate that our fine-tuning based watermark removal attacks could pose real threats to the copyright of pre-trained models, and thus highlight the importance of further investigating the watermarking problem and proposing more robust watermark embedding schemes against the attacks.

Supplementary Material

MP4 File (ASIA-CCS21-fp179.mp4)
Video - REFIT: A Unified Watermark Removal Framework For Deep Learning Systems With Limited Data

References

[1]
Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, and Joseph Keshet. 2018. Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In 27th $$USENIX$$ Security Symposium.
[2]
Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim vS rndić, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. 2013. Evasion attacks against machine learning at test time. In Joint European conference on machine learning and knowledge discovery in databases.
[3]
Battista Biggio, Blaine Nelson, and Pavel Laskov. 2012. Poisoning attacks against support vector machines. arXiv preprint arXiv:1206.6389 (2012).
[4]
Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP).
[5]
Bryant Chen, Wilka Carvalho, Nathalie Baracaldo, Heiko Ludwig, Benjamin Edwards, Taesung Lee, Ian Molloy, and Biplav Srivastava. 2018. Detecting backdoor attacks on deep neural networks by activation clustering. arXiv preprint arXiv:1811.03728 (2018).
[6]
Huili Chen, Cheng Fu, Jishen Zhao, and Farinaz Koushanfar. 2019. DeepInspect: A Black-box Trojan Detection and Mitigation Framework for Deep Neural Networks. International Joint Conferences on Artificial Intelligence (IJCAI) (2019).
[7]
Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. 2017. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526 (2017).
[8]
Patryk Chrabaszcz, Ilya Loshchilov, and Frank Hutter. 2017. A downsampled variant of imagenet as an alternative to the cifar datasets. arXiv preprint arXiv:1707.08819 (2017).
[9]
Adam Coates, Andrew Ng, and Honglak Lee. 2011. An analysis of single-layer networks in unsupervised feature learning. In The fourteenth international conference on artificial intelligence and statistics.
[10]
Robert Coop, Aaron Mishtal, and Itamar Arel. 2013. Ensemble learning in fixed expansion layer networks for mitigating catastrophic forgetting. IEEE transactions on neural networks and learning systems (2013).
[11]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition.
[12]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In North American Chapter of the Association for Computational Linguistics.
[13]
Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model inversion attacks that exploit confidence information and basic countermeasures. In The 22nd ACM SIGSAC Conference on Computer and Communications Security.
[14]
Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David Page, and Thomas Ristenpart. 2014. Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing. In 23rd $$USENIX$$ Security Symposium ($$USENIX$$ Security 14).
[15]
Yansong Gao, Chang Xu, Derui Wang, Shiping Chen, Damith C Ranasinghe, and Surya Nepal. 2019. STRIP: A Defence Against Trojan Attacks on Deep Neural Networks. arXiv preprint arXiv:1902.06531 (2019).
[16]
Ian J Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, and Yoshua Bengio. 2013. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211 (2013).
[17]
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. International Conference on Learning Representations (ICLR) (2015).
[18]
Yves Grandvalet and Yoshua Bengio. 2005. Semi-supervised learning by entropy minimization. In Advances in neural information processing systems.
[19]
Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. 2017. BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain. arXiv preprint arXiv:1708.06733 (2017).
[20]
Wenbo Guo, Lun Wang, Xinyu Xing, Min Du, and Dawn Song. 2019. TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems. arXiv preprint arXiv:1908.01763 (2019).
[21]
Jamie Hayes, Luca Melis, George Danezis, and Emiliano De Cristofaro. 2019. LOGAN: Membership inference attacks against generative models. Proceedings on Privacy Enhancing Technologies (2019).
[22]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE conference on computer vision and pattern recognition.
[23]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
[24]
Dorjan Hitaj and Luigi V Mancini. 2018. Have you stolen my model? evasion attacks against deep neural network watermarking techniques. arXiv preprint arXiv:1809.00615 (2018).
[25]
Ronald Kemker, Marc McClure, Angelina Abitino, Tyler L Hayes, and Christopher Kanan. 2018. Measuring catastrophic forgetting in neural networks. In AAAI conference on artificial intelligence.
[26]
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences (2017).
[27]
Pang Wei Koh and Percy Liang. 2017. Understanding Black-box Predictions via Influence Functions. In International Conference on Machine Learning. 1885--1894.
[28]
Alex Krizhevsky et al. 2009. Learning multiple layers of features from tiny images. Technical Report. Citeseer.
[29]
Bo Li, Yining Wang, Aarti Singh, and Yevgeniy Vorobeychik. 2016. Data poisoning attacks on factorization-based collaborative filtering. In Advances in neural information processing systems.
[30]
Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2018. Fine-pruning: Defending against backdooring attacks on deep neural networks. In International Symposium on Research in Attacks, Intrusions, and Defenses.
[31]
Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. 2017a. Trojaning Attack on Neural Networks. In Network and Distributed System Security Symposium (NDSS).
[32]
Yuntao Liu, Yang Xie, and Ankur Srivastava. 2017b. Neural Trojans. In The 35th IEEE International Conference on Computer Design.
[33]
David Lopez-Paz and Marc'Aurelio Ranzato. 2017. Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems.
[34]
Erwan Le Merrer, Patrick Perez, and Gilles Trédan. 2019. Adversarial frontier stitching for remote neural network watermarking. Journal of Neural Computing and Applications (2019).
[35]
Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii. 2018. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE transactions on pattern analysis and machine intelligence (2018).
[36]
Luis Mu noz-González, Battista Biggio, Ambra Demontis, Andrea Paudice, Vasin Wongrassamee, Emil C Lupu, and Fabio Roli. 2017. Towards poisoning of deep learning algorithms with back-gradient optimization. In 10th ACM Workshop on Artificial Intelligence and Security.
[37]
Ryota Namba and Jun Sakuma. 2019. Robust Watermarking of Neural Network with Exponential Weighting. In 2019 ACM Asia Conference on Computer and Communications Security.
[38]
Blaine Nelson, Marco Barreno, Fuching Jack Chi, Anthony D Joseph, Benjamin IP Rubinstein, Udam Saini, Charles A Sutton, J Doug Tygar, and Kai Xia. 2008. Exploiting Machine Learning to Subvert Your Spam Filter. LEET (2008).
[39]
Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. 2016. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP).
[40]
Bita Darvish Rouhani, Huili Chen, and Farinaz Koushanfar. 2018. Deepsigns: A generic watermarking framework for ip protection of deep learning models. arXiv preprint arXiv:1804.00750 (2018).
[41]
Ali Shafahi, W Ronny Huang, Mahyar Najibi, Octavian Suciu, Christoph Studer, Tudor Dumitras, and Tom Goldstein. 2018. Poison frogs! targeted clean-label poisoning attacks on neural networks. Advances in Neural Information Processing Systems (2018).
[42]
Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. 2017. Continual learning with deep generative replay. In Advances in Neural Information Processing Systems.
[43]
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP).
[44]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations (ICLR) (2015).
[45]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).
[46]
Brandon Tran, Jerry Li, and Aleksander Madry. 2018. Spectral signatures in backdoor attacks. In Advances in Neural Information Processing Systems.
[47]
Yusuke Uchida, Yuki Nagai, Shigeyuki Sakazawa, and Shin'ichi Satoh. 2017. Embedding watermarks into deep neural networks. In ACM International Conference on Multimedia Retrieval.
[48]
Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y Zhao. 2019. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In IEEE Symposium on Security and Privacy.
[49]
Ziqi Yang, Hung Dang, and Ee-Chien Chang. 2019. Effectiveness of Distillation Attack and Countermeasure on Neural Network Watermarking. arXiv preprint arXiv:1906.06046 (2019).
[50]
Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks?. In Advances in neural information processing systems.
[51]
Friedemann Zenke, Ben Poole, and Surya Ganguli. 2017. Continual learning through synaptic intelligence. In International Conference on Machine Learning.
[52]
Jialong Zhang, Zhongshu Gu, Jiyong Jang, Hui Wu, Marc Ph Stoecklin, Heqing Huang, and Ian Molloy. 2018. Protecting intellectual property of deep neural networks with watermarking. In Asia Conference on Computer and Communications Security.

Cited By

View all
  • (2024)Identifying Appropriate Intellectual Property Protection Mechanisms for Machine Learning Models: A Systematization of Watermarking, Fingerprinting, Model Access, and AttacksIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.327013535:10(13082-13100)Online publication date: Oct-2024
  • (2024)RemovalNet: DNN Fingerprint Removal AttacksIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.331506421:4(2645-2658)Online publication date: Jul-2024
  • (2024)A Self-Supervised CNN for Image Watermark RemovalIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.337583134:8(7566-7576)Online publication date: Aug-2024
  • Show More Cited By

Index Terms

  1. REFIT: A Unified Watermark Removal Framework For Deep Learning Systems With Limited Data

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ASIA CCS '21: Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security
        May 2021
        975 pages
        ISBN:9781450382878
        DOI:10.1145/3433210
        • General Chairs:
        • Jiannong Cao,
        • Man Ho Au,
        • Program Chairs:
        • Zhiqiang Lin,
        • Moti Yung
        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 04 June 2021

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. fine-tuning
        2. neural networks
        3. watermark removal

        Qualifiers

        • Research-article

        Funding Sources

        • DARPA D3M
        • National Science Foundation

        Conference

        ASIA CCS '21
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 418 of 2,322 submissions, 18%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)574
        • Downloads (Last 6 weeks)101
        Reflects downloads up to 10 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Identifying Appropriate Intellectual Property Protection Mechanisms for Machine Learning Models: A Systematization of Watermarking, Fingerprinting, Model Access, and AttacksIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.327013535:10(13082-13100)Online publication date: Oct-2024
        • (2024)RemovalNet: DNN Fingerprint Removal AttacksIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.331506421:4(2645-2658)Online publication date: Jul-2024
        • (2024)A Self-Supervised CNN for Image Watermark RemovalIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.337583134:8(7566-7576)Online publication date: Aug-2024
        • (2024)Perceptive Self-Supervised Learning Network for Noisy Image Watermark RemovalIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.334967834:8(7069-7079)Online publication date: Aug-2024
        • (2024)An Explainable Intellectual Property Protection Method for Deep Neural Networks Based on Intrinsic FeaturesIEEE Transactions on Artificial Intelligence10.1109/TAI.2024.33883895:9(4649-4659)Online publication date: Sep-2024
        • (2024)Data-Free Watermark for Deep Neural Networks by Truncated Adversarial DistillationICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446261(4480-4484)Online publication date: 14-Apr-2024
        • (2024)Backdoor Attacks to Deep Neural Networks: A Survey of the Literature, Challenges, and Future Research DirectionsIEEE Access10.1109/ACCESS.2024.335581612(29004-29023)Online publication date: 2024
        • (2024)When deep learning meets watermarkingComputer Standards & Interfaces10.1016/j.csi.2023.10383089:COnline publication date: 25-Jun-2024
        • (2024)Deep neural networks watermark via universal deep hiding and metric learningNeural Computing and Applications10.1007/s00521-024-09469-536:13(7421-7438)Online publication date: 21-Feb-2024
        • (2023)Free Fine-tuning: A Plug-and-Play Watermarking Scheme for Deep Neural NetworksProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612331(8463-8474)Online publication date: 26-Oct-2023
        • Show More Cited By

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media