research-article

Open access

SEAT: Similarity Encoder by Adversarial Training for Detecting Model Extraction Attack Queries

Authors:

Zhanyuan Zhang,

David WagnerAuthors Info & Claims

AISec '21: Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security

Pages 37 - 48

https://doi.org/10.1145/3474369.3486863

Published: 15 November 2021 Publication History

Abstract

Given black-box access to the prediction API, model extraction attacks can steal the functionality of models deployed in the cloud. In this paper, we introduce the SEAT detector, which detects black-box model extraction attacks so that the defender can terminate malicious accounts. SEAT has a similarity encoder trained by adversarial training. Using the similarity encoder, SEAT detects accounts that make queries that indicate a model extraction attack in progress and cancels these accounts. We evaluate our defense against existing model extraction attacks and against new adaptive attacks introduced in this paper. Our results show that even against adaptive attackers, SEAT increases the cost of model extraction attacks by 3.8 times to 16 times.

References

[1]

Buse Gul Atli, Sebastian Szyller, Mika Juuti, Samuel Marchal, and N. Asokan. 2020. Extraction of Complex DNN Models: Real Threat or Boogeyman? arxiv: 1910.05429 [cs.LG]

[2]

Lejla Batina, Shivam Bhasin, Dirmanto Jap, and Stjepan Picek. 2018. CSI Neural Network: Using Side-channels to Recover Your Artificial Neural Network Information. arxiv: 1810.09076 [cs.CR]

[3]

Wieland Brendel, Jonas Rauber, and Matthias Bethge. 2018. Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models. arxiv: 1712.04248 [stat.ML]

[4]

Nicholas Carlini, Matthew Jagielski, and Ilya Mironov. 2020. Cryptanalytic Extraction of Neural Network Models. arxiv: 2003.04884 [cs.LG]

[5]

Nicholas Carlini and David Wagner. 2017. Towards Evaluating the Robustness of Neural Networks. arxiv: 1608.04644 [cs.CR]

[6]

Jianbo Chen, Michael I. Jordan, and Martin J. Wainwright. 2020. HopSkipJumpAttack: A Query-Efficient Decision-Based Attack. arxiv: 1904.02144 [cs.LG]

[7]

Steven Chen, Nicholas Carlini, and David Wagner. 2019. Stateful Detection of Black-Box Adversarial Attacks. arxiv: 1907.05587 [cs.CR]

[8]

Jacson Rodrigues Correia-Silva, Rodrigo Berriel, Claudine Santos Badue, Alberto Ferreira de Souza, and Thiago Oliveira-Santos. 2018. Copycat CNN: Stealing Knowledge by Persuading Confession with Random Non-Labeled Data. 2018 International Joint Conference on Neural Networks (IJCNN) (2018), 1--8.

[9]

Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, and Amos J. Storkey. 2018. CINIC-10 is not ImageNet or CIFAR-10. arxiv: 1810.03505 [cs.CV]

[10]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248--255https://doi.org/10.1109/CVPR.2009.5206848

[11]

John (JD) Douceur. 2002. The Sybil Attack. In Proceedings of 1st International Workshop on Peer-to-Peer Systems (IPTPS) proceedings of 1st international workshop on peer-to-peer systems (iptps) ed.). https://www.microsoft.com/en-us/research/publication/the-sybil-attack/

[12]

John C Duchi, Michael I Jordan, Martin J Wainwright, and Andre Wibisono. 2012. Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods. In NIPS. Citeseer, 1448--1456.

[13]

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks. arxiv: 1406.2661 [stat.ML]

[14]

Greg Griffin, Alex Holub, and Pietro Perona. 2006. Caltech256 Image Dataset. (2006). http://www.vision.caltech.edu/Image_Datasets/Caltech256/

[15]

R. Hadsell, S. Chopra, and Y. LeCun. 2006. Dimensionality Reduction by Learning an Invariant Mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), Vol. 2. 1735--1742. https://doi.org/10.1109/CVPR.2006.100

Digital Library

[16]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arxiv: 1512.03385 [cs.CV]

[17]

Sebastian Houben, Johannes Stallkamp, Jan Salmen, Marc Schlipsing, and Christian Igel. 2013. Detection of Traffic Signs in Real-World Images: The German Traffic Sign Detection Benchmark. In International Joint Conference on Neural Networks.

[18]

Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. 2007. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. Technical Report 07-49. University of Massachusetts, Amherst.

[19]

Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. 2018. Black-box Adversarial Attacks with Limited Queries and Information. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 2137--2146. http://proceedings.mlr.press/v80/ilyas18a.html

[20]

Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arxiv: 1502.03167 [cs.LG]

[21]

Matthew Jagielski, Nicholas Carlini, David Berthelot, Alex Kurakin, and Nicolas Papernot. 2020. High Accuracy and High Fidelity Extraction of Neural Networks. arxiv: 1909.01838 [cs.LG]

[22]

Mika Juuti, Sebastian Szyller, Samuel Marchal, and N. Asokan. 2019. PRADA: Protecting against DNN Model Stealing Attacks. arxiv: 1805.02628 [cs.CR]

[23]

Sanjay Kariyappa, Atul Prakash, and Moinuddin K Qureshi. 2021. Protecting DNN s from Theft using an Ensemble of Diverse Models. In International Conference on Learning Representations. https://openreview.net/forum?id=LucJxySuJcE

[24]

S. Kariyappa and M. K. Qureshi. 2020. Defending Against Model Stealing Attacks With Adaptive Misinformation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 767--775. https://doi.org/10.1109/CVPR42600.2020.00085

[25]

Alex Krizhevsky. 2012. Learning Multiple Layers of Features from Tiny Images. University of Toronto (05 2012).

[26]

Ya Le and X. Yang. 2015. Tiny ImageNet Visual Recognition Challenge.

[27]

Huiying Li, Shawn Shan, Emily Wenger, Jiayun Zhang, Haitao Zheng, and Ben Y. Zhao. 2020. Blacklight: Defending Black-Box Adversarial Attacks on Deep Neural Networks. arxiv: 2006.14042 [cs.CR]

[28]

Daniel Lowd and Christopher Meek. 2005. Adversarial Learning. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (Chicago, Illinois, USA) (KDD '05). Association for Computing Machinery, New York, NY, USA, 641--647. https://doi.org/10.1145/1081870.1081950

Digital Library

[29]

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2019. Towards Deep Learning Models Resistant to Adversarial Attacks. arxiv: 1706.06083 [stat.ML]

[30]

Sébastien Marcel and Yann Rodriguez. 2010. Torchvision the Machine-Vision Package of Torch. In Proceedings of the 18th ACM International Conference on Multimedia (Firenze, Italy) (MM '10). Association for Computing Machinery, New York, NY, USA, 1485--1488. https://doi.org/10.1145/1873951.1874254

Digital Library

[31]

Smitha Milli, Ludwig Schmidt, Anca D. Dragan, and Moritz Hardt. 2018. Model Reconstruction from Model Explanations. arxiv: 1807.05185 [stat.ML]

[32]

Seungyong Moon, Gaon An, and Hyun Oh Song. 2019. Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization. arxiv: 1905.06635 [cs.LG]

[33]

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Ng. 2011. Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS (01 2011).

[34]

Tribhuvanesh Orekondy, Bernt Schiele, and Mario Fritz. 2018. Knockoff Nets: Stealing Functionality of Black-Box Models. arxiv: 1812.02766 [cs.CV]

[35]

Tribhuvanesh Orekondy, Bernt Schiele, and Mario Fritz. 2020. Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks. arxiv: 1906.10908 [cs.LG]

[36]

Soham Pal, Yash Gupta, Aditya Shukla, Aditya Kanade, Shirish Shevade, and Vinod Ganapathy. 2020. ActiveThief: Model Extraction Using Active Learning and Unannotated Public Data. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 01 (Apr. 2020), 865--872. https://doi.org/10.1609/aaai.v34i01.5432

[37]

Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. 2017. Practical Black-Box Attacks against Machine Learning. arxiv: 1602.02697 [cs.CR]

[38]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. (2017).

[39]

A. Quattoni and A. Torralba. 2009. Recognizing indoor scenes. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 413--420. https://doi.org/10.1109/CVPR.2009.5206537

[40]

Jonas Rauber, Wieland Brendel, and Matthias Bethge. 2017. Foolbox: A Python toolbox to benchmark the robustness of machine learning models. In Reliable Machine Learning in the Wild Workshop, 34th International Conference on Machine Learning. http://arxiv.org/abs/1707.04131

[41]

Jonas Rauber, Roland Zimmermann, Matthias Bethge, and Wieland Brendel. 2020. Foolbox Native: Fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX. Journal of Open Source Software, Vol. 5, 53 (2020), 2607. https://doi.org/10.21105/joss.02607

[42]

Sara Sabour, Yanshuai Cao, Fartash Faghri, and David J. Fleet. 2016. Adversarial Manipulation of Deep Representations. arxiv: 1511.05122 [cs.CV]

[43]

Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arxiv: 1409.1556 [cs.CV]

[44]

Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. 2016. Stealing Machine Learning Models via Prediction APIs. arxiv: 1609.02943 [cs.CR]

[45]

Jean-Baptiste Truong, Pratyush Maini, Robert J. Walls, and Nicolas Papernot. 2021. Data-Free Model Extraction. arxiv: 2011.14779 [cs.LG]

[46]

Daniel Ponsa Vassileios Balntas, Edgar Riba and Krystian Mikolajczyk. 2016. Learning local feature descriptors with triplets and shallow convolutional neural networks. In Proceedings of the British Machine Vision Conference (BMVC), Edwin R. Hancock Richard C. Wilson and William A. P. Smith (Eds.). BMVA Press, Article 119, 11 pages. https://doi.org/10.5244/C.30.119

[47]

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. 2011. The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001. California Institute of Technology.

[48]

Mengjia Yan, Christopher W. Fletcher, and Josep Torrellas. 2020. Cache Telepathy: Leveraging Shared Resource Attacks to Learn DNN Architectures. In 29th USENIX Security Symposium (USENIX Security 20). USENIX Association, 2003-2020. https://www.usenix.org/conference/usenixsecurity20/presentation/yan

[49]

Zhi Yang, Christo Wilson, Xiao Wang, Tingting Gao, Ben Y. Zhao, and Yafei Dai. 2014. Uncovering Social Network Sybils in the Wild. ACM Trans. Knowl. Discov. Data, Vol. 8, 1, Article 2 (Feb. 2014), 29 pages. https://doi.org/10.1145/2556609

Digital Library

[50]

Honggang Yu, Kaichen Yang, Teng Zhang, Yun-Yun Tsai, Tsung-Yi Ho, and Yier Jin. 2020. CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples. Network and Distributed System Security Symposium. https://doi.org/10.14722/ndss.2020.24178

Cited By

Kalinin MSoshnev MKonoplev A(2024)Protection of Computational Machine Learning Models against Extraction ThreatAutomatic Control and Computer Sciences10.3103/S014641162308008457:8(996-1004)Online publication date: 29-Feb-2024
https://dl.acm.org/doi/10.3103/S0146411623080084
Miura TShibahara TYanai N(2024)MEGEX: Data-Free Model Extraction Attack Against Gradient-Based Explainable AIProceedings of the 2nd ACM Workshop on Secure and Trustworthy Deep Learning Systems10.1145/3665451.3665533(56-66)Online publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1145/3665451.3665533
Wang XChen KMa XChen ZChen JJiang YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt TuningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681032(6212-6221)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681032
Show More Cited By

Index Terms

SEAT: Similarity Encoder by Adversarial Training for Detecting Model Extraction Attack Queries
1. Computing methodologies
  1. Artificial intelligence
2. Security and privacy

Recommendations

Spanning attack: reinforce black-box attacks with unlabeled data
Abstract
Adversarial black-box attacks aim to craft adversarial perturbations by querying input–output pairs of machine learning models. They are widely used to evaluate the robustness of pre-trained models. However, black-box attacks often suffer from the ...
Stateful Defenses for Machine Learning Models Are Not Yet Secure Against Black-box Attacks
CCS '23: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security

Recent work has proposed stateful defense models (SDMs) as a compelling strategy to defend against a black-box attacker who only has query access to the model, as is common for online machine learning platforms. Such stateful defenses aim to defend ...
It's Not What It Looks Like: Manipulating Perceptual Hashing based Applications
CCS '21: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security

Perceptual hashing is widely used to search or match similar images for digital forensics and cybercrime study. Unfortunately, the robustness of perceptual hashing algorithms is not well understood in these contexts. In this paper, we examine the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AISec '21: Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security

November 2021

210 pages

ISBN:9781450386579

DOI:10.1145/3474369

Program Chairs:
Nicholas Carlini
Google Brain
,
Ambra Demontis
University of Cagliari
,
Yizheng Chen
University of California, Berkeley

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2021

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CCS '21

Sponsor:

SIGSAC

CCS '21: 2021 ACM SIGSAC Conference on Computer and Communications Security

November 15, 2021

Virtual Event, Republic of Korea

Acceptance Rates

Overall Acceptance Rate 94 of 231 submissions, 41%

Upcoming Conference

CCS '25

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 13 - 17, 2025

Taipei , Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
1,063
Total Downloads

Downloads (Last 12 months)233
Downloads (Last 6 weeks)25

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kalinin MSoshnev MKonoplev A(2024)Protection of Computational Machine Learning Models against Extraction ThreatAutomatic Control and Computer Sciences10.3103/S014641162308008457:8(996-1004)Online publication date: 29-Feb-2024
https://dl.acm.org/doi/10.3103/S0146411623080084
Miura TShibahara TYanai N(2024)MEGEX: Data-Free Model Extraction Attack Against Gradient-Based Explainable AIProceedings of the 2nd ACM Workshop on Secure and Trustworthy Deep Learning Systems10.1145/3665451.3665533(56-66)Online publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1145/3665451.3665533
Wang XChen KMa XChen ZChen JJiang YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt TuningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681032(6212-6221)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681032
Sadeghzadeh ASobhanian ADehghan FJalili R(2024)HODA: Hardness-Oriented Detection of Model Extraction AttacksIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.332060919(1429-1439)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIFS.2023.3320609
Peng HBao SLi L(2024)A Survey of Security Protection Methods for Deep Learning ModelIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.33143985:4(1533-1553)Online publication date: Apr-2024
https://doi.org/10.1109/TAI.2023.3314398
Zhang CLuo SPan LLu CZhang Z(2024)Making models more secureComputers and Electrical Engineering10.1016/j.compeleceng.2024.109266117:COnline publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1016/j.compeleceng.2024.109266
Vineetha BSuryaprasad JShylaja SHonnavalli P(2024)A Deep Dive into Deep Learning-Based Adversarial Attacks and Defenses in Computer Vision: From a Perspective of CybersecurityIntelligent Sustainable Systems10.1007/978-981-99-7569-3_28(341-356)Online publication date: 16-Feb-2024
https://doi.org/10.1007/978-981-99-7569-3_28
Oliynyk DMayer RRauber A(2023)I Know What You Trained Last Summer: A Survey on Stealing Machine Learning Models and DefencesACM Computing Surveys10.1145/359529255:14s(1-41)Online publication date: 17-Jul-2023
https://dl.acm.org/doi/10.1145/3595292
Zhang CLi ZLiang HLiang JLiu XZhu L(2023)IPES: Improved Pre-trained Encoder Stealing Attack in Contrastive Learning2023 IEEE International Conferences on Internet of Things (iThings) and IEEE Green Computing & Communications (GreenCom) and IEEE Cyber, Physical & Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics)10.1109/iThings-GreenCom-CPSCom-SmartData-Cybermatics60724.2023.00078(354-361)Online publication date: 17-Dec-2023
https://doi.org/10.1109/iThings-GreenCom-CPSCom-SmartData-Cybermatics60724.2023.00078
Rashid ASuch J(2023)MalProtect: Stateful Defense Against Adversarial Query Attacks in ML-Based Malware DetectionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.329395918(4361-4376)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TIFS.2023.3293959
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents