Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3474369.3486863acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Open access

SEAT: Similarity Encoder by Adversarial Training for Detecting Model Extraction Attack Queries

Published: 15 November 2021 Publication History

Abstract

Given black-box access to the prediction API, model extraction attacks can steal the functionality of models deployed in the cloud. In this paper, we introduce the SEAT detector, which detects black-box model extraction attacks so that the defender can terminate malicious accounts. SEAT has a similarity encoder trained by adversarial training. Using the similarity encoder, SEAT detects accounts that make queries that indicate a model extraction attack in progress and cancels these accounts. We evaluate our defense against existing model extraction attacks and against new adaptive attacks introduced in this paper. Our results show that even against adaptive attackers, SEAT increases the cost of model extraction attacks by 3.8 times to 16 times.

References

[1]
Buse Gul Atli, Sebastian Szyller, Mika Juuti, Samuel Marchal, and N. Asokan. 2020. Extraction of Complex DNN Models: Real Threat or Boogeyman? arxiv: 1910.05429 [cs.LG]
[2]
Lejla Batina, Shivam Bhasin, Dirmanto Jap, and Stjepan Picek. 2018. CSI Neural Network: Using Side-channels to Recover Your Artificial Neural Network Information. arxiv: 1810.09076 [cs.CR]
[3]
Wieland Brendel, Jonas Rauber, and Matthias Bethge. 2018. Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models. arxiv: 1712.04248 [stat.ML]
[4]
Nicholas Carlini, Matthew Jagielski, and Ilya Mironov. 2020. Cryptanalytic Extraction of Neural Network Models. arxiv: 2003.04884 [cs.LG]
[5]
Nicholas Carlini and David Wagner. 2017. Towards Evaluating the Robustness of Neural Networks. arxiv: 1608.04644 [cs.CR]
[6]
Jianbo Chen, Michael I. Jordan, and Martin J. Wainwright. 2020. HopSkipJumpAttack: A Query-Efficient Decision-Based Attack. arxiv: 1904.02144 [cs.LG]
[7]
Steven Chen, Nicholas Carlini, and David Wagner. 2019. Stateful Detection of Black-Box Adversarial Attacks. arxiv: 1907.05587 [cs.CR]
[8]
Jacson Rodrigues Correia-Silva, Rodrigo Berriel, Claudine Santos Badue, Alberto Ferreira de Souza, and Thiago Oliveira-Santos. 2018. Copycat CNN: Stealing Knowledge by Persuading Confession with Random Non-Labeled Data. 2018 International Joint Conference on Neural Networks (IJCNN) (2018), 1--8.
[9]
Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, and Amos J. Storkey. 2018. CINIC-10 is not ImageNet or CIFAR-10. arxiv: 1810.03505 [cs.CV]
[10]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248--255https://doi.org/10.1109/CVPR.2009.5206848
[11]
John (JD) Douceur. 2002. The Sybil Attack. In Proceedings of 1st International Workshop on Peer-to-Peer Systems (IPTPS) proceedings of 1st international workshop on peer-to-peer systems (iptps) ed.). https://www.microsoft.com/en-us/research/publication/the-sybil-attack/
[12]
John C Duchi, Michael I Jordan, Martin J Wainwright, and Andre Wibisono. 2012. Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods. In NIPS. Citeseer, 1448--1456.
[13]
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks. arxiv: 1406.2661 [stat.ML]
[14]
Greg Griffin, Alex Holub, and Pietro Perona. 2006. Caltech256 Image Dataset. (2006). http://www.vision.caltech.edu/Image_Datasets/Caltech256/
[15]
R. Hadsell, S. Chopra, and Y. LeCun. 2006. Dimensionality Reduction by Learning an Invariant Mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), Vol. 2. 1735--1742. https://doi.org/10.1109/CVPR.2006.100
[16]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arxiv: 1512.03385 [cs.CV]
[17]
Sebastian Houben, Johannes Stallkamp, Jan Salmen, Marc Schlipsing, and Christian Igel. 2013. Detection of Traffic Signs in Real-World Images: The German Traffic Sign Detection Benchmark. In International Joint Conference on Neural Networks.
[18]
Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. 2007. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. Technical Report 07-49. University of Massachusetts, Amherst.
[19]
Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. 2018. Black-box Adversarial Attacks with Limited Queries and Information. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 2137--2146. http://proceedings.mlr.press/v80/ilyas18a.html
[20]
Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arxiv: 1502.03167 [cs.LG]
[21]
Matthew Jagielski, Nicholas Carlini, David Berthelot, Alex Kurakin, and Nicolas Papernot. 2020. High Accuracy and High Fidelity Extraction of Neural Networks. arxiv: 1909.01838 [cs.LG]
[22]
Mika Juuti, Sebastian Szyller, Samuel Marchal, and N. Asokan. 2019. PRADA: Protecting against DNN Model Stealing Attacks. arxiv: 1805.02628 [cs.CR]
[23]
Sanjay Kariyappa, Atul Prakash, and Moinuddin K Qureshi. 2021. Protecting DNN s from Theft using an Ensemble of Diverse Models. In International Conference on Learning Representations. https://openreview.net/forum?id=LucJxySuJcE
[24]
S. Kariyappa and M. K. Qureshi. 2020. Defending Against Model Stealing Attacks With Adaptive Misinformation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 767--775. https://doi.org/10.1109/CVPR42600.2020.00085
[25]
Alex Krizhevsky. 2012. Learning Multiple Layers of Features from Tiny Images. University of Toronto (05 2012).
[26]
Ya Le and X. Yang. 2015. Tiny ImageNet Visual Recognition Challenge.
[27]
Huiying Li, Shawn Shan, Emily Wenger, Jiayun Zhang, Haitao Zheng, and Ben Y. Zhao. 2020. Blacklight: Defending Black-Box Adversarial Attacks on Deep Neural Networks. arxiv: 2006.14042 [cs.CR]
[28]
Daniel Lowd and Christopher Meek. 2005. Adversarial Learning. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (Chicago, Illinois, USA) (KDD '05). Association for Computing Machinery, New York, NY, USA, 641--647. https://doi.org/10.1145/1081870.1081950
[29]
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2019. Towards Deep Learning Models Resistant to Adversarial Attacks. arxiv: 1706.06083 [stat.ML]
[30]
Sébastien Marcel and Yann Rodriguez. 2010. Torchvision the Machine-Vision Package of Torch. In Proceedings of the 18th ACM International Conference on Multimedia (Firenze, Italy) (MM '10). Association for Computing Machinery, New York, NY, USA, 1485--1488. https://doi.org/10.1145/1873951.1874254
[31]
Smitha Milli, Ludwig Schmidt, Anca D. Dragan, and Moritz Hardt. 2018. Model Reconstruction from Model Explanations. arxiv: 1807.05185 [stat.ML]
[32]
Seungyong Moon, Gaon An, and Hyun Oh Song. 2019. Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization. arxiv: 1905.06635 [cs.LG]
[33]
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Ng. 2011. Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS (01 2011).
[34]
Tribhuvanesh Orekondy, Bernt Schiele, and Mario Fritz. 2018. Knockoff Nets: Stealing Functionality of Black-Box Models. arxiv: 1812.02766 [cs.CV]
[35]
Tribhuvanesh Orekondy, Bernt Schiele, and Mario Fritz. 2020. Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks. arxiv: 1906.10908 [cs.LG]
[36]
Soham Pal, Yash Gupta, Aditya Shukla, Aditya Kanade, Shirish Shevade, and Vinod Ganapathy. 2020. ActiveThief: Model Extraction Using Active Learning and Unannotated Public Data. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 01 (Apr. 2020), 865--872. https://doi.org/10.1609/aaai.v34i01.5432
[37]
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. 2017. Practical Black-Box Attacks against Machine Learning. arxiv: 1602.02697 [cs.CR]
[38]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. (2017).
[39]
A. Quattoni and A. Torralba. 2009. Recognizing indoor scenes. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 413--420. https://doi.org/10.1109/CVPR.2009.5206537
[40]
Jonas Rauber, Wieland Brendel, and Matthias Bethge. 2017. Foolbox: A Python toolbox to benchmark the robustness of machine learning models. In Reliable Machine Learning in the Wild Workshop, 34th International Conference on Machine Learning. http://arxiv.org/abs/1707.04131
[41]
Jonas Rauber, Roland Zimmermann, Matthias Bethge, and Wieland Brendel. 2020. Foolbox Native: Fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX. Journal of Open Source Software, Vol. 5, 53 (2020), 2607. https://doi.org/10.21105/joss.02607
[42]
Sara Sabour, Yanshuai Cao, Fartash Faghri, and David J. Fleet. 2016. Adversarial Manipulation of Deep Representations. arxiv: 1511.05122 [cs.CV]
[43]
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arxiv: 1409.1556 [cs.CV]
[44]
Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. 2016. Stealing Machine Learning Models via Prediction APIs. arxiv: 1609.02943 [cs.CR]
[45]
Jean-Baptiste Truong, Pratyush Maini, Robert J. Walls, and Nicolas Papernot. 2021. Data-Free Model Extraction. arxiv: 2011.14779 [cs.LG]
[46]
Daniel Ponsa Vassileios Balntas, Edgar Riba and Krystian Mikolajczyk. 2016. Learning local feature descriptors with triplets and shallow convolutional neural networks. In Proceedings of the British Machine Vision Conference (BMVC), Edwin R. Hancock Richard C. Wilson and William A. P. Smith (Eds.). BMVA Press, Article 119, 11 pages. https://doi.org/10.5244/C.30.119
[47]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. 2011. The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001. California Institute of Technology.
[48]
Mengjia Yan, Christopher W. Fletcher, and Josep Torrellas. 2020. Cache Telepathy: Leveraging Shared Resource Attacks to Learn DNN Architectures. In 29th USENIX Security Symposium (USENIX Security 20). USENIX Association, 2003-2020. https://www.usenix.org/conference/usenixsecurity20/presentation/yan
[49]
Zhi Yang, Christo Wilson, Xiao Wang, Tingting Gao, Ben Y. Zhao, and Yafei Dai. 2014. Uncovering Social Network Sybils in the Wild. ACM Trans. Knowl. Discov. Data, Vol. 8, 1, Article 2 (Feb. 2014), 29 pages. https://doi.org/10.1145/2556609
[50]
Honggang Yu, Kaichen Yang, Teng Zhang, Yun-Yun Tsai, Tsung-Yi Ho, and Yier Jin. 2020. CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples. Network and Distributed System Security Symposium. https://doi.org/10.14722/ndss.2020.24178

Cited By

View all
  • (2024)Protection of Computational Machine Learning Models against Extraction ThreatAutomatic Control and Computer Sciences10.3103/S014641162308008457:8(996-1004)Online publication date: 29-Feb-2024
  • (2024)MEGEX: Data-Free Model Extraction Attack Against Gradient-Based Explainable AIProceedings of the 2nd ACM Workshop on Secure and Trustworthy Deep Learning Systems10.1145/3665451.3665533(56-66)Online publication date: 2-Jul-2024
  • (2024)AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt TuningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681032(6212-6221)Online publication date: 28-Oct-2024
  • Show More Cited By

Index Terms

  1. SEAT: Similarity Encoder by Adversarial Training for Detecting Model Extraction Attack Queries

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      AISec '21: Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security
      November 2021
      210 pages
      ISBN:9781450386579
      DOI:10.1145/3474369
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 November 2021

      Check for updates

      Author Tags

      1. adversarial machine learning
      2. black-box attacks
      3. intellectual property
      4. mlaas
      5. model extraction

      Qualifiers

      • Research-article

      Conference

      CCS '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 94 of 231 submissions, 41%

      Upcoming Conference

      CCS '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)233
      • Downloads (Last 6 weeks)25
      Reflects downloads up to 09 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Protection of Computational Machine Learning Models against Extraction ThreatAutomatic Control and Computer Sciences10.3103/S014641162308008457:8(996-1004)Online publication date: 29-Feb-2024
      • (2024)MEGEX: Data-Free Model Extraction Attack Against Gradient-Based Explainable AIProceedings of the 2nd ACM Workshop on Secure and Trustworthy Deep Learning Systems10.1145/3665451.3665533(56-66)Online publication date: 2-Jul-2024
      • (2024)AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt TuningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681032(6212-6221)Online publication date: 28-Oct-2024
      • (2024)HODA: Hardness-Oriented Detection of Model Extraction AttacksIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.332060919(1429-1439)Online publication date: 1-Jan-2024
      • (2024)A Survey of Security Protection Methods for Deep Learning ModelIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.33143985:4(1533-1553)Online publication date: Apr-2024
      • (2024)Making models more secureComputers and Electrical Engineering10.1016/j.compeleceng.2024.109266117:COnline publication date: 1-Jul-2024
      • (2024)A Deep Dive into Deep Learning-Based Adversarial Attacks and Defenses in Computer Vision: From a Perspective of CybersecurityIntelligent Sustainable Systems10.1007/978-981-99-7569-3_28(341-356)Online publication date: 16-Feb-2024
      • (2023)I Know What You Trained Last Summer: A Survey on Stealing Machine Learning Models and DefencesACM Computing Surveys10.1145/359529255:14s(1-41)Online publication date: 17-Jul-2023
      • (2023)IPES: Improved Pre-trained Encoder Stealing Attack in Contrastive Learning2023 IEEE International Conferences on Internet of Things (iThings) and IEEE Green Computing & Communications (GreenCom) and IEEE Cyber, Physical & Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics)10.1109/iThings-GreenCom-CPSCom-SmartData-Cybermatics60724.2023.00078(354-361)Online publication date: 17-Dec-2023
      • (2023)MalProtect: Stateful Defense Against Adversarial Query Attacks in ML-Based Malware DetectionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.329395918(4361-4376)Online publication date: 1-Jan-2023
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media