Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3343031.3351088acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Black-box Adversarial Attacks on Video Recognition Models

Published: 15 October 2019 Publication History

Abstract

Deep neural networks (DNNs) are known for their vulnerability to adversarial examples. These are examples that have undergone small, carefully crafted perturbations, and which can easily fool a DNN into making misclassifications at test time. Thus far, the field of adversarial research has mainly focused on image models, under either a white-box setting, where an adversary has full access to model parameters, or a black-box setting where an adversary can only query the target model for probabilities or labels. Whilst several white-box attacks have been proposed for video models, black-box video attacks are still unexplored. To close this gap, we propose the first black-box video attack framework, called V-BAD. V-BAD utilizestentative perturbations transferred from image models andpartition-based rectifications found by the NES to obtain good adversarial gradient estimates with fewer queries to the target model. V-BAD is equivalent to estimating the projection of the adversarial gradient on a selected subspace. Using three benchmark video datasets, we demonstrate that V-BAD can craft both untargeted and targeted attacks to fool two state-of-the-art deep video recognition models. For the targeted attack, it achieves $>$93% success rate using only an average of $3.4 \sim 8.4 \times 10^4$ queries, a similar number of queries to state-of-the-art black-box image attacks. This is despite the fact that videos often have two orders of magnitude higher dimensionality than static images. We believe that V-BAD is a promising new tool to evaluate and improve the robustness of video recognition models to black-box adversarial attacks.

References

[1]
Anish Athalye, Nicholas Carlini, and David Wagner. 2018. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML .
[2]
Arjun Nitin Bhagoji, Warren He, Bo Li, and Dawn Song. 2018. Practical black-box attacks on deep neural networks using efficient query mechanisms. In ECCV .
[3]
Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In S&P .
[4]
Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In CVPR .
[5]
Pin-Yu Chen, Yash Sharma, Huan Zhang, Jinfeng Yi, and Cho-Jui Hsieh. 2018. EAD: elastic-net attacks to deep neural networks via adversarial examples. In AAAI .
[6]
Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. 2017. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In AISec .
[7]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR .
[8]
Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. 2018. Robust Physical-World Attacks on Deep Learning Visual Classification. In CVPR .
[9]
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In ICLR .
[10]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR .
[11]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In CVPR .
[12]
Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. 2018. Black-box adversarial attacks with limited queries and information. ICML .
[13]
Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2013. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, 1 (2013), 221--231.
[14]
Yu-Gang Jiang, Minjun Li, Xi Wang, Wei Liu, and Xian-Sheng Hua. 2018a. DeepProduct: Mobile Product Search With Portable Deep Features. ACM Transactions on Multimedia Computing, Communications and Applications (TOMM), Vol. 14, 2 (2018), 50:1--50:18.
[15]
Yu-Gang Jiang, Zuxuan Wu, Jinhui Tang, Zechao Li, Xiangyang Xue, and Shih-Fu Chang. 2018b. Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification. IEEE Transactions on Multimedia (TMM), Vol. 20, 11 (2018), 3137--3147.
[16]
Yu-Gang Jiang, Zuxuan Wu, Jun Wang, Xiangyang Xue, and Shih-Fu Chang. 2018c. Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 2 (2018), 352--364.
[17]
Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In CVPR .
[18]
Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, et almbox. 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017).
[19]
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images . Technical Report. Citeseer.
[20]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In NIPS .
[21]
Hildegard Kuehne, Hueihan Jhuang, Est'ibaliz Garrote, Tomaso Poggio, and Thomas Serre. 2011. HMDB: a large video database for human motion recognition. In ICCV .
[22]
Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2016. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016).
[23]
Yann LeCun, Bernhard E Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne E Hubbard, and Lawrence D Jackel. 1990. Handwritten digit recognition with a back-propagation network. In NIPS .
[24]
Shasha Li, Ajaya Neupane, Sujoy Paul, Chengyu Song, Srikanth V Krishnamurthy, Amit K Roy Chowdhury, and Ananthram Swami. 2018. Adversarial perturbations against real-time video classification systems. arXiv preprint arXiv:1807.00458 (2018).
[25]
Sheng Liu, Zhou Ren, and Junsong Yuan. 2018. SibNet: Sibling Convolutional Encoder for Video Captioning. In ACM MM .
[26]
Xingjun Ma, Bo Li, Yisen Wang, Sarah M. Erfani, Sudanthi Wijewickrema, Michael E. Houle, Grant Schoenebeck, Dawn Song, and James Bailey. 2018. Characterizing adversarial subspaces using local intrinsic dimensionality. In ICLR .
[27]
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In ICLR .
[28]
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. 2017. Universal adversarial perturbations. In CVPR .
[29]
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016. Deepfool: a simple and accurate method to fool deep neural networks. In CVPR .
[30]
Anh Nguyen, Jason Yosinski, and Jeff Clune. 2015. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In CVPR .
[31]
Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. 2016a. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277 (2016).
[32]
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. 2017. Practical black-box attacks against machine learning. In ASIACCS .
[33]
Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. 2016b. The limitations of deep learning in adversarial settings. In EuroS&P .
[34]
Roberto Rey-de Castro and Herschel Rabitz. 2018. Targeted nonlinear adversarial perturbations in images and videos. arXiv preprint arXiv:1809.00958 (2018).
[35]
Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).
[36]
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012).
[37]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In ICLR .
[38]
Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. 2019. Robustness may be at odds with accuracy. In ICLR .
[39]
Yisen Wang, Xingjun Ma, James Bailey, Jinfeng Yi, Bowen Zhou, and Quanquan Gu. 2019. On the Convergence and Robustness of Adversarial Training. In ICML. 6586--6595.
[40]
Xingxing Wei, Jun Zhu, and Hang Su. 2018. Sparse adversarial perturbations for videos. arXiv preprint arXiv:1803.02536 (2018).
[41]
Daan Wierstra, Tom Schaul, Tobias Glasmachers, Yi Sun, Jan Peters, and Jürgen Schmidhuber. 2014. Natural evolution strategies. Journal of Machine Learning Research, Vol. 15, 1 (2014), 949--980.
[42]
Zuxuan Wu, Yu-Gang Jiang, Xi Wang, Hao Ye, and Xiangyang Xue. 2016. Multi-stream multi-class fusion of deep networks for video classification. In ACM MM .
[43]
Ziwei Yang, Yahong Han, and Zheng Wang. 2017. Catching the temporal regions-of-interest for video captioning. In ACM MM .
[44]
Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. 2015. Beyond short snippets: Deep networks for video classification. In CVPR .
[45]
Rui-Wei Zhao, Zuxuan Wu, Jianguo Li, and Yu-Gang Jiang. 2017. Learning semantic feature map for visual content recognition. In ACM MM .

Cited By

View all
  • (2025)RLVS: A Reinforcement Learning-Based Sparse Adversarial Attack Method for Black-Box Video RecognitionElectronics10.3390/electronics1402024514:2(245)Online publication date: 8-Jan-2025
  • (2024)Visual Content Privacy Protection: A SurveyACM Computing Surveys10.1145/3708501Online publication date: 16-Dec-2024
  • (2024)Survey on Adversarial Attack and Defense for Medical Image Analysis: Methods and ChallengesACM Computing Surveys10.1145/370263857:3(1-38)Online publication date: 22-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '19: Proceedings of the 27th ACM International Conference on Multimedia
October 2019
2794 pages
ISBN:9781450368896
DOI:10.1145/3343031
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adversarial examples
  2. black-box attack
  3. model security
  4. video recognition

Qualifiers

  • Research-article

Conference

MM '19
Sponsor:

Acceptance Rates

MM '19 Paper Acceptance Rate 252 of 936 submissions, 27%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)142
  • Downloads (Last 6 weeks)30
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)RLVS: A Reinforcement Learning-Based Sparse Adversarial Attack Method for Black-Box Video RecognitionElectronics10.3390/electronics1402024514:2(245)Online publication date: 8-Jan-2025
  • (2024)Visual Content Privacy Protection: A SurveyACM Computing Surveys10.1145/3708501Online publication date: 16-Dec-2024
  • (2024)Survey on Adversarial Attack and Defense for Medical Image Analysis: Methods and ChallengesACM Computing Surveys10.1145/370263857:3(1-38)Online publication date: 22-Nov-2024
  • (2024)AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt TuningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681032(6212-6221)Online publication date: 28-Oct-2024
  • (2024)Neural Style Protection: Counteracting Unauthorized Neural Style Transfer2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00392(3954-3963)Online publication date: 3-Jan-2024
  • (2024)Cube-Evo: A Query-Efficient Black-Box Attack on Video Classification SystemIEEE Transactions on Reliability10.1109/TR.2023.326198673:2(1160-1171)Online publication date: Jun-2024
  • (2024)Sparse Adversarial Video Attack Based on Dual-Branch Neural Network on Industrial Artificial Intelligence of ThingsIEEE Transactions on Industrial Informatics10.1109/TII.2024.338351720:7(9385-9392)Online publication date: Jul-2024
  • (2024)Coreset Learning-Based Sparse Black-Box Adversarial Attack for Video RecognitionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.333355619(1547-1560)Online publication date: 2024
  • (2024)Diffusion Patch Attack With Spatial–Temporal Cross-Evolution for Video RecognitionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.345247534:12(13190-13200)Online publication date: Dec-2024
  • (2024)Incentive Mechanism Design Toward a Win–Win Situation for Generative Art Trainers and ArtistsIEEE Transactions on Computational Social Systems10.1109/TCSS.2024.341563111:6(7528-7540)Online publication date: Dec-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media