research-article

Black-box Adversarial Attacks on Video Recognition Models

Authors:

Shaoxiang Chen,

Yu-Gang JiangAuthors Info & Claims

MM '19: Proceedings of the 27th ACM International Conference on Multimedia

Pages 864 - 872

https://doi.org/10.1145/3343031.3351088

Published: 15 October 2019 Publication History

Abstract

Deep neural networks (DNNs) are known for their vulnerability to adversarial examples. These are examples that have undergone small, carefully crafted perturbations, and which can easily fool a DNN into making misclassifications at test time. Thus far, the field of adversarial research has mainly focused on image models, under either a white-box setting, where an adversary has full access to model parameters, or a black-box setting where an adversary can only query the target model for probabilities or labels. Whilst several white-box attacks have been proposed for video models, black-box video attacks are still unexplored. To close this gap, we propose the first black-box video attack framework, called V-BAD. V-BAD utilizestentative perturbations transferred from image models andpartition-based rectifications found by the NES to obtain good adversarial gradient estimates with fewer queries to the target model. V-BAD is equivalent to estimating the projection of the adversarial gradient on a selected subspace. Using three benchmark video datasets, we demonstrate that V-BAD can craft both untargeted and targeted attacks to fool two state-of-the-art deep video recognition models. For the targeted attack, it achieves $>$93% success rate using only an average of $3.4 \sim 8.4 \times 10^4$ queries, a similar number of queries to state-of-the-art black-box image attacks. This is despite the fact that videos often have two orders of magnitude higher dimensionality than static images. We believe that V-BAD is a promising new tool to evaluate and improve the robustness of video recognition models to black-box adversarial attacks.

References

[1]

Anish Athalye, Nicholas Carlini, and David Wagner. 2018. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML .

[2]

Arjun Nitin Bhagoji, Warren He, Bo Li, and Dawn Song. 2018. Practical black-box attacks on deep neural networks using efficient query mechanisms. In ECCV .

[3]

Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In S&P .

[4]

Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In CVPR .

[5]

Pin-Yu Chen, Yash Sharma, Huan Zhang, Jinfeng Yi, and Cho-Jui Hsieh. 2018. EAD: elastic-net attacks to deep neural networks via adversarial examples. In AAAI .

[6]

Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. 2017. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In AISec .

Digital Library

[7]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR .

[8]

Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. 2018. Robust Physical-World Attacks on Deep Learning Visual Classification. In CVPR .

[9]

Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In ICLR .

[10]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR .

[11]

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In CVPR .

[12]

Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. 2018. Black-box adversarial attacks with limited queries and information. ICML .

[13]

Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2013. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, 1 (2013), 221--231.

Digital Library

[14]

Yu-Gang Jiang, Minjun Li, Xi Wang, Wei Liu, and Xian-Sheng Hua. 2018a. DeepProduct: Mobile Product Search With Portable Deep Features. ACM Transactions on Multimedia Computing, Communications and Applications (TOMM), Vol. 14, 2 (2018), 50:1--50:18.

[15]

Yu-Gang Jiang, Zuxuan Wu, Jinhui Tang, Zechao Li, Xiangyang Xue, and Shih-Fu Chang. 2018b. Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification. IEEE Transactions on Multimedia (TMM), Vol. 20, 11 (2018), 3137--3147.

Digital Library

[16]

Yu-Gang Jiang, Zuxuan Wu, Jun Wang, Xiangyang Xue, and Shih-Fu Chang. 2018c. Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 2 (2018), 352--364.

Digital Library

[17]

Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In CVPR .

[18]

Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, et almbox. 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017).

[19]

Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images . Technical Report. Citeseer.

[20]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In NIPS .

Digital Library

[21]

Hildegard Kuehne, Hueihan Jhuang, Est'ibaliz Garrote, Tomaso Poggio, and Thomas Serre. 2011. HMDB: a large video database for human motion recognition. In ICCV .

[22]

Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2016. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016).

[23]

Yann LeCun, Bernhard E Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne E Hubbard, and Lawrence D Jackel. 1990. Handwritten digit recognition with a back-propagation network. In NIPS .

[24]

Shasha Li, Ajaya Neupane, Sujoy Paul, Chengyu Song, Srikanth V Krishnamurthy, Amit K Roy Chowdhury, and Ananthram Swami. 2018. Adversarial perturbations against real-time video classification systems. arXiv preprint arXiv:1807.00458 (2018).

[25]

Sheng Liu, Zhou Ren, and Junsong Yuan. 2018. SibNet: Sibling Convolutional Encoder for Video Captioning. In ACM MM .

[26]

Xingjun Ma, Bo Li, Yisen Wang, Sarah M. Erfani, Sudanthi Wijewickrema, Michael E. Houle, Grant Schoenebeck, Dawn Song, and James Bailey. 2018. Characterizing adversarial subspaces using local intrinsic dimensionality. In ICLR .

[27]

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In ICLR .

[28]

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. 2017. Universal adversarial perturbations. In CVPR .

[29]

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016. Deepfool: a simple and accurate method to fool deep neural networks. In CVPR .

[30]

Anh Nguyen, Jason Yosinski, and Jeff Clune. 2015. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In CVPR .

[31]

Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. 2016a. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277 (2016).

[32]

Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. 2017. Practical black-box attacks against machine learning. In ASIACCS .

[33]

Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. 2016b. The limitations of deep learning in adversarial settings. In EuroS&P .

[34]

Roberto Rey-de Castro and Herschel Rabitz. 2018. Targeted nonlinear adversarial perturbations in images and videos. arXiv preprint arXiv:1809.00958 (2018).

[35]

Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).

[36]

Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012).

[37]

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In ICLR .

[38]

Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. 2019. Robustness may be at odds with accuracy. In ICLR .

[39]

Yisen Wang, Xingjun Ma, James Bailey, Jinfeng Yi, Bowen Zhou, and Quanquan Gu. 2019. On the Convergence and Robustness of Adversarial Training. In ICML. 6586--6595.

[40]

Xingxing Wei, Jun Zhu, and Hang Su. 2018. Sparse adversarial perturbations for videos. arXiv preprint arXiv:1803.02536 (2018).

[41]

Daan Wierstra, Tom Schaul, Tobias Glasmachers, Yi Sun, Jan Peters, and Jürgen Schmidhuber. 2014. Natural evolution strategies. Journal of Machine Learning Research, Vol. 15, 1 (2014), 949--980.

Digital Library

[42]

Zuxuan Wu, Yu-Gang Jiang, Xi Wang, Hao Ye, and Xiangyang Xue. 2016. Multi-stream multi-class fusion of deep networks for video classification. In ACM MM .

[43]

Ziwei Yang, Yahong Han, and Zheng Wang. 2017. Catching the temporal regions-of-interest for video captioning. In ACM MM .

[44]

Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. 2015. Beyond short snippets: Deep networks for video classification. In CVPR .

[45]

Rui-Wei Zhao, Zuxuan Wu, Jianguo Li, and Yu-Gang Jiang. 2017. Learning semantic feature map for visual content recognition. In ACM MM .

Cited By

Song JYu DTeng HChen Y(2025)RLVS: A Reinforcement Learning-Based Sparse Adversarial Attack Method for Black-Box Video RecognitionElectronics10.3390/electronics1402024514:2(245)Online publication date: 8-Jan-2025
https://doi.org/10.3390/electronics14020245
Zhao RZhang YWang TWen WXiang YCao X(2024)Visual Content Privacy Protection: A SurveyACM Computing Surveys10.1145/3708501Online publication date: 16-Dec-2024
https://dl.acm.org/doi/10.1145/3708501
Dong JChen JXie XLai JChen H(2024)Survey on Adversarial Attack and Defense for Medical Image Analysis: Methods and ChallengesACM Computing Surveys10.1145/370263857:3(1-38)Online publication date: 22-Nov-2024
https://dl.acm.org/doi/10.1145/3702638
Show More Cited By

Index Terms

Black-box Adversarial Attacks on Video Recognition Models
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding

Recommendations

Practical Black-Box Attacks against Machine Learning
ASIA CCS '17: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security

Machine learning (ML) models, e.g., deep neural networks (DNNs), are vulnerable to adversarial examples: malicious inputs modified to yield erroneous model outputs, while appearing unmodified to human observers. Potential attacks include having ...
Black-box Bayesian adversarial attack with transferable priors
Abstract
Deep neural networks are vulnerable to adversarial attacks, even in the black-box setting, where the attacker only has query access to the model. The most popular black-box adversarial attacks usually rely on substitute models or gradient ...
Stateful Detection of Black-Box Adversarial Attacks
SPAI '20: Proceedings of the 1st ACM Workshop on Security and Privacy on Artificial Intelligence

The problem of adversarial examples, evasion attacks on machine learning classifiers, has proven extremely difficult to solve. This is true even in the black-box threat model, as is the case in many practical settings. Here, the classifier is hosted as ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '19: Proceedings of the 27th ACM International Conference on Multimedia

October 2019

2794 pages

ISBN:9781450368896

DOI:10.1145/3343031

General Chairs:
Laurent Amsaleg
CNRS-IRISA, France
,
Benoit Huet
EURECOM, France
,
Martha Larson
Radboud University and TU Delft (Netherlands)
,
Program Chairs:
Guillaume Gravier
CNRS-IRISA, France
,
Hayley Hung
Delft University of Technology Netherlands
,
Chong-Wah Ngo
City University of Hong Kong Hong Kong
,
Wei Tsang Ooi
National University of Singapore Singapore

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '19

Sponsor:

SIGMM

MM '19: The 27th ACM International Conference on Multimedia

October 21 - 25, 2019

Nice, France

Acceptance Rates

MM '19 Paper Acceptance Rate 252 of 936 submissions, 27%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

83
Total Citations
View Citations
963
Total Downloads

Downloads (Last 12 months)142
Downloads (Last 6 weeks)30

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Song JYu DTeng HChen Y(2025)RLVS: A Reinforcement Learning-Based Sparse Adversarial Attack Method for Black-Box Video RecognitionElectronics10.3390/electronics1402024514:2(245)Online publication date: 8-Jan-2025
https://doi.org/10.3390/electronics14020245
Zhao RZhang YWang TWen WXiang YCao X(2024)Visual Content Privacy Protection: A SurveyACM Computing Surveys10.1145/3708501Online publication date: 16-Dec-2024
https://dl.acm.org/doi/10.1145/3708501
Dong JChen JXie XLai JChen H(2024)Survey on Adversarial Attack and Defense for Medical Image Analysis: Methods and ChallengesACM Computing Surveys10.1145/370263857:3(1-38)Online publication date: 22-Nov-2024
https://dl.acm.org/doi/10.1145/3702638
Wang XChen KMa XChen ZChen JJiang YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt TuningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681032(6212-6221)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681032
Li YRen JXu HLiu H(2024)Neural Style Protection: Counteracting Unauthorized Neural Style Transfer2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00392(3954-3963)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00392
Zhan YFu YHuang LGuo JShi HSong HHu C(2024)Cube-Evo: A Query-Efficient Black-Box Attack on Video Classification SystemIEEE Transactions on Reliability10.1109/TR.2023.326198673:2(1160-1171)Online publication date: Jun-2024
https://doi.org/10.1109/TR.2023.3261986
Deng WYang CHuang KLiu YGui WLuo J(2024)Sparse Adversarial Video Attack Based on Dual-Branch Neural Network on Industrial Artificial Intelligence of ThingsIEEE Transactions on Industrial Informatics10.1109/TII.2024.338351720:7(9385-9392)Online publication date: Jul-2024
https://doi.org/10.1109/TII.2024.3383517
Chen JChen TXu XZhang JYang YShen H(2024)Coreset Learning-Based Sparse Black-Box Adversarial Attack for Video RecognitionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.333355619(1547-1560)Online publication date: 2024
https://doi.org/10.1109/TIFS.2023.3333556
Yang JGuan ZLi JShi ZLiu X(2024)Diffusion Patch Attack With Spatial–Temporal Cross-Evolution for Video RecognitionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.345247534:12(13190-13200)Online publication date: Dec-2024
https://doi.org/10.1109/TCSVT.2024.3452475
Duan HSaddik ACai W(2024)Incentive Mechanism Design Toward a Win–Win Situation for Generative Art Trainers and ArtistsIEEE Transactions on Computational Social Systems10.1109/TCSS.2024.341563111:6(7528-7540)Online publication date: Dec-2024
https://doi.org/10.1109/TCSS.2024.3415631
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents