Advancing Certified Robustness of Explanation via Gradient Quantization
Pages 2596 - 2606
Abstract
Explaining black-box models is fundamental to gaining trust and deploying these models in real applications. As existing explanation methods have been shown to lack robustness against adversarial perturbations, there has been a growing interest in generating robust explanations. However, existing works resort to empirical defense strategies and these heuristic methods fail against powerful adversaries. In this paper, we certify the robustness of explanations motivated by the success of randomized smoothing. Specifically, we compute a tight radius in which the robustness of the explanation is certified. While a challenge is how to formulate the robustness of the explanation mathematically, we quantize the explanation into discrete spaces to mimic classification in randomized smoothing. To address the high computational cost of randomized smoothing, we introduce randomized gradient smoothing. Also, we explore the robustness of the semantic explanation by certifying the robustness of capsules. In the experiment, we demonstrate the effectiveness of our method on benchmark datasets from the perspectives of post-hoc explanation and semantic explanation respectively. Our work is a promising step towards filling the gap between the theoretical robustness bound and empirical explanations. Our code has been released at https://github.com/NKUShaw/CertifiedExplanation.
References
[1]
Ahmad Ajalloeian, Seyed-Mohsen Moosavi-Dezfooli, Michalis Vlachos, and Pascal Frossard. 2022. On Smoothed Explanations: Quality and Robustness. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, October 17--21, 2022, Mohammad Al Hasan and Li Xiong (Eds.). ACM, 15--25. https://doi.org/10.1145/3511808.3557409
[2]
Motasem Alfarra, Adel Bibi, Philip H. S. Torr, and Bernard Ghanem. 2022. Data dependent randomized smoothing. In Uncertainty in Artificial Intelligence, Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, UAI 2022, 1--5 August 2022, Eindhoven, The Netherlands (Proceedings of Machine Learning Research, Vol. 180), James Cussens and Kun Zhang (Eds.). PMLR, 64--74. https://proceedings.mlr.press/v180/alfarra22a.html
[3]
David Alvarez-Melis and Tommi S. Jaakkola. 2018. Towards Robust Interpretability with Self-Explaining Neural Networks. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3--8, 2018, Montréal, Canada, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.). 7786--7795. https://proceedings.neurips.cc/paper/2018/hash/3e9f0fc9b2f89e043bc6233994dfcf76-Abstract.html
[4]
Leila Amgoud, Philippe Muller, and Henri Trenquier. 2023. Leveraging Argumentation for Generating Robust Sample-based Explanations. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China. ijcai.org, 3104--3111. https://doi.org/10.24963/IJCAI.2023/346
[5]
Suresh Balakrishnama and Aravind Ganapathiraju. 1998. Linear discriminant analysis-a brief tutorial. Institute for Signal and information Processing, Vol. 18, 1998 (1998), 1--8.
[6]
Oren Barkan, Yehonatan Elisha, Jonathan Weill, Yuval Asher, Amit Eshel, and Noam Koenigstein. 2023. Deep Integrated Explanations. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM 2023, Birmingham, United Kingdom, October 21--25, 2023, Ingo Frommholz, Frank Hopfgartner, Mark Lee, Michael Oakes, Mounia Lalmas, Min Zhang, and Rodrygo L. T. Santos (Eds.). ACM, 57--67. https://doi.org/10.1145/3583780.3614836
[7]
Louenas Bounia and Frédéric Koriche. 2023. Approximating probabilistic explanations via supermodular minimization. In Uncertainty in Artificial Intelligence, UAI 2023, July 31 - 4 August 2023, Pittsburgh, PA, USA (Proceedings of Machine Learning Research, Vol. 216), Robin J. Evans and Ilya Shpitser (Eds.). PMLR, 216--225. https://proceedings.mlr.press/v216/bounia23a.html
[8]
Nicholas Carlini and David A. Wagner. 2017. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods. CoRR, Vol. abs/1705.07263 (2017). showeprint[arXiv]1705.07263 http://arxiv.org/abs/1705.07263
[9]
Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. 2019. Certified adversarial robustness via randomized smoothing. In international conference on machine learning. PMLR, 1310--1320.
[10]
Ann-Kathrin Dombrowski, Maximilian Alber, Christopher J. Anders, Marcel Ackermann, Klaus-Robert Müller, and Pan Kessel. 2019. Explanations can be manipulated and geometry is to blame. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 13567--13578. https://proceedings.neurips.cc/paper/2019/hash/bb836c01cdc9120a9c984c525e4b1a4a-Abstract.html
[11]
Amirata Ghorbani, Abubakar Abid, and James Zou. 2019. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 3681--3688.
[12]
Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Arthur Mann, and Pushmeet Kohli. 2019. Scalable Verified Training for Provably Robust Image Classification. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 4841--4850. https://doi.org/10.1109/ICCV.2019.00494
[13]
Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. A survey of methods for explaining black box models. ACM computing surveys (CSUR), Vol. 51, 5 (2018), 1--42.
[14]
Bo Hui, Da Yan, Xiaolong Ma, and Wei-Shinn Ku. 2023. Rethinking graph lottery tickets: Graph sparsity matters. arXiv preprint arXiv:2305.02190 (2023).
[15]
Sérgio Jesus, Catarina Belém, Vladimir Balayan, Jo ao Bento, Pedro Saleiro, Pedro Bizarro, and Jo ao Gama. 2021. How can I choose an explainer? An application-grounded evaluation of post-hoc explanations. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 805--815.
[16]
Gaojie Jin, Xinping Yi, Dengyu Wu, Ronghui Mu, and Xiaowei Huang. 2023. Randomized adversarial training via taylor expansion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16447--16457.
[17]
Himabindu Lakkaraju and Osbert Bastani. 2020. "How do I fool you?": Manipulating User Trust via Misleading Black Box Explanations. In AIES '20: AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, USA, February 7--8, 2020, Annette N. Markham, Julia Powles, Toby Walsh, and Anne L. Washington (Eds.). ACM, 79--85. https://doi.org/10.1145/3375627.3375833
[18]
Mathias Lécuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. 2019. Certified Robustness to Adversarial Examples with Differential Privacy. In 2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, May 19--23, 2019. IEEE, 656--672. https://doi.org/10.1109/SP.2019.00044
[19]
Guang-He Lee, Yang Yuan, Shiyu Chang, and Tommi S. Jaakkola. 2019. Tight Certificates of Adversarial Robustness for Randomly Smoothed Classifiers. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 4911--4922. https://proceedings.neurips.cc/paper/2019/hash/fa2e8c4385712f9a1d24c363a2cbe5b8-Abstract.html
[20]
Tang Li, Fengchun Qiao, Mengmeng Ma, and Xi Peng. 2023. Are Data-Driven Explanations Robust Against Out-of-Distribution Data?. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17--24, 2023. IEEE, 3821--3831. https://doi.org/10.1109/CVPR52729.2023.00372
[21]
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems, Vol. 30 (2017).
[22]
Sven Mantowksy, Firas Mualla, Saqib Sayad Bukhari, and Georg Schneider. 2023. DNN Pruning and Its Effects on Robustness. (2023).
[23]
Christoph Molnar. 2020. Interpretable machine learning. Lulu. com.
[24]
Géraldin Nanfack, Paul Temple, and Beno^it Frénay. 2021. Global explanations with decision rules: a co-learning approach. In Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, UAI 2021, Virtual Event, 27--30 July 2021 (Proceedings of Machine Learning Research, Vol. 161), Cassio P. de Campos, Marloes H. Maathuis, and Erik Quaeghebeur (Eds.). AUAI Press, 589--599. https://proceedings.mlr.press/v161/nanfack21a.html
[25]
Rahul Parhi and Robert D Nowak. 2023. Deep learning meets sparse regularization: A signal processing perspective. arXiv preprint arXiv:2301.09554 (2023).
[26]
Dino Pedreschi, Fosca Giannotti, Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, and Franco Turini. 2019. Meaningful explanations of black box AI decision systems. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 9780--9784.
[27]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135--1144.
[28]
Sara Sabour, Nicholas Frosst, and Geoffrey E Hinton. 2017. Dynamic routing between capsules. Advances in neural information processing systems, Vol. 30 (2017).
[29]
Hadi Salman, Jerry Li, Ilya P. Razenshteyn, Pengchuan Zhang, Huan Zhang, Sébastien Bubeck, and Greg Yang. 2019. Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 11289--11300. https://proceedings.neurips.cc/paper/2019/hash/3a24b25a7b092a252166a1641ae953e7-Abstract.html
[30]
Lloyd S Shapley et al. 1953. A value for n-person games. (1953).
[31]
Connor Shorten and Taghi M Khoshgoftaar. 2019. A survey on image data augmentation for deep learning. Journal of big data, Vol. 6, 1 (2019), 1--48.
[32]
Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning Important Features Through Propagating Activation Differences. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6--11 August 2017 (Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, 3145--3153. http://proceedings.mlr.press/v70/shrikumar17a.html
[33]
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013).
[34]
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14--16, 2014, Workshop Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1312.6034
[35]
Dylan Slack, Anna Hilgard, Sameer Singh, and Himabindu Lakkaraju. 2021. Reliable post hoc explanations: Modeling uncertainty in explainability. Advances in neural information processing systems, Vol. 34 (2021), 9391--9404.
[36]
Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. 2017. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017).
[37]
Suraj Srinivas and Franccois Fleuret. 2019. Full-gradient representation for neural network visualization. Advances in neural information processing systems, Vol. 32 (2019).
[38]
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. In International conference on machine learning. PMLR, 3319--3328.
[39]
Zeren Tan and Yang Tian. 2023. Robust Explanation for Free or At the Cost of Faithfulness. In International Conference on Machine Learning, ICML 2023, 23--29 July 2023, Honolulu, Hawaii, USA (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 33534--33562. https://proceedings.mlr.press/v202/tan23a.html
[40]
Haofan Wang, Zifan Wang, Mengnan Du, Fan Yang, Zijian Zhang, Sirui Ding, Piotr Mardziel, and Xia Hu. 2020. Score-CAM: Score-weighted visual explanations for convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 24--25.
[41]
David S. Watson, Limor Gultchin, Ankur Taly, and Luciano Floridi. 2021. Local explanations via necessity and sufficiency: unifying theory and practice. In Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, UAI 2021, Virtual Event, 27--30 July 2021 (Proceedings of Machine Learning Research, Vol. 161), Cassio P. de Campos, Marloes H. Maathuis, and Erik Quaeghebeur (Eds.). AUAI Press, 1382--1392. https://proceedings.mlr.press/v161/watson21a.html
[42]
Weiyan Xie, Xiao-Hui Li, Zhi Lin, Leonard K. M. Poon, Caleb Chen Cao, and Nevin L. Zhang. 2023. Two-stage holistic and contrastive explanation of image classification. In Uncertainty in Artificial Intelligence, UAI 2023, July 31 - 4 August 2023, Pittsburgh, PA, USA (Proceedings of Machine Learning Research, Vol. 216), Robin J. Evans and Ilya Shpitser (Eds.). PMLR, 2335--2345. https://proceedings.mlr.press/v216/xie23a.html
[43]
Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and Understanding Convolutional Networks. In Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6--12, 2014, Proceedings, Part I (Lecture Notes in Computer Science, Vol. 8689), David J. Fleet, Tomás Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer, 818--833. https://doi.org/10.1007/978--3--319--10590--1_53
[44]
Quanshi Zhang, Ying Nian Wu, and Song-Chun Zhu. 2018. Interpretable convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8827--8836.
[45]
Xingyu Zhao, Wei Huang, Xiaowei Huang, Valentin Robu, and David Flynn. 2021. BayLIME: Bayesian local interpretable model-agnostic explanations. In Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, UAI 2021, Virtual Event, 27--30 July 2021 (Proceedings of Machine Learning Research, Vol. 161), Cassio P. de Campos, Marloes H. Maathuis, and Erik Quaeghebeur (Eds.). AUAI Press, 887--896. https://proceedings.mlr.press/v161/zhao21a.html
[46]
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2921--2929.
Index Terms
- Advancing Certified Robustness of Explanation via Gradient Quantization
Recommendations
UniCR: Universally Approximated Certified Robustness via Randomized Smoothing
Computer Vision – ECCV 2022AbstractWe study certified robustness of machine learning classifiers against adversarial perturbations. In particular, we propose the first universally approximated certified robustness (UniCR) framework, which can approximate the robustness ...
Comments
Information & Contributors
Information
Published In
![cover image ACM Conferences](/cms/asset/d40088a2-6680-49d6-be5f-ba68fd905ec5/3627673.cover.jpg)
October 2024
5705 pages
Copyright © 2024 ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Published: 21 October 2024
Check for updates
Qualifiers
- Research-article
Funding Sources
- U.S. National Science Foundation grants
Conference
CIKM '24
Sponsor:
CIKM '24: The 33rd ACM International Conference on Information and Knowledge Management
October 21 - 25, 2024
ID, Boise, USA
Acceptance Rates
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%
Upcoming Conference
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 164Total Downloads
- Downloads (Last 12 months)164
- Downloads (Last 6 weeks)35
Reflects downloads up to 06 Feb 2025
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in