research-article

Open access

On Reducing Undesirable Behavior in Deep-Reinforcement-Learning-Based Software

Authors:

Ophir M. Carmel,

Guy KatzAuthors Info & Claims

Proceedings of the ACM on Software Engineering, Volume 1, Issue FSE

Article No.: 68, Pages 1518 - 1539

https://doi.org/10.1145/3660775

Published: 12 July 2024 Publication History

Abstract

Deep reinforcement learning (DRL) has proven extremely useful in a large variety of application domains. However, even successful DRL-based software can exhibit highly undesirable behavior. This is due to DRL training being based on maximizing a reward function, which typically captures general trends but cannot precisely capture, or rule out, certain behaviors of the model. In this paper, we propose a novel framework aimed at drastically reducing the undesirable behavior of DRL-based software, while maintaining its excellent performance. In addition, our framework can assist in providing engineers with a comprehensible characterization of such undesirable behavior. Under the hood, our approach is based on extracting decision tree classifiers from erroneous state-action pairs, and then integrating these trees into the DRL training loop, penalizing the model whenever it performs an error. We provide a proof-of-concept implementation of our approach, and use it to evaluate the technique on three significant case studies. We find that our approach can extend existing frameworks in a straightforward manner, and incurs only a slight overhead in training time. Further, it incurs only a very slight hit to performance, or even in some cases --- improves it, while significantly reducing the frequency of undesirable behavior.

References

[1]

D. Amodei, C. Olah, J. Steinhardt, P. Christiano, J. Schulman, and D. Mané. 2016. Concrete Problems in AI Safety. Technical Report. arxiv:1606.06565

[2]

K. Arulkumaran, M. Deisenroth, M. Brundage, and A. Bharath. 2017. Deep Reinforcement Learning: A Brief Survey. IEEE Signal Processing Magazine, 34, 6 (2017), 26–38.

[3]

O. Bastani, Y. Pu, and A. Solar-Lezama. 2018. Verifiable Reinforcement Learning via Policy Extraction. In Proc. 32nd Conf. on Neural Information Processing Systems (NeurIPS).

[4]

S. Bhatt. 2019. Reinforcement Learning 101. https://towardsdatascience.com/reinforcement-learning-101-e24b50e1d292

[5]

M. Cai, M. Mann, Z. Serlin, K. Leahy, and C.-I. Vasile. 2023. Learning Minimally-Violating Continuous Control for Infeasible Linear Temporal Logic Specifications. Technical Report. arxiv:2210.01162

[6]

O. M. Carmel and G. Katz. 2024. On Reducing Undesirable Behavior in Deep-Reinforcement-Learning-Based Software: Code and Benchmarks. https://github.com/ophircarmel/Reducing-Undesirable-Behaviour

[7]

E. Clarke, T. Henzinger, H. Veith, and R. Bloem. 2018. Handbook of Model Checking. Springer.

[8]

G. Dalal, K. Dvijotham, M. Vecerik, T. Hester, C. Paduraru, and Y. Tassa. 2018. Safe Exploration in Continuous Action Spaces. Technical Report. arxiv:1801.08757

[9]

T. Eliyahu, Y. Kazak, G. Katz, and M. Schapira. 2021. Verifying Learning-Augmented Systems. In Proc. Conf. of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM). 305–318.

[10]

G. Gu, L. Yang, Y. Du, G. Chen, F. Walter, J. Wang, Y. Yang, and A. Knoll. 2023. A Review of Safe Reinforcement Learning: Methods, Theory and Applications. Technical Report. arxiv:2205.10330

[11]

H. Harder. 2022. Snake Played by a Deep Reinforcement Learning Agent. https://towardsdatascience.com/snake-played-by-a-deep-reinforcement-learning-agent-53f2c4331d36

[12]

Y. Hu, W. Wang, H. Jia, Y. Wang, Y. Chen, J. Hao, F. Wu, and C. Fan. 2020. Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping. Technical Report. arxiv:2011.02669

[13]

N. Jay, N. Rotman, B. Godfrey, M. Schapira, and A. Tamar. 2019. A Deep Reinforcement Learning Perspective on Internet Congestion Control. In Proc. Int. Conf. on Machine Learning (ICML). 3050–3059.

[14]

B. Johnson, Y. Brun, and A. Meliou. 2020. Causal Testing: Understanding Defects’ Root Causes. In Proc. 42nd Int. Conf. on Software Engineering (ICSE). 87–99.

[15]

D. Karger, S. Oh, and D. Shah. 2013. Efficient Crowdsourcing for Multi-Class Labeling. In Proc. ACM Int. Conf. on Measurement and Modeling of Computer Systems (SIGMETRICS). 81–92.

[16]

G. Katz, C. Barrett, D. Dill, K. Julian, and M. Kochenderfer. 2017. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks. In Proc. 29th Int. Conf. on Computer Aided Verification (CAV). 97–117.

[17]

Y. Kazak, C. Barrett, G. Katz, and M. Schapira. 2019. Verifying Deep-RL-Driven Systems. In Proc. 1st ACM SIGCOMM Workshop on Network Meets AI & ML (NetAI). 83–89.

[18]

C. Kingsford and S. Salzberg. 2008. What are Decision Trees? Nature Biotechnology, 26, 9 (2008), 1011–1013.

[19]

G. Lample and D. Chaplot. 2017. Playing FPS Games with Deep Reinforcement Learning. In Proc. 31st AAAI Conf. on Artificial Intelligence (AAAI).

[20]

Y. LeCun, Y. Bengio, and G. Hinton. 2015. Deep Learning. Nature, 521, 7553 (2015), 436–444.

[21]

M. Leszak, D. Perry, and D. Stoll. 2000. A Case Study in Root Cause Defect Analysis. In Proc. 22nd Int. Conf. on Software Engineering (ICSE). 428–437.

[22]

Y. Li. 2017. Deep Reinforcement Learning: An Overview. Technical Report. arxiv:1701.07274

[23]

Q. Liu and Y. Wu. 2012. Supervised Learning. Springer.

[24]

E. Marchesini and A. Farinelli. 2020. Discrete Deep Reinforcement Learning for Mapless Navigation. In Proc. IEEE Int. Conf. on Robotics and Automation (ICRA). 10688–10694.

[25]

V. Mnih, K. Kavukcuoglu, D. Silver, A. Rusu, J. Veness, M. Bellemare, A. Graves, M. Riedmiller, A. Fidjeland, G. Ostrovski, and et al. 2015. Human-Level Control through Deep Reinforcement Learning. 529–533 pages.

[26]

S. Mousavi, M. Schukat, and E. Howley. 2018. Deep Reinforcement Learning: An Overview. In Proc. SAI Intelligent Systems Conf. (IntelliSys). 426–440.

[27]

A. Ng, D. Harada, and S. Russell. 1999. Policy Invariance under Reward Transformations: Theory and Application to Reward Shaping. In Proc. Int. Conf. on Machine Learning (ICML). 278–287.

[28]

P. Palanisamy. 2020. Multi-Agent Connected Autonomous Driving using Deep Reinforcement Learning. In Proc. Int. Joint Conf. on Neural Networks (IJCNN). 1–7.

[29]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research (JMLR), 12 (2011), 2825–2830.

Digital Library

[30]

S. Safavian and D. Landgrebe. 1991. A Survey of Decision Tree Classifier Methodology. IEEE Transactions on Systems, Man, and Cybernetics, 21, 3 (1991), 660–674.

[31]

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. 2017. Proximal Policy Optimization Algorithms. Technical Report. arxiv:1707.06347

[32]

V. Sheng and J. Zhang. 2019. Machine Learning with Crowdsourcing: A Brief Summary of the Past Research and Future Directions. In Proc. 33rd AAAI Conf. on Artificial Intelligence (AAAI).

[33]

D. Silver, A. Huang, C. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, and S. Dieleman. 2016. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature, 529, 7587 (2016), 484–489.

[34]

R. Sutton and A. Barto. 1998. Introduction to Reinforcement Learning. MIT press Cambridge.

[35]

R. Sutton and A. Barto. 2018. Reinforcement Learning: An Introduction. MIT press.

Digital Library

[36]

C. Tessler, D. Mankowitz, and S. Mannor. 2018. Reward Constrained Policy Optimization. Technical Report. arxiv:1805.11074

[37]

P. Thomas, B. Castro da Silva, A. Barto, S. Giguere, Y. Brun, and E. Brunskill. 2019. Preventing Undesirable Behavior of Intelligent Machines. Science, 366, 6468 (2019), 999–1004.

[38]

H. Van Hasselt, A. Guez, and D. Silver. 2016. Deep Reinforcement Learning with Double Q-Learning. In Proc. 30th AAAI Conf. on Artificial Intelligence (AAAI).

[39]

A. Vidali. 2019. Deep Q-Learning Agent for Traffic Signal Control. https://github.com/AndreaVidali/Deep-QLearning-Agent-for-Traffic-Signal-Control

[40]

H.-N. Wang, N. Liu, Y.-Y. Zhang, D.-W. Feng, F. Huang, D.-S. Li, and Y.-M. Zhang. 2020. Deep Reinforcement Learning: A Survey journal=Frontiers of Information Technology & Electronic Engineering. 1726–1744 pages.

[41]

E. Wiewiora. 2010. Reward Shaping. Springer, 863–865.

[42]

E. Wiewiora, G. Cottrell, and C. Elkan. 2003. Principled Methods for Advising Reinforcement Learning Agents. In Proc. 20th Int. Conf. on Machine Learning (ICML). 792–799.

[43]

D. Ye, G. Chen, W. Zhang, S. Chen, B. Yuan, B. Liu, J. Chen, and et al. 2020. Towards Playing Full MOBA Games with Deep Reinforcement Learning. Technical Report. arxiv:2011.12692

[44]

H. Zhang and Y. Guo. 2022. Safe Reinforcement Learning with Contrastive Risk Prediction. Technical Report. arxiv:2209.09648

Index Terms

On Reducing Undesirable Behavior in Deep-Reinforcement-Learning-Based Software

Recommendations

Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Neural Information Processing
Abstract
As the two hottest branches of machine learning, deep learning and reinforcement learning both play a vital role in the field of artificial intelligence. Combining deep learning with reinforcement learning, deep reinforcement learning is a method ...
Conversational Recommender System Using Deep Reinforcement Learning
RecSys '22: Proceedings of the 16th ACM Conference on Recommender Systems

Deep Reinforcement Learning (DRL) uses the best of both Reinforcement Learning and Deep Learning for solving problems which cannot be addressed by them individually. Deep Reinforcement Learning has been used widely for games, robotics etc. Limited work ...
Enhancing Software Fault Detection with Deep Reinforcement Learning: A Q-Learning Approach
ICSCA '24: Proceedings of the 2024 13th International Conference on Software and Computer Applications

With the increasing complexity of software systems, traditional software fault detection methods are becoming less effective. This paper proposes a novel approach that leverages Deep Reinforcement Learning (DRL) to improve software fault detection. DRL, ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Software Engineering

Proceedings of the ACM on Software Engineering Volume 1, Issue FSE

July 2024

2770 pages

EISSN:2994-970X

DOI:10.1145/3554322

Editor:
Luciano Baresi
Politecnico di Milano, Italy

Issue’s Table of Contents

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2024

Published in PACMSE Volume 1, Issue FSE

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
166
Total Downloads

Downloads (Last 12 months)166
Downloads (Last 6 weeks)25

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents