research-article

Wuji: automatic online combat game testing using evolutionary deep reinforcement learning

Authors:

Changjie FanAuthors Info & Claims

ASE '19: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering

Pages 772 - 784

https://doi.org/10.1109/ASE.2019.00077

Published: 07 February 2020 Publication History

Abstract

Game testing has been long recognized as a notoriously challenging task, which mainly relies on manual playing and scripting based testing in game industry. Even until recently, automated game testing still remains to be largely untouched niche. A key challenge is that game testing often requires to play the game as a sequential decision process. A bug may only be triggered until completing certain difficult intermediate tasks, which requires a certain level of intelligence. The recent success of deep reinforcement learning (DRL) sheds light on advancing automated game testing, without human competitive intelligent support. However, the existing DRLs mostly focus on winning the game rather than game testing. To bridge the gap, in this paper, we first perform an in-depth analysis of 1349 real bugs from four real-world commercial game products. Based on this, we propose four oracles to support automated game testing, and further propose Wuji, an on-the-fly game testing framework, which leverages evolutionary algorithms, DRL and multi-objective optimization to perform automatic game testing. Wuji balances between winning the game and exploring the space of the game. Winning the game allows the agent to make progress in the game, while space exploration increases the possibility of discovering bugs. We conduct a large-scale evaluation on a simple game and two popular commercial games. The results demonstrate the effectiveness of Wuji in exploring space and detecting bugs. Moreover, Wuji found 3 previously unknown bugs¹, which have been confirmed by the developers, in the commercial games.

References

[1]

American Fuzzy Lop. http://lcamtuf.coredump.cx/afl/, 2018.

[2]

David Adamo, Md Khorrom Khan, Sreedevi Koppula, and Renée C. Bryce. Reinforcement learning for android GUI testing. In Proceedings of the 9th ACM SIGSOFT International Workshop on Automating TEST Case Design, Selection, and Evaluation, A-TEST@ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 05, 2018, pages 2--8, 2018.

Digital Library

[3]

Saiqa Aleem, Luiz Fernando Capretz, and Faheem Ahmed. Critical success factors to improve the game development process from a developer's perspective. J. Comput. Sci. Technol., 31(5):925--950, 2016.

[4]

Nadia Alshahwan, Xinbo Gao, Mark Harman, Yue Jia, Ke Mao, Alexander Mols, Taijin Tei, and Ilya Zorin. Deploying search based software engineering with sapienz at facebook. In International Symposium on Search Based Software Engineering, pages 3--45. Springer, 2018.

[5]

Ishan Banerjee, Bao N. Nguyen, Vahid Garousi, and Atif M. Memon. Graphical user interface (GUI) testing: Systematic mapping and repository. Information & Software Technology, 55(10):1679--1694, 2013.

Digital Library

[6]

Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. Coverage-based greybox fuzzing as markov chain. IEEE Transactions on Software Engineering, 45(5):489--506, 2017.

[7]

Konstantin Böttinger, Patrice Godefroid, and Rishabh Singh. Deep reinforcement fuzzing. In 2018 IEEE Security and Privacy Workshops, SP Workshops 2018, San Francisco, CA, USA, May 24, 2018, pages 116--122, 2018.

[8]

Cristian Cadar, Daniel Dunbar, Dawson R Engler, et al. Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI, volume 8, pages 209--224, 2008.

Digital Library

[9]

Hongxu Chen, Yinxing Xue, Yuekang Li, Bihuan Chen, Xiaofei Xie, Xiuheng Wu, and Yang Liu. Hawkeye: Towards a desired directed greybox fuzzer. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS '18, pages 2095--2108, New York, NY, USA, 2018. ACM.

[10]

Shauvik Roy Choudhary, Alessandra Gorla, and Alessandro Orso. Automated test input generation for android: Are we there yet? (E). In 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9--13, 2015, pages 429--440, 2015.

Digital Library

[11]

Kalyanmoy Deb and Ram Bhusan Agrawal. Simulated binary crossover for continuous search space. Complex Systems, 9(2):115--148, 1994.

[12]

Kalyanmoy Deb, Samir Agrawal, Amrit Pratap, and Tanaka Meyarivan. A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii. In International conference on parallel problem solving from nature, pages 849--858. Springer, 2000.

Digital Library

[13]

DeepMind. Aplhago. https://deepmind.com/research/alphago/, 2019.

[14]

DeepMind. Dota2. https://openai.com/five/, 2019.

[15]

Xiaoning Du, Xiaofei Xie, Yi Li, Lei Ma, Yang Liu, and Jianjun Zhao. Deepstellar: model-based quantitative analysis of stateful deep learning systems. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 477--487. ACM, 2019.

Digital Library

[16]

Gordon Fraser and Andrea Arcuri. Evosuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, pages 416--419. ACM, 2011.

Digital Library

[17]

Google. Ui/application exerciser monkey. https://developer.android.com/studio/test/monkey, 2018.

[18]

Sidra Iftikhar, Muhammad Zohaib Iqbal, Muhammad Uzair Khan, and Wardah Mahmood. An automated model based testing approach for platform games. In 18th ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, MoDELS 2015, Ottawa, ON, Canada, September 30 - October 2, 2015, pages 426--435, 2015.

[19]

Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv.org, February 2015.

[20]

Hammad Khalid, Meiyappan Nagappan, Emad Shihab, and Ahmed E. Hassan. Prioritizing the devices to test your app on: a case study of android game apps. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, (FSE-22), Hong Kong, China, November 16 - 22, 2014, pages 610--620, 2014.

Digital Library

[21]

Diederik P Kingma and Jimmy Ba. Adam - A Method for Stochastic Optimization. ICLR, 2015.

[22]

Yavuz Köroglu, Alper Sen, Ozlem Muslu, Yunus Mete, Ceyda Ulker, Tolga Tanriverdi, and Yunus Donmez. QBE: qlearning-based exploration of android applications. In 11th IEEE International Conference on Software Testing, Verification and Validation, ICST 2018, Västerås, Sweden, April 9--13, 2018, pages 105--115, 2018.

[23]

Yuekang Li, Yinxing Xue, Hongxu Chen, Xiuheng Wu, Cen Zhang, Xiaofei Xie, Haijun Wang, and Yang Liu. Cerebro: context-aware adaptive fuzzing for effective vulnerability detection. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 533--544. ACM, 2019.

Digital Library

[24]

Dayi Lin, Cor-Paul Bezemer, and Ahmed E. Hassan. Studying the urgent updates of popular games on the steam platform. Empirical Software Engineering, 22(4):2095--2126, 2017.

Digital Library

[25]

Gabriel Lovreto, André Takeshi Endo, Paulo Nardi, and Vinicius H. S. Durelli. Automated tests for mobile games: An experience report. In 17th Brazilian Symposium on Computer Games and Digital Entertainment, SBGames 2018, Foz do Iguaçu, Brazil, October 29 - November 1, 2018, pages 48--56, 2018.

[26]

Ke Mao, Mark Harman, and Yue Jia. Sapienz: multi-objective automated testing for android applications. In Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016, Saarbrücken, Germany, July 18--20, 2016, pages 94--105, 2016.

Digital Library

[27]

Brad L Miller, David E Goldberg, et al. Genetic algorithms, tournament selection, and the effects of noise. Complex systems, 9(3):193--212, 1995.

[28]

Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928--1937, 2016.

Digital Library

[29]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529--533, 2015.

[30]

Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), pages 807--814, 2010.

Digital Library

[31]

Newzoo. Global games market report. https://newzoo.com/solutions/standard/market-forecasts/global-games-market-report, 2018.

[32]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017.

[33]

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484, 2016.

[34]

Helge Spieker, Arnaud Gotlieb, Dusica Marijan, and Morten Mossige. Reinforcement learning for automatic test case prioritization and selection in continuous integration. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, Santa Barbara, CA, USA, July 10 - 14, 2017, pages 12--22, 2017.

Digital Library

[35]

Ting Su, Guozhu Meng, Yuting Chen, Ke Wu, Weiming Yang, Yao Yao, Geguang Pu, Yang Liu, and Zhendong Su. Guided, stochastic model-based gui testing of android apps. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pages 245--256. ACM, 2017.

Digital Library

[36]

Ting Su, Guozhu Meng, Yuting Chen, Ke Wu, Weiming Yang, Yao Yao, Geguang Pu, Yang Liu, and Zhendong Su. Guided, stochastic modelbased GUI testing of android apps. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, September 4--8, 2017, pages 245--256, 2017.

Digital Library

[37]

Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O. Stanley, and Jeff Clune. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567, 2017.

[38]

Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.

[39]

Thi Anh Tuyet Vuong and Shingo Takada. A reinforcement learning based approach to automated testing of android applications. In Proceedings of the 9th ACM SIGSOFT International Workshop on Automating TEST Case Design, Selection, and Evaluation, A-TEST@ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 05, 2018, pages 31--37, 2018.

Digital Library

[40]

Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. Skyfire: Data-driven seed generation for fuzzing. In 2017 IEEE Symposium on Security and Privacy (SP), pages 579--594. IEEE, 2017.

[41]

Wuji. Wuji. https://sites.google.com/view/gametesting, 2019.

[42]

Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. Deephunter: a coverage-guided fuzz testing framework for deep neural networks. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 146--157. ACM, 2019.

Digital Library

[43]

Xiaofei Xie, Lei Ma, Haijun Wang, Yuekang Li, Yang Liu, and Xiaohong Li. Diffchaser: Detecting disagreements for deep neural networks. In IJCAI, pages 5772--5778, 2019.

[44]

Tianpei Yang, Jianye Hao, Zhaopeng Meng, Chongjie Zhang, Yan Zheng, and Ze Zheng. Towards Efficient Detection and Optimal Response against Sophisticated Opponents. IJCAI, 2019.

[45]

Tianpei Yang, Jianye Hao, Zhaopeng Meng, Yan Zheng, Chongjie Zhang, and Ze Zheng. Bayes-ToMoP - A Fast Detection and Best Response Algorithm Towards Sophisticated Opponents. AAMAS, 2019.

[46]

Yan Zheng, Zhaopeng Meng, Jianye Hao, and Zongzhang Zhang. Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments. PRICAI, 2018.

Digital Library

[47]

Yan Zheng, Zhaopeng Meng, Jianye Hao, Zongzhang Zhang, Tianpei Yang, and Changjie Fan. A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents. NeurIPS, 2018.

Cited By

Yu SFang CLi XLing YChen ZSu Z(2024)Effective, Platform-Independent GUI Testing via Image Embedding and Reinforcement LearningACM Transactions on Software Engineering and Methodology10.1145/367472833:7(1-27)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.1145/3674728
Zhao YHarrison BYu T(2024)DinoDroid: Testing Android Apps Using Deep Q-NetworksACM Transactions on Software Engineering and Methodology10.1145/365215033:5(1-24)Online publication date: 4-Jun-2024
https://dl.acm.org/doi/10.1145/3652150
Ma XWang YWang JXie XWu BLi SXu FWang QChristakis MPradel M(2024)Enhancing Multi-agent System Testing with Diversity-Guided Exploration and Adaptive Critical State ExploitationProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680376(1491-1503)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680376
Show More Cited By

Wuji: automatic online combat game testing using evolutionary deep reinforcement learning

Recommendations

GBGallery : A benchmark and framework for game testing
Abstract
Software bug database and benchmark are the wheels of advancing automated software testing. In practice, real bugs often occur sparsely relative to the amount of software code, the extraction and curation of which are quite labor-intensive but can ...
PlayTest: A Gamified Test Generator for Games
Gamify 2023: Proceedings of the 2nd International Workshop on Gamification in Software Development, Verification, and Validation

Games are usually created incrementally, requiring repeated testing of the same scenarios, which is a tedious and error-prone task for game developers. Therefore, we aim to alleviate this game testing process by encapsulating it into a game called ...
Review of Intrinsic Motivation in Simulation-based Game Testing
CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems

This paper presents a review of intrinsic motivation in player modeling, with a focus on simulation-based game testing. Modern AI agents can learn to win many games; from a game testing perspective, a remaining research problem is how to model the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASE '19: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering

November 2019

1333 pages

ISBN:9781728125084

General Chair:
Thomas Zimmermann
Microsoft Research
,
Program Chairs:
Julia Lawall
Inria/LIP6, France
,
Darko Marinov
University of Illinois at Urbana-Champaign

Sponsors

In-Cooperation

IEEE CS

Publisher

IEEE Press

Publication History

Published: 07 February 2020

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASE '19

Sponsor:

ASE '19: 34nd IEEE/ACM International Conference on Automated Software Engineering

November 10 - 15, 2019

California, San Diego

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
189
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)5

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yu SFang CLi XLing YChen ZSu Z(2024)Effective, Platform-Independent GUI Testing via Image Embedding and Reinforcement LearningACM Transactions on Software Engineering and Methodology10.1145/367472833:7(1-27)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.1145/3674728
Zhao YHarrison BYu T(2024)DinoDroid: Testing Android Apps Using Deep Q-NetworksACM Transactions on Software Engineering and Methodology10.1145/365215033:5(1-24)Online publication date: 4-Jun-2024
https://dl.acm.org/doi/10.1145/3652150
Ma XWang YWang JXie XWu BLi SXu FWang QChristakis MPradel M(2024)Enhancing Multi-agent System Testing with Diversity-Guided Exploration and Adaptive Critical State ExploitationProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680376(1491-1503)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680376
Hu QGuo YXie XCordy MMa LPapadakis MLe Traon Y(2024)Test Optimization in DNN Testing: A SurveyACM Transactions on Software Engineering and Methodology10.1145/364367833:4(1-42)Online publication date: 27-Jan-2024
https://dl.acm.org/doi/10.1145/3643678
Yu SFang CDu MLing YChen ZSu ZRoychoudhury APaiva AAbreu RStorey M(2024)Practical Non-Intrusive GUI Exploration Testing with Visual-based Robotic ArmsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639161(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639161
Bucchiarone ACooper KLin DSmith AWanick V(2023)Fostering Collaboration and Advancing Research in Software Engineering and Game Development for Serious ContextsACM SIGSOFT Software Engineering Notes10.1145/3617946.361795548:4(46-50)Online publication date: 17-Oct-2023
https://dl.acm.org/doi/10.1145/3617946.3617955
Hu QGuo YXie XCordy MPapadakis MLe Traon Y(2023)LaF: Labeling-free Model Selection for Automated Deep Neural Network ReusingACM Transactions on Software Engineering and Methodology10.1145/361166633:1(1-28)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.1145/3611666
Wang CLu HGao CLi ZXiong TDeng YChandra SBlincoe KTonella P(2023)A Unified Framework for Mini-game Testing: Experience on WeChatProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3613868(1623-1634)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3613868
Bucchiarone ACooper KLin DMelcer ESung K(2023)Games and Software EngineeringACM SIGSOFT Software Engineering Notes10.1145/3573074.357309648:1(85-89)Online publication date: 17-Jan-2023
https://dl.acm.org/doi/10.1145/3573074.3573096
Casamayor RCetina CPastor ÓPérez F(2023)Studying the Influence and Distribution of the Human Effort in a Hybrid Fitness Function for Search-Based Model-Driven EngineeringIEEE Transactions on Software Engineering10.1109/TSE.2023.332973049:12(5189-5202)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3329730
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents