research-article

Open access

Testing the Limits: Unusual Text Inputs Generation for Mobile App Crash Detection with Large Language Model

Authors:

Qing WangAuthors Info & Claims

ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

Article No.: 137, Pages 1 - 12

https://doi.org/10.1145/3597503.3639118

Published: 12 April 2024 Publication History

Abstract

Mobile applications have become a ubiquitous part of our daily life, providing users with access to various services and utilities. Text input, as an important interaction channel between users and applications, plays an important role in core functionality such as search queries, authentication, messaging, etc. However, certain special text (e.g., -18 for Font Size) can cause the app to crash, and generating diversified unusual inputs for fully testing the app is highly demanded. Nevertheless, this is also challenging due to the combination of explosion dilemma, high context sensitivity, and complex constraint relations. This paper proposes InputBlaster which leverages the LLM to automatically generate unusual text inputs for mobile app crash detection. It formulates the unusual inputs generation problem as a task of producing a set of test generators, each of which can yield a batch of unusual text inputs under the same mutation rule. In detail, InputBlaster leverages LLM to produce the test generators together with the mutation rules serving as the reasoning chain, and utilizes the in-context learning schema to demonstrate the LLM with examples for boosting the performance. InputBlaster is evaluated on 36 text input widgets with cash bugs involving 31 popular Android apps, and results show that it achieves 78% bug detection rate, with 136% higher than the best baseline. Besides, we integrate it with the automated GUI testing tool and detect 37 unseen crashes in real-world apps.

References

[1]

2022. Crash bug text. https://www.theguardian.com/technology/iphone-crash-bug-text-imessage-ios.

[2]

2022. Crash bug text in ios. https://tech.hindustantimes.com/tech/news/be-careful-a-new-text-bomb-is-making-whatsapp-crash-and-will-hang-your-phone-71599532897852.html.

[3]

2022. F-droid. https://f-droid.org/.

[4]

Toufique Ahmed and Premkumar Devanbu. 2022. Few-shot training LLMs for project-specific code-summarization. ASE (2022).

[5]

Nadia Alshahwan and Mark Harman. 2011. Automated web application testing using search based software engineering. In ASE. IEEE, 3--12.

[6]

Saswat Anand, Mayur Naik, Mary Jean Harrold, and Hongseok Yang. 2012. Automated concolic testing of smartphone apps. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. 1--11.

Digital Library

[7]

Yauhen Leanidavich Arnatovich, Minh Ngoc Ngo, Tan Hee Beng Kuan, and Charlie Soh. 2016. Achieving high code coverage in android ui testing via automated widget exercising. In 2016 23rd Asia-Pacific Software Engineering Conference (APSEC). IEEE, 193--200.

[8]

Yauhen Leanidavich Arnatovich, Lipo Wang, Ngoc Minh Ngo, and Charlie Soh. 2018. Mobolic: An automated approach to exercising mobile application GUIs using symbiosis of online testing technique and customated input generation. Software: Practice and Experience 48, 5 (2018), 1107--1142.

[9]

Tanzirul Azim and Iulian Neamtiu. 2013. Targeted and depth-first exploration for systematic testing of android apps. In Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications. 641--660.

Digital Library

[10]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877--1901.

[11]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877--1901.

[12]

Tianqin Cai, Zhao Zhang, and Ping Yang. 2020. Fastbot: A Multi-Agent ModelBased Test Generation System Beijing Bytedance Network Technology Co., Ltd. In Proceedings of the IEEE/ACM 1st International Conference on Automation of Software Test. 93--96.

Digital Library

[13]

Chen Chen, Baojiang Cui, Jinxin Ma, Runpu Wu, Jianchao Guo, and Wenqian Liu. 2018. A systematic review of fuzzing techniques. Computers & Security 75 (2018), 118--137.

[14]

Taolue Chen, Alejandro Flores-Lamas, Matthew Hague, Zhilei Han, Denghang Hu, Shuanglong Kan, Anthony W Lin, Philipp Rümmer, and Zhilin Wu. 2022. Solving string constraints with Regex-dependent functions through transducers with priorities and variables. Proceedings of the ACM on Programming Languages 6, POPL (2022), 1--31.

Digital Library

[15]

Taolue Chen, Matthew Hague, Jinlong He, Denghang Hu, Anthony Widjaja Lin, Philipp Rümmer, and Zhilin Wu. 2020. A decision procedure for path feasibility of string manipulating programs with integer data type. In Automated Technology for Verification and Analysis: 18th International Symposium, ATVA 2020, Hanoi, Vietnam, October 19--23, 2020, Proceedings. Springer, 325--342.

Digital Library

[16]

Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, and Geoffrey E Hinton. 2020. Big self-supervised models are strong semi-supervised learners. Advances in neural information processing systems 33 (2020), 22243--22255.

[17]

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. 2022. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).

[18]

Joel D Day, Thorsten Ehlers, Mitja Kulczynski, Florin Manea, Dirk Nowotka, and Danny Bøgsted Poulsen. 2019. On solving word equations using SAT. In Reachability Problems: 13th International Conference, RP 2019, Brussels, Belgium, September 11--13, 2019, Proceedings 13. Springer, 93--106.

Digital Library

[19]

Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Jeffrey Nichols, and Ranjitha Kumar. 2017. Rico: A Mobile App Dataset for Building Data-Driven Design Applications. In UIST.

Digital Library

[20]

Yinlin Deng, Chunqiu Steven Xia, Chenyuan Yang, Shizhuo Dylan Zhang, Shujing Yang, and Lingming Zhang. 2023. Large Language Models are Edge-Case Fuzzers: Testing Deep Learning Libraries via FuzzGPT. ISSTA (2023).

[21]

Yinlin Deng, Chenyuan Yang, Anjiang Wei, and Lingming Zhang. 2022. Fuzzing deep-learning libraries via automated relational API inference. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 44--56.

Digital Library

[22]

Android Developers. 2012. Ui/application exerciser monkey.

[23]

Zhen Dong, Marcel Böhme, Lucia Cojocaru, and Abhik Roychoudhury. 2020. Time-travel testing of android apps. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE, 481--492.

Digital Library

[24]

Robert Feldt and Simon Poulding. 2013. Finding test data with specific properties via metaheuristic search. In 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 350--359.

[25]

Shivam Garg, Dimitris Tsipras, Percy S Liang, and Gregory Valiant. 2022. What can transformers learn in-context? a case study of simple function classes. Advances in Neural Information Processing Systems 35 (2022), 30583--30598.

[26]

Tianxiao Gu, Chengnian Sun, Xiaoxing Ma, Chun Cao, Chang Xu, Yuan Yao, Qirun Zhang, Jian Lu, and Zhendong Su. 2019. Practical GUI testing of Android applications via model abstraction and refinement. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 269--280.

Digital Library

[27]

Yuyu He, Lei Zhang, Zhemin Yang, Yinzhi Cao, Keke Lian, Shuai Li, Wei Yang, Zhibo Zhang, Min Yang, Yuan Zhang, et al. 2020. TextExerciser: feedback-driven text input exercising for android applications. In 2020 IEEE Symposium on Security and Privacy (SP). IEEE, 1071--1087.

[28]

Lukav Holik, Petr Jank, Anthony W Lin, Philipp Rmmer, and Tom Vojnar. 2017. String constraints with concatenation and transducers solved efficiently. Proceedings of the ACM on Programming Languages 2, POPL (2017), 1--32.

[29]

Yang Hu, Umair Z Ahmed, Sergey Mechtaev, Ben Leong, and Abhik Roychoudhury. 2019. Re-factoring based program repair applied to programming assignments. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 388--398.

Digital Library

[30]

Text input. 2022. Introduction about text input on Android Developer website. https://developer.android.google.cn/reference/android/widget/EditText?hl=en.

[31]

Nan Jiang, Kevin Liu, Thibaud Lutellier, and Lin Tan. 2023. Impact of Code Language Models on Automated Program Repair. ICSE (2023).

[32]

Sungmin Kang, Juyeon Yoon, and Shin Yoo. 2023. Large Language Models are Few-shot Testers: Exploring LLM-based General Bug Reproduction. ICSE (2023).

[33]

Adam Kiezun, Vijay Ganesh, Shay Artzi, Philip J Guo, Pieter Hooimeijer, and Michael D Ernst. 2013. HAMPI: A solver for word equations over strings, regular expressions, and context-free grammars. ACM Transactions on Software Engineering and Methodology (TOSEM) 21, 4 (2013), 1--28.

Digital Library

[34]

Sebastian Krings, Joshua Schmidt, Patrick Skowronek, Jannik Dunkelau, and Dierk Ehmke. 2020. Towards constraint logic programming over strings for test data generation. In Declarative Programming and Knowledge Management: Conference on Declarative Programming, DECLARE 2019, Unifying INAP, WLP, and WFLP, Cottbus, Germany, September 9--12, 2019, Revised Selected Papers 22. Springer, 139--159.

[35]

Misha Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, and Aravind Srinivas. 2020. Reinforcement learning with augmented data. Advances in neural information processing systems 33 (2020), 19884--19895.

[36]

Caroline Lemieux, Jeevana Priya Inala, Shuvendu K Lahiri, and Siddhartha Sen. 2023. CODAMOSA: Escaping Coverage Plateaus in Test Generation with Pretrained Large Language Models. In ICSE.

[37]

Guodong Li and Indradeep Ghosh. 2013. PASS: String solving with parameterized array and interval automaton. In Hardware and Software: Verification and Testing: 9th International Haifa Verification Conference, HVC 2013, Haifa, Israel, November 5--7, 2013, Proceedings 9. Springer, 15--31.

[38]

Yuanchun Li, Ziyue Yang, Yao Guo, and Xiangqun Chen. 2017. DroidBot: A Lightweight UI-Guided Test Input Generator for Android (ICSE-C '17).

[39]

Yuanchun Li, Ziyue Yang, Yao Guo, and Xiangqun Chen. 2017. Droidbot: a lightweight ui-guided test input generator for android. In 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C). IEEE, 23--26.

[40]

Yuanchun Li, Ziyue Yang, Yao Guo, and Xiangqun Chen. 2019. Humanoid: a deep learning-based approach to automated black-box Android app testing. In ASE. IEEE, 1070--1073.

[41]

Hongliang Liang, Xiaoxiao Pei, Xiaodong Jia, Wuwei Shen, and Jian Zhang. 2018. Fuzzing: State of the art. IEEE Transactions on Reliability 67, 3 (2018), 1199--1218.

[42]

Tianyi Liang, Andrew Reynolds, Cesare Tinelli, Clark Barrett, and Morgan Deters. 2014. A DPLL (T) theory solver for a theory of strings and regular expressions. In Computer Aided Verification: 26th International Conference, CAV 2014, Held as Part of the Vienna Summer of Logic, VSL 2014, Vienna, Austria, July 18--22, 2014. Proceedings 26. Springer, 646--662.

Digital Library

[43]

Peng Liu, Xiangyu Zhang, Marco Pistoia, Yunhui Zheng, Manoel Marques, and Lingfei Zeng. 2017. Automatic text input generation for mobile testing. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 643--653.

Digital Library

[44]

Zhe Liu, Chunyang Chen, Junjie Wang, Xing Che, Yuekai Huang, Jun Hu, and Qing Wang. 2022. Fill in the Blank: Context-aware Automated Text Input Generation for Mobile GUI Testing. arXiv preprint arXiv:2212.04732 (2022).

[45]

Zhe Liu, Chunyang Chen, Junjie Wang, Yuekai Huang, Jun Hu, and Qing Wang. 2020. Owl Eyes: Spotting UI Display Issues via Visual Understanding. In ASE. IEEE.

Digital Library

[46]

Zhe Liu, Chunyang Chen, Junjie Wang, Yuekai Huang, Jun Hu, and Qing Wang. 2022. Nighthawk: Fully Automated Localizing UI Display Issues via Visual Understanding. IEEE Transactions on Software Engineering (2022), 1--16.

[47]

Zhe Liu, Chunyang Chen, Junjie Wang, Yuhui Su, Yuekai Huang, Jun Hu, and Qing Wang. 2023. Ex pede Herculem: Augmenting Activity Transition Graph for Apps via Graph Convolution Network. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1983--1995.

Digital Library

[48]

Zhe Liu, Chunyang Chen, Junjie Wang, Yuhui Su, and Qing Wang. 2022. NaviDroid: a tool for guiding manual Android testing via hint moves. In Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings. 154--158.

Digital Library

[49]

Zhe Liu, Chunyang Chen, Junjie Wang, and Qing Wang. 2022. Guided Bug Crush: Assist Manual GUI Testing of Android Apps via Hint Moves. In CHI 2022.

Digital Library

[50]

Li Lucy and David Bamman. 2021. Gender and representation bias in GPT-3 generated stories. In Proceedings of the Third Workshop on Narrative Understanding. 48--55.

[51]

Aravind Machiry, Rohan Tahiliani, and Mayur Naik. 2013. Dynodroid: An input generation system for android apps. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. 224--234.

Digital Library

[52]

Ke Mao, Mark Harman, and Yue Jia. 2016. Sapienz: Multi-objective automated testing for Android applications. In Proceedings of the 25th International Symposium on Software Testing and Analysis. 94--105.

Digital Library

[53]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. Computer Science (2013).

[54]

Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2022. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? arXiv preprint arXiv:2202.12837 (2022).

[55]

Noor Nashid, Mifta Sintaha, and Ali Mesbah. 2023. Retrieval-Based Prompt Selection for Code-Related Few-Shot Learning. In Proceedings of the 45th International Conference on Software Engineering (ICSE'23).

Digital Library

[56]

Minxue Pan, An Huang, Guoxin Wang, Tian Zhang, and Xuandong Li. 2020. Reinforcement learning based curiosity-driven testing of Android applications. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 153--164.

Digital Library

[57]

Gabriel Poesia, Oleksandr Polozov, Vu Le, Ashish Tiwari, Gustavo Soares, Christopher Meek, and Sumit Gulwani. 2022. Synchromesh: Reliable code generation from pre-trained language models. ICLR (2022).

[58]

Simon Poulding and Robert Feldt. 2017. Generating controllably invalid and atypical inputs for robustness testing. In 2017 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). IEEE, 81--84.

[59]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 21 (2020), 140:1--140:67.

[60]

Vaibhav Rastogi, Yan Chen, and William Enck. 2013. Appsplayground: automatic security analysis of smartphone applications. In Proceedings of the third ACM conference on Data and application security and privacy. 209--220.

Digital Library

[61]

J Schulman, B Zoph, C Kim, J Hilton, J Menick, J Weng, JFC Uribe, L Fedus, L Metz, M Pokorny, et al. 2022. ChatGPT: Optimizing language models for dialogue.

[62]

Carolyn B. Seaman. 1999. Qualitative methods in empirical studies of software engineering. IEEE Transactions on software engineering 25, 4 (1999), 557--572.

Digital Library

[63]

Mike Sharples. 2022. Automated Essay Writing: An AIED Opinion. International Journal of Artificial Intelligence in Education (2022), 1--8.

[64]

Ting Su, Guozhu Meng, Yuting Chen, Ke Wu, Weiming Yang, Yao Yao, Geguang Pu, Yang Liu, and Zhendong Su. 2017. Guided, stochastic model-based GUI testing of Android apps. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 245--256.

Digital Library

[65]

Nezih Sunman, Yiğit Soydan, and Hasan Sözer. 2022. Automated web application testing driven by pre-recorded test cases. Journal of Systems and Software (2022), 111441.

[66]

Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2014. S3: A symbolic string solver for vulnerability detection in web applications. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. 1232--1243.

Digital Library

[67]

Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2017. Model counting for recursively-defined strings. In Computer Aided Verification: 29th International Conference, CAV 2017, Heidelberg, Germany, July 24--28, 2017, Proceedings, Part II 30. Springer, 399--418.

[68]

UIAutomator. 2021. Python wrapper of Android uiautomator test tool. https://github.com/xiaocong/uiautomator.

[69]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems (2017).

[70]

Jue Wang, Yanyan Jiang, Chang Xu, Chun Cao, Xiaoxing Ma, and Jian Lu. 2020. Combodroid: generating high-quality test inputs for android apps via use case combinations. In ICSE. 469--480.

[71]

Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Yumao Lu, Zicheng Liu, and Lijuan Wang. 2022. An empirical study of gpt-3 for few-shot knowledge-based vqa. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 3081--3089.

[72]

Zhengran Zeng, Hanzhuo Tan, Haotian Zhang, Jing Li, Yuqun Zhang, and Lingming Zhang. 2022. An extensive study on pre-trained models for program understanding and generation. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 39--51.

Digital Library

[73]

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. 2022. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 (2022).

Cited By

Hu HWang HDong RChen XChen C(2024)Enhancing GUI Exploration Coverage of Android Apps with Deep Link-Integrated MonkeyACM Transactions on Software Engineering and Methodology10.1145/366481033:6(1-31)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3664810

Recommendations

Testing cross-platform mobile app development frameworks
ASE '15: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering

Mobile app developers often wish to make their apps available on a wide variety of platforms, e.g., Android, iOS, and Windows devices. Each of these platforms uses a different programming environment, each with its own language and APIs for app ...
An Explorative Study of the Mobile App Ecosystem from App Developers' Perspective
WWW '17: Proceedings of the 26th International Conference on World Wide Web

With the prevalence of smartphones, app markets such as Apple App Store and Google Play has become the center stage in the mobile app ecosystem, with millions of apps developed by tens of thousands of app developers in each major market. This paper ...
Learning Mobile App Development: A Hands-on Guide to Building Apps with iOS and Android

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

May 2024

2942 pages

ISBN:9798400702174

DOI:10.1145/3597503

Co-chairs:
Ana Paiva,
Rui Abreu,
Program Co-chairs:
Abhik Roychoudhury,
Margaret Storey

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

Faculty of Engineering of University of Porto

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 April 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Conference

ICSE '24

Sponsor:

SIGSOFT

ICSE '24: IEEE/ACM 46th International Conference on Software Engineering

April 14 - 20, 2024

Lisbon, Portugal

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
304
Total Downloads

Downloads (Last 12 months)304
Downloads (Last 6 weeks)108

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hu HWang HDong RChen XChen C(2024)Enhancing GUI Exploration Coverage of Android Apps with Deep Link-Integrated MonkeyACM Transactions on Software Engineering and Methodology10.1145/366481033:6(1-31)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3664810

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents