Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3597503.3639118acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article
Open access

Testing the Limits: Unusual Text Inputs Generation for Mobile App Crash Detection with Large Language Model

Published: 12 April 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Mobile applications have become a ubiquitous part of our daily life, providing users with access to various services and utilities. Text input, as an important interaction channel between users and applications, plays an important role in core functionality such as search queries, authentication, messaging, etc. However, certain special text (e.g., -18 for Font Size) can cause the app to crash, and generating diversified unusual inputs for fully testing the app is highly demanded. Nevertheless, this is also challenging due to the combination of explosion dilemma, high context sensitivity, and complex constraint relations. This paper proposes InputBlaster which leverages the LLM to automatically generate unusual text inputs for mobile app crash detection. It formulates the unusual inputs generation problem as a task of producing a set of test generators, each of which can yield a batch of unusual text inputs under the same mutation rule. In detail, InputBlaster leverages LLM to produce the test generators together with the mutation rules serving as the reasoning chain, and utilizes the in-context learning schema to demonstrate the LLM with examples for boosting the performance. InputBlaster is evaluated on 36 text input widgets with cash bugs involving 31 popular Android apps, and results show that it achieves 78% bug detection rate, with 136% higher than the best baseline. Besides, we integrate it with the automated GUI testing tool and detect 37 unseen crashes in real-world apps.

    References

    [1]
    2022. Crash bug text. https://www.theguardian.com/technology/iphone-crash-bug-text-imessage-ios.
    [2]
    2022. Crash bug text in ios. https://tech.hindustantimes.com/tech/news/be-careful-a-new-text-bomb-is-making-whatsapp-crash-and-will-hang-your-phone-71599532897852.html.
    [3]
    2022. F-droid. https://f-droid.org/.
    [4]
    Toufique Ahmed and Premkumar Devanbu. 2022. Few-shot training LLMs for project-specific code-summarization. ASE (2022).
    [5]
    Nadia Alshahwan and Mark Harman. 2011. Automated web application testing using search based software engineering. In ASE. IEEE, 3--12.
    [6]
    Saswat Anand, Mayur Naik, Mary Jean Harrold, and Hongseok Yang. 2012. Automated concolic testing of smartphone apps. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. 1--11.
    [7]
    Yauhen Leanidavich Arnatovich, Minh Ngoc Ngo, Tan Hee Beng Kuan, and Charlie Soh. 2016. Achieving high code coverage in android ui testing via automated widget exercising. In 2016 23rd Asia-Pacific Software Engineering Conference (APSEC). IEEE, 193--200.
    [8]
    Yauhen Leanidavich Arnatovich, Lipo Wang, Ngoc Minh Ngo, and Charlie Soh. 2018. Mobolic: An automated approach to exercising mobile application GUIs using symbiosis of online testing technique and customated input generation. Software: Practice and Experience 48, 5 (2018), 1107--1142.
    [9]
    Tanzirul Azim and Iulian Neamtiu. 2013. Targeted and depth-first exploration for systematic testing of android apps. In Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications. 641--660.
    [10]
    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877--1901.
    [11]
    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877--1901.
    [12]
    Tianqin Cai, Zhao Zhang, and Ping Yang. 2020. Fastbot: A Multi-Agent ModelBased Test Generation System Beijing Bytedance Network Technology Co., Ltd. In Proceedings of the IEEE/ACM 1st International Conference on Automation of Software Test. 93--96.
    [13]
    Chen Chen, Baojiang Cui, Jinxin Ma, Runpu Wu, Jianchao Guo, and Wenqian Liu. 2018. A systematic review of fuzzing techniques. Computers & Security 75 (2018), 118--137.
    [14]
    Taolue Chen, Alejandro Flores-Lamas, Matthew Hague, Zhilei Han, Denghang Hu, Shuanglong Kan, Anthony W Lin, Philipp Rümmer, and Zhilin Wu. 2022. Solving string constraints with Regex-dependent functions through transducers with priorities and variables. Proceedings of the ACM on Programming Languages 6, POPL (2022), 1--31.
    [15]
    Taolue Chen, Matthew Hague, Jinlong He, Denghang Hu, Anthony Widjaja Lin, Philipp Rümmer, and Zhilin Wu. 2020. A decision procedure for path feasibility of string manipulating programs with integer data type. In Automated Technology for Verification and Analysis: 18th International Symposium, ATVA 2020, Hanoi, Vietnam, October 19--23, 2020, Proceedings. Springer, 325--342.
    [16]
    Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, and Geoffrey E Hinton. 2020. Big self-supervised models are strong semi-supervised learners. Advances in neural information processing systems 33 (2020), 22243--22255.
    [17]
    Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. 2022. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).
    [18]
    Joel D Day, Thorsten Ehlers, Mitja Kulczynski, Florin Manea, Dirk Nowotka, and Danny Bøgsted Poulsen. 2019. On solving word equations using SAT. In Reachability Problems: 13th International Conference, RP 2019, Brussels, Belgium, September 11--13, 2019, Proceedings 13. Springer, 93--106.
    [19]
    Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Jeffrey Nichols, and Ranjitha Kumar. 2017. Rico: A Mobile App Dataset for Building Data-Driven Design Applications. In UIST.
    [20]
    Yinlin Deng, Chunqiu Steven Xia, Chenyuan Yang, Shizhuo Dylan Zhang, Shujing Yang, and Lingming Zhang. 2023. Large Language Models are Edge-Case Fuzzers: Testing Deep Learning Libraries via FuzzGPT. ISSTA (2023).
    [21]
    Yinlin Deng, Chenyuan Yang, Anjiang Wei, and Lingming Zhang. 2022. Fuzzing deep-learning libraries via automated relational API inference. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 44--56.
    [22]
    Android Developers. 2012. Ui/application exerciser monkey.
    [23]
    Zhen Dong, Marcel Böhme, Lucia Cojocaru, and Abhik Roychoudhury. 2020. Time-travel testing of android apps. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE, 481--492.
    [24]
    Robert Feldt and Simon Poulding. 2013. Finding test data with specific properties via metaheuristic search. In 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 350--359.
    [25]
    Shivam Garg, Dimitris Tsipras, Percy S Liang, and Gregory Valiant. 2022. What can transformers learn in-context? a case study of simple function classes. Advances in Neural Information Processing Systems 35 (2022), 30583--30598.
    [26]
    Tianxiao Gu, Chengnian Sun, Xiaoxing Ma, Chun Cao, Chang Xu, Yuan Yao, Qirun Zhang, Jian Lu, and Zhendong Su. 2019. Practical GUI testing of Android applications via model abstraction and refinement. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 269--280.
    [27]
    Yuyu He, Lei Zhang, Zhemin Yang, Yinzhi Cao, Keke Lian, Shuai Li, Wei Yang, Zhibo Zhang, Min Yang, Yuan Zhang, et al. 2020. TextExerciser: feedback-driven text input exercising for android applications. In 2020 IEEE Symposium on Security and Privacy (SP). IEEE, 1071--1087.
    [28]
    Lukav Holik, Petr Jank, Anthony W Lin, Philipp Rmmer, and Tom Vojnar. 2017. String constraints with concatenation and transducers solved efficiently. Proceedings of the ACM on Programming Languages 2, POPL (2017), 1--32.
    [29]
    Yang Hu, Umair Z Ahmed, Sergey Mechtaev, Ben Leong, and Abhik Roychoudhury. 2019. Re-factoring based program repair applied to programming assignments. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 388--398.
    [30]
    Text input. 2022. Introduction about text input on Android Developer website. https://developer.android.google.cn/reference/android/widget/EditText?hl=en.
    [31]
    Nan Jiang, Kevin Liu, Thibaud Lutellier, and Lin Tan. 2023. Impact of Code Language Models on Automated Program Repair. ICSE (2023).
    [32]
    Sungmin Kang, Juyeon Yoon, and Shin Yoo. 2023. Large Language Models are Few-shot Testers: Exploring LLM-based General Bug Reproduction. ICSE (2023).
    [33]
    Adam Kiezun, Vijay Ganesh, Shay Artzi, Philip J Guo, Pieter Hooimeijer, and Michael D Ernst. 2013. HAMPI: A solver for word equations over strings, regular expressions, and context-free grammars. ACM Transactions on Software Engineering and Methodology (TOSEM) 21, 4 (2013), 1--28.
    [34]
    Sebastian Krings, Joshua Schmidt, Patrick Skowronek, Jannik Dunkelau, and Dierk Ehmke. 2020. Towards constraint logic programming over strings for test data generation. In Declarative Programming and Knowledge Management: Conference on Declarative Programming, DECLARE 2019, Unifying INAP, WLP, and WFLP, Cottbus, Germany, September 9--12, 2019, Revised Selected Papers 22. Springer, 139--159.
    [35]
    Misha Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, and Aravind Srinivas. 2020. Reinforcement learning with augmented data. Advances in neural information processing systems 33 (2020), 19884--19895.
    [36]
    Caroline Lemieux, Jeevana Priya Inala, Shuvendu K Lahiri, and Siddhartha Sen. 2023. CODAMOSA: Escaping Coverage Plateaus in Test Generation with Pretrained Large Language Models. In ICSE.
    [37]
    Guodong Li and Indradeep Ghosh. 2013. PASS: String solving with parameterized array and interval automaton. In Hardware and Software: Verification and Testing: 9th International Haifa Verification Conference, HVC 2013, Haifa, Israel, November 5--7, 2013, Proceedings 9. Springer, 15--31.
    [38]
    Yuanchun Li, Ziyue Yang, Yao Guo, and Xiangqun Chen. 2017. DroidBot: A Lightweight UI-Guided Test Input Generator for Android (ICSE-C '17).
    [39]
    Yuanchun Li, Ziyue Yang, Yao Guo, and Xiangqun Chen. 2017. Droidbot: a lightweight ui-guided test input generator for android. In 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C). IEEE, 23--26.
    [40]
    Yuanchun Li, Ziyue Yang, Yao Guo, and Xiangqun Chen. 2019. Humanoid: a deep learning-based approach to automated black-box Android app testing. In ASE. IEEE, 1070--1073.
    [41]
    Hongliang Liang, Xiaoxiao Pei, Xiaodong Jia, Wuwei Shen, and Jian Zhang. 2018. Fuzzing: State of the art. IEEE Transactions on Reliability 67, 3 (2018), 1199--1218.
    [42]
    Tianyi Liang, Andrew Reynolds, Cesare Tinelli, Clark Barrett, and Morgan Deters. 2014. A DPLL (T) theory solver for a theory of strings and regular expressions. In Computer Aided Verification: 26th International Conference, CAV 2014, Held as Part of the Vienna Summer of Logic, VSL 2014, Vienna, Austria, July 18--22, 2014. Proceedings 26. Springer, 646--662.
    [43]
    Peng Liu, Xiangyu Zhang, Marco Pistoia, Yunhui Zheng, Manoel Marques, and Lingfei Zeng. 2017. Automatic text input generation for mobile testing. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 643--653.
    [44]
    Zhe Liu, Chunyang Chen, Junjie Wang, Xing Che, Yuekai Huang, Jun Hu, and Qing Wang. 2022. Fill in the Blank: Context-aware Automated Text Input Generation for Mobile GUI Testing. arXiv preprint arXiv:2212.04732 (2022).
    [45]
    Zhe Liu, Chunyang Chen, Junjie Wang, Yuekai Huang, Jun Hu, and Qing Wang. 2020. Owl Eyes: Spotting UI Display Issues via Visual Understanding. In ASE. IEEE.
    [46]
    Zhe Liu, Chunyang Chen, Junjie Wang, Yuekai Huang, Jun Hu, and Qing Wang. 2022. Nighthawk: Fully Automated Localizing UI Display Issues via Visual Understanding. IEEE Transactions on Software Engineering (2022), 1--16.
    [47]
    Zhe Liu, Chunyang Chen, Junjie Wang, Yuhui Su, Yuekai Huang, Jun Hu, and Qing Wang. 2023. Ex pede Herculem: Augmenting Activity Transition Graph for Apps via Graph Convolution Network. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1983--1995.
    [48]
    Zhe Liu, Chunyang Chen, Junjie Wang, Yuhui Su, and Qing Wang. 2022. NaviDroid: a tool for guiding manual Android testing via hint moves. In Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings. 154--158.
    [49]
    Zhe Liu, Chunyang Chen, Junjie Wang, and Qing Wang. 2022. Guided Bug Crush: Assist Manual GUI Testing of Android Apps via Hint Moves. In CHI 2022.
    [50]
    Li Lucy and David Bamman. 2021. Gender and representation bias in GPT-3 generated stories. In Proceedings of the Third Workshop on Narrative Understanding. 48--55.
    [51]
    Aravind Machiry, Rohan Tahiliani, and Mayur Naik. 2013. Dynodroid: An input generation system for android apps. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. 224--234.
    [52]
    Ke Mao, Mark Harman, and Yue Jia. 2016. Sapienz: Multi-objective automated testing for Android applications. In Proceedings of the 25th International Symposium on Software Testing and Analysis. 94--105.
    [53]
    Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. Computer Science (2013).
    [54]
    Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2022. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? arXiv preprint arXiv:2202.12837 (2022).
    [55]
    Noor Nashid, Mifta Sintaha, and Ali Mesbah. 2023. Retrieval-Based Prompt Selection for Code-Related Few-Shot Learning. In Proceedings of the 45th International Conference on Software Engineering (ICSE'23).
    [56]
    Minxue Pan, An Huang, Guoxin Wang, Tian Zhang, and Xuandong Li. 2020. Reinforcement learning based curiosity-driven testing of Android applications. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 153--164.
    [57]
    Gabriel Poesia, Oleksandr Polozov, Vu Le, Ashish Tiwari, Gustavo Soares, Christopher Meek, and Sumit Gulwani. 2022. Synchromesh: Reliable code generation from pre-trained language models. ICLR (2022).
    [58]
    Simon Poulding and Robert Feldt. 2017. Generating controllably invalid and atypical inputs for robustness testing. In 2017 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). IEEE, 81--84.
    [59]
    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 21 (2020), 140:1--140:67.
    [60]
    Vaibhav Rastogi, Yan Chen, and William Enck. 2013. Appsplayground: automatic security analysis of smartphone applications. In Proceedings of the third ACM conference on Data and application security and privacy. 209--220.
    [61]
    J Schulman, B Zoph, C Kim, J Hilton, J Menick, J Weng, JFC Uribe, L Fedus, L Metz, M Pokorny, et al. 2022. ChatGPT: Optimizing language models for dialogue.
    [62]
    Carolyn B. Seaman. 1999. Qualitative methods in empirical studies of software engineering. IEEE Transactions on software engineering 25, 4 (1999), 557--572.
    [63]
    Mike Sharples. 2022. Automated Essay Writing: An AIED Opinion. International Journal of Artificial Intelligence in Education (2022), 1--8.
    [64]
    Ting Su, Guozhu Meng, Yuting Chen, Ke Wu, Weiming Yang, Yao Yao, Geguang Pu, Yang Liu, and Zhendong Su. 2017. Guided, stochastic model-based GUI testing of Android apps. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 245--256.
    [65]
    Nezih Sunman, Yiğit Soydan, and Hasan Sözer. 2022. Automated web application testing driven by pre-recorded test cases. Journal of Systems and Software (2022), 111441.
    [66]
    Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2014. S3: A symbolic string solver for vulnerability detection in web applications. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. 1232--1243.
    [67]
    Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2017. Model counting for recursively-defined strings. In Computer Aided Verification: 29th International Conference, CAV 2017, Heidelberg, Germany, July 24--28, 2017, Proceedings, Part II 30. Springer, 399--418.
    [68]
    UIAutomator. 2021. Python wrapper of Android uiautomator test tool. https://github.com/xiaocong/uiautomator.
    [69]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems (2017).
    [70]
    Jue Wang, Yanyan Jiang, Chang Xu, Chun Cao, Xiaoxing Ma, and Jian Lu. 2020. Combodroid: generating high-quality test inputs for android apps via use case combinations. In ICSE. 469--480.
    [71]
    Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Yumao Lu, Zicheng Liu, and Lijuan Wang. 2022. An empirical study of gpt-3 for few-shot knowledge-based vqa. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 3081--3089.
    [72]
    Zhengran Zeng, Hanzhuo Tan, Haotian Zhang, Jing Li, Yuqun Zhang, and Lingming Zhang. 2022. An extensive study on pre-trained models for program understanding and generation. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 39--51.
    [73]
    Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. 2022. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 (2022).

    Cited By

    View all
    • (2024)Enhancing GUI Exploration Coverage of Android Apps with Deep Link-Integrated MonkeyACM Transactions on Software Engineering and Methodology10.1145/366481033:6(1-31)Online publication date: 27-Jun-2024

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering
    May 2024
    2942 pages
    ISBN:9798400702174
    DOI:10.1145/3597503
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    In-Cooperation

    • Faculty of Engineering of University of Porto

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 April 2024

    Check for updates

    Author Tags

    1. Android GUI testing
    2. large language model
    3. in-context learning

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China

    Conference

    ICSE '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 276 of 1,856 submissions, 15%

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)304
    • Downloads (Last 6 weeks)108
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Enhancing GUI Exploration Coverage of Android Apps with Deep Link-Integrated MonkeyACM Transactions on Software Engineering and Methodology10.1145/366481033:6(1-31)Online publication date: 27-Jun-2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media