Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3540250.3549089acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article
Open access

Putting them under microscope: a fine-grained approach for detecting redundant test cases in natural language

Published: 09 November 2022 Publication History

Abstract

Natural language (NL) documentation is the bridge between software managers and testers, and NL test cases are prevalent in system-level testing and other quality assurance activities. Due to reasons such as requirements redundancy, parallel testing, tester turn-over within long evolving history, there are inevitably lots of redundant test cases, which significantly increase the cost. Previous redundancy detection approaches typically treat the textual descriptions as a whole to compare their similarity and suffer from low precision. Our observation reveals that a test case can have explicit test-oriented entities, such as tested function Components, Constraints, etc; and there are also specific relations between these entities. This inspires us with a potential opportunity for accurate redundancy detection. In this paper, we first define five test-oriented entity categories and four associated relation categories, and re-formulate the NL test case redundancy detection problem as the comparison of detailed testing content guided by the test-oriented entities and relations. Following that, we propose Tscope, a fine-grained approach for redundant NL test case detection by dissecting test cases into atomic test tuple(s) with the entities restricted by associated relations. To serve as the test case dissection, Tscope designs a context-aware model for the automatic entity and relation extraction. Evaluation on 3,467 test cases from ten projects shows Tscope could achieve 91.8% precision, 74.8% recall and 82.4% F1, significantly outperforming state-of-the-art approaches and commonly-used classifiers. This new formulation of the NL test case redundant detection problem can motivate the follow-up studies in further improving this task and other related tasks involving NL descriptions.

References

[1]
Stephan Arlt, Tobias Morciniec, Andreas Podelski, and Silke Wagner. 2015. If A fails, can B still succeed? Inferring dependencies between test results in automotive system testing. In 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST). 1–10.
[2]
Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to-beat baseline for sentence embeddings. In International conference on learning representations.
[3]
Giannis Bekoulis, Johannes Deleu, Thomas Demeester, and Chris Develder. 2018. Joint entity recognition and relation extraction as a multi-head selection problem. Expert Systems with Applications, 114 (2018), 34–45.
[4]
Jonathan Bell. 2014. Detecting, isolating, and enforcing dependencies among and within test cases. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 799–802.
[5]
Lionel Briand and Yvan Labiche. 2002. A UML-based approach to system testing. Software and systems modeling, 1, 1 (2002), 10–42.
[6]
Mr Brijain, R Patel, MR Kushik, and K Rana. 2014. A survey on decision tree algorithm for classification.
[7]
Michael Buckland and Fredric Gey. 1994. The relationship between recall and precision. Journal of the American society for information science, 45, 1 (1994), 12–19.
[8]
Emanuela G Cartaxo, Patrícia DL Machado, and Francisco G Oliveira Neto. 2011. On the use of a similarity function for test case selection in the context of model-based testing. Software Testing, Verification and Reliability, 21, 2 (2011), 75–100.
[9]
Emanuela G Cartaxo, Francisco GO Neto, and Patricia DL Machado. 2007. Test case generation by means of UML sequence diagrams and labeled transition systems. In 2007 IEEE International Conference on Systems, Man and Cybernetics. 1292–1297.
[10]
Tsong Yueh Chen, Fei-Ching Kuo, Robert G Merkel, and TH Tse. 2010. Adaptive random testing: The art of test case diversity. Journal of Systems and Software, 83, 1 (2010), 60–66.
[11]
Yahui Chen. 2015. Convolutional neural network for sentence classification. Master’s thesis. University of Waterloo.
[12]
Emilio Cruciani, Breno Miranda, Roberto Verdecchia, and Antonia Bertolino. 2019. Scalable approaches for test suite reduction. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 419–429.
[13]
Kalpit Dixit and Yaser Al-Onaizan. 2019. Span-level model for relation extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5308–5314.
[14]
Emelie Engström and Per Runeson. 2013. Test overlay in an emerging software product line–an industrial case study. Information and Software Technology, 55, 3 (2013), 581–594.
[15]
Chunrong Fang, Zhenyu Chen, Kun Wu, and Zhihong Zhao. 2014. Similarity-based test case prioritization using ordered sequences of program entities. Software Quality Journal, 22, 2 (2014), 335–361.
[16]
ChunRong Fang, ZhenYu Chen, and BaoWen Xu. 2012. Comparing logic coverage criteria on test case prioritization. Science China Information Sciences, 55, 12 (2012), 2826–2840.
[17]
Gordon Fraser and Franz Wotawa. 2007. Redundancy based test-suite reduction. In International Conference on Fundamental Approaches to Software Engineering. 291–305.
[18]
Michael Gordon and Manfred Kochen. 1989. Recall-precision trade-off: A derivation. Journal of the American Society for Information Science, 40, 3 (1989), 145–151.
[19]
Amir Hadifar, Lucas Sterckx, Thomas Demeester, and Chris Develder. 2019. A self-training approach for short text clustering. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019). 194–199.
[20]
Jiawei Han, Micheline Kamber, and Jian Pei. 2011. Data Mining: Concepts and Techniques, 3rd edition. Morgan Kaufmann.
[21]
Hadi Hemmati, Andrea Arcuri, and Lionel Briand. 2013. Achieving scalable model-based testing through test case diversity. ACM Transactions on Software Engineering and Methodology (TOSEM), 22, 1 (2013), 1–42.
[22]
Hwa-You Hsu and Alessandro Orso. 2009. MINTS: A general framework and tool for supporting test-suite minimization. In 2009 IEEE 31st international conference on software engineering. 419–429.
[23]
Qiao Huang, Xin Xia, David Lo, and Gail C. Murphy. 2018. Automating Intention Mining. IEEE Transactions on Software Engineering, PP, 99 (2018), 1–1.
[24]
Yuekai Huang, Junjie Wang, Song Wang, Zhe Liu, Yuanzhe Hu, and Qing Wang. 2020. Quest for the Golden Approach: An Experimental Evaluation of Duplicate Crowdtesting Reports Detection. In Proceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1–12.
[25]
Haruna Isotani, Hironori Washizaki, Yoshiaki Fukazawa, Tsutomu Nomoto, Saori Ouji, and Shinobu Saito. 2021. Duplicate Bug Report Detection by Using Sentence Embedding and Fine-tuning. In 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). 535–544.
[26]
Dennis Jeffrey and Neelam Gupta. 2005. Test suite reduction with selective redundancy. In 21st IEEE International Conference on Software Maintenance (ICSM’05). 549–558.
[27]
Tian Jiang, Lin Tan, and Sunghun Kim. 2013. Personalized defect prediction. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering, ASE 2013, Silicon Valley, CA, USA, November 11-15, 2013. 279–289.
[28]
Negar Koochakzadeh and Vahid Garousi. 2010. A tester-assisted methodology for test redundancy detection. Advances in Software Engineering, 2010 (2010).
[29]
Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In International conference on machine learning. 957–966.
[30]
Bagavathi Lakshmi and S Parthasarathy. 2019. Human action recognition using median background and max pool convolution with nearest neighbor. International Journal of Ambient Computing and Intelligence (IJACI), 10, 2 (2019), 34–47.
[31]
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International conference on machine learning. 1188–1196.
[32]
Linyi Li, Zhenwen Li, Weijie Zhang, Jun Zhou, Pengcheng Wang, Jing Wu, Guanghua He, Xia Zeng, Yuetang Deng, and Tao Xie. 2020. Clustering test steps in natural language toward automating test automation. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1285–1295.
[33]
Yue Liu, Kang Wang, Wang Wei, Bofeng Zhang, and Hailin Zhong. 2011. User-session-based test cases optimization method based on agglutinate hierarchy clustering. In 2011 International Conference on Internet of Things and 4th International Conference on Cyber, Physical and Social Computing. 413–418.
[34]
Y Lou, J Chen, L Zhang, and D Hao. 2019. Chapter one-a survey on regression test-case prioritization. vol. 113 of Advances in Computers.
[35]
Harish Tayyar Madabushi, Elena Kochkina, and Michael Castelle. 2020. Cost-Sensitive BERT for Generalisable Sentence Classification with Imbalanced Data. CoRR, abs/2003.11563 (2020).
[36]
Dusica Marijan and Marius Liaaen. 2018. Practical selective regression testing with effective redundancy in interleaved tests. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice. 153–162.
[37]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. Computer Science.
[38]
Joaquim Motger de la Encarnación, Cristina Palomares Bonache, and Jordi Marco Gómez. 2020. RESim-Automated detection of duplicated requirements in software engineering projects. In Joint Proceedings of REFSQ-2020 Workshops, Doctoral Symposium, Live Studies Track, and Poster Track: co-located with the 26th International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ 2020): Pisa, Italy, March 24, 2020. 1–6.
[39]
Arjun Srinivas Nayak, Ananthu P Kanive, Naveen Chandavekar, and R Balasubramani. 2016. Survey on pre-processing techniques for text mining. International Journal of Engineering and Computer Science, 5, 6 (2016), 16875–16879.
[40]
Jeff Offutt, Jie Pan, and Jeffrey M Voas. 1995. Procedures for reducing the size of coverage-based test sets. In Proceedings of the 12th International Conference on Testing Computer Software. 111–123.
[41]
Yifan Qiao, Chenyan Xiong, Zhenghao Liu, and Zhiyuan Liu. 2019. Understanding the Behaviors of BERT in Ranking. arXiv preprint arXiv:1904.07531.
[42]
Faisal Rahutomo, Teruaki Kitasuka, and Masayoshi Aritsugi. 2012. Semantic cosine similarity. In The 7th International Student Conference on Advanced Science and Technology ICAST. 4, 1.
[43]
Gregg Rothermel, Mary Jean Harrold, Jeffery Ostrin, and Christie Hong. 1998. An empirical study of the effects of minimization on the fault detection capabilities of test suites. In Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272). 34–43.
[44]
Gregg Rothermel, Mary Jean Harrold, Jeffery Von Ronne, and Christie Hong. 2002. Empirical studies of test-suite reduction. Software Testing, Verification and Reliability, 12, 4 (2002), 219–249.
[45]
Burr Settles and Mark Craven. 2008. An analysis of active learning strategies for sequence labeling tasks. In Conference on Empirical Methods in Natural Language Processing.
[46]
Seung Yeob Shin, Shiva Nejati, Mehrdad Sabetzadeh, Lionel C Briand, and Frank Zimmer. 2018. Test case prioritization for acceptance testing of cyber physical systems: a multi-objective search-based approach. In Proceedings of the 27th acm sigsoft international symposium on software testing and analysis. 49–60.
[47]
Cong Sun and Zhihao Yang. 2019. Transfer Learning in Biomedical Named Entity Recognition: An Evaluation of BERT in the PharmaCoNER task. In Proceedings of The 5th Workshop on BioNLP Open Shared Tasks, BioNLP-OST@EMNLP-IJNCLP 2019, Hong Kong, China, November 4, 2019. 100–104.
[48]
Sahar Tahvili, Marcus Ahlberg, Eric Fornander, Wasif Afzal, Mehrdad Saadatmand, Markus Bohlin, and Mahdi Sarabi. 2018. Functional dependency detection for integration test cases. In 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C). 207–214.
[49]
Sahar Tahvili, Leo Hatvani, Michael Felderer, Wasif Afzal, and Markus Bohlin. 2019. Automated functional dependency detection between test cases using doc2vec and clustering. In 2019 IEEE International Conference On Artificial Intelligence Testing (AITest). 19–26.
[50]
Chuanqi Tan, Wei Qiu, Mosha Chen, Rui Wang, and Fei Huang. 2020. Boundary enhanced neural span classification for nested named entity recognition. In Proceedings of the AAAI Conference on Artificial Intelligence. 34, 9016–9023.
[51]
Eero J Uusitalo, Marko Komssi, Marjo Kauppinen, and Alan M Davis. 2008. Linking requirements and testing in practice. In 2008 16th IEEE International Requirements Engineering Conference. 265–270.
[52]
Vladimir Vapnik. 1999. The nature of statistical learning theory. Springer science & business media.
[53]
Markos Viggiato, Dale Paas, Chris Buzon, and Cor-Paul Bezemer. 2021. Identifying Similar Test Cases That Are Specified in Natural Language. arXiv preprint arXiv:2110.07733.
[54]
Chunhui Wang, Fabrizio Pastore, Arda Goknil, and Lionel Briand. 2020. Automatic generation of acceptance test cases from use case specifications: an nlp-based approach. IEEE Transactions on Software Engineering.
[55]
Liting Wang, Li Zhang, and Jing Jiang. 2019. Detecting duplicate questions in stack overflow via deep learning approaches. In 2019 26th Asia-Pacific Software Engineering Conference (APSEC). 506–513.
[56]
Yijin Xiong, Yukun Feng, Hao Wu, Hidetaka Kamigaito, and Manabu Okumura. 2021. Fusing Label Embedding into BERT: An Efficient Improvement for Text Classification. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 1743–1750.
[57]
Shin Yoo and Mark Harman. 2012. Regression testing minimization, selection and prioritization: a survey. Software testing, verification and reliability, 22, 2 (2012), 67–120.
[58]
Kamal Z Zamli, Norasyikin Safieny, and Fakhrud Din. 2018. Hybrid test redundancy reduction strategy based on global neighborhood algorithm and simulated annealing. In Proceedings of the 2018 7th International Conference on Software and Computer Applications. 87–91.
[59]
Justyna Zander-Nowicka, Pieter J Mosterman, and Ina Schieferdecker. 2008. Quality of test specification by application of patterns. In Proceedings of the 15th Conference on Pattern Languages of Programs. 1–6.
[60]
Suncong Zheng, Feng Wang, Hongyun Bao, Yuexing Hao, Peng Zhou, and Bo Xu. 2017. Joint extraction of entities and relations based on a novel tagging scheme. arXiv preprint arXiv:1706.05075.
[61]
Suncong Zheng, Jiaming Xu, Hongyun Bao, Zhenyu Qi, Jie Zhang, Hongwei Hao, and Bo Xu. 2016. Joint learning of entity semantics and relation pattern for relation extraction. In Joint european conference on machine learning and knowledge discovery in databases. 443–458.
[62]
Rodolfo Zunino and Paolo Gastaldo. 2002. Analog implementation of the SoftMax function. In IEEE International Symposium on Circuits & Systems.

Cited By

View all
  • (2024)LTM: Scalable and Black-Box Similarity-Based Test Suite Minimization Based on Language ModelsIEEE Transactions on Software Engineering10.1109/TSE.2024.346958250:11(3053-3070)Online publication date: 1-Nov-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
November 2022
1822 pages
ISBN:9781450394130
DOI:10.1145/3540250
This work is licensed under a Creative Commons Attribution 4.0 International License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Entity and Relation Extraction
  2. Natural Language Processing
  3. Test Case Redundancy

Qualifiers

  • Research-article

Funding Sources

  • the National Key Research and Development Program of China

Conference

ESEC/FSE '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)142
  • Downloads (Last 6 weeks)23
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)LTM: Scalable and Black-Box Similarity-Based Test Suite Minimization Based on Language ModelsIEEE Transactions on Software Engineering10.1109/TSE.2024.346958250:11(3053-3070)Online publication date: 1-Nov-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media