research-article

Automatically Capturing Quality-Related Concerns in Bug Report Descriptions for Efficient Bug Triaging

Authors:

Rrezarta Krasniqi,

Hyunsook DoAuthors Info & Claims

EASE '22: Proceedings of the 26th International Conference on Evaluation and Assessment in Software Engineering

Pages 10 - 19

https://doi.org/10.1145/3530019.3530021

Published: 13 June 2022 Publication History

Abstract

In the early phases of a project, software architects and developers design solutions to satisfy quality concerns. However, as a byproduct of the long-term maintenance effort, qualities tend to erode, causing quality-related bugs to surface across the codebase. In principle, quality-related concerns not only can be expensive and difficult to detect, but they can have a detrimental effect on the system operating as intended. Moreover, quality-related concerns can directly affect users’ experiences at large. To address this problem, we build a quality-based bug classifier that leverages several feature selection techniques, TF-IDF, Chi-square (χ2), Mutual Information, and Extra Randomized Trees, including the incorporation of various machine learning algorithms. Our results indicate that Random Forest with the (TF-IDF+χ2) configuration achieved the best results for detecting six-quality related types, achieving a precision of 76%, recall of 70%, and F1 of 70%. However, the same approach returned low precision of 48%, recall of 15%, and F1 of 23% for detecting functional-related bugs. We argue that such low performance has resulted in an aftermath of overlapping content caused by functional and quality-related information which opens another challenging topic that we aim to expand in future work.

References

[1]

Charu C Aggarwal and ChengXiang Zhai. 2012. Mining text data. Springer Science & Business Media.

Digital Library

[2]

Karan Aggarwal, Finbarr Timbers, Tanner Rutgers, Abram Hindle, Eleni Stroulia, and Russell Greiner. 2017. Detecting duplicate bug reports with software engineering domain knowledge. Journal of Software: Evolution and Process 29, 3 (2017), e1821.

[3]

Thazin Win Win Aung, Yao Wan, Huan Huo, and Yulei Sui. 2022. Multi-triage: A multi-task learning framework for bug triage. Journal of Systems and Software 184 (2022), 111133.

Digital Library

[4]

Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiß, Rahul Premraj, and Thomas Zimmermann. 2007. Quality of bug reports in Eclipse. In 2007 OOPSLA workshop on eclipse technology eXchange. ACM, 21–25.

Digital Library

[5]

Barry Boehm and Victor R Basili. 2005. Software defect reduction top 10 list. Foundations of empirical software engineering: the legacy of Victor R. Basili 426, 37 (2005).

[6]

Barry W Boehm, John R Brown, and Mlity Lipow. 1976. Quantitative evaluation of software quality. In 2nd International Conference on Software engineering. IEEE Computer Society Press, 592–605.

[7]

Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5–32.

Digital Library

[8]

Rich Caruana and Alexandru Niculescu-Mizil. 2006. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd international conference on Machine learning. ACM, 161–168.

Digital Library

[9]

Oscar Chaparro, Juan Manuel Florez, and Andrian Marcus. 2019. Using bug descriptions to reformulate queries during text-retrieval-based bug localization. Empirical Software Engineering 24, 5 (2019), 2947–3007.

Digital Library

[10]

Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321–357.

[11]

Larissa Chazette and Kurt Schneider. 2020. Explainability as a non-functional requirement: challenges and recommendations. Requirements Engineering 25, 4 (2020), 493–514.

Digital Library

[12]

Lawrence Chung, Brian A Nixon, Eric Yu, and John Mylopoulos. 2012. Non-functional requirements in software engineering. Vol. 5. Springer Science & Business Media.

[13]

Paolo Ciancarini, Angelo Messina, Francesco Poggi, and Daniel Russo. 2018. Agile knowledge engineering for mission critical software requirements. In Synergies Between Knowledge Engineering & Software Engineering. Springer, 151–171.

[14]

Thomas M Cover, Peter E Hart, 1967. Nearest neighbor pattern classification. IEEE transactions on information theory 13, 1 (1967), 21–27.

Digital Library

[15]

Manoranjan Dash and Huan Liu. 1997. Feature selection for classification. Intelligent data analysis 1, 3 (1997), 131–156.

Digital Library

[16]

Organización Internacional de Normalización. 2011. ISO-IEC 25010: Systems and Software Engineering-Systems and Software Quality Requirements and Evaluation -System and Software Quality Models. ISO.

[17]

R. Geoff Dromey. 1995. A model for software product quality. IEEE TSE 21, 2 (1995), 146–162.

Digital Library

[18]

Paul M Duvall, Steve Matyas, and Andrew Glover. 2007. Continuous integration: improving software quality and reducing risk. Pearson Education.

[19]

Jonas Eckhardt, Andreas Vogelsang, and Daniel Méndez Fernández. 2016. On the distinction of functional and quality requirements in practice. In International Conference on Product-Focused Software Process Improvement. Springer, 31–47.

[20]

Jayalath Bandara Ekanayake. 2021. Predicting Bug Priority Using Topic Modelling in Imbalanced Learning Environments. International Journal of Systems and Service-Oriented Engineering (IJSSOE) 11, 1 (2021), 31–42.

[21]

Katrin Erk and Sebastian Padó. 2008. A structured vector space model for word meaning in context. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. 897–906.

[22]

Tom Fawcett. 2004. ROC graphs: Notes and practical considerations for researchers. ML 31, 1 (2004), 1–38.

[23]

Ronen Feldman, James Sanger, 2007. The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge university press.

[24]

International Organization for Standardization. 2005. Quality Management Systems–Fundamentals and Vocabulary. International Organization for Standardization.

[25]

George Forman. 2003. An extensive empirical study of feature selection metrics for text classification. Journal of ML Research 3, Mar (2003), 1289–1305.

[26]

Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely randomized trees. Machine learning 63, 1 (2006), 3–42.

[27]

Martin Glinz. 2005. Rethinking the notion of non-functional requirements. In Proc. Third World Congress for Software Quality, Vol. 2. 55–64.

[28]

Katerina Goseva-Popstojanova and Jacob Tyo. 2018. Identification of Security Related Bug Reports via Text Mining Using Supervised and Unsupervised Classification. In 2018 IEEE International Conference on Software Quality, Reliability and Security (QRS). IEEE, 344–355.

[29]

Donna D Gregorio. 2012. How the Business Analyst supports and encourages collaboration on agile projects. In International Systems Conference. IEEE, 1–4.

[30]

Eduard C Groen, Sylwia Kopczyńska, Marc P Hauer, Tobias D Krafft, and Joerg Doerr. 2017. Users—the hidden software product quality experts?: A study on how app users report quality aspects in online reviews. In 2017 IEEE 25th international requirements engineering conference (RE). IEEE, 80–89.

[31]

Rahul Gupta, Jay Pujara, Craig A Knoblock, Shushyam M Sharanappa, Bharat Pulavarti, Gerard Hoberg, and Gordon Phillips. 2018. Feature selection methods for understanding business competitor relationships. In Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets. 1–6.

Digital Library

[32]

Julian Harty. 2011. Finding usability bugs with automated tests. Commun. ACM 54, 2 (2011), 44–49.

Digital Library

[33]

Haibo He and Edwardo A Garcia. 2009. Learning from imbalanced data. IEEE Data Engineering 21, 9 (2009), 1263–1284.

Digital Library

[34]

Haibo He and Edwardo A Garcia. 2009. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering 21, 9(2009), 1263–1284.

Digital Library

[35]

Haruna Isotani, Hironori Washizaki, Yoshiaki Fukazawa, Tsutomu Nomoto, Saori Ouji, and Shinobu Saito. 2021. Duplicate Bug Report Detection by Using Sentence Embedding and Fine-tuning. In 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 535–544.

[36]

I Jacobson, G Booch, and J Rumbaugh. 1999. The Unifed Software Development Process.

[37]

Nicholas Jalbert and Westley Weimer. 2008. Automated duplicate detection for bug tracking systems. In Dependable Systems and Networks With FTCS and DCC. IEEE International Conference on. IEEE, 52–61.

[38]

Nishant Jha and Anas Mahmoud. 2019. Mining non-functional requirements from app store reviews. Empirical Software Engineering 24, 6 (2019), 3659–3695.

[39]

Guoliang Jin, Linhai Song, Xiaoming Shi, Joel Scherpelz, and Shan Lu. 2012. Understanding and Detecting Real-World Performance Bugs. ACM 47, 6 (2012), 77–88.

Digital Library

[40]

Li-Ping Jing, Hou-Kuan Huang, and Hong-Bo Shi. 2002. Improved feature selection approach TFIDF in text mining. In Machine Learning and Cybernetics, 2002. International Conference on, Vol. 2. IEEE, 944–946.

[41]

Gerald Kotonya and Ian Sommerville. 1998. Requirements engineering: processes and techniques. Wiley Publishing.

Digital Library

[42]

Rrezarta Krasniqi and Ankit Agrawal. 2021. Analyzing and Detecting Emerging Quality-Related Concerns across OSS Defect Report Summaries. In 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering. IEEE, 12–23.

[43]

Rrezarta Krasniqi and Jane Cleland-Huang. 2020. Enhancing source code refactoring detection with explanations from commit messages. In 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 512–516.

[44]

Sari Kujala, Marjo Kauppinen, Laura Lehtola, and Tero Kojo. 2005. The role of user involvement in requirements quality and project success. In 13th International Conference on Requirements Engineering. IEEE, 75–84.

Digital Library

[45]

Zijad Kurtanović and Walid Maalej. 2017. Automatically classifying functional and non-functional requirements using supervised machine learning. In 2017 IEEE 25th International Requirements Engineering Conference (RE). Ieee, 490–495.

[46]

Ahmed Lamkanfi, Serge Demeyer, Quinten David Soetens, and Tim Verdonck. 2011. Comparing mining algorithms for predicting the severity of a reported bug. In Software Maintenance and Reengineering. IEEE, 249–258.

[47]

Stefan Lessmann, Bart Baesens, Christophe Mues, and Swantje Pietsch. 2008. Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE TSE 34, 4 (2008), 485–496.

Digital Library

[48]

Feng-Lin Li, Jennifer Horkoff, Alexander Borgida, Giancarlo Guizzardi, Lin Liu, and John Mylopoulos. 2015. From stakeholder requirements to formal specifications through refinement. In International Conference on Requirements Engineering: Foundation for Software Quality. Springer, 164–180.

[49]

William Lidwell, Kritina Holden, and Jill Butler. 2010. Universal principles of design, revised and updated: 125 ways to enhance usability, influence perception, increase appeal, make better design decisions, and teach through design. Rockport.

[50]

Edward Loper and Steven Bird. 2002. NLTK: the natural language toolkit. arXiv (2002).

Digital Library

[51]

Gilles Louppe, Louis Wehenkel, Antonio Sutera, and Pierre Geurts. 2013. Understanding variable importances in forests of randomized trees. In Advances in Neural Processing Systems. 431–439.

[52]

Mengmeng Lu and Peng Liang. 2017. Automatic Classification of Non-Functional Requirements from Augmented App User Reviews. In Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering. ACM, 344–353.

Digital Library

[53]

Walid Maalej, Zijad Kurtanović, Hadeer Nabil, and Christoph Stanik. 2016. On the automatic classification of app reviews. RE 21, 3 (2016), 311–331.

Digital Library

[54]

Ginika Mahajan and Neha Chaudhary. 2022. Bug Classifying and Assigning System (BCAS): An Automated Framework to Classify and Assign Bugs. In Smart Systems: Innovations in Computing. Springer, 537–547.

[55]

Muhammad Imran Manzoor, Momina Shaheen, Hudaibia Khalid, Aimen Anum, Nisar Hussain, and M Rehan Faheem. 2018. Requirement Elicitation Methods for Cloud Providers in IT Industry.Journal of Modern Education 10, 10 (2018).

[56]

Andrew McCallum, Kamal Nigam, 1998. A comparison of event models for naive bayes text classification. In Workshop on learning for text categorization), Vol. 752. Citeseer, 41–48.

[57]

Thomas Moscibroda and Onur Mutlu. 2007. Memory performance attacks: Denial of memory service in multi-core systems. In Proceedings of 16th USENIX Security Symposium. USENIX, 18.

[58]

Bashar Nuseibeh and Steve Easterbrook. 2000. Requirements engineering: a roadmap. In Proceedings of the Conference on Software Engineering. ACM, 35–46.

Digital Library

[59]

Cristina Palomares, Xavier Franch, Carme Quer, Panagiota Chatzipetrou, Lidia López, and Tony Gorschek. 2021. The state-of-practice in requirements elicitation: an extended interview study at 12 companies. RE 26, 2 (2021), 273–299.

[60]

Daniel Pletea, Bogdan Vasilescu, and Alexander Serebrenik. 2014. Security and emotion: sentiment analysis of security discussions on github. In Proceedings of the 11th working conference on mining software repositories. 348–351.

Digital Library

[61]

Martin F Porter. 1980. An algorithm for suffix stripping. Program 14, 3 (1980), 130–137.

[62]

Michael Rath, David Lo, and Patrick Mäder. 2018. Analyzing requirements and traceability information to improve bug localization. In Proceedings of the 15th Conference on Mining Software Repositories. 442–453.

Digital Library

[63]

Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management 24, 5 (1988), 513–523.

[64]

Walt Scacchi. 2002. Understanding the requirements for developing open source software systems. IEEE Proceedings-Software 149, 1 (2002), 24–39.

[65]

Riccardo Scandariato, James Walden, Aram Hovsepyan, and Wouter Joosen. 2014. Predicting vulnerable software components via text mining. IEEE Transactions on Software Engineering 40, 10 (2014), 993–1006.

[66]

Fabrizio Sebastiani. 2002. Machine learning in automated text categorization. ACM computing surveys 34, 1 (2002), 1–47.

[67]

P Sharma and S Dhir. 2016. Functional & non-functional requirement elicitation and risk assessment for agile processes. International Journal of Control Theory and Applications 9, 18(2016), 9005–9010.

[68]

Frederick T. Sheldon, Krishna M. Kavi, Robert C Tausworthe, James T. Yu, Ralph Brettschneider, and William W. Everett. 1992. Reliability measurement: From theory to practice. IEEE Software 9, 4 (1992), 13–20.

Digital Library

[69]

Emad Shihab, Akinori Ihara, Yasutaka Kamei, Walid M Ibrahim, Masao Ohira, Bram Adams, Ahmed E Hassan, and Ken-ichi Matsumoto. 2013. Studying re-opened bugs in open source software. EMSE 18, 5 (2013), 1005–1042.

[70]

Andreia Silva, Plácido Pinheiro, Adriano Albuquerque, and Jônatas Barroso. 2016. A process for creating the elicitation guide of non-functional requirements. In Computer Science On-line Conference. Springer, 293–302.

[71]

Richard Sproat, Alan W Black, Stanley Chen, Shankar Kumar, Mari Ostendorf, and Christopher Richards. 2001. Normalization of non-standard words. Computer speech & language 15, 3 (2001), 287–333.

[72]

Reinhard Stoiber. 2020. Utilising Perspectives to Improve Completeness in Industrial Requirements Specifications. In 2020 IEEE 28th International Requirements Engineering Conference (RE). IEEE, 408–409.

[73]

Pang-Ning Tan. 2018. Introduction to data mining. Pearson.

[74]

Yuan Tian, Dinusha Wijedasa, David Lo, and Claire Le Goues. 2016. Learning to rank for bug report assignee recommendation. In Program Comprehension (ICPC), 2016 IEEE 24th International Conference on. IEEE, 1–10.

[75]

Michael B Twidale and David M Nichols. 2005. Exploring usability discussions in open source development. In Proceedings of the 38th Annual Hawaii International Conference on. IEEE, 198c–198c.

Digital Library

[76]

Vladimir Naumovich Vapnik. 1999. An overview of statistical learning theory. Transactions on neural networks 10, 5 (1999), 988–999.

Digital Library

[77]

Stephan Vogel, Sanjika Hewavitharana, Muntsin Kolss, and Alex Waibel. 2004. The ISL statistical translation system for spoken language translation. In Proceedings of the 1st International Workshop on Spoken Language Translation.

[78]

Stefan Wagner, Daniel Méndez Fernández, Michael Felderer, Antonio Vetrò, Marcos Kalinowski, Roel Wieringa, Dietmar Pfahl, Tayana Conte, Marie-Therese Christiansson, Desmond Greer, 2019. Status quo in requirements engineering: A theory and a global family of surveys. ACM Transactions on Software Engineering and Methodology (TOSEM) 28, 2(2019), 1–48.

Digital Library

[79]

James Walden, Jeff Stuckman, and Riccardo Scandariato. 2014. Predicting vulnerable components: Software metrics vs text mining. In 2014 IEEE 25th international symposium on software reliability engineering. IEEE, 23–33.

Digital Library

[80]

Chong Wang, Fan Zhang, Peng Liang, Maya Daneva, and Marten van Sinderen. 2018. Can app change logs improve requirements classification from app reviews?: an exploratory study. In 12th International Symposium on Empirical Software Methodologies and Engineering. ACM, 43.

Digital Library

[81]

Dumidu Wijayasekara, Milos Manic, and Miles McQueen. 2014. Vulnerability identification and classification via text mining bug databases. In IECON 2014-40th Annual Conference of the IEEE Industrial Electronics Society. IEEE, 3612–3618.

[82]

Grant Williams and Anas Mahmoud. 2017. Mining twitter feeds for software user requirements. In 2017 IEEE 25th International Requirements Engineering Conference (RE). IEEE, 1–10.

[83]

Xin Xia, David Lo, Emad Shihab, Xinyu Wang, and Bo Zhou. 2015. Automatic, high accuracy prediction of reopened bugs. Automated Software Engineering 22, 1 (2015), 75–109.

Digital Library

[84]

Jingwei Yang and Lin Liu. 2020. What Users Think about Predictive Analytics? A Survey on NFRs. In 2020 IEEE 28th International Requirements Engineering Conference (RE). IEEE, 340–345.

[85]

Shahed Zaman, Bram Adams, and Ahmed E. Hassan. 2012. A qualitative study on performance bugs. In 9th Conference of Mining Software Repositories, MSR June 2-3, 2012, Zurich, Switzerland. 199–208.

Digital Library

[86]

Deqing Zou, Zhijun Deng, Zhen Li, and Hai Jin. 2018. Automatically identifying security bug reports via multitype features analysis. In Information Security and Privacy. Springer, 619–633.

Cited By

Qian CZhang MNie YLu SCao H(2023)A Survey on Bug Deduplication and Triage Methods from Multiple Points of ViewApplied Sciences10.3390/app1315878813:15(8788)Online publication date: 29-Jul-2023
https://doi.org/10.3390/app13158788
Carneiro GFerreira JRamalho FMassoni T(2023)Similar Bug Reports Recommendation System using BERTProceedings of the XXXVII Brazilian Symposium on Software Engineering10.1145/3613372.3613396(378-387)Online publication date: 25-Sep-2023
https://doi.org/10.1145/3613372.3613396
Krasniqi RDo H(2023)Capturing Contextual Relationships of Buggy Classes for Detecting Quality-Related Bugs2023 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME58846.2023.00048(375-379)Online publication date: 1-Oct-2023
https://doi.org/10.1109/ICSME58846.2023.00048
Show More Cited By

Recommendations

A multi-model framework for semantically enhancing detection of quality-related bug report descriptions
Abstract
Maintaining and delivering a high-quality software system is a delicate process. One way to ensure that a software system achieves the desired quality is to systematically monitor and timely address quality-related concerns. Quality concerns, such ...
An Empirical Study of Bug Fixing Rate
COMPSAC '15: Proceedings of the 2015 IEEE 39th Annual Computer Software and Applications Conference - Volume 02

Bug fixing is one of the most important activities in software development and maintenance. A software project often employs an issue tracking system such as Bugzilla to store and manage their bugs. In the issue tracking system, many bugs are invalid ...
Analysis of Bug Report Qualities with Fixing Time using a Bayesian Network
EASE '23: Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering

Most client software employs a bug-tracking system, which utilizes user-submitted reports (bug reports) that contain information necessary for software developers to fix bugs. The quality of bug reports drastically differs. Bug reports can include ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

EASE '22: Proceedings of the 26th International Conference on Evaluation and Assessment in Software Engineering

June 2022

466 pages

ISBN:9781450396134

DOI:10.1145/3530019

Editors:
Miroslaw Staron
Chalmers | University of Gothenburg, Sweden
,
Christian Berger
Chalmers | University of Gothenburg, Sweden
,
Jocelyn Simmonds
University of Chile, Chile
,
Rafael Prikladnicki
School of Technology at PUCRS University, Brazil

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

EASE 2022

EASE 2022: The International Conference on Evaluation and Assessment in Software Engineering 2022

June 13 - 15, 2022

Gothenburg, Sweden

Acceptance Rates

Overall Acceptance Rate 71 of 232 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
99
Total Downloads

Downloads (Last 12 months)29
Downloads (Last 6 weeks)4

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Qian CZhang MNie YLu SCao H(2023)A Survey on Bug Deduplication and Triage Methods from Multiple Points of ViewApplied Sciences10.3390/app1315878813:15(8788)Online publication date: 29-Jul-2023
https://doi.org/10.3390/app13158788
Carneiro GFerreira JRamalho FMassoni T(2023)Similar Bug Reports Recommendation System using BERTProceedings of the XXXVII Brazilian Symposium on Software Engineering10.1145/3613372.3613396(378-387)Online publication date: 25-Sep-2023
https://doi.org/10.1145/3613372.3613396
Krasniqi RDo H(2023)Capturing Contextual Relationships of Buggy Classes for Detecting Quality-Related Bugs2023 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME58846.2023.00048(375-379)Online publication date: 1-Oct-2023
https://doi.org/10.1109/ICSME58846.2023.00048
Krasniqi RDo H(2023)A multi-model framework for semantically enhancing detection of quality-related bug report descriptionsEmpirical Software Engineering10.1007/s10664-022-10280-w28:2Online publication date: 11-Feb-2023
https://dl.acm.org/doi/10.1007/s10664-022-10280-w

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents