research-article

Issue Report Validation in an Industrial Context

Authors:

Ethem Utku Aktas,

Mete Cihad Inan,

Cemal YilmazAuthors Info & Claims

ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 2026 - 2031

https://doi.org/10.1145/3611643.3613887

Published: 30 November 2023 Publication History

Abstract

Effective issue triaging is crucial for software development teams to improve software quality, and thus customer satisfaction. Validating issue reports manually can be time-consuming, hindering the overall efficiency of the triaging process. This paper presents an approach on automating the validation of issue reports to accelerate the issue triaging process in an industrial set-up. We work on 1,200 randomly selected issue reports in banking domain, written in Turkish, an agglutinative language, meaning that new words can be formed with linear concatenation of suffixes to express entire sentences. We manually label these reports for validity, and extract the relevant patterns indicating that they are invalid. Since the issue reports we work on are written in an agglutinative language, we use morphological analysis to extract the features. Using the proposed feature extractors, we utilize a machine learning based approach to predict the issue reports’ validity, performing a 0.77 F1-score.

References

[1]

Ahmet Afsin Ak∈ and Mehmet Dündar Ak∈. 2007. Zemberek, an open source NLP framework for Turkic languages. Structure, 10, 2007 (2007), 1–5.

[2]

Ethem Utku Aktas and Cemal Yilmaz. 2020. Automated issue assignment: results and insights from an industrial case. Empirical Software Engineering, 25, 5 (2020), 3544–3589.

Digital Library

[3]

Ethem Utku Aktas and Cemal Yilmaz. 2022. Using Screenshot Attachments in Issue Reports for Triaging. Empirical Software Engineering, 27, 7 (2022), 181.

Digital Library

[4]

Giuliano Antoniol, Kamel Ayari, Massimiliano Di Penta, Foutse Khomh, and Yann-Gaël Guéhéneuc. 2008. Is it a bug or an enhancement? A text-based approach to classify change requests. In Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds. 304–318.

[5]

Shikhar Bharadwaj and Tushar Kadam. 2022. Github issue classification using bert-style models. In 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE). 40–43.

Digital Library

[6]

Oscar Chaparro, Jing Lu, Fiorella Zampetti, Laura Moreno, Massimiliano Di Penta, Andrian Marcus, Gabriele Bavota, and Vincent Ng. 2017. Detecting missing information in bug descriptions. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 396–407.

Digital Library

[7]

Giuseppe Colavito, Filippo Lanubile, and Nicole Novielli. 2022. Issue report classification using pre-trained language models. In 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE). 29–32.

Digital Library

[8]

Cagri Cöltekin. 2010. A Freely Available Morphological Analyzer for Turkish. In LREC. 2, 19–28.

[9]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

[10]

Geoff Dougherty. 2012. Pattern recognition and classification: an introduction. Springer Science & Business Media.

[11]

Jianjun He, Ling Xu, Yuanrui Fan, Zhou Xu, Meng Yan, and Yan Lei. 2020. Deep learning based valid bug reports determination and explanation. In 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE). 184–194.

[12]

Steffen Herbold, Alexander Trautsch, and Fabian Trautsch. 2020. On the feasibility of automated prediction of bug and non-bug issues. Empirical Software Engineering, 25 (2020), 5333–5369.

Digital Library

[13]

Kim Herzig, Sascha Just, and Andreas Zeller. 2013. It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In 2013 35th international conference on software engineering (ICSE). 392–401.

[14]

Maliheh Izadi. 2022. Catiss: An intelligent tool for categorizing issues reports using transformers. In 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE). 44–47.

Digital Library

[15]

Maliheh Izadi, Kiana Akbari, and Abbas Heydarnoori. 2022. Predicting the objective and priority of issue reports in software repositories. Empirical Software Engineering, 27, 2 (2022), 50.

Digital Library

[16]

Thorsten Joachims. 2005. Text categorization with support vector machines: Learning with many relevant features. In Machine Learning: ECML-98: 10th European Conference on Machine Learning Chemnitz, Germany, April 21–23, 1998 Proceedings. 137–142.

[17]

Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759.

[18]

Rafael Kallis, Oscar Chaparro, Andrea Di Sorbo, and Sebastiano Panichella. 2022. Nlbse’22 tool competition. In 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE). 25–28.

Digital Library

[19]

Rafael Kallis, Andrea Di Sorbo, Gerardo Canfora, and Sebastiano Panichella. 2019. Ticket tagger: Machine learning driven issue classification. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). 406–409.

[20]

Rafael Kallis, Andrea Di Sorbo, Gerardo Canfora, and Sebastiano Panichella. 2021. Predicting issue types on GitHub. Science of Computer Programming, 205 (2021), 102598.

[21]

Kemal Oflazer. 1994. Two-level description of Turkish morphology. Literary and linguistic computing, 9, 2 (1994), 137–148.

[22]

Kemal Oflazer. 2014. Turkish and its challenges for language processing. Language resources and evaluation, 48 (2014), 639–653.

[23]

Ahmed Fawzi Otoom, Sara Al-jdaeh, and Maen Hammad. 2019. Automated classification of software bug reports. In proceedings of the 9th international conference on information communication and management. 17–21.

Digital Library

[24]

Nitish Pandey, Debarshi Kumar Sanyal, Abir Hudait, and Amitava Sen. 2017. Automated classification of software issue reports using machine learning techniques: an empirical study. Innovations in Systems and Software Engineering, 13 (2017), 279–297.

Digital Library

[25]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, and Vincent Dubourg. 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12 (2011), 2825–2830.

Digital Library

[26]

Quentin Perez, Pierre-Antoine Jean, Christelle Urtado, and Sylvain Vauttier. 2021. Bug or not bug? That is the question. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). 47–58.

[27]

Natthakul Pingclasai, Hideaki Hata, and Ken-ichi Matsumoto. 2013. Classifying bug reports to bugs and other requests using topic modeling. In 2013 20Th asia-pacific software engineering conference (APSEC). 2, 13–18.

[28]

Hanmin Qin and Xin Sun. 2018. Classifying bug reports into bugs and non-bugs using LSTM. In Proceedings of the 10th Asia-Pacific Symposium on Internetware. 1–4.

Digital Library

[29]

Hinrich Schütze, Christopher D Manning, and Prabhakar Raghavan. 2008. Introduction to information retrieval. 39, Cambridge University Press Cambridge.

[30]

Mohammed Latif Siddiq and Joanna CS Santos. 2022. Bert-based github issue report classification. In 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE). 33–36.

Digital Library

[31]

Yang Song and Oscar Chaparro. 2020. BEE: a tool for structuring and analyzing bug reports. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1551–1555.

Digital Library

[32]

Pannavat Terdchanakul, Hideaki Hata, Passakorn Phannachitta, and Kenichi Matsumoto. 2017. Bug or not? bug report classification using n-gram idf. In 2017 IEEE international conference on software maintenance and evolution (ICSME). 534–538.

[33]

Alexander Trautsch and Steffen Herbold. 2022. Predicting issue types with sebert. In 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE). 37–39.

Digital Library

[34]

Xiaoyuan Xie, Yuhui Su, Songqiang Chen, Lin Chen, Jifeng Xuan, and Baowen Xu. 2021. MULA: A just-in-time multi-labeling system for issue reports. IEEE Transactions on Reliability, 71, 1 (2021), 250–263.

[35]

Yu Zhou, Yanxiang Tong, Ruihang Gu, and Harald Gall. 2016. Combining text mining and data mining for bug report classification. Journal of Software: Evolution and Process, 28, 3 (2016), 150–176.

Digital Library

Cited By

Toslali MSnible EChen JCha ASingh SKalantar MParthasarathy Sd'Amorim M(2024)AgraBOT: Accelerating Third-Party Security Risk Management in Enterprise Setting through Generative AICompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663829(74-79)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663829
Aktas ECakmak EInan MYilmaz C(2024)Improving the quality of software issue report descriptions in Turkish: An industrial case study at SofttechEmpirical Software Engineering10.1007/s10664-023-10434-429:2Online publication date: 12-Feb-2024
https://dl.acm.org/doi/10.1007/s10664-023-10434-4

Index Terms

Issue Report Validation in an Industrial Context
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation

Recommendations

Inducing multilingual text analysis tools via robust projection across aligned corpora
HLT '01: Proceedings of the first international conference on Human language technology research

This paper describes a system and set of algorithms for automatically inducing stand-alone monolingual part-of-speech taggers, base noun-phrase bracketers, named-entity taggers and morphological analyzers for an arbitrary foreign language. Case studies ...
An automatic non-English sentiment lexicon builder using unannotated corpus

Sentiment lexicons in the English language are widely accessible while in many other languages, these resources are extremely deficient. Current techniques and methods for sentiment analysis focus mainly on the English language, whereas other languages ...
Using frames to disambiguate prepositions

In natural language processing (NLP), disambiguation is the procedure used to solve name conflicts of polysemic concepts (different meanings); in fact, phrase disambiguation is a problem not totally solved in NLP. Several disambiguation types exist; for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

November 2023

2215 pages

ISBN:9798400703270

DOI:10.1145/3611643

General Chair:
Satish Chandra
Google, USA
,
Program Chairs:
Kelly Blincoe
University of Auckland, New Zealand
,
Paolo Tonella
USI Lugano, Switzerland

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESEC/FSE '23

Sponsor:

SIGSOFT

ESEC/FSE '23: 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

December 3 - 9, 2023

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
65
Total Downloads

Downloads (Last 12 months)65
Downloads (Last 6 weeks)2

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Toslali MSnible EChen JCha ASingh SKalantar MParthasarathy Sd'Amorim M(2024)AgraBOT: Accelerating Third-Party Security Risk Management in Enterprise Setting through Generative AICompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663829(74-79)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663829
Aktas ECakmak EInan MYilmaz C(2024)Improving the quality of software issue report descriptions in Turkish: An industrial case study at SofttechEmpirical Software Engineering10.1007/s10664-023-10434-429:2Online publication date: 12-Feb-2024
https://dl.acm.org/doi/10.1007/s10664-023-10434-4

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents