Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3528588.3528659acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
short-paper

Issue report classification using pre-trained language models

Published: 01 February 2023 Publication History

Abstract

This paper describes our participation in the tool competition organized in the scope of the 1st International Workshop on Natural Language-based Software Engineering. We propose a supervised approach relying on fine-tuned BERT-based language models for the automatic classification of GitHub issues. We experimented with different pre-trained models, achieving the best performance with fine-tuned RoBERTa (F1 = .8591).

References

[1]
Giuliano Antoniol, Kamel Ayari, Massimiliano Di Penta, Foutse Khomh, and Yann-Gaël Guéhéneuc. 2008. Is It a Bug or an Enhancement? A Text-Based Approach to Classify Change Requests. In Proc. of the 2008 Conf. of the Center for Advanced Studies on Collaborative Research: Meeting of Minds (CASCON '08). ACM, New York, NY, USA, Article 23.
[2]
Himanshu Batra, Narinder Singh Punn, Sanjay Kumar Sonbhadra, and Sonali Agarwal. 2021. BERT-Based Sentiment Analysis: A Software Engineering Perspective. Database and Expert Systems Applications (2021), 138--148.
[3]
Eeshita Biswas, Mehmet Efruz Karabulut, Lori Pollock, and K. Vijay-Shanker. 2020. Achieving Reliable Sentiment Analysis in the Software Engineering Domain using BERT. In 2020 IEEE Int'l. Conf. on Software Maintenance and Evolution (ICSME). 162--173.
[4]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proc. of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, Minneapolis, Minnesota, 4171--4186.
[5]
Colavito Giuseppe, Lanubile Filippo, and Novielli Nicole. 2022. Issue-Report-Classification-Using-RoBERTa. https://github.com/collab-uniba/Tssue-Report-Classification-Using-RoBERTa
[6]
Akbari K. Heydarnoori A. Izadi, M. 2022. Predicting the objective and priority of issue reports in software repositories. Empir Software Eng 27 (2022).
[7]
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2017. Bag of Tricks for Efficient Text Classification. In Proc. of the 15th Conf. of the European Chapter of the Association for Computational Linguistics. ACL, Valencia, Spain, 427--431. https://aclanthology.org/E17-2068
[8]
Rafael Kallis, Oscar Chaparro, Andrea Di Sorbo, and Sebastiano Panichella. 2022. NLBSE'22 Tool Competition. In Proceedings of The 1st International Workshop on Natural Language-based Software Engineering (NLBSE'22).
[9]
Rafael Kallis, Andrea Di Sorbo, Gerardo Canfora, and Sebastiano Panichella. 2019. Ticket Tagger: Machine Learning Driven Issue Classification. In 2019 IEEE Int'l. Conf on Software Maintenance and Evolution (ICSME). 406--409.
[10]
Rafael Kallis, Andrea Di Sorbo, Gerardo Canfora, and Sebastiano Panichella. 2021. Predicting issue types on GitHub. Science of Computer Programming 205 (2021), 102598.
[11]
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. arXiv:1909.11942 [cs.CL]
[12]
Omer Levy and Yoav Goldberg. 2014. Neural Word Embedding as Implicit Matrix Factorization. In Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger (Eds.), Vol. 27. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2014/file/feab05aa91085b7a8012516bc3533958-Paper.pdf
[13]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692 [cs.CL]
[14]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (Lake Tahoe, Nevada) (NIPS'13). Curran Associates Inc., Red Hook, NY, USA, 3111--3119.
[15]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing(EMNLP). 1532--1543.
[16]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108 (2019). arXiv:1910.01108 http://arxiv.org/abs/1910.01108
[17]
Jun Wang, Xiaofang Zhang, and Lin Chen. 2021. How well do pre-trained contextual language representations recommend labels for GitHub issues? Knowledge-Based Systems 232 (2021), 107476.

Cited By

View all
  • (2024)Text-To-Text Generation for Issue Report ClassificationProceedings of the Third ACM/IEEE International Workshop on NL-based Software Engineering10.1145/3643787.3648042(53-56)Online publication date: 20-Apr-2024
  • (2024)Identifying Security Bugs in Issue Reports: Comparison of BERT, N-gram IDF and ChatGPT2024 IEEE/ACIS 22nd International Conference on Software Engineering Research, Management and Applications (SERA)10.1109/SERA61261.2024.10685583(328-333)Online publication date: 30-May-2024
  • (2024)Impact of data quality for automatic issue classification using pre-trained language modelsJournal of Systems and Software10.1016/j.jss.2023.111838210:COnline publication date: 25-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
NLBSE '22: Proceedings of the 1st International Workshop on Natural Language-based Software Engineering
May 2022
87 pages
ISBN:9781450393430
DOI:10.1145/3528588
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 February 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. BERT
  2. deep learning
  3. issue classification
  4. labeling unstructured data
  5. software maintenance and evolution

Qualifiers

  • Short-paper

Funding Sources

  • MIUR

Conference

ICSE '22
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)57
  • Downloads (Last 6 weeks)8
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Text-To-Text Generation for Issue Report ClassificationProceedings of the Third ACM/IEEE International Workshop on NL-based Software Engineering10.1145/3643787.3648042(53-56)Online publication date: 20-Apr-2024
  • (2024)Identifying Security Bugs in Issue Reports: Comparison of BERT, N-gram IDF and ChatGPT2024 IEEE/ACIS 22nd International Conference on Software Engineering Research, Management and Applications (SERA)10.1109/SERA61261.2024.10685583(328-333)Online publication date: 30-May-2024
  • (2024)Impact of data quality for automatic issue classification using pre-trained language modelsJournal of Systems and Software10.1016/j.jss.2023.111838210:COnline publication date: 25-Jun-2024
  • (2023)Large Language Models: The Next Frontier for Variable Discovery within Metamorphic Testing?2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00070(678-682)Online publication date: Mar-2023
  • (2023)Few-Shot Learning for Issue Report Classification2023 IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE)10.1109/NLBSE59153.2023.00011(16-19)Online publication date: May-2023
  • (2023)An Intelligent Tool for Classifying Issue Reports2023 IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE)10.1109/NLBSE59153.2023.00010(13-15)Online publication date: May-2023
  • (2023)The NLBSE'23 Tool Competition2023 IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE)10.1109/NLBSE59153.2023.00007(1-8)Online publication date: May-2023
  • (2023)Issue-Labeler: an ALBERT-based Jira Plugin for Issue Classification2023 IEEE/ACM 10th International Conference on Mobile Software Engineering and Systems (MOBILESoft)10.1109/MOBILSoft59058.2023.00012(40-43)Online publication date: May-2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media