short-paper

Issue report classification using pre-trained language models

Authors:

Giuseppe Colavito,

Filippo Lanubile,

Nicole NovielliAuthors Info & Claims

NLBSE '22: Proceedings of the 1st International Workshop on Natural Language-based Software Engineering

Pages 29 - 32

https://doi.org/10.1145/3528588.3528659

Published: 01 February 2023 Publication History

Abstract

This paper describes our participation in the tool competition organized in the scope of the 1st International Workshop on Natural Language-based Software Engineering. We propose a supervised approach relying on fine-tuned BERT-based language models for the automatic classification of GitHub issues. We experimented with different pre-trained models, achieving the best performance with fine-tuned RoBERTa (F1 = .8591).

References

[1]

Giuliano Antoniol, Kamel Ayari, Massimiliano Di Penta, Foutse Khomh, and Yann-Gaël Guéhéneuc. 2008. Is It a Bug or an Enhancement? A Text-Based Approach to Classify Change Requests. In Proc. of the 2008 Conf. of the Center for Advanced Studies on Collaborative Research: Meeting of Minds (CASCON '08). ACM, New York, NY, USA, Article 23.

Digital Library

[2]

Himanshu Batra, Narinder Singh Punn, Sanjay Kumar Sonbhadra, and Sonali Agarwal. 2021. BERT-Based Sentiment Analysis: A Software Engineering Perspective. Database and Expert Systems Applications (2021), 138--148.

Digital Library

[3]

Eeshita Biswas, Mehmet Efruz Karabulut, Lori Pollock, and K. Vijay-Shanker. 2020. Achieving Reliable Sentiment Analysis in the Software Engineering Domain using BERT. In 2020 IEEE Int'l. Conf. on Software Maintenance and Evolution (ICSME). 162--173.

[4]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proc. of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, Minneapolis, Minnesota, 4171--4186.

[5]

Colavito Giuseppe, Lanubile Filippo, and Novielli Nicole. 2022. Issue-Report-Classification-Using-RoBERTa. https://github.com/collab-uniba/Tssue-Report-Classification-Using-RoBERTa

[6]

Akbari K. Heydarnoori A. Izadi, M. 2022. Predicting the objective and priority of issue reports in software repositories. Empir Software Eng 27 (2022).

Digital Library

[7]

Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2017. Bag of Tricks for Efficient Text Classification. In Proc. of the 15th Conf. of the European Chapter of the Association for Computational Linguistics. ACL, Valencia, Spain, 427--431. https://aclanthology.org/E17-2068

[8]

Rafael Kallis, Oscar Chaparro, Andrea Di Sorbo, and Sebastiano Panichella. 2022. NLBSE'22 Tool Competition. In Proceedings of The 1st International Workshop on Natural Language-based Software Engineering (NLBSE'22).

Digital Library

[9]

Rafael Kallis, Andrea Di Sorbo, Gerardo Canfora, and Sebastiano Panichella. 2019. Ticket Tagger: Machine Learning Driven Issue Classification. In 2019 IEEE Int'l. Conf on Software Maintenance and Evolution (ICSME). 406--409.

[10]

Rafael Kallis, Andrea Di Sorbo, Gerardo Canfora, and Sebastiano Panichella. 2021. Predicting issue types on GitHub. Science of Computer Programming 205 (2021), 102598.

[11]

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. arXiv:1909.11942 [cs.CL]

[12]

Omer Levy and Yoav Goldberg. 2014. Neural Word Embedding as Implicit Matrix Factorization. In Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger (Eds.), Vol. 27. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2014/file/feab05aa91085b7a8012516bc3533958-Paper.pdf

[13]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692 [cs.CL]

[14]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (Lake Tahoe, Nevada) (NIPS'13). Curran Associates Inc., Red Hook, NY, USA, 3111--3119.

Digital Library

[15]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing(EMNLP). 1532--1543.

[16]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108 (2019). arXiv:1910.01108 http://arxiv.org/abs/1910.01108

[17]

Jun Wang, Xiaofang Zhang, and Lin Chen. 2021. How well do pre-trained contextual language representations recommend labels for GitHub issues? Knowledge-Based Systems 232 (2021), 107476.

Digital Library

Cited By

Rejithkumar GAnish PGhaisas SIzadi MDi Sorbo APanichella S(2024)Text-To-Text Generation for Issue Report ClassificationProceedings of the Third ACM/IEEE International Workshop on NL-based Software Engineering10.1145/3643787.3648042(53-56)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3643787.3648042
Yokoyama DNishiura KMonden A(2024)Identifying Security Bugs in Issue Reports: Comparison of BERT, N-gram IDF and ChatGPT2024 IEEE/ACIS 22nd International Conference on Software Engineering Research, Management and Applications (SERA)10.1109/SERA61261.2024.10685583(328-333)Online publication date: 30-May-2024
https://doi.org/10.1109/SERA61261.2024.10685583
Colavito GLanubile FNovielli NQuaranta L(2024)Impact of data quality for automatic issue classification using pre-trained language modelsJournal of Systems and Software10.1016/j.jss.2023.111838210:COnline publication date: 25-Jun-2024
https://dl.acm.org/doi/10.1016/j.jss.2023.111838
Show More Cited By

Index Terms

Issue report classification using pre-trained language models
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
2. Software and its engineering
  1. Software creation and management
    1. Software post-development issues

Recommendations

Leveraging GPT-like LLMs to Automate Issue Labeling
MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories

Issue labeling is a crucial task for the effective management of software projects. To date, several approaches have been put forth for the automatic assignment of labels to issue reports. In particular, supervised approaches based on the fine-tuning of ...
A Comparative Study of Using Pre-trained Language Models for Toxic Comment Classification
WWW '21: Companion Proceedings of the Web Conference 2021

As user-generated contents thrive, so does the spread of toxic comment. Therefore, detecting toxic comment becomes an active research area, and it is often handled as a text classification task. As recent popular methods for text classification tasks, ...
Better Few-Shot Text Classification with Pre-trained Language Model
Artificial Neural Networks and Machine Learning – ICANN 2021
Abstract
Recently, pre-trained language models achieve extraordinary performance on numerous benchmarks. By learning the general language knowledge from a large pre-train corpus, the language models could fit for a specific downstream task with a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

NLBSE '22: Proceedings of the 1st International Workshop on Natural Language-based Software Engineering

May 2022

87 pages

ISBN:9781450393430

DOI:10.1145/3528588

Conference Chairs:
Andrea Di Sorbo
University of Sannio, Benevento, Italy
,
Sebastiano Panichella
Zurich University of Applied Sciences, Zurich, Switzerland

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 February 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

MIUR

Conference

ICSE '22

Sponsor:

SIGSOFT

ICSE '22: 44th International Conference on Software Engineering

May 21, 2022

Pennsylvania, Pittsburgh

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
107
Total Downloads

Downloads (Last 12 months)57
Downloads (Last 6 weeks)8

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Rejithkumar GAnish PGhaisas SIzadi MDi Sorbo APanichella S(2024)Text-To-Text Generation for Issue Report ClassificationProceedings of the Third ACM/IEEE International Workshop on NL-based Software Engineering10.1145/3643787.3648042(53-56)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3643787.3648042
Yokoyama DNishiura KMonden A(2024)Identifying Security Bugs in Issue Reports: Comparison of BERT, N-gram IDF and ChatGPT2024 IEEE/ACIS 22nd International Conference on Software Engineering Research, Management and Applications (SERA)10.1109/SERA61261.2024.10685583(328-333)Online publication date: 30-May-2024
https://doi.org/10.1109/SERA61261.2024.10685583
Colavito GLanubile FNovielli NQuaranta L(2024)Impact of data quality for automatic issue classification using pre-trained language modelsJournal of Systems and Software10.1016/j.jss.2023.111838210:COnline publication date: 25-Jun-2024
https://dl.acm.org/doi/10.1016/j.jss.2023.111838
Tsigkanos CRani PMüller SKehrer T(2023)Large Language Models: The Next Frontier for Variable Discovery within Metamorphic Testing?2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00070(678-682)Online publication date: Mar-2023
https://doi.org/10.1109/SANER56733.2023.00070
Colavito GLanubile FNovielli N(2023)Few-Shot Learning for Issue Report Classification2023 IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE)10.1109/NLBSE59153.2023.00011(16-19)Online publication date: May-2023
https://doi.org/10.1109/NLBSE59153.2023.00011
Laiq M(2023)An Intelligent Tool for Classifying Issue Reports2023 IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE)10.1109/NLBSE59153.2023.00010(13-15)Online publication date: May-2023
https://doi.org/10.1109/NLBSE59153.2023.00010
Kallis RIzadi MPascarella LChaparro ORani P(2023)The NLBSE'23 Tool Competition2023 IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE)10.1109/NLBSE59153.2023.00007(1-8)Online publication date: May-2023
https://doi.org/10.1109/NLBSE59153.2023.00007
Alhindi WAleid AJenhani IMkaouer M(2023)Issue-Labeler: an ALBERT-based Jira Plugin for Issue Classification2023 IEEE/ACM 10th International Conference on Mobile Software Engineering and Systems (MOBILESoft)10.1109/MOBILSoft59058.2023.00012(40-43)Online publication date: May-2023
https://doi.org/10.1109/MOBILSoft59058.2023.00012

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents