research-article

KADEL: Knowledge-Aware Denoising Learning for Commit Message Generation

Authors:

Wenqiang ZhangAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology, Volume 33, Issue 5

Article No.: 133, Pages 1 - 32

https://doi.org/10.1145/3643675

Published: 04 June 2024 Publication History

Abstract

Commit messages are natural language descriptions of code changes, which are important for software evolution such as code understanding and maintenance. However, previous methods are trained on the entire dataset without considering the fact that a portion of commit messages adhere to good practice (i.e., good-practice commits), while the rest do not. On the basis of our empirical study, we discover that training on good-practice commits significantly contributes to the commit message generation. Motivated by this finding, we propose a novel knowledge-aware denoising learning method called KADEL. Considering that good-practice commits constitute only a small proportion of the dataset, we align the remaining training samples with these good-practice commits. To achieve this, we propose a model that learns the commit knowledge by training on good-practice commits. This knowledge model enables supplementing more information for training samples that do not conform to good practice. However, since the supplementary information may contain noise or prediction errors, we propose a dynamic denoising training method. This method composes a distribution-aware confidence function and a dynamic distribution list, which enhances the effectiveness of the training process. Experimental results on the whole MCMD dataset demonstrate that our method overall achieves state-of-the-art performance compared with previous methods.

References

[1]

Eric Arazo, Diego Ortego, Paul Albert, Noel E. O’Connor, and Kevin McGuinness. 2019. Unsupervised label noise modeling and loss correction. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). Proceedings of Machine Learning Research, Vol. 97, PMLR, 312–321. Retrieved from http://proceedings.mlr.press/v97/arazo19a.html

[2]

Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization@ACL 2005, Jade Goldstein, Alon Lavie, Chin-Yew Lin, and Clare R. Voss (Eds.). Association for Computational Linguistics, 65–72. Retrieved from https://aclanthology.org/W05-0909/

[3]

Jacob G. Barnett, Charles K. Gathuru, Luke S. Soldano, and Shane McIntosh. 2016. The relationship between commit message detail and defect proneness in Java projects on GitHub. In Proceedings of the 13th International Conference on Mining Software Repositories, MSR 2016, Miryung Kim, Romain Robbes, and Christian Bird (Eds.). ACM, 496–499. DOI:

Digital Library

[4]

Antoine Bosselut, Hannah Rashkin, Maarten Sap, Chaitanya Malaviya, Asli Celikyilmaz, and Yejin Choi. 2019. COMET: Commonsense transformers for automatic knowledge graph construction. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Volume 1: Long Papers, Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, 4762–4779. DOI:

[5]

Bram Bulté and Arda Tezcan. 2019. Neural fuzzy repair: Integrating fuzzy matches into neural machine translation. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Volume 1: Long Papers, Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, 1800–1809. DOI:

[6]

Raymond P. L. Buse and Westley Weimer. 2010. Automatically documenting program changes. In Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering, Charles Pecheur, Jamie Andrews, and Elisabetta Di Nitto (Eds.). ACM, 33–42. DOI:

Digital Library

[7]

Casey Casalnuovo, Yagnik Suchak, Baishakhi Ray, and Cindy Rubio-González. 2017. GitcProc: A tool for processing and classifying GitHub commits. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, Tevfik Bultan and Koushik Sen (Eds.). ACM, 396–399. DOI:

Digital Library

[8]

Shuang Chen, Jinpeng Wang, Xiaocheng Feng, Feng Jiang, Bing Qin, and Chin-Yew Lin. 2019. Enhancing neural data-to-text generation models with external background knowledge. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19), Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 3020–3030. DOI:

[9]

Luis Fernando Cortes-Coy, Mario Linares Vásquez, Jairo Aponte, and Denys Poshyvanyk. 2014. On automatically generating commit messages via summarization of source code changes. In Proceedings of the 14th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2014. IEEE Computer Society, 275–284. DOI:

Digital Library

[10]

Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc.: Ser. B. (Methodol.) 39, 1 (1977), 1–22.

[11]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the NAACL-HLT (1). Association for Computational Linguistics, 4171–4186.

[12]

Jinhao Dong, Yiling Lou, Qihao Zhu, Zeyu Sun, Zhilin Li, Wenjie Zhang, and Dan Hao. 2022. FIRA: Fine-grained graph-based code change representation for automated commit message generation. In Proceedings of the 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022. ACM, 970–981. DOI:

Digital Library

[13]

Aleksandra Eliseeva, Yaroslav Sokolov, Egor Bogomolov, Yaroslav Golubev, Danny Dig, and Timofey Bryksin. 2023. From commit message generation to history-aware commit message completion. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023. IEEE, 723–735. DOI:

[14]

Mingyang Geng, Shangwen Wang, Dezun Dong, Haotian Wang, Ge Li, Zhi Jin, Xiaoguang Mao, and Xiangke Liao. 2024. Large language models are few-shot summarizers: Multi-intent comment generation via in-context learning. In Proceedings of the International Conference on Software Engineering. ACM. DOI:

Digital Library

[15]

Bo Han, Quanming Yao, Tongliang Liu, Gang Niu, Ivor W. Tsang, James T. Kwok, and Masashi Sugiyama. 2020. A survey of label-noise representation learning: Past, present and future. arXiv:2011.04406. Retrieved from https://arxiv.org/abs/2011.04406

[16]

Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor W. Tsang, and Masashi Sugiyama. 2018. Co-teaching: Robust training of deep neural networks with extremely noisy labels. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018 (NeurIPS’18), Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.). NeurIPS Foundation, 8536–8546. Retrieved from https://proceedings.neurips.cc/paper/2018/hash/a19744e268754fb0148b017647355b7b-Abstract.html

[17]

Yichen He, Liran Wang, Kaiyi Wang, Yupeng Zhang, Hang Zhang, and Zhoujun Li. 2023. COME: Commit message generation with modification embedding. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 792–803.

Digital Library

[18]

Dan Hendrycks, Mantas Mazeika, Duncan Wilson, and Kevin Gimpel. 2018. Using trusted data to train deep networks on labels corrupted by severe noise. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018 (NeurIPS’18), Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.). NeurIPS Foundation, 10477–10486. Retrieved from https://proceedings.neurips.cc/paper/2018/hash/ad554d8c3b06d6b97ee76a2448bd7913-Abstract.html

[19]

Abram Hindle, Daniel M. Germán, Michael W. Godfrey, and Richard C. Holt. 2009. Automatic classication of large changes into maintenance categories. In Proceedings of the 17th IEEE International Conference on Program Comprehension, ICPC 2009. IEEE Computer Society, 30–39. DOI:

[20]

Xing Hu, Qiuyuan Chen, Haoye Wang, Xin Xia, David Lo, and Thomas Zimmermann. 2022. Correlating automated and human evaluation of code documentation generation quality. ACM Trans. Softw. Eng. Methodol. 31, 4 (2022), 63:1–63:28. DOI:

Digital Library

[21]

Yuan Huang, Nan Jia, Hao-Jie Zhou, Xiangping Chen, Zibin Zheng, and Mingdong Tang. 2020. Learning human-written commit messages to document code changes. J. Comput. Sci. Technol. 35, 6 (2020), 1258–1277. DOI:

Digital Library

[22]

Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. 2018. MentorNet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In Proceedings of the 35th International Conference on Machine Learning (ICML’18), Jennifer G. Dy and Andreas Krause (Eds.). Proceedings of Machine Learning Research, Vol. 80, PMLR, 2309–2318. Retrieved from http://proceedings.mlr.press/v80/jiang18c.html

[23]

Shuyao Jiang. 2019. Boosting neural commit message generation with code semantic analysis. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019. IEEE, 1280–1282. DOI:

Digital Library

[24]

Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically generating commit messages from diffs using neural machine translation. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE’17), Grigore Rosu, Massimiliano Di Penta, and Tien N. Nguyen (Eds.). IEEE Computer Society, 135–146. DOI:

[25]

Maurice G. Kendall. 1945. The treatment of ties in ranking problems. Biometrika 33, 3 (1945), 239–251.

[26]

Patrick S. H. Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). NeurIPS Foundation, Retrieved from https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html

[27]

Jiawei Li and Iftekhar Ahmed. 2023. Commit message matters: Investigating impact and evolution of commit message quality. In Proceedings of the 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023. IEEE, 806–817. DOI:

Digital Library

[28]

Lin Chin-Yew. 2004. ROUGE: A package for automatic evaluation of summaries. In Proceedings of the Text Summarization Branches Out. ACL.

[29]

Qin Liu, Zihe Liu, Hongming Zhu, Hongfei Fan, Bowen Du, and Yu Qian. 2019. Generating commit messages from diffs using pointer-generator network. In Proceedings of the 16th International Conference on Mining Software Repositories, MSR 2019, Margaret-Anne D. Storey, Bram Adams, and Sonia Haiduc (Eds.). IEEE / ACM, 299–309. DOI:

Digital Library

[30]

Shangqing Liu, Cuiyun Gao, Sen Chen, Lun Yiu Nie, and Yang Liu. 2022. ATOM: Commit message generation based on abstract syntax tree and hybrid ranking. IEEE Trans. Software Eng. 48, 5 (2022), 1800–1817. DOI:

Digital Library

[31]

Tongliang Liu and Dacheng Tao. 2016. Classification with noisy labels by importance reweighting. IEEE Trans. Pattern Anal. Mach. Intell. 38, 3 (2016), 447–461. DOI:

Digital Library

[32]

Zhongxin Liu, Xin Xia, Ahmed E. Hassan, David Lo, Zhenchang Xing, and Xinyu Wang. 2018. Neural-machine-translation-based commit message generation: How far are we? In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Marianne Huchard, Christian Kästner, and Gordon Fraser (Eds.). ACM, 373–384. DOI:

Digital Library

[33]

Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019. OpenReview.net. Retrieved from https://openreview.net/forum?id=Bkg6RiCqY7

[34]

Pablo Loyola, Edison Marrese-Taylor, Jorge A. Balazs, Yutaka Matsuo, and Fumiko Satoh. 2018. Content aware source code change description generation. In Proceedings of the 11th International Conference on Natural Language Generation, Emiel Krahmer, Albert Gatt, and Martijn Goudbeek (Eds.). Association for Computational Linguistics, 119–128. DOI:

[35]

Pablo Loyola, Edison Marrese-Taylor, and Yutaka Matsuo. 2017. A neural architecture for generating natural language descriptions from source code changes. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Volume 2: Short Papers, Regina Barzilay and Min-Yen Kan (Eds.). Association for Computational Linguistics, 287–292. DOI:

[36]

Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin B. Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, and Shujie Liu. 2021. CodeXGLUE: A machine learning benchmark dataset for code understanding and generation. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, Joaquin Vanschoren and Sai-Kit Yeung (Eds.). NeurIPS Foundation, Retrieved from https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/c16a5320fa475530d9583c34fd356ef5-Abstract-round1.html

[37]

Shangwen Lv, Fuqing Zhu, and Songlin Hu. 2020. Integrating external event knowledge for script learning. In Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Donia Scott, Núria Bel, and Chengqing Zong (Eds.). International Committee on Computational Linguistics, 306–315. DOI:

[38]

Wei Ma, Shangqing Liu, Wenhan Wang, Qiang Hu, Ye Liu, Cen Zhang, Liming Nie, and Yang Liu. 2023. The scope of ChatGPT in software engineering: A thorough investigation. arXiv:2305.12138. Retrieved from https://arxiv.org/abs/2305.12138

[39]

Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii. 2019. Virtual adversarial training: A regularization method for supervised and semi-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 41, 8 (2019), 1979–1993. DOI:

[40]

Laura Moreno, Jairo Aponte, Giriprasad Sridhara, Andrian Marcus, Lori L. Pollock, and K. Vijay-Shanker. 2013. Automatic generation of natural language summaries for Java classes. In Proceedings of the IEEE 21st International Conference on Program Comprehension, ICPC 2013. IEEE Computer Society, 23–32. DOI:

[41]

Lun Yiu Nie, Cuiyun Gao, Zhicong Zhong, Wai Lam, Yang Liu, and Zenglin Xu. 2021. CoreGen: Contextualized code representation learning for commit message generation. Neurocomputing 459 (2021), 97–107. DOI:

Digital Library

[42]

OpenAI. 2022. Introducing ChatGPT. Technical Report. OpenAI. [Online]. Retrieved from https://openai.com/blog/chatgpt

[43]

Sebastiano Panichella, Annibale Panichella, Moritz Beller, Andy Zaidman, and Harald C. Gall. 2016. The impact of test case summaries on bug fixing performance: An empirical investigation. In Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Laura K. Dillon, Willem Visser, and Laurie A. Williams (Eds.). ACM, 547–558. DOI:

Digital Library

[44]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. ACL, 311–318. DOI:

Digital Library

[45]

Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu. 2017. Making deep neural networks robust to label noise: A loss correction approach. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. IEEE Computer Society, 2233–2241. DOI:

[46]

Alec Radford, Narasimhan Karthik, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. (2018).

[47]

Soumaya Rebai, Marouane Kessentini, Vahid Alizadeh, Oussama Ben Sghaier, and Rick Kazman. 2020. Recommending refactorings via commit message analysis. Inf. Softw. Technol. 126 (2020), 106332. DOI:

[48]

Maarten Sap, Ronan Le Bras, Emily Allaway, Chandra Bhagavatula, Nicholas Lourie, Hannah Rashkin, Brendan Roof, Noah A. Smith, and Yejin Choi. 2019. ATOMIC: An atlas of machine commonsense for if-then reasoning. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, AAAI 2019. AAAI Press, 3027–3035. DOI:

Digital Library

[49]

Yutaka Sasaki. 2007. The truth of the F-measure. Teach Tutor Mater 1, 5 (2007), 1–5.

[50]

Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Volume 1: Long Papers, Regina Barzilay and Min-Yen Kan (Eds.). Association for Computational Linguistics, 1073–1083. DOI:

[51]

Ensheng Shi, Yanlin Wang, Wei Tao, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, and Hongbin Sun. 2022. RACE: Retrieval-augmented commit message generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, 5520–5530. Retrieved from https://aclanthology.org/2022.emnlp-main.372

[52]

Jessica Shieh. 2023. Best practices for prompt engineering with OpenAI API. OpenAI, February. Retrieved Nov 10, 2023 from https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api

[53]

Wei Tao, Yanlin Wang, Ensheng Shi, Lun Du, Shi Han, Hongyu Zhang, Dongmei Zhang, and Wenqiang Zhang. 2021. On the evaluation of commit message generation models: An experimental study. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution, ICSME 2021. IEEE, 126–136. DOI:

[54]

Wei Tao, Yanlin Wang, Ensheng Shi, Lun Du, Shi Han, Hongyu Zhang, Dongmei Zhang, and Wenqiang Zhang. 2022. A large-scale empirical study of commit message generation: Models, datasets and evaluation. Empir. Softw. Eng. 27, 7 (2022), 198. DOI:

Digital Library

[55]

Yingchen Tian, Yuxia Zhang, Klaas-Jan Stol, Lin Jiang, and Hui Liu. 2022. What makes a good commit message? In Proceedings of the 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022. ACM, 2389–2401. DOI:

Digital Library

[56]

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and efficient foundation language models. arXiv:2302.13971. Retrieved from https://arxiv.org/abs/2302.13971

[57]

Chris van der Lee, Albert Gatt, Emiel van Miltenburg, Sander Wubben, and Emiel Krahmer. 2019. Best practices for the human evaluation of automatically generated text. In Proceedings of the 12th International Conference on Natural Language Generation, INLG 2019, Kees van Deemter, Chenghua Lin, and Hiroya Takamura (Eds.). Association for Computational Linguistics, 355–368. DOI:

[58]

Mario Linares Vásquez, Luis Fernando Cortes-Coy, Jairo Aponte, and Denys Poshyvanyk. 2015. ChangeScribe: A tool for automatically generating commit messages. In Proceedings of the 37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Antonia Bertolino, Gerardo Canfora, and Sebastian G. Elbaum (Eds.). Vol. 2, IEEE Computer Society, 709–712. DOI:

[59]

Jesse Vig. 2019. A multiscale visualization of attention in the transformer model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Florence, Italy, 37–42. DOI:

[60]

Chenglin Wang, Yucheng Zhou, Guodong Long, Xiaodong Wang, and Xiaowei Xu. 2022. Unsupervised knowledge graph construction and event-centric knowledge infusion for scientific NLI. arXiv:2210.15248. Retrieved from https://arxiv.org/abs/2210.15248

[61]

Haoye Wang, Xin Xia, David Lo, Qiang He, Xinyu Wang, and John Grundy. 2021. Context-aware retrieval-based deep commit message generation. ACM Trans. Softw. Eng. Methodol. 30, 4 (2021), 56:1–56:30. DOI:

Digital Library

[62]

Yue Wang, Weishi Wang, Shafiq R. Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 8696–8708. DOI:

[63]

Hongxin Wei, Lei Feng, Xiangyu Chen, and Bo An. 2020. Combating noisy labels by agreement: A joint training method with co-regularization. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020. Computer Vision Foundation / IEEE, 13723–13732. DOI:

[64]

Shengbin Xu, Yuan Yao, Feng Xu, Tianxiao Gu, Hanghang Tong, and Jian Lu. 2019. Commit message generation for source code changes. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019, Sarit Kraus (Ed.). ijcai.org, 3975–3981. DOI:

[65]

Xin Ye, Yongjie Zheng, Wajdi Aljedaani, and Mohamed Wiem Mkaouer. 2021. Recommending pull request reviewers based on code changes. Soft Comput. 25, 7 (2021), 5619–5632. DOI:

Digital Library

[66]

Zibin Zheng, Kaiwen Ning, Jiachi Chen, Yanlin Wang, Wenqing Chen, Lianghong Guo, and Weicheng Wang. 2023. Towards an understanding of large language models in software engineering tasks. arXiv:2308.11396. Retrieved from https://arxiv.org/abs/2308.11396

[67]

Zibin Zheng, Kaiwen Ning, Yanlin Wang, Jingwen Zhang, Dewu Zheng, Mingxi Ye, and Jiachi Chen. 2023. A survey of large language models for code: Evolution, benchmarking, and future trends. arXiv:2311.10372. Retrieved from https://arxiv.org/abs/2311.10372

[68]

Yucheng Zhou, Xiubo Geng, Tao Shen, Jian Pei, Wenqiang Zhang, and Daxin Jiang. 2021. Modeling event-pair relations in external knowledge graphs for script reasoning. In Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Vol. ACL/IJCNLP 2021, Association for Computational Linguistics, 4586–4596. DOI:

[69]

Yucheng Zhou, Tao Shen, Xiubo Geng, Chongyang Tao, Can Xu, Guodong Long, Binxing Jiao, and Daxin Jiang. 2023. Towards robust ranker for text retrieval. In Findings of the Association for Computational Linguistics: ACL 2023, Anna Rogers, Jordan L. Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, 5387–5401. DOI:

Cited By

Vu TBui TDo TNguyen TVo HNguyen S(2025)Automated description generation for software patchesInformation and Software Technology10.1016/j.infsof.2024.107543177(107543)Online publication date: Jan-2025
https://doi.org/10.1016/j.infsof.2024.107543
Du YLi YMa YLi M(2025)Capturing the context-aware code change via dynamic control flow graph for commit message generationMachine Learning10.1007/s10994-024-06671-3114:4Online publication date: 19-Feb-2025
https://doi.org/10.1007/s10994-024-06671-3

Index Terms

KADEL: Knowledge-Aware Denoising Learning for Commit Message Generation
1. Software and its engineering
  1. Software notations and tools
    1. Software configuration management and version control systems

Recommendations

What makes a good commit message?
ICSE '22: Proceedings of the 44th International Conference on Software Engineering

A key issue in collaborative software development is communication among developers. One modality of communication is a commit message, in which developers describe the changes they make in a repository. As such, commit messages serve as an "audit trail" ...
Revisiting Learning-Based Commit Message Generation
ICSE '23: Proceedings of the 45th International Conference on Software Engineering

Commit messages summarize code changes and help developers understand the intention. To alleviate human efforts in writing commit messages, researchers have proposed various automated commit message generation techniques, among which learning-based ...
Delving into Commit-Issue Correlation to Enhance Commit Message Generation Models
ASE '23: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering

Commit message generation (CMG) is a challenging task in automated software engineering that aims to generate natural language descriptions of code changes for commits. Previous methods all start from the modified code snippets, outputting commit ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology

ACM Transactions on Software Engineering and Methodology Volume 33, Issue 5

June 2024

952 pages

EISSN:1557-7392

DOI:10.1145/3618079

Editor:
Mauro Pezzè
USI Università della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2024

Online AM: 29 January 2024

Accepted: 15 January 2024

Revised: 29 November 2023

Received: 20 July 2023

Published in TOSEM Volume 33, Issue 5

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Scientific and Technological innovation action plan of Shanghai Science and Technology Committee

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
599
Total Downloads

Downloads (Last 12 months)497
Downloads (Last 6 weeks)38

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Vu TBui TDo TNguyen TVo HNguyen S(2025)Automated description generation for software patchesInformation and Software Technology10.1016/j.infsof.2024.107543177(107543)Online publication date: Jan-2025
https://doi.org/10.1016/j.infsof.2024.107543
Du YLi YMa YLi M(2025)Capturing the context-aware code change via dynamic control flow graph for commit message generationMachine Learning10.1007/s10994-024-06671-3114:4Online publication date: 19-Feb-2025
https://doi.org/10.1007/s10994-024-06671-3

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents