Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Deep Is Better? An Empirical Comparison of Information Retrieval and Deep Learning Approaches to Code Summarization

Published: 15 March 2024 Publication History

Abstract

Code summarization aims to generate short functional descriptions for source code to facilitate code comprehension. While Information Retrieval (IR) approaches that leverage similar code snippets and corresponding summaries have led the early research, Deep Learning (DL) approaches that use neural models to capture statistical properties between code and summaries are now mainstream. Although some preliminary studies suggest that IR approaches are more effective in some cases, it is currently unclear how effective the existing approaches can be in general, where and why IR/DL approaches perform better, and whether the integration of IR and DL can achieve better performance. Consequently, there is an urgent need for a comprehensive study of the IR and DL code summarization approaches to provide guidance for future development in this area. This article presents the first large-scale empirical study of 18 IR, DL, and hybrid code summarization approaches on five benchmark datasets. We extensively compare different types of approaches using automatic metrics, we conduct quantitative and qualitative analyses of where and why IR and DL approaches perform better, respectively, and we also study hybrid approaches for assessing the effectiveness of integrating IR and DL. The study shows that the performance of IR approaches should not be underestimated, that while DL models perform better in predicting tokens from method signatures and capturing structural similarities in code, simple IR approaches tend to perform better in the presence of code with high similarity or long reference summaries, and that existing hybrid approaches do not perform as well as individual approaches in their respective areas of strength. Based on our findings, we discuss future research directions for better code summarization.

References

[1]
Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2020. A transformer-based approach for source code summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20). 4998–5007.
[2]
Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021. Unified pre-training for program understanding and generation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’21). 2655–2668.
[3]
Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Muñoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy-Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, and Leandro von Werra. 2023. SantaCoder: Don’t reach for the stars! arXiv preprint arXiv:2301.03988 (2023).
[4]
Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 281–293.
[5]
Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Comput. Surv. 51, 4 (2018), 1–37.
[6]
Miltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A convolutional attention network for extreme summarization of source code. In Proceedings of the International Conference on Machine Learning. PMLR, 2091–2100.
[7]
Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2019. code2seq: Generating sequences from structured representations of code. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). OpenReview.net.
[8]
Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 65–72.
[9]
Antonio Valerio Miceli Barone and Rico Sennrich. 2017. A parallel corpus of Python functions and documentation strings for automated code documentation and code generation. In Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP’17). Asian Federation of Natural Language Processing, 314–319.
[10]
Mohammad Bavarian, Heewoo Jun, Nikolas Tezak, John Schulman, Christine McLeavey, Jerry Tworek, and Mark Chen. 2022. Efficient training of language models to fill in the middle. arXiv preprint arXiv:2207.14255 (2022).
[11]
Egor Bogomolov, Sergey Zhuravlev, Egor Spirin, and Timofey Bryksin. 2022. Assessing project-level fine-tuning of ML4SE models. arXiv preprint arXiv:2206.03333 (2022).
[12]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pondé de Oliveira Pinto, Jared Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Joshua Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
[13]
Qiuyuan Chen, Xin Xia, Han Hu, David Lo, and Shanping Li. 2021. Why my code summarization model does not work: Code comment improvement with category prediction. ACM Trans. Softw. Eng. Methodol. 30, 2 (2021), 1–29.
[14]
Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of the 8th Workshop on Syntax, Semantics and Structure in Statistical Translation. Association for Computational Linguistics, 103–111.
[15]
YunSeok Choi, JinYeong Bak, CheolWon Na, and Jee-Hyong Lee. 2021. Learning sequential and structural information for source code summarization. In Findings of the Association for Computational Linguistics (ACL-IJCNLP’21). 2842–2851.
[16]
Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. ELECTRA: Pre-training text encoders as discriminators rather than generators. In Proceedings of the 8th International Conference on Learning Representations (ICLR’20). OpenReview.net.
[17]
Colin B. Clement, Dawn Drain, Jonathan Timcheck, Alexey Svyatkovskiy, and Neel Sundaresan. 2020. PyMT5: Multi-mode translation of natural language and Python code with transformers. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). Association for Computational Linguistics, 9052–9065.
[18]
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G. Carbonell, Quoc Le, and Ruslan Salakhutdinov. 2019. Transformer-XL: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2978–2988.
[19]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). Association for Computational Linguistics, 4171–4186.
[20]
Brian P. Eddy, Jeffrey A. Robinson, Nicholas A. Kraft, and Jeffrey C. Carver. 2013. Evaluating source code summarization techniques: Replication and expansion. In Proceedings of the 21st International Conference on Program Comprehension (ICPC’13). IEEE, 13–22.
[21]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics (EMNLP’20). 1536–1547.
[22]
Patrick Fernandes, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Structured neural summarization. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). OpenReview.net.
[23]
Andrew Forward and Timothy C. Lethbridge. 2002. The relevance of software documentation, tools and technologies: A survey. In Proceedings of the ACM Symposium on Document Engineering. 26–33.
[24]
Wei Fu and Tim Menzies. 2017. Easy over hard: A case study on deep learning. In Proceedings of the 11th Joint Meeting on Foundations of Software Engineering. 49–60.
[25]
Shuzheng Gao, Cuiyun Gao, Yulan He, Jichuan Zeng, Lunyiu Nie, Xin Xia, and Michael Lyu. 2023. Code structure–guided transformer for source code summarization. ACM Trans. Softw. Eng. Methodol. 32, 1 (2023), 1–32.
[26]
Yvette Graham, Nitika Mathur, and Timothy Baldwin. 2014. Randomized significance tests in machine translation. In Proceedings of the 9th Workshop on Statistical Machine Translation. 266–274.
[27]
David Gros, Hariharan Sezhiyan, Prem Devanbu, and Zhou Yu. 2020. Code to comment “Translation”: Data, metrics, baselining & evaluation. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE’20). IEEE, 746–757.
[28]
Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. UniXcoder: Unified cross-modal pre-training for code representation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 7212–7225.
[29]
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin B. Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, and Ming Zhou. 2021. GraphCodeBERT: Pre-training code representations with data flow. In Proceedings of the 9th International Conference on Learning Representations (ICLR’21). OpenReview.net.
[30]
Sonia Haiduc, Jairo Aponte, and Andrian Marcus. 2010. Supporting program comprehension with source code summarization. In Proceedings of the ACM/IEEE 32nd International Conference on Software Engineering. IEEE, 223–226.
[31]
Sonia Haiduc, Jairo Aponte, Laura Moreno, and Andrian Marcus. 2010. On the use of automated text summarization techniques for summarizing source code. In Proceedings of the 17th Working Conference on Reverse Engineering. IEEE, 35–44.
[32]
Sakib Haque, Alexander LeClair, Lingfei Wu, and Collin McMillan. 2020. Improved automatic summarization of subroutines via attention to file context. In Proceedings of the 17th International Conference on Mining Software Repositories. 300–310.
[33]
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In Proceedings of the IEEE/ACM 26th International Conference on Program Comprehension (ICPC’18). IEEE, 200–20010.
[34]
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2020. Deep code comment generation with hybrid lexical and syntactical information. Empir. Softw. Eng. 25, 3 (2020), 2179–2217.
[35]
Xing Hu, Ge Li, Xin Xia, David Lo, Shuai Lu, and Zhi Jin. 2018. Summarizing source code with transferred API knowledge. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). ijcai.org, 2269–2275.
[36]
Xing Hu, Xin Xia, David Lo, Zhiyuan Wan, Qiuyuan Chen, and Thomas Zimmermann. 2022. Practitioners’ expectations on automated code comment generation. In Proceedings of the 44th IEEE/ACM International Conference on Software Engineering (ICSE’22). 1693–1705.
[37]
Yuan Huang, Shaohao Huang, Huanchao Chen, Xiangping Chen, Zibin Zheng, Xiapu Luo, Nan Jia, Xinyu Hu, and Xiaocong Zhou. 2020. Towards automatically generating block comments for code snippets. Inf. Softw. Technol. 127 (2020), 106373.
[38]
Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. CodeSearchNet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 (2019).
[39]
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2073–2083.
[40]
Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue. 2002. CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Softw. Eng. 28, 7 (2002), 654–670.
[41]
Shigeki Karita, Xiaofei Wang, Shinji Watanabe, Takenori Yoshimura, Wangyou Zhang, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, and Ryuichi Yamamoto. 2019. A comparative study on transformer vs RNN in speech applications. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU’19). IEEE, 449–456.
[42]
Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 388–395.
[43]
Marie-Anne Lachaux, Baptiste Roziere, Marc Szafraniec, and Guillaume Lample. 2021. DOBF: A deobfuscation pre-training objective for programming languages. Adv. Neural Inf. Process. Syst. 34 (2021), 14967–14979.
[44]
Alexander LeClair, Aakash Bansal, and Collin McMillan. 2021. Ensemble models for neural source code summarization of subroutines. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME’21). IEEE, 286–297.
[45]
Alexander LeClair, Sakib Haque, Lingfei Wu, and Collin McMillan. 2020. Improved code summarization via a graph neural network. In Proceedings of the 28th International Conference on Program Comprehension. 184–195.
[46]
Alexander LeClair, Siyuan Jiang, and Collin McMillan. 2019. A neural model for generating natural language summaries of program subroutines. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering (ICSE’19). IEEE, 795–806.
[47]
Alexander LeClair and Collin McMillan. 2019. Recommendations for datasets for source code summarization. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). Association for Computational Linguistics, 3931–3937.
[48]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 7871–7880.
[49]
Boao Li, Meng Yan, Xin Xia, Xing Hu, Ge Li, and David Lo. 2020. DeepCommenter: A deep code comment generation tool with hybrid lexical and syntactical information. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1571–1575.
[50]
Jia Li, Yongmin Li, Ge Li, Xing Hu, Xin Xia, and Zhi Jin. 2021. EditSum: A retrieve-and-edit framework for source code summarization. In Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE’21). IEEE, 155–166.
[51]
Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu, Benjamin Lipkin, Muhtasham Oblokulov, Zhiruo Wang, Rudra Murthy, Jason Stillerman, Siva Sankalp Patel, Dmitry Abulkhanov, Marco Zocca, Manan Dey, Zhihan Zhang, Nour Fahmy, Urvashi Bhattacharyya, Wenhao Yu, Swayam Singh, Sasha Luccioni, Paulo Villegas, Maxim Kunakov, Fedor Zhdanov, Manuel Romero, Tony Lee, Nadav Timor, Jennifer Ding, Claire Schlesinger, Hailey Schoelkopf, Jan Ebert, Tri Dao, Mayank Mishra, Alex Gu, Jennifer Robinson, Carolyn Jane Anderson, Brendan Dolan-Gavitt, Danish Contractor, Siva Reddy, Daniel Fried, Dzmitry Bahdanau, Yacine Jernite, Carlos Muñoz Ferrandis, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, and Harm de Vries. 2023. StarCoder: May the source be with you! (2023). arxiv:cs.CL/2305.06161
[52]
Yuding Liang and Kenny Zhu. 2018. Automatic generation of text descriptive comments for code blocks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[53]
Chen Lin, Zhichao Ouyang, Junqing Zhuang, Jianqiang Chen, Hui Li, and Rongxin Wu. 2021. Improving code summarization with block-wise abstract syntax tree splitting. In Proceedings of the IEEE/ACM 29th International Conference on Program Comprehension (ICPC’21). IEEE, 184–195.
[54]
Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out. Association for Computational Linguistics, 74–81.
[55]
Shangqing Liu, Yu Chen, Xiaofei Xie, Jing Kai Siow, and Yang Liu. 2021. Retrieval-augmented generation for code summarization via hybrid GNN. In Proceedings of the 9th International Conference on Learning Representations (ICLR’21). OpenReview.net.
[56]
Zhongxin Liu, Xin Xia, Ahmed E. Hassan, David Lo, Zhenchang Xing, and Xinyu Wang. 2018. Neural-machine-translation-based commit message generation: How far are we? In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 373–384.
[57]
C. Lopes. 2010. UCI source code data sets. Retrieved from http://www.ics.uci.edu/-lopes/datasets/
[58]
Shigeki Karita, Xiaofei Wang, Shinji Watanabe, Takenori Yoshimura, Wangyou Zhang, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, and Ryuichi Yamamoto. 2021. CodeXGLUE: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664 (2021).
[59]
Junayed Mahmud, Fahim Faisal, Raihan Islam Arnob, Antonios Anastasopoulos, and Kevin Moran. 2021. Code to comment translation: A comparative study on model effectiveness & errors. arXiv preprint arXiv:2106.08415 (2021).
[60]
Paul W. McBurney, Cheng Liu, Collin McMillan, and Tim Weninger. 2014. Improving topic model source code summarization. In Proceedings of the 22nd International Conference on Program Comprehension. 291–294.
[61]
Paul W. McBurney and Collin McMillan. 2014. Automatic documentation generation via source code summarization of method context. In Proceedings of the 22nd International Conference on Program Comprehension. 279–290.
[62]
Mary L. McHugh. 2012. Interrater reliability: The kappa statistic. Biochem. Med. 22, 3 (2012), 276–282.
[63]
Jessica Moore, Ben Gelman, and David Slater. 2019. A convolutional neural network for language-agnostic source code summarization. In Proceedings of the 14th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE’19). SciTePress, 15–26.
[64]
Laura Moreno, Jairo Aponte, Giriprasad Sridhara, Andrian Marcus, Lori Pollock, and K. Vijay-Shanker. 2013. Automatic generation of natural language summaries for Java classes. In Proceedings of the 21st International Conference on Program Comprehension (ICPC’13). IEEE, 23–32.
[65]
Fangwen Mu, Xiao Chen, Lin Shi, Song Wang, and Qing Wang. 2022. Automatic comment generation via multi-pass deliberation. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–12.
[66]
Fangwen Mu, Xiao Chen, Lin Shi, Song Wang, and Qing Wang. 2023. Developer-intent driven code comment generation. arXiv preprint arXiv:2302.07055 (2023).
[67]
Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2022. CodeGEN: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474 (2022).
[68]
Sebastiano Panichella, Jairo Aponte, Massimiliano Di Penta, Andrian Marcus, and Gerardo Canfora. 2012. Mining source code descriptions from developer communications. In Proceedings of the 20th IEEE International Conference on Program Comprehension (ICPC’12). IEEE, 63–72.
[69]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311–318.
[70]
Md. Rizwan Parvez, Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021. Retrieval augmented code generation and summarization. In Findings of the Association for Computational Linguistics (EMNLP’21). Association for Computational Linguistics, 2719–2734.
[71]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 1 (2020), 5485–5551.
[72]
Carol Righi, Janice James, Michael Beasley, Donald L. Day, Jean E. Fox, Jennifer Gieber, Chris Howe, and Laconya Ruby. 2013. Card sort analysis best practices. J. Usabil. Stud. 8, 3 (2013), 69–89.
[73]
Stephen E. Robertson, Steve Walker, and Micheline Beaulieu. 2000. Experimentation as a way of life: Okapi at TREC. Inf. Process. Manag. 36, 1 (2000), 95–108.
[74]
Paige Rodeghero, Collin McMillan, Paul W. McBurney, Nigel Bosch, and Sidney D’Mello. 2014. Improving automated source code summarization via an eye-tracking study of programmers. In Proceedings of the 36th International Conference on Software Engineering. 390–401.
[75]
Devjeet Roy, Sarah Fakhoury, and Venera Arnaoudova. 2021. Reassessing automatic evaluation metrics for code summarization tasks. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1105–1116.
[76]
Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24, 5 (1988), 513–523.
[77]
Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 1073–1083.
[78]
Ramin Shahbazi, Rishab Sharma, and Fatemeh H. Fard. 2021. API2Com: On the improvement of automatically generated code comments using API documentations. In Proceedings of the IEEE/ACM 29th International Conference on Program Comprehension (ICPC’21). IEEE, 411–421.
[79]
Noam Shazeer. 2019. Fast transformer decoding: One write-head is all you need. arXiv preprint arXiv:1911.02150 (2019).
[80]
Ensheng Shi, Yanlin Wang, Lun Du, Junjie Chen, Shi Han, Hongyu Zhang, Dongmei Zhang, and Hongbin Sun. 2022. On the evaluation of neural code summarization. In Proceedings of the 44th International Conference on Software Engineering. 1597–1608.
[81]
Lin Shi, Fangwen Mu, Xiao Chen, Song Wang, Junjie Wang, Ye Yang, Ge Li, Xin Xia, and Qing Wang. 2022. Are we building on the rock? On the importance of data preprocessing for code summarization. arXiv preprint arXiv:2207.05579 (2022).
[82]
Yusuke Shido, Yasuaki Kobayashi, Akihiro Yamamoto, Atsushi Miyamoto, and Tadayuki Matsumura. 2019. Automatic source code summarization with extended tree-LSTM. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’19). IEEE, 1–8.
[83]
Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori Pollock, and K. Vijay-Shanker. 2010. Towards automatically generating summary comments for Java methods. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering. 43–52.
[84]
Gongbo Tang, Mathias Müller, Annette Rios, and Rico Sennrich. 2018. Why self-attention? A targeted evaluation of neural machine translation architectures. arXiv preprint arXiv:1808.08946 (2018).
[85]
Ze Tang, Xiaoyu Shen, Chuanyi Li, Jidong Ge, Liguo Huang, Zhelin Zhu, and Bin Luo. 2022. AST-trans: Code summarization with efficient tree-structured attention. In Proceedings of the International Conference on Software Engineering(ICSE’22).
[86]
Zhaopeng Tu, Zhendong Su, and Premkumar Devanbu. 2014. On the localness of software. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 269–280.
[87]
Carmine Vassallo, Sebastiano Panichella, Massimiliano Di Penta, and Gerardo Canfora. 2014. Codes: Mining source code descriptions from developers discussions. In Proceedings of the 22nd International Conference on Program Comprehension. 106–109.
[88]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).
[89]
Yao Wan, Zhou Zhao, Min Yang, Guandong Xu, Haochao Ying, Jian Wu, and Philip S. Yu. 2018. Improving automatic source code summarization via deep reinforcement learning. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 397–407.
[90]
Wenhua Wang, Yuqun Zhang, Yulei Sui, Yao Wan, Zhou Zhao, Jian Wu, Philip Yu, and Guandong Xu. 2020. Reinforcement-learning-guided source code summarization via hierarchical attention. IEEE Trans. Softw. Eng. 48, 1 (2020), 102–119.
[91]
Yue Wang, Weishi Wang, Shafiq R. Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’21). 8696–8708.
[92]
Bolin Wei, Ge Li, Xin Xia, Zhiyi Fu, and Zhi Jin. 2019. Code generation as a dual task of code summarization. Adv. Neural Inf. Process. Syst. 32 (2019).
[93]
Bolin Wei, Yongmin Li, Ge Li, Xin Xia, and Zhi Jin. 2020. Retrieve and refine: Exemplar-based neural comment generation. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE’20). IEEE, 349–360.
[94]
Huihui Wei and Ming Li. 2017. Supervised deep features for software functional clone detection by exploiting lexical and syntactical information in source code. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’17). 3034–3040.
[95]
Edmund Wong, Taiyue Liu, and Lin Tan. 2015. CloCom: Mining existing source code for automatic comment generation. In Proceedings of the IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER’15). IEEE, 380–389.
[96]
Edmund Wong, Jinqiu Yang, and Lin Tan. 2013. AutoComment: Mining question and answer sites for automatic comment generation. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE’13). IEEE, 562–567.
[97]
Hongqiu Wu, Hai Zhao, and Min Zhang. 2021. Code summarization with structure-induced transformer. In Findings of the Association for Computational Linguistics (ACL/IJCNLP’21). Association for Computational Linguistics, 1078–1090.
[98]
Rui Xie, Wei Ye, Jinan Sun, and Shikun Zhang. 2021. Exploiting method names to improve code summarization: A deliberation multi-task learning approach. In Proceedings of the IEEE/ACM 29th International Conference on Program Comprehension (ICPC’21). IEEE, 138–148.
[99]
Kun Xu, Lingfei Wu, Zhiguo Wang, Yansong Feng, Michael Witbrock, and Vadim Sheinin. 2018. Graph2seq: Graph to sequence learning with attention-based neural networks. arXiv preprint arXiv:1804.00823 (2018).
[100]
Wei Ye, Rui Xie, Jinglei Zhang, Tianxiang Hu, Xiaoyin Wang, and Shikun Zhang. 2020. Leveraging code generation to improve code retrieval and summarization via dual learning. In Proceedings of the Web Conference. 2309–2319.
[101]
Lingbin Zeng, Xunhui Zhang, Tao Wang, Xiao Li, Jie Yu, and Huaimin Wang. 2018. Improving code summarization by combining deep learning and empirical knowledge. In Proceedings of the 30th International Conference on Software Engineering and Knowledge Engineering. KSI Research Inc. and Knowledge Systems Institute Graduate School, 566–565.
[102]
Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, and Xudong Liu. 2020. Retrieval-based neural source code summarization. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering (ICSE’20). IEEE, 1385–1397.
[103]
Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, and Xudong Liu. 2019. A novel neural source code representation based on abstract syntax tree. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering (ICSE’19). IEEE, 783–794.
[104]
Gang Zhao and Jeff Huang. 2018. DeepSim: Deep learning code functional similarity. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 141–151.
[105]
Qinkai Zheng, Xiao Xia, Xu Zou, Yuxiao Dong, Shan Wang, Yufei Xue, Zihan Wang, Lei Shen, Andi Wang, Yang Li, et al. 2023. CodeGeeX: A pre-trained model for code generation with multilingual evaluations on humaneval-X. arXiv preprint arXiv:2303.17568 (2023).
[106]
Daniel Zügner, Tobias Kirschstein, Michele Catasta, Jure Leskovec, and Stephan Günnemann. 2021. Language-agnostic representation learning of source code from structure and context. arXiv preprint arXiv:2103.11318 (2021).

Cited By

View all
  • (2024)A Comparative Analysis of Large Language Models for Code Documentation GenerationProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3664765(65-73)Online publication date: 10-Jul-2024
  • (2024)Automatic smart contract comment generation via large language models and in-context learningInformation and Software Technology10.1016/j.infsof.2024.107405168:COnline publication date: 17-Apr-2024

Index Terms

  1. Deep Is Better? An Empirical Comparison of Information Retrieval and Deep Learning Approaches to Code Summarization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Software Engineering and Methodology
    ACM Transactions on Software Engineering and Methodology  Volume 33, Issue 3
    March 2024
    943 pages
    EISSN:1557-7392
    DOI:10.1145/3613618
    • Editor:
    • Mauro Pezzé
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 March 2024
    Online AM: 06 November 2023
    Accepted: 19 October 2023
    Revised: 16 August 2023
    Received: 11 February 2023
    Published in TOSEM Volume 33, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Code summarization
    2. deep learning
    3. information retrieval

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)620
    • Downloads (Last 6 weeks)56
    Reflects downloads up to 12 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Comparative Analysis of Large Language Models for Code Documentation GenerationProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3664765(65-73)Online publication date: 10-Jul-2024
    • (2024)Automatic smart contract comment generation via large language models and in-context learningInformation and Software Technology10.1016/j.infsof.2024.107405168:COnline publication date: 17-Apr-2024

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media