research-article

Deep Is Better? An Empirical Comparison of Information Retrieval and Deep Learning Approaches to Code Summarization

Authors:

Yu Pei,

Xuandong LiAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology, Volume 33, Issue 3

Article No.: 67, Pages 1 - 37

https://doi.org/10.1145/3631975

Published: 15 March 2024 Publication History

Get Access

Abstract

Code summarization aims to generate short functional descriptions for source code to facilitate code comprehension. While Information Retrieval (IR) approaches that leverage similar code snippets and corresponding summaries have led the early research, Deep Learning (DL) approaches that use neural models to capture statistical properties between code and summaries are now mainstream. Although some preliminary studies suggest that IR approaches are more effective in some cases, it is currently unclear how effective the existing approaches can be in general, where and why IR/DL approaches perform better, and whether the integration of IR and DL can achieve better performance. Consequently, there is an urgent need for a comprehensive study of the IR and DL code summarization approaches to provide guidance for future development in this area. This article presents the first large-scale empirical study of 18 IR, DL, and hybrid code summarization approaches on five benchmark datasets. We extensively compare different types of approaches using automatic metrics, we conduct quantitative and qualitative analyses of where and why IR and DL approaches perform better, respectively, and we also study hybrid approaches for assessing the effectiveness of integrating IR and DL. The study shows that the performance of IR approaches should not be underestimated, that while DL models perform better in predicting tokens from method signatures and capturing structural similarities in code, simple IR approaches tend to perform better in the presence of code with high similarity or long reference summaries, and that existing hybrid approaches do not perform as well as individual approaches in their respective areas of strength. Based on our findings, we discuss future research directions for better code summarization.

References

[1]

Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2020. A transformer-based approach for source code summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20). 4998–5007.

Abstract

References

Cited By

Index Terms

Recommendations

Retrieval-based neural source code summarization

An Extractive-and-Abstractive Framework for Source Code Summarization

Revisiting Information Retrieval and Deep Learning Approaches for Code Summarization

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Full Text

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations