research-article

Open access

Towards Summarizing Code Snippets Using Pre-Trained Transformers

Authors:

Gabriele BavotaAuthors Info & Claims

ICPC '24: Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension

Pages 1 - 12

https://doi.org/10.1145/3643916.3644400

Published: 13 June 2024 Publication History

PDF eReader

Abstract

When comprehending code, a helping hand may come from the natural language comments documenting it that, unfortunately, are not always there. To support developers in such a scenario, several techniques have been presented to automatically generate natural language summaries for a given code. Most recent approaches exploit deep learning (DL) to automatically document classes or functions, while little effort has been devoted to more fine-grained documentation (e.g., documenting code snippets or even a single statement). Such a design choice is dictated by the availability of training data: For example, in the case of Java, it is easy to create datasets composed of pairs <method, javadoc> that can be fed to DL models to teach them how to summarize a method. Such a comment-to-code linking is instead non-trivial when it comes to inner comments documenting a few statements. In this work, we take all the steps needed to train a DL model to automatically document code snippets. First, we manually built a dataset featuring 6.6k comments that have been (i) classified based on their type (e.g., code summary, TODO), and (ii) linked to the code statements they document. Second, we used such a dataset to train a multi-task DL model taking as input a comment and being able to (i) classify whether it represents a "code summary" or not, and (ii) link it to the code statements it documents. Our model identifies code summaries with 84% accuracy and is able to link them to the documented lines of code with recall and precision higher than 80%. Third, we run this model on 10k projects, identifying and linking code summaries to the documented code. This unlocked the possibility of building a large-scale dataset of documented code snippets that have then been used to train a new DL model able to automatically document code snippets. A comparison with state-of-the-art baselines shows the superiority of the proposed approach, which however, is still far from representing an accurate solution for snippet summarization.

References

[1]

[n.d.]. ChatGPT https://openai.com/blog/chatgpt.

Abstract

References

Cited By

Index Terms

Recommendations

Understanding code snippets in code reviews: a preliminary study of the OpenStack community

Demystifying code snippets in code reviews: a study of the OpenStack and Qt communities and a practitioner survey

Non-Linear Software Documentation with Interactive Code Examples

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations