Context-aware Code Summary Generation

Su, Chia-Yi; Bansal, Aakash; Huang, Yu; Li, Toby Jia-Jun; McMillan, Collin

Computer Science > Software Engineering

arXiv:2408.09006 (cs)

[Submitted on 16 Aug 2024]

Title:Context-aware Code Summary Generation

Authors:Chia-Yi Su, Aakash Bansal, Yu Huang, Toby Jia-Jun Li, Collin McMillan

View PDF HTML (experimental)

Abstract:Code summary generation is the task of writing natural language descriptions of a section of source code. Recent advances in Large Language Models (LLMs) and other AI-based technologies have helped make automatic code summarization a reality. However, the summaries these approaches write tend to focus on a narrow area of code. The results are summaries that explain what that function does internally, but lack a description of why the function exists or its purpose in the broader context of the program. In this paper, we present an approach for including this context in recent LLM-based code summarization. The input to our approach is a Java method and that project in which that method exists. The output is a succinct English description of why the method exists in the project. The core of our approach is a 350m parameter language model we train, which can be run locally to ensure privacy. We train the model in two steps. First we distill knowledge about code summarization from a large model, then we fine-tune the model using data from a study of human programmer who were asked to write code summaries. We find that our approach outperforms GPT-4 on this task.

Comments:	21 pages, 5 figures, preprint under review
Subjects:	Software Engineering (cs.SE)
Cite as:	arXiv:2408.09006 [cs.SE]
	(or arXiv:2408.09006v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2408.09006

Submission history

From: Chia-Yi Su [view email]
[v1] Fri, 16 Aug 2024 20:15:34 UTC (1,229 KB)

Computer Science > Software Engineering

Title:Context-aware Code Summary Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:Context-aware Code Summary Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators