An empirical study of the textual similarity between source code and source code summaries
PW McBurney, C McMillan - Empirical Software Engineering, 2016 - Springer
Empirical Software Engineering, 2016•Springer
Source code documentation often contains summaries of source code written by authors.
Recently, automatic source code summarization tools have emerged that generate
summaries without requiring author intervention. These summaries are designed for readers
to be able to understand the high-level concepts of the source code. Unfortunately, there is
no agreed upon understanding of what makes up a “good summary.” This paper presents an
empirical study examining summaries of source code written by authors, readers, and …
Recently, automatic source code summarization tools have emerged that generate
summaries without requiring author intervention. These summaries are designed for readers
to be able to understand the high-level concepts of the source code. Unfortunately, there is
no agreed upon understanding of what makes up a “good summary.” This paper presents an
empirical study examining summaries of source code written by authors, readers, and …
Abstract
Source code documentation often contains summaries of source code written by authors. Recently, automatic source code summarization tools have emerged that generate summaries without requiring author intervention. These summaries are designed for readers to be able to understand the high-level concepts of the source code. Unfortunately, there is no agreed upon understanding of what makes up a “good summary.” This paper presents an empirical study examining summaries of source code written by authors, readers, and automatic source code summarization tools. This empirical study examines the textual similarity between source code and summaries of source code using Short Text Semantic Similarity metrics. We found that readers use source code in their summaries more than authors do. Additionally, this study finds that accuracy of a human written summary can be estimated by the textual similarity of that summary to the source code.
Springer