Datasets
Standard Dataset
On Automatically Assessing Code Understandability
- Citation Author(s):
- Simone Scalabrino, Gabriele Bavota, Christopher Vendome, Mario Linares-Vásquez, Denys Poshyvanyk, Rocco Oliveto
- Submitted by:
- Simone Scalabrino
- Last updated:
- Tue, 05/17/2022 - 22:17
- DOI:
- 10.21227/H24M3X
- Data Format:
- Research Article Link:
- Links:
- License:
- Categories:
- Keywords:
Abstract
Understanding software is an inherent requirement for many maintenance and evolution tasks. Without a thorough understanding of the code, developers would not be able to fix bugs or add new features timely. Measuring code understandability might be useful to guide developers in writing better code, and could also help in estimating the effort required to modify code components. Unfortunately, there are no metrics designed to assess the understandability of code snippets.
In this dataset we provide 396 human evaluations from 57 developers performed on 50 Java snippets. The dataset contains, for each instance: (i) information about the snippet, (ii) all the 121 code-related, documentation-related and developer-related metrics, and (iii) all the six proxies for code understandability (TNPU, AU, TAU, PBU, ABU, BD).
A plain CSV file that contains all the 396 evaluations on code understandability gathered in our study, including the metrics measured on developers, documentation and code.
Separator: comma (",")
Using from R: read.csv("complete_dataset.csv")