Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A longitudinal study on the temporal validity of software samples

Published: 17 April 2024 Publication History

Abstract

Context

In Empirical Software Engineering, it is crucial to work with representative samples that reflect the current state of the software industry. An important consideration, especially in rapidly changing fields like software development, is that if we use a sample collected years ago, it should continue to represent the same population in the present day to produce generalizable results. However, it is seldom the case in which a software sample built several years ago accurately depicts the current state of the development industry. Nevertheless, many recent studies rely on rather old datasets (seven or more years of age) to conduct their investigations.

Objective

To analyze the evolution of a population of open-source projects, determine the likelihood of detecting significant differences over time, and study the activity history of the projects.

Method

We performed a longitudinal study with 72 snapshots of quality projects from Github, covering the period between July 1st 2017 and June 1st 2023. We recorded monthly values of seven repository metrics (contributors, commits, closed pull-requests, merged pull-requests, closed issues, number of stars and forks), encompassing data from a total of 1991 repositories.

Results

We observed significant changes in all the metrics evaluated, with most cases showing negligible to small effect sizes. Notably, merged pull-requests registered medium effect sizes. The evolution was not equal in all the metrics, however, after five years it was unlikely that a sample of projects remained representative for any of the analyzed metrics, showing probabilities below 25%.

Conclusion

Although the temporal validity of a sample depends on the specific data being studied, employing datasets created several years ago does not appear to be a sound strategy if the aim is to produce results that can be extrapolated to the current state of the population.

References

[1]
Y. Hassouneh, H. Turabieh, T. Thaher, I. Tumar, H. Chantar, J. Too, Boosted whale optimization algorithm with natural selection operators for software fault prediction, IEEE Access 9 (2021) 14239–14258,.
[2]
A. Alazba, H. Aljamaan, Code smell detection using feature selection and stacking ensemble: an empirical investigation, Inf. Softw. Technol. 138 (2021),.
[3]
C. Ni, X. Xia, D. Lo, X. Chen, Q. Gu, Revisiting supervised and unsupervised methods for effort-aware cross-project defect prediction, IEEE Trans. Softw. Eng. 48 (3) (2022) 786–802,.
[4]
S. Baltes, P. Ralph, Sampling in software engineering research: a critical review and guidelines, Empir. Softw. Eng. 27 (4) (2022) 94,.
[5]
E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D.M. German, D. Damian, The promises and perils of mining GitHub, in: Proceedings of the 11th Working Conference on Mining Software Repositories, 2014, pp. 92–101,.
[6]
N. Munaiah, S. Kroh, C. Cabrey, M. Nagappan, Curating GitHub for engineered software projects, Empir. Softw. Eng. 22 (6) (2017) 3219–3253,.
[7]
T. Xia, W. Fu, R. Shu, R. Agrawal, T. Menzies, Predicting health indicators for open source projects (using hyperparameter optimization, Empir. Softw. Eng. 27 (6) (2022) 122,.
[8]
K. Munger, The limited value of non-replicable field experiments in contexts with low temporal validity, Soc. Media + Soc. 5 (3) (2019),.
[9]
T. Lewowski, L. Madeyski, Creating evolving project data sets in software engineering, Stud. Comput. Intell. 851 (2020) 1–14.
[10]
J.A. Carruthers, J.A. Diaz-Pace, E.A. Irrazabal, How are software datasets constructed in Empirical Software Engineering studies? A systematic mapping study, in: 2022 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), 2022, pp. 442–450,.
[11]
M. Jureczko, L. Madeyski, Towards identifying software project clusters with regard to defect prediction, in: Proceedings of the 6th International Conference on Predictive Models in Software Engineering - PROMISE ’10, 2010, p. 1,.
[12]
M. Shepperd, Q. Song, Z. Sun, C. Mair, Data quality: some comments on the NASA software defect datasets, IEEE Trans. Softw. Eng. 39 (9) (2013) 1208–1215,.
[13]
P. Afric, L. Sikic, A.S. Kurdija, M. Silic, REPD: source code defect prediction as anomaly detection, J. Syst. Softw. 168 (2020),.
[14]
I.H. Laradji, M. Alshayeb, L. Ghouti, Software defect prediction using ensemble learning on selected features, Inf. Softw. Technol. 58 (2015) 388–402,.
[15]
A. Boucher, M. Badri, Software metrics thresholds calculation techniques to predict fault-proneness: an empirical comparison, Inf. Softw. Technol. 96 (2018) 38–67,.
[16]
E. Tempero, et al., The Qualitas Corpus: a curated collection of Java code for empirical studies, in: Proceedings - Asia-Pacific Software Engineering Conference, APSEC, 2010, pp. 336–345,.
[17]
M.M. Lehman, Programs, life cycles, and laws of software evolution, Proc. IEEE 68 (9) (1980) 1060–1076,.
[18]
A. Ait, J.L.C. Izquierdo, J. Cabot, An empirical study on the survival rate of GitHub projects, in: Proceedings of the 19th International Conference on Mining Software Repositories, 2022, pp. 365–375,.
[19]
J. Coelho, M.T. Valente, L. Milen, L.L. Silva, Is this GitHub project maintained? Measuring the level of maintenance activity of open-source projects, Inf. Softw. Technol. 122 (2020),.
[20]
C. Wohlin, P. Runeson, M. Höst, M.C. Ohlsson, B. Regnell, A. Wesslén, Experimentation in Software Engineering, 9783642290, Berlin, Heidelberg: Springer, Berlin Heidelberg, 2012.
[21]
M. D'Ambros, M. Lanza, R. Robbes, An extensive comparison of bug prediction approaches, in: Proceedings - International Conference on Software Engineering, 2010, pp. 31–41,.
[22]
R. Wu, H. Zhang, S. Kim, S.C. Cheung, ReLink: recovering links between bugs and changes, in: SIGSOFT/FSE 2011 - Proceedings of the 19th ACM SIGSOFT Symposium on Foundations of Software Engineering, 2011, pp. 15–25,.
[23]
V. Lenarduzzi, F. Pecorelli, N. Saarimaki, S. Lujan, F. Palomba, A critical comparison on six static analysis tools: detection, agreement, and precision, J. Syst. Softw. 198 (2023),.
[24]
B.L. Sousa, M.A.S. Bigonha, K.A.M. Ferreira, G.C. Franco, A time series-based dataset of open-source software evolution, in: Proceedings of the 19th International Conference on Mining Software Repositories, 2022, pp. 702–706,.
[25]
J. Whitehead, I. Mistrík, J. Grundy, A. van der Hoek, Collaborative Software Engineering: concepts and Techniques, Collaborative Software Engineering, Springer Berlin Heidelberg, Berlin, Heidelberg, 2010, pp. 1–30.
[26]
K. Crowston, Q. Li, K. Wei, U.Y. Eseryel, J. Howison, Self-organization of teams for free/libre open source software development, Inf. Softw. Technol. 49 (6) (2007) 564–575,.
[27]
B. Gezici, A. Tarhan, O. Chouseinoglou, Internal and external quality in the evolution of mobile software: an exploratory study in open-source market, Inf. Softw. Technol. 112 (2019) 178–200,.
[28]
Y. Yu, H. Wang, G. Yin, T. Wang, Reviewer recommendation for pull-requests in GitHub: what can we learn from code review and bug assignment?, Inf. Softw. Technol. 74 (2016) 204–218,.
[29]
G. Gousios, M. Pinzger, A. van Deursen, An exploratory study of the pull-based software development model, in: Proceedings of the 36th International Conference on Software Engineering, 2014, pp. 345–355,.
[30]
S.G. Eick, T.L. Graves, A.F. Karr, J.S. Marron, A. Mockus, Does code decay? Assessing the evidence from change management data, IEEE Trans. Softw. Eng. 27 (1) (2001) 1–12,.
[31]
C. Laaber, M. Basmaci, P. Salza, Predicting unstable software benchmarks using static source code features, Empir. Softw. Eng. 26 (6) (2021) 114,.
[32]
D.J. Kim, T.-H. Chen, J. Yang, The secret life of test smells - an empirical study on test smell evolution and maintenance, Empir. Softw. Eng. 26 (5) (2021) 100,.
[33]
C. Macho, S. Beyer, S. McIntosh, M. Pinzger, The nature of build changes, Empir. Softw. Eng. 26 (3) (2021) 1–53,.
[34]
L.P. Lima, L.S. Rocha, C.I.M. Bezerra, M. Paixao, Assessing exception handling testing practices in open-source libraries, Empir. Softw. Eng. 26 (5) (2021) 1–39,.
[35]
Z.A. Kermansaravi, M.S. Rahman, F. Khomh, F. Jaafar, Y.G. Guéhéneuc, Investigating design anti-pattern and design pattern mutations and their change- and fault-proneness, Empir. Softw. Eng. 26 (1) (2021) 1–47,.
[36]
G.A.A. Prana, et al., Out of sight, out of mind? How vulnerable dependencies affect open-source projects, Empir. Softw. Eng. 26 (4) (2021) 1–34,.
[37]
E.A. AlOmar, M.W. Mkaouer, A. Ouni, M. Kessentini, On the impact of refactoring on the relationship between quality attributes and design metrics, in: International Symposium on Empirical Software Engineering and Measurement, 2019.
[38]
L. Grammel, H. Schackmann, A. Schröter, C. Treude, M.-A. Storey, Attracting the community's many eyes, Hum. Aspect. Softw. Eng. (2010) 1–6,.
[39]
N. Bettenburg, S. Just, A. Schröter, C. Weiss, R. Premraj, T. Zimmermann, What makes a good bug report?, in: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, 2008, pp. 308–318,.
[40]
O. Jarczyk, B. Gruszka, S. Jaroszewicz, L. Bukowski, A. Wierzbicki, GitHub Projects. Quality Analysis of Open-Source Software, 2014, pp. 80–94.
[41]
H. Borges, M. Tulio Valente, What's in a GitHub star? Understanding repository starring practices in a social coding platform, J. Syst. Softw. 146 (2018) 112–129,.
[42]
J. Coelho, M.T. Valente, Why modern open source projects fail, Proc. 2017 11th Jt. Meet. Found. Softw. Eng. Part F1301 (2017) 186–196,.
[43]
I. Scholtes, P. Mavrodiev, F. Schweitzer, From Aristotle to Ringelmann: a large-scale analysis of team productivity and coordination in Open Source Software projects, Empir. Softw. Eng. 21 (2) (2016) 642–683,.
[44]
J.D. Singer, J.B. Willett, Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence, Oxford University Press, 2003.
[45]
K. Crowston, K. Wei, J. Howison, A. Wiggins, Free/Libre open-source software development, ACM Comput. Surv. 44 (2) (2012) 1–35,.
[46]
V. Cosentino, J. Luis, J. Cabot, Findings from GitHub, in: Proceedings of the 13th International Conference on Mining Software Repositories, 2016, pp. 137–141,.
[47]
V. Cosentino, J.L. Canovas Izquierdo, J. Cabot, A systematic mapping study of software development with GitHub, IEEE Access 5 (2017) 7173–7192,.
[48]
G. Bavota, G. Canfora, M. Di Penta, R. Oliveto, S. Panichella, The evolution of project inter-dependencies in a software ecosystem: the case of Apache, in: 2013 IEEE International Conference on Software Maintenance, 2013, pp. 280–289,.
[49]
G. Bavota, G. Canfora, M. Di Penta, R. Oliveto, S. Panichella, How the Apache community upgrades dependencies: an evolutionary study, Empir. Softw. Eng. 20 (5) (2015) 1275–1317,.
[50]
R. Kikas, G. Gousios, M. Dumas, D. Pfahl, Structure and evolution of package dependency networks, in: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR, 2017, pp. 102–112,.
[51]
M. Fowler, K. Beck, J. Brant, W. Opdyke, D. Roberts, Refactoring: Improving the Design of Existing Code, 2002.
[52]
M. Tufano, et al., When and why your code starts to smell bad (and whether the smells go away), IEEE Trans. Softw. Eng. 43 (11) (2017) 1063–1088,.
[53]
A.-J. Molnar, S. Motogna, Long-term evaluation of technical debt in open-source software, in: Proceedings of the 14th ACM /IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2020, pp. 1–9,.
[54]
A. Trautsch, S. Herbold, J. Grabowski, A longitudinal study of static analysis warning evolution and the effects of PMD on software quality in Apache open source projects, Empir. Softw. Eng. 25 (6) (2020) 5137–5192,.
[55]
I. Chengalur-Smith, A. Sidorova, S. Daniel, Sustainability of free/libre open source projects: a longitudinal study, J. Assoc. Inf. Syst. 11 (11) (2010) 657–683,.
[56]
J.A. Carruthers, J.A. Diaz-Pace, E.A. Irrazabal, A systematic mapping study of empirical studies performed with collections of software projects, Comput. y Sist. 26 (4) (2022),.
[57]
O.J. Dunn, Multiple comparisons among means, J. Am. Stat. Assoc. 56 (293) (1961) 52,.
[58]
S.S. Shapiro, M.B. Wilk, An analysis of variance test for normality (complete samples), Biometrika 52 (3/4) (Dec. 1965) 591,.
[59]
W.H. Kruskal, W.A. Wallis, Use of ranks in one-criterion variance analysis, J. Am. Stat. Assoc. 47 (260) (1952) 583–621,.
[60]
O.J. Dunn, Multiple comparisons using rank sums, Technometrics 6 (3) (1964) 241–252,.
[61]
A. Vargha, H.D. Delaney, A. Vargha, A critique and improvement of the ‘CL’ common language effect size statistics of McGraw and Wong, J. Educ. Behav. Stat. 25 (2) (2000) 101,.
[62]
M.R. Hess, J.D. Kromrey, Robust confidence intervals for effect sizes: a comparative study of cohen's d and cliff's delta under non-normality and heterogeneous variances, in: Annual Meeting of the American Educational Research Association, 2004.
[63]
D.R. Cox, D. Oakes, Analysis of Survival Data, Chapman and Hall/CRC, 2018.
[64]
R. Coelho, L. Almeida, G. Gousios, A. van Deursen, C. Treude, Exception handling bug hazards in Android: results from a mining study and an exploratory survey, Empir. Softw. Eng. 22 (3) (2017) 1264–1304,.
[65]
E. Iannone, R. Guadagni, F. Ferrucci, A. De Lucia, F. Palomba, The secret life of software vulnerabilities: a large-scale empirical study, IEEE Trans. Softw. Eng. 49 (1) (2023) 44–63,.
[66]
E.L. Kaplan, P. Meier, Nonparametric estimation from incomplete observations, J. Am. Stat. Assoc. 53 (282) (1958) 457,.
[67]
G. Gousios, M.-A. Storey, A. Bacchelli, Work practices and challenges in pull-based development, in: Proceedings of the 38th International Conference on Software Engineering, 2016, pp. 285–296,.
[68]
M.M. Lehman, On understanding laws, evolution, and conservation in the large-program life cycle, J. Syst. Softw. 1 (1979) 213–221,.
[69]
M. Caneill, D.M. Germán, S. Zacchiroli, The debsources dataset: two decades of free and open source software, Empir. Softw. Eng. 22 (3) (2017) 1405–1437,.
[70]
L. Hatton, D. Spinellis, M. van Genuchten, The long-term growth rate of evolving software: empirical results and implications, J. Softw. Evol. Process 29 (5) (2017) e1847,.
[71]
G. Rousseau, R. Di Cosmo, S. Zacchiroli, Software provenance tracking at the scale of public source code, Empir. Softw. Eng. 25 (4) (2020) 2930–2959,.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Information and Software Technology
Information and Software Technology  Volume 168, Issue C
Apr 2024
131 pages

Publisher

Butterworth-Heinemann

United States

Publication History

Published: 17 April 2024

Author Tags

  1. Software samples
  2. Temporal validity
  3. Longitudinal study
  4. Sample evolution

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Oct 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media