research-article

Open access

Code, quality, and process metrics in graduated and retired ASFI projects

Authors:

Ștefan Stănciulescu,

Vladimir FilkovAuthors Info & Claims

ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 495 - 506

https://doi.org/10.1145/3540250.3549132

Published: 09 November 2022 Publication History

Abstract

Recent work on open source sustainability shows that successful trajectories of projects in the Apache Software Foundation Incubator (ASFI) can be predicted early on, using a set of socio-technical measures. Because OSS projects are socio-technical systems centered around code artifacts, we hypothesize that sustainable projects may exhibit different code and process patterns than unsustainable ones, and that those patterns can grow more apparent as projects evolve over time. Here we studied the code and coding processes of over 200 ASFI projects, and found that ASFI graduated projects have different patterns of code quality and complexity than retired ones. Likewise for the coding processes – e.g., feature commits or bug-fixing commits are correlated with project graduation success. We find that minor contributors and major contributors (who contribute <5%, respectively >=95% commits) associate with graduation outcomes, implying that having also developers who contribute fewer commits are important for a project’s success. This study provides evidence that OSS projects, especially nascent ones, can benefit from introspection and instrumentation using multidimensional modeling of the whole system, including code, processes, and code quality measures, and how they are interconnected over time.

References

[1]

Dimitrios Athanasiou, Ariadi Nugroho, Joost Visser, and Andy Zaidman. 2014. Test Code Quality and Its Relation to Issue Handling Performance. IEEE Transactions on Software Engineering, 40, 11 (2014), 1100–1125. https://doi.org/10.1109/TSE.2014.2342227

[2]

Robert Freed Bales. 2017. Social Interaction Systems: Theory and Measurement. Routledge. https://doi.org/10.4324/9781315129563

[3]

Rajiv D. Banker, Srikant M. Datar, Chris F. Kemerer, and Dani Zweig. 1993. Software Complexity and Maintenance Costs. Commun. ACM, 36, 11 (1993), nov, 81–94. issn:0001-0782 https://doi.org/10.1145/163359.163375

Digital Library

[4]

Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. 2015. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67, 1 (2015), 1–48. https://doi.org/10.18637/jss.v067.i01

[5]

Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. 2015. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67, 1 (2015), 1–48. https://doi.org/10.18637/jss.v067.i01

[6]

Christian Bird, Nachiappan Nagappan, Brendan Murphy, Harald Gall, and Premkumar Devanbu. 2011. Don’t Touch My Code! Examining the Effects of Ownership on Software Quality. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering. 4–14. https://doi.org/10.1145/2025113.2025119

Digital Library

[7]

Hudson Borges, Andre Hora, and Marco Tulio Valente. 2016. Understanding the Factors That Impact the Popularity of GitHub Repositories. In 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME). 334–344. https://doi.org/10.1109/ICSME.2016.31

[8]

Jailton Coelho and Marco Tulio Valente. 2017. Why Modern Open Source Projects Fail. In Proceedings of the 2017 11th Joint meeting on Foundations of Software Engineering (ESEC/FSE 2017). 186–196. https://doi.org/10.1145/3106237.3106246

Digital Library

[9]

Melvin E Conway. 1968. How do committees invent. Datamation, 14, 4 (1968), 28–31.

[10]

Kevin Crowston, Hala Annabi, and James Howison. 2003. Defining Open Source Software Project Success.

[11]

Frank DeRemer and Hans H Kron. 1976. Programming-in-the-large versus programming-in-the-small. IEEE Transactions on Software Engineering, 80–86.

Digital Library

[12]

Edson Dias, Paulo Meirelles, Fernando Castor, Igor Steinmacher, Igor Wiese, and Gustavo Pinto. 2021. What Makes a Great Maintainer of Open Source Projects? In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 982–994. https://doi.org/10.1109/ICSE43902.2021.00093

Digital Library

[13]

Georgios Digkas, Mircea Lungu, Alexander Chatzigeorgiou, and Paris Avgeriou. 2017. The Evolution of Technical Debt in the Apache Ecosystem. In European Conference on Software Architecture (ECSA). 51–66. https://doi.org/10.1007/978-3-319-65831-5_4

[14]

Lex Donaldson. 2001. The Contingency Theory of Organizations. Sage. https://doi.org/10.4135/9781452229249

[15]

Geanderson E dos Santos and Eduardo Figueiredo. 2020. Commit Classification using Natural Language Processing: Experiments over Labeled Datasets. In CIbSE. 110–123.

[16]

Matthieu Foucault, Marc Palyart, Xavier Blanc, Gail C. Murphy, and Jean-Rémy Falleri. 2015. Impact of Developer Turnover on Quality in Open-Source Software. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE). 829–841. https://doi.org/10.1145/2786805.2786870

Digital Library

[17]

Santiago Gala-Pérez, Gregorio Robles, Jesús M González-Barahona, and Israel Herraiz. 2013. Intensive Metrics for the Study of the Evolution of Open Source Projects: Case studies from Apache Software Foundation projects. In 2013 10th Working Conference on Mining Software Repositories (MSR). 159–168. https://doi.org/10.1109/MSR.2013.6624023

[18]

Amir Hossein Ghapanchi. 2015. Predicting software future sustainability: A longitudinal perspective. Information Systems, 49 (2015), 40–51. issn:0306-4379 https://doi.org/10.1016/j.is.2014.10.005

Digital Library

[19]

Javier Luis Cánovas Izquierdo, Valerio Cosentino, and Jordi Cabot. 2017. An Empirical Study on the Maturity of the Eclipse Modeling Ecosystem. In 2017 ACM/IEEE 20th International Conference on Model Driven Engineering Languages and Systems (MODELS). 292–302. https://doi.org/10.1109/MODELS.2017.19

Digital Library

[20]

Yue Jiang, Bojan Cukic, Tim Menzies, and Nick Bartlow. 2008. Comparing Design and Code Metrics for Software Quality Prediction. In Proceedings of the 4th International Workshop on Predictor Models in Software Engineering (PROMISE). 11–18. https://doi.org/10.1145/1370788.1370793

Digital Library

[21]

Mitchell Joblin and Sven Apel. 2021. How Do Successful and Failed Projects Differ? A Socio-Technical Analysis. ACM Trans. Softw. Eng. Methodol., 31, 4 (2021), Article 67, 24 pages. https://doi.org/10.1145/3504003

Digital Library

[22]

Cory J Kapser and Michael W Godfrey. 2008. “Cloning considered harmful” considered harmful: patterns of cloning in software. Empirical Software Engineering, 13, 6 (2008), 645–692. https://doi.org/10.1007/s10664-008-9076-6

Digital Library

[23]

Daniel Lüdecke. 2021. sjPlot: Data Visualization for Statistics in Social Science. https://CRAN.R-project.org/package=sjPlot R package version 2.8.10

[24]

Daniel Lüdecke, Mattan S. Ben-Shachar, Indrajeet Patil, Philip Waggoner, and Dominique Makowski. 2021. performance: An R Package for Assessment, Comparison and Testing of Statistical Models. Journal of Open Source Software, 6, 60 (2021), 3139. https://doi.org/10.21105/joss.03139

[25]

Sang-Yong Tom Lee, Hee-Woong Kim, and Sumeet Gupta. 2009. Measuring open source software success. Omega, 37, 2 (2009), 426–438. https://doi.org/10.1016/j.omega.2007.05.005

[26]

Aversano Lerina and Laura Nardi. 2019. Investigating on the Impact of Software Clones on Technical Debt. In 2019 IEEE/ACM International Conference on Technical Debt (TechDebt). 108–112. https://doi.org/10.1109/TechDebt.2019.00029

Digital Library

[27]

Tony Liu, Lyle Ungar, and Konrad Kording. 2021. Quantifying causality in data science with quasi-experiments. Nature Computational Science, 1, 1 (2021), 24–32.

[28]

Log4j Project. 2022. https://logging.apache.org/log4j/2.x/ Accessed: 2022-03-10

[29]

Radu Marinescu. 2004. Detection Strategies: Metrics-Based Rules for Detecting Design Flaws. In Proceedings of the 20th IEEE International Conference on Software Maintenance (ICSM). 350–359. https://doi.org/10.1109/ICSM.2004.1357820

[30]

Robert C Martin. 2009. Clean code: a handbook of agile software craftsmanship. Pearson Education.

[31]

Vishal Midha and Prashant Palvia. 2012. Factors affecting the success of Open Source Software. Journal of Systems and Software, 85, 4 (2012), 895–905. https://doi.org/10.1016/j.jss.2011.11.010

Digital Library

[32]

Naouel Moha, Yann-Gaël Guéhéneuc, Laurence Duchien, and Anne-Francoise Le Meur. 2009. DECOR: A Method for the Specification and Detection of Code and Design Smells. IEEE Transactions on Software Engineering, 36, 1 (2009), 20–36. https://doi.org/10.1109/TSE.2009.50

Digital Library

[33]

Nachiappan Nagappan, Brendan Murphy, and Victor Basili. 2008. The Influence of Organizational Structure on Software Quality: An Empirical Case Study. In ACM/IEEE 30th International Conference on Software Engineering (ICSE). 521–530. https://doi.org/10.1145/1368088.1368160

Digital Library

[34]

Shinichi Nakagawa and Holger Schielzeth. 2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution, 4, 2 (2013), 133–142.

[35]

Thomas J. Ostrand, Elaine J. Weyuker, and Robert M. Bell. 2004. Where the Bugs Are. ACM SIGSOFT Software Engineering Notes, 29, 4 (2004), 86–96. https://doi.org/10.1145/1013886.1007524

Digital Library

[36]

Foyzur Rahman and Premkumar Devanbu. 2013. How, and Why, Process Metrics Are Better. In 2013 35th International Conference on Software Engineering (ICSE). 432–441.

[37]

Cobra Rahmani and Deepak Khazanchi. 2010. A Study on Defect Density of Open Source Software. In 2010 IEEE/ACIS 9th International Conference on Computer and Information Science. 679–683. https://doi.org/10.1109/ICIS.2010.11

Digital Library

[38]

Robert W Ruekert, Orville C Walker Jr, and Kenneth J Roering. 1985. The organization of marketing activities: a contingency theory of structure and performance. Journal of marketing, 49, 1 (1985), 13–25.

[39]

Alexander Sachs. 2019. Predicting Repository Upkeep with Textual Personality Analysis. Master’s thesis. University of Waterloo.

[40]

Charles M Schweik. 2013. Sustainability in open source software commons: Lessons learned from an empirical study of sourceforge projects. Technology Innovation Management Review, 3, 1 (2013).

[41]

Raed Shatnawi and Wei Li. 2008. The effectiveness of software metrics in identifying error-prone classes in post-release software evolution process. Journal of systems and software, 81, 11 (2008), 1868–1882. https://doi.org/10.1016/j.jss.2007.12.794

Digital Library

[42]

Yonghee Shin, Andrew Meneely, Laurie Williams, and Jason A Osborne. 2010. Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities. IEEE Transactions on Software Engineering, 37, 6 (2010), 772–787. https://doi.org/10.1109/TSE.2010.81

Digital Library

[43]

Chandrasekar Subramaniam, Ravi Sen, and Matthew L Nelson. 2009. Determinants of open source software project success: A longitudinal study. Decision Support Systems, 46, 2 (2009), 576–585. https://doi.org/10.1016/j.dss.2008.10.005

Digital Library

[44]

The MITRE Corporation. 2022. CVE-2021-44228. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44228 Accessed: 2022-03-10

[45]

Edith Tom, Aybüke Aurum, and Richard Vidgen. 2013. An exploration of technical debt. Journal of Systems and Software, 86, 6 (2013), 1498–1516. https://doi.org/10.1016/j.jss.2012.12.052

Digital Library

[46]

Andrew H Van de Ven and Robert Drazin. 1984. The concept of fit in contingency theory. Minnesota Univ Minneapolis Strategic Management Research Center.

[47]

Bogdan Vasilescu, Kelly Blincoe, Qi Xuan, Casey Casalnuovo, Daniela Damian, Premkumar Devanbu, and Vladimir Filkov. 2016. The Sky Is Not the Limit: Multitasking Across GitHub Projects. In Proceedings of the 38th International Conference on Software Engineering (ICSE). 994–1005. https://doi.org/10.1145/2884781.2884875

Digital Library

[48]

Carmine Vassallo, Fabio Palomba, Alberto Bacchelli, and Harald C. Gall. 2018. Continuous code quality: are we (really) doing that? In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE). 790–795. https://doi.org/10.1145/3238147.3240729

Digital Library

[49]

Jing Wu, Khim-Yong Goh, and Qian Tang. 2007. Investigating Success of Open Source Software Projects: A Social Network Perspective. ICIS 2007 Proceedings, 105.

[50]

Tianpei Xia, Wei Fu, Rui Shu, and Tim Menzies. 2020. Predicting project health for open source projects (using the DECART hyperparameter optimizer). arXiv preprint arXiv:2006.07240.

[51]

Likang Yin, Mahasweta Chakraborty, Charles Schweik, Seth Frey, and Vladimir Filkov. 2022. Open Source Software Sustainability: Combining Institutional Analysis and Socio-Technical Networks. Accepted at CSCW 2022, arXiv preprint arXiv:2203.03144, arxiv:2203.03144

[52]

Likang Yin, Zhuangzhi Chen, Qi Xuan, and Vladimir Filkov. 2021. Sustainability Forecasting for Apache Incubator Projects. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 1056–1067. https://doi.org/10.1145/3468264.3468563

Digital Library

[53]

Yang Zhang, Bogdan Vasilescu, Huaimin Wang, and Vladimir Filkov. 2018. One Size Does Not Fit All: An Empirical Study of Containerized Continuous Deployment Workflows. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 295–306. https://doi.org/10.1145/3236024.3236033

Digital Library

Cited By

Chakraborti MAtkisson CStănciulescu ŞFilkov VFrey S(2024)Do We Run How We Say We Run? Formalization and Practice of Governance in OSS CommunitiesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3641980(1-26)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3641980
Han JWang YLiu ZBao LLiu JLo DDeng S(2024)Sustainability Forecasting for Deep Learning Packages2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00106(981-992)Online publication date: 12-Mar-2024
https://doi.org/10.1109/SANER60148.2024.00106
Alami APardo RLinåker J(2024)Free open source communities sustainability: Does it make a difference in software quality?Empirical Software Engineering10.1007/s10664-024-10529-629:5Online publication date: 23-Jul-2024
https://doi.org/10.1007/s10664-024-10529-6
Show More Cited By

Index Terms

Code, quality, and process metrics in graduated and retired ASFI projects

Index terms have been assigned to the content through auto-classification.

Recommendations

Copilot-in-the-Loop: Fixing Code Smells in Copilot-Generated Python Code using Copilot
ASE '24: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering

As one of the most popular dynamic languages, Python experiences a decrease in readability and maintainability when code smells are present. Recent advancements in Large Language Models have sparked growing interest in AI-enabled tools for both code ...
"I know it when I see it" Perceptions of Code Quality: ITiCSE '17 Working Group Report
ITiCSE-WGR '17: Proceedings of the 2017 ITiCSE Conference on Working Group Reports

Context. Code quality is a key issue in software development. The ability to develop high quality software is therefore a key learning goal of computing programs. However, there are no universally accepted measures to assess the quality of code and ...
A review of code smell mining techniques

Over the past 15years, researchers presented numerous techniques and tools for mining code smells. It is imperative to classify, compare, and evaluate existing techniques and tools used for the detection of code smells because of their varying features ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

November 2022

1822 pages

ISBN:9781450394130

DOI:10.1145/3540250

General Chair:
Abhik Roychoudhury
National University of Singapore, Singapore
,
Program Chairs:
Cristian Cadar
Imperial College London, UK
,
Miryung Kim
University of California at Los Angeles, USA

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESEC/FSE '22

Sponsor:

ESEC/FSE '22: 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

November 14 - 18, 2022

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
478
Total Downloads

Downloads (Last 12 months)265
Downloads (Last 6 weeks)22

Reflects downloads up to 24 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chakraborti MAtkisson CStănciulescu ŞFilkov VFrey S(2024)Do We Run How We Say We Run? Formalization and Practice of Governance in OSS CommunitiesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3641980(1-26)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3641980
Han JWang YLiu ZBao LLiu JLo DDeng S(2024)Sustainability Forecasting for Deep Learning Packages2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00106(981-992)Online publication date: 12-Mar-2024
https://doi.org/10.1109/SANER60148.2024.00106
Alami APardo RLinåker J(2024)Free open source communities sustainability: Does it make a difference in software quality?Empirical Software Engineering10.1007/s10664-024-10529-629:5Online publication date: 23-Jul-2024
https://doi.org/10.1007/s10664-024-10529-6
Qiu HLieb AChou JCarneal MMok JAmspoker EVasilescu BDabbish L(2023)Climate Coach: A Dashboard for Open-Source Maintainers to Overview Community DynamicsProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581317(1-18)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3581317
Yan YFrey SZhang AFilkov VYin L(2023)GitHub OSS Governance File Dataset2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)10.1109/MSR59073.2023.00089(630-634)Online publication date: May-2023
https://doi.org/10.1109/MSR59073.2023.00089

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents