research-article

BFSig: Leveraging File Significance in Bus Factor Estimation

Authors:

Vahid Haratian,

Mikhail Evtikhiev,

Pouria Derakhshanfar,

Vladimir KovalenkoAuthors Info & Claims

ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 1926 - 1936

https://doi.org/10.1145/3611643.3613877

Published: 30 November 2023 Publication History

Abstract

Software projects experience the departure of developers due to various reasons. As developers are one of the main sources of knowledge in software projects, their absence will inevitably result in a certain degree of knowledge depletion. Bus Factor (BF) is a metric to evaluate how this knowledge loss can affect the project’s continuity. Conventionally, BF is calculated as the smallest set of developers, removing over half the project knowledge upon departure. Current state-of-the-art approaches measure developers’ knowledge by the number of authored files, utilizing version control system (VCS) information. However, numerous studies have shown that files in software projects have different significance. In this study, we explore how weighting files according to their significance affects the performance of two prevailing BF estimators. We derive significance scores by computing five well-known graph metrics from the project’s dependency graph: PageRank, In-/Out-/All-Degree, and Betweenness Centralities. Furthermore, we introduce BFSig, a prototype of our approach. Finally, we present a new dataset comprising reported BF scores collected by surveying software practitioners from five prominent Github repositories. Our results indicate that BFSig outperforms the baselines by up to an 18% reduction in terms of Normalized Mean Absolute Error (NMAE). Moreover, BFSig yields 18% fewer False Negatives in identifying potential risks associated with low BF. Besides, our respondent confirmed BFSig versatility by showing its ability to assess the BF of the project’s subfolders. In conclusion, we believe to estimate BF from authorship, software components of higher importance should be assigned heavier weight. Currently, BFSig exclusively explores the topological characteristics of these components. Nevertheless, considering attributes such as code complexity and bug proneness could potentially enhance the performance of BFSig.

References

[1]

[n. d.]. Google Form. URL:. https://www.google.com/forms/about/

[2]

[n. d.]. IntelliJ Platform SDK. URL:. https://plugins.jetbrains.com/docs/intellij/welcome.html

[3]

2007. ISBSG repository release 10. URL:. https://www.isbsg.org/

[4]

2023. GitHub Editor tool. URL:. https://docs.github.com/en/codespaces/the-githubdev-web-based-editor

[5]

Guilherme Avelino, Eleni Constantinou, Marco Tulio Valente, and Alexander Serebrenik. 2019. On the abandonment and survival of open source projects: An empirical investigation. In 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1–12. https://doi.org/10.1109/ESEM.2019.8870181

[6]

Guilherme Avelino, Leonardo Passos, Andre Hora, and Marco Tulio Valente. 2016. A novel approach for estimating Truck Factors. In 2016 IEEE 24th International Conference on Program Comprehension (ICPC). 1–10. https://doi.org/10.1109/ICPC.2016.7503718

[7]

T. Chai and R. R. Draxler. 2014. Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature. Geoscientific Model Development, 7, 3 (2014), 1247–1250. https://doi.org/10.5194/gmd-7-1247-2014

[8]

Valerio Cosentino, Javier Luis Cánovas Izquierdo, and Jordi Cabot. 2015. Assessing the bus factor of Git repositories. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). 499–503. https://doi.org/10.1109/SANER.2015.7081864

[9]

B. Curtis, S.B. Sheppard, P. Milliman, M.A. Borst, and T. Love. 1979. Measuring the Psychological Complexity of Software Maintenance Tasks with the Halstead and McCabe Metrics. IEEE Transactions on Software Engineering, SE-5, 2 (1979), 96–104. https://doi.org/10.1109/TSE.1979.234165

Digital Library

[10]

Otávio Cury, Guilherme Avelino, Pedro Santos Neto, Ricardo Britto, and Marco Túlio Valente. 2022. Identifying Source Code File Experts. In Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM ’22). Association for Computing Machinery, New York, NY, USA. 125–136. isbn:9781450394277 https://doi.org/10.1145/3544902.3546243

Digital Library

[11]

Reinhard Diestel, Alexander Schrijver, and Paul Seymour. 2010. Graph theory. Oberwolfach Reports, 7, 1 (2010), 521–580. https://doi.org/10.4171/OWR/2005/03

[12]

Thomas Fritz, Gail C. Murphy, Emerson Murphy-Hill, Jingwen Ou, and Emily Hill. 2014. Degree-of-Knowledge: Modeling a Developer’s Knowledge of Code. ACM Trans. Softw. Eng. Methodol., 23, 2 (2014), Article 14, apr, 42 pages. issn:1049-331X https://doi.org/10.1145/2512207

Digital Library

[13]

Thomas Fritz, Jingwen Ou, Gail Murphy, and Emerson Murphy-Hill. 2010. A Degree-of-Knowledge Model to Capture Source Code Familiarity. Proceedings - International Conference on Software Engineering, 1, 385–394. https://doi.org/10.1145/1806799.1806856

Digital Library

[14]

Tudor Girba, Adrian Kuhn, Mauricio Seeberger, and Stephane Ducasse. 2005. How Developers Drive Software Evolution. Principles of Software Evolution, International Workshop on, 0 (2005), 113–122. issn:1550-4077 https://doi.org/10.1109/IWPSE.2005.21.

Digital Library

[15]

Qing Gu, ShiJie Xiong, and DaoXu Chen. 2014. Correlations between characteristics of maximum influence and degree distributions in software networks. Science China Information Sciences, 57, 7 (2014), 1–12. https://doi.org/10.1007/s11432-013-5047-7

[16]

Aric Hagberg, Pieter Swart, and Daniel S Chult. 2008. Exploring network structure, dynamics, and function using NetworkX. Los Alamos National Lab.(LANL), Los Alamos, NM (United States).

[17]

K. Inoue, R. Yokomori, H. Fujiwara, T. Yamamoto, M. Matsushita, and S. Kusumoto. 2003. Component rank: relative significance rank for software component search. In 25th International Conference on Software Engineering, 2003. Proceedings. 14–24. https://doi.org/10.1109/ICSE.2003.1201184

[18]

Elgun Jabrayilzade, Mikhail Evtikhiev, Eray Tüzün, and Vladimir Kovalenko. 2022. Bus Factor in Practice. In Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP ’22). Association for Computing Machinery, New York, NY, USA. 97–106. isbn:9781450392266 https://doi.org/10.1145/3510457.3513082

Digital Library

[19]

William M Mendenhall and Terry L Sincich. 2016. Statistics for Engineering and the Sciences. CRC Press. isbn:9781498728855

[20]

Mathieu Nassif and Martin P. Robillard. 2017. Revisiting Turnover-Induced Knowledge Loss in Software Projects. In 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). 261–272. https://doi.org/10.1109/ICSME.2017.64

[21]

Lawrence PAGE. 1998. The PageRank citation ranking : bringing order to the Web. Technical Report, https://cir.nii.ac.jp/crid/1571980075097177472

[22]

Fabio Palomba, Damian Andrew Tamburri, Francesca Arcelli Fontana, Rocco Oliveto, Andy Zaidman, and Alexander Serebrenik. 2021. Beyond Technical Aspects: How Do Community Smells Influence the Intensity of Code Smells? IEEE Transactions on Software Engineering, 47, 1 (2021), 108–129. https://doi.org/10.1109/TSE.2018.2883603

Digital Library

[23]

Fabio Palomba, Marco Zanoni, Francesca Arcelli Fontana, Andrea De Lucia, and Rocco Oliveto. 2019. Toward a Smell-Aware Bug Prediction Model. IEEE Transactions on Software Engineering, 45, 2 (2019), 194–218. https://doi.org/10.1109/TSE.2017.2770122

[24]

Weifeng Pan, Beibei Song, Kangshun Li, and Kejun Zhang. 2018. Identifying key classes in object-oriented software using generalized k-core decomposition. Future Generation Computer Systems, 81 (2018), 188–202. issn:0167-739X https://doi.org/10.1016/j.future.2017.10.006

Digital Library

[25]

Shruthi Puranik, Pranav Deshpande, and K. Chandrasekaran. 2016. A Novel Machine Learning Approach for Bug Prediction. Procedia Computer Science, 93 (2016), 924–930. issn:1877-0509 https://doi.org/10.1016/j.procs.2016.07.271 Proceedings of the 6th International Conference on Advances in Computing and Communications

[26]

Filippo Ricca, Alessandro Marchetto, and Marco Torchiano. 2011. On the Difficulty of Computing the Truck Factor. In Product-Focused Software Process Improvement, Danilo Caivano, Markku Oivo, Maria Teresa Baldassarre, and Giuseppe Visaggio (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 337–351. isbn:978-3-642-21843-9

[27]

Mark Richards. 2015. Software architecture patterns. 4, isbn:9781098134273

[28]

Peter C. Rigby, Yue Cai Zhu, Samuel M. Donadelli, and Audris Mockus. 2016. Quantifying and Mitigating Turnover-Induced Knowledge Loss: Case Studies of Chrome and a Project at Avaya. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). 1006–1016. https://doi.org/10.1145/2884781.2884851

Digital Library

[29]

Ioana Rus, Mikael Lindvall, and S Sinha. 2002. Knowledge management in software engineering. IEEE software, 19, 3 (2002), 26–38.

Digital Library

[30]

Adrian Schröter, Thomas Zimmermann, Rahul Premraj, and Andreas Zeller. 2006. If your bug database could talk. In Proceedings of the 5th international symposium on empirical software engineering. 2, 18–20.

[31]

Satish M. Srinivasan, Raghvinder S. Sangwan, and Colin J. Neill. 2017. On the measures for ranking software components. Innovations in Systems and Software Engineering, 13, 2 (2017), 01 Sep, 161–175. issn:1614-5054 https://doi.org/10.1007/s11334-017-0302-5

Digital Library

[32]

Sho Suzuki, Hirohisa Aman, Sousuke Amasaki, Tomoyuki Yokogawa, and Minoru Kawahara. 2017. An Application of the PageRank Algorithm to Commit Evaluation on Git Repository. In 2017 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA). 380–383. https://doi.org/10.1109/SEAA.2017.24

[33]

Liuhai Wang, Xin Du, Bo Jiang, Weifeng Pan, Hua Ming, and Dongsheng Liu. 2022. KEADA: Identifying Key Classes in Software Systems Using Dynamic Analysis and Entropy-Based Metrics. Entropy, 24, 5 (2022), issn:1099-4300 https://doi.org/10.3390/e24050652

[34]

Stanley Wasserman and Katherine Faust. 1994. Social Network Analysis: Methods and Applications. Cambridge University Press. https://doi.org/10.1017/CBO9780511815478

[35]

Laurie Williams and Robert R Kessler. 2003. Pair programming illuminated. Addison-Wesley Professional. isbn:9780201745764

[36]

Nico Zazworka, Kai Stapel, Eric Knauss, Forrest Shull, Victor R. Basili, and Kurt Schneider. 2010. Are Developers Complying with the Process: An XP Study. In Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM ’10). Association for Computing Machinery, New York, NY, USA. Article 14, 10 pages. isbn:9781450300391 https://doi.org/10.1145/1852786.1852805

Digital Library

[37]

Thomas Zimmermann, Nachiappan Nagappan, and Andreas Zeller. 2008. Predicting Bugs from History. Springer Berlin Heidelberg, Berlin, Heidelberg. 69–88. isbn:978-3-540-76440-3 https://doi.org/10.1007/978-3-540-76440-3_4

Cited By

Cury OAvelino GNeto PValente MBritto R(2024)Source code expert identificationInformation and Software Technology10.1016/j.infsof.2024.107445170:COnline publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1016/j.infsof.2024.107445

Index Terms

BFSig: Leveraging File Significance in Bus Factor Estimation
1. Software and its engineering
  1. Software creation and management
    1. Collaboration in software development
      1. Open source model
      2. Programming teams
    2. Software development process management
      1. Risk management

Recommendations

Bus factor in practice
ICSE-SEIP '22: Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice

Bus factor is a metric that identifies how resilient is the project to the sudden engineer turnover. It states the minimal number of engineers that have to be hit by a bus for a project to be stalled. Even though the metric is often discussed in the ...
Bus Factor Explorer
ASE '23: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering

Bus factor (BF) is a metric that tracks knowledge distribution in a project. It is the minimal number of engineers that have to leave for a project to stall. Despite the fact that there are several algorithms for calculating the bus factor, only a few ...
A Survey on Knowledge Management in Software Engineering
QRS-C '15: Proceedings of the 2015 IEEE International Conference on Software Quality, Reliability and Security - Companion

Software development is a knowledge intensive and collaborative activity. The success of the project totally depends on knowledge and experience of the developers. Increasing knowledge creation and sharing among software engineers are uphill tasks in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

November 2023

2215 pages

ISBN:9798400703270

DOI:10.1145/3611643

General Chair:
Satish Chandra
Google, USA
,
Program Chairs:
Kelly Blincoe
University of Auckland, New Zealand
,
Paolo Tonella
USI Lugano, Switzerland

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

The Scientific and Technological Research Council of Turkey (TUBITAK)

Conference

ESEC/FSE '23

Sponsor:

SIGSOFT

ESEC/FSE '23: 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

December 3 - 9, 2023

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
83
Total Downloads

Downloads (Last 12 months)46
Downloads (Last 6 weeks)5

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cury OAvelino GNeto PValente MBritto R(2024)Source code expert identificationInformation and Software Technology10.1016/j.infsof.2024.107445170:COnline publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1016/j.infsof.2024.107445

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten