Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3611643.3613877acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

BFSig: Leveraging File Significance in Bus Factor Estimation

Published: 30 November 2023 Publication History

Abstract

Software projects experience the departure of developers due to various reasons. As developers are one of the main sources of knowledge in software projects, their absence will inevitably result in a certain degree of knowledge depletion. Bus Factor (BF) is a metric to evaluate how this knowledge loss can affect the project’s continuity. Conventionally, BF is calculated as the smallest set of developers, removing over half the project knowledge upon departure. Current state-of-the-art approaches measure developers’ knowledge by the number of authored files, utilizing version control system (VCS) information. However, numerous studies have shown that files in software projects have different significance. In this study, we explore how weighting files according to their significance affects the performance of two prevailing BF estimators. We derive significance scores by computing five well-known graph metrics from the project’s dependency graph: PageRank, In-/Out-/All-Degree, and Betweenness Centralities. Furthermore, we introduce BFSig, a prototype of our approach. Finally, we present a new dataset comprising reported BF scores collected by surveying software practitioners from five prominent Github repositories. Our results indicate that BFSig outperforms the baselines by up to an 18% reduction in terms of Normalized Mean Absolute Error (NMAE). Moreover, BFSig yields 18% fewer False Negatives in identifying potential risks associated with low BF. Besides, our respondent confirmed BFSig versatility by showing its ability to assess the BF of the project’s subfolders. In conclusion, we believe to estimate BF from authorship, software components of higher importance should be assigned heavier weight. Currently, BFSig exclusively explores the topological characteristics of these components. Nevertheless, considering attributes such as code complexity and bug proneness could potentially enhance the performance of BFSig.

References

[1]
[n. d.]. Google Form. URL:. https://www.google.com/forms/about/
[2]
[n. d.]. IntelliJ Platform SDK. URL:. https://plugins.jetbrains.com/docs/intellij/welcome.html
[3]
2007. ISBSG repository release 10. URL:. https://www.isbsg.org/
[4]
2023. GitHub Editor tool. URL:. https://docs.github.com/en/codespaces/the-githubdev-web-based-editor
[5]
Guilherme Avelino, Eleni Constantinou, Marco Tulio Valente, and Alexander Serebrenik. 2019. On the abandonment and survival of open source projects: An empirical investigation. In 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1–12. https://doi.org/10.1109/ESEM.2019.8870181
[6]
Guilherme Avelino, Leonardo Passos, Andre Hora, and Marco Tulio Valente. 2016. A novel approach for estimating Truck Factors. In 2016 IEEE 24th International Conference on Program Comprehension (ICPC). 1–10. https://doi.org/10.1109/ICPC.2016.7503718
[7]
T. Chai and R. R. Draxler. 2014. Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature. Geoscientific Model Development, 7, 3 (2014), 1247–1250. https://doi.org/10.5194/gmd-7-1247-2014
[8]
Valerio Cosentino, Javier Luis Cánovas Izquierdo, and Jordi Cabot. 2015. Assessing the bus factor of Git repositories. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). 499–503. https://doi.org/10.1109/SANER.2015.7081864
[9]
B. Curtis, S.B. Sheppard, P. Milliman, M.A. Borst, and T. Love. 1979. Measuring the Psychological Complexity of Software Maintenance Tasks with the Halstead and McCabe Metrics. IEEE Transactions on Software Engineering, SE-5, 2 (1979), 96–104. https://doi.org/10.1109/TSE.1979.234165
[10]
Otávio Cury, Guilherme Avelino, Pedro Santos Neto, Ricardo Britto, and Marco Túlio Valente. 2022. Identifying Source Code File Experts. In Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM ’22). Association for Computing Machinery, New York, NY, USA. 125–136. isbn:9781450394277 https://doi.org/10.1145/3544902.3546243
[11]
Reinhard Diestel, Alexander Schrijver, and Paul Seymour. 2010. Graph theory. Oberwolfach Reports, 7, 1 (2010), 521–580. https://doi.org/10.4171/OWR/2005/03
[12]
Thomas Fritz, Gail C. Murphy, Emerson Murphy-Hill, Jingwen Ou, and Emily Hill. 2014. Degree-of-Knowledge: Modeling a Developer’s Knowledge of Code. ACM Trans. Softw. Eng. Methodol., 23, 2 (2014), Article 14, apr, 42 pages. issn:1049-331X https://doi.org/10.1145/2512207
[13]
Thomas Fritz, Jingwen Ou, Gail Murphy, and Emerson Murphy-Hill. 2010. A Degree-of-Knowledge Model to Capture Source Code Familiarity. Proceedings - International Conference on Software Engineering, 1, 385–394. https://doi.org/10.1145/1806799.1806856
[14]
Tudor Girba, Adrian Kuhn, Mauricio Seeberger, and Stephane Ducasse. 2005. How Developers Drive Software Evolution. Principles of Software Evolution, International Workshop on, 0 (2005), 113–122. issn:1550-4077 https://doi.org/10.1109/IWPSE.2005.21.
[15]
Qing Gu, ShiJie Xiong, and DaoXu Chen. 2014. Correlations between characteristics of maximum influence and degree distributions in software networks. Science China Information Sciences, 57, 7 (2014), 1–12. https://doi.org/10.1007/s11432-013-5047-7
[16]
Aric Hagberg, Pieter Swart, and Daniel S Chult. 2008. Exploring network structure, dynamics, and function using NetworkX. Los Alamos National Lab.(LANL), Los Alamos, NM (United States).
[17]
K. Inoue, R. Yokomori, H. Fujiwara, T. Yamamoto, M. Matsushita, and S. Kusumoto. 2003. Component rank: relative significance rank for software component search. In 25th International Conference on Software Engineering, 2003. Proceedings. 14–24. https://doi.org/10.1109/ICSE.2003.1201184
[18]
Elgun Jabrayilzade, Mikhail Evtikhiev, Eray Tüzün, and Vladimir Kovalenko. 2022. Bus Factor in Practice. In Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP ’22). Association for Computing Machinery, New York, NY, USA. 97–106. isbn:9781450392266 https://doi.org/10.1145/3510457.3513082
[19]
William M Mendenhall and Terry L Sincich. 2016. Statistics for Engineering and the Sciences. CRC Press. isbn:9781498728855
[20]
Mathieu Nassif and Martin P. Robillard. 2017. Revisiting Turnover-Induced Knowledge Loss in Software Projects. In 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). 261–272. https://doi.org/10.1109/ICSME.2017.64
[21]
Lawrence PAGE. 1998. The PageRank citation ranking : bringing order to the Web. Technical Report, https://cir.nii.ac.jp/crid/1571980075097177472
[22]
Fabio Palomba, Damian Andrew Tamburri, Francesca Arcelli Fontana, Rocco Oliveto, Andy Zaidman, and Alexander Serebrenik. 2021. Beyond Technical Aspects: How Do Community Smells Influence the Intensity of Code Smells? IEEE Transactions on Software Engineering, 47, 1 (2021), 108–129. https://doi.org/10.1109/TSE.2018.2883603
[23]
Fabio Palomba, Marco Zanoni, Francesca Arcelli Fontana, Andrea De Lucia, and Rocco Oliveto. 2019. Toward a Smell-Aware Bug Prediction Model. IEEE Transactions on Software Engineering, 45, 2 (2019), 194–218. https://doi.org/10.1109/TSE.2017.2770122
[24]
Weifeng Pan, Beibei Song, Kangshun Li, and Kejun Zhang. 2018. Identifying key classes in object-oriented software using generalized k-core decomposition. Future Generation Computer Systems, 81 (2018), 188–202. issn:0167-739X https://doi.org/10.1016/j.future.2017.10.006
[25]
Shruthi Puranik, Pranav Deshpande, and K. Chandrasekaran. 2016. A Novel Machine Learning Approach for Bug Prediction. Procedia Computer Science, 93 (2016), 924–930. issn:1877-0509 https://doi.org/10.1016/j.procs.2016.07.271 Proceedings of the 6th International Conference on Advances in Computing and Communications
[26]
Filippo Ricca, Alessandro Marchetto, and Marco Torchiano. 2011. On the Difficulty of Computing the Truck Factor. In Product-Focused Software Process Improvement, Danilo Caivano, Markku Oivo, Maria Teresa Baldassarre, and Giuseppe Visaggio (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 337–351. isbn:978-3-642-21843-9
[27]
Mark Richards. 2015. Software architecture patterns. 4, isbn:9781098134273
[28]
Peter C. Rigby, Yue Cai Zhu, Samuel M. Donadelli, and Audris Mockus. 2016. Quantifying and Mitigating Turnover-Induced Knowledge Loss: Case Studies of Chrome and a Project at Avaya. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). 1006–1016. https://doi.org/10.1145/2884781.2884851
[29]
Ioana Rus, Mikael Lindvall, and S Sinha. 2002. Knowledge management in software engineering. IEEE software, 19, 3 (2002), 26–38.
[30]
Adrian Schröter, Thomas Zimmermann, Rahul Premraj, and Andreas Zeller. 2006. If your bug database could talk. In Proceedings of the 5th international symposium on empirical software engineering. 2, 18–20.
[31]
Satish M. Srinivasan, Raghvinder S. Sangwan, and Colin J. Neill. 2017. On the measures for ranking software components. Innovations in Systems and Software Engineering, 13, 2 (2017), 01 Sep, 161–175. issn:1614-5054 https://doi.org/10.1007/s11334-017-0302-5
[32]
Sho Suzuki, Hirohisa Aman, Sousuke Amasaki, Tomoyuki Yokogawa, and Minoru Kawahara. 2017. An Application of the PageRank Algorithm to Commit Evaluation on Git Repository. In 2017 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA). 380–383. https://doi.org/10.1109/SEAA.2017.24
[33]
Liuhai Wang, Xin Du, Bo Jiang, Weifeng Pan, Hua Ming, and Dongsheng Liu. 2022. KEADA: Identifying Key Classes in Software Systems Using Dynamic Analysis and Entropy-Based Metrics. Entropy, 24, 5 (2022), issn:1099-4300 https://doi.org/10.3390/e24050652
[34]
Stanley Wasserman and Katherine Faust. 1994. Social Network Analysis: Methods and Applications. Cambridge University Press. https://doi.org/10.1017/CBO9780511815478
[35]
Laurie Williams and Robert R Kessler. 2003. Pair programming illuminated. Addison-Wesley Professional. isbn:9780201745764
[36]
Nico Zazworka, Kai Stapel, Eric Knauss, Forrest Shull, Victor R. Basili, and Kurt Schneider. 2010. Are Developers Complying with the Process: An XP Study. In Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM ’10). Association for Computing Machinery, New York, NY, USA. Article 14, 10 pages. isbn:9781450300391 https://doi.org/10.1145/1852786.1852805
[37]
Thomas Zimmermann, Nachiappan Nagappan, and Andreas Zeller. 2008. Predicting Bugs from History. Springer Berlin Heidelberg, Berlin, Heidelberg. 69–88. isbn:978-3-540-76440-3 https://doi.org/10.1007/978-3-540-76440-3_4

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
November 2023
2215 pages
ISBN:9798400703270
DOI:10.1145/3611643
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bus factor
  2. dataset
  3. file significance
  4. intelligent collaboration tools
  5. knowledge management
  6. truck factor

Qualifiers

  • Research-article

Funding Sources

  • The Scientific and Technological Research Council of Turkey (TUBITAK)

Conference

ESEC/FSE '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)46
  • Downloads (Last 6 weeks)5
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media