Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3643991.3644913acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Options Matter: Documenting and Fixing Non-Reproducible Builds in Highly-Configurable Systems

Published: 02 July 2024 Publication History

Abstract

A critical aspect of software development, build reproducibility, ensures the dependability, security, and maintainability of software systems. Although several factors, including the build environment, have been investigated in the context of non-reproducible builds, to the best of our knowledge the precise influence of configuration options in configurable systems has not been thoroughly investigated. This paper aims at filling this gap.
This paper thus proposes an approach to automatically identify configuration options causing non-reproducibility of builds. It begins by building a set of builds in order to detect non-reproducible ones through binary comparison. We then develop automated techniques that combine statistical learning with symbolic reasoning to analyze over 20,000 configuration options. Our methods are designed to both detect options causing non-reproducibility, and remedy non-reproducible configurations, two tasks that are challenging and costly to perform manually.
We evaluate our approach on three case studies, namely Toybox, Busybox, and Linux, analyzing more than 2,000 configurations for each of them. Toybox and Busybox come exempt from non-reproducibility. In contrast, 47% of Linux configurations lead to non-reproducible builds. The approach we propose in this paper is capable of identifying 10 configuration options that caused this non-reproducibility. When confronted to the Linux documentation, none of these are documented as non-reproducible. Thus, our identified non-reproducible configuration options are novel knowledge and constitute a direct, actionable information improvement for the Linux community. Finally, we demonstrate that our methodology effectively identifies a set of undesirable option values, enabling the enhancement and expansion of the Linux kernel documentation while automatically rectifying 96% of encountered non-reproducible builds.

References

[1]
A fast, scalable, multi-language and extensible build system.
[2]
I. Abal, J. Melo, S. Stanciulescu, C. Brabrand, M. Ribeiro, and A. Wasowski, Variability bugs in highly configurable systems: A qualitative analysis, ACM Trans. Softw. Eng. Methodol., 26 (2018), pp. 10:1--10:34.
[3]
R. Bajaj, E. Fernandes, B. Adams, and A. Hassan, Unreproducible builds: Time to fix, causes, and correlation with external ecosystem factors, Empirical Software Engineering, 29 (2024).
[4]
C.-P. Bezemer, S. McIntosh, B. Adams, D. M. German, and A. E. Hassan, An Empirical Study of Unspecified Dependencies in Make-Based Build Systems, Empirical Software Engineering, 22 (2017), p. 3117--3148.
[5]
Q. Cao, R. Wen, and S. McIntosh, Forecasting the duration of incremental build jobs, in 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), IEEE, 2017, pp. 524--528.
[6]
Z. Chen, P. Chen, P. Wang, G. Yu, Z. He, and G. Mai, Diagconfig: Configuration diagnosis of performance violations in configurable software systems, in Proceedings of the 2023 31st ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023.
[7]
E. Dumlu, C. Yilmaz, M. B. Cohen, and A. Porter, Feedback driven adaptive combinatorial testing, in Proceedings of the 2011 International Symposium on Software Testing and Analysis, ISSTA '11, New York, NY, USA, 2011, ACM, pp. 243--253.
[8]
S. Erdweg, M. Lichter, and M. Weiel, A sound and optimal incremental build system with dynamic dependencies, ACM Sigplan Notices, 50 (2015), pp. 89--106.
[9]
S. I. Feldman, Make --- a program for maintaining computer programs, Software: Practice and Experience, 9 (1979), pp. 255--265.
[10]
P. Franz, T. Berger, I. Fayaz, S. Nadi, and E. Groshev, Configfix: Interactive configuration conflict resolution for the linux kernel, in 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), 2021, pp. 91--100.
[11]
P. Gazzillo, U. Koc, T. Nguyen, and S. Wei, Localizing configurations in highly-configurable systems, in Proceeedings of the 22nd International Systems and Software Product Line Conference - Volume 1, SPLC 2018, Gothenburg, Sweden, September 10-14, 2018, 2018, pp. 269--273.
[12]
A. Halin, A. Nuttinck, M. Acher, X. Devroey, G. Perrouin, and B. Baudry, Test them all, is it worth it? assessing configuration sampling on the jhipster web development stack, Empirical Software Engineering, (2018).
[13]
A. Halin, A. Nuttinck, M. Acher, X. Devroey, G. Perrouin, and P. Heymans, Yo variability! JHipster: A playground for web-apps analyses, in Proceedings of the Eleventh International Workshop on Variability Modelling of Software-intensive Systems, VAMOS '17, New York, NY, USA, 2017, ACM, pp. 44--51.
[14]
M. A. Hammer, J. Dunfield, K. Headley, N. Labich, J. S. Foster, M. Hicks, and D. Van Horn, Incremental computation with names, Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, (2015).
[15]
C. Henard, M. Papadakis, G. Perrouin, J. Klein, P. Heymans, and Y. Le Traon, Bypassing the Combinatorial Explosion: Using Similarity to Generate and Prioritize T-Wise Test Configurations for Software Product Lines, IEEE Transactions on Software Engineering, 40 (2014), pp. 650--670.
[16]
D. Jin, X. Qu, M. B. Cohen, and B. Robinson, Configurations everywhere: Implications for testing and debugging in practice, in Companion Proceedings of the 36th International Conference on Software Engineering, ICSE Companion 2014, New York, NY, USA, 2014, ACM, pp. 215--224.
[17]
G. Konat, S. Erdweg, and E. Visser, Scalable incremental building with dynamic task dependencies, in 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, 2018, pp. 76--86.
[18]
C. Lamb and S. Zacchiroli, Reproducible builds: Increasing the integrity of software supply chains, IEEE Software, 39 (2022), pp. 62--70.
[19]
C. Macho, S. McIntosh, and M. Pinzger, Extracting Build Changes with BuildDiff, in Proc. of the International Conference on Mining Software Repositories (MSR), 2017, p. 368--378.
[20]
H. Martin, M. Acher, J. A. Pereira, L. Lesoil, J. Jézéqel, and D. E. Khelladi, Transfer learning across variants and versions: The case of linux kernel size, IEEE Trans. Software Eng., 48 (2022), pp. 4274--4290.
[21]
G. Maudoux and K. Mens, Correct, efficient, and tailored: The future of build systems, IEEE Software, 35 (2018), pp. 32--37.
[22]
F. Medeiros, C. Kästner, M. Ribeiro, R. Gheyi, and S. Apel, A comparison of 10 sampling algorithms for configurable systems, in Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016, 2016, pp. 643--654.
[23]
J. Melo, E. Flesborg, C. Brabrand, and A. Wasowski, A quantitative analysis of variability warnings in linux, in Proceedings of the Tenth International Workshop on Variability Modelling of Software-intensive Systems, VaMoS '16, ACM, 2016, pp. 3--8.
[24]
N. Mitchell, Shake before building, ACM SIGPLAN Notices, 47 (2012), p. 55.
[25]
S. Mujahid, R. Abdalkareem, E. Shihab, and S. McIntosh, Using Others' Tests to Identify Breaking Updates, in Proc. of the International Conference on Mining Software Repositories (MSR), 2020, p. 466--476.
[26]
X. Niu, n. changhai, H. K. N. Leung, Y. Lei, X. Wang, J. Xu, and Y. Wang, An interleaving approach to combinatorial testing and failure-inducing interaction identification, IEEE Transactions on Software Engineering, (2018), pp. 1--1.
[27]
D. Olewicki, M. Nayrolles, and B. Adams, Towards language-independent brown build detection, in Proceedings of the 44th International Conference on Software Engineering, ICSE '22, New York, NY, USA, 2022, Association for Computing Machinery, p. 2177--2188.
[28]
G. A. Randrianaina, D. E. Khelladi, O. Zendra, and M. Acher, Towards Incremental Build of Software Configurations, in ICSE-NIER 2022 - 44th International Conference on Software Engineering - New Ideas and Emerging Results, Pittsburgh, PA, United States, May 2022, pp. 1--5.
[29]
G. A. Randrianaina, X. Tërnava, D. E. Khelladi, and M. Acher, On the Benefits and Limits of Incremental Build of Software Configurations: An Exploratory Study, in ICSE 2022 - 44th International Conference on Software Engineering, Pittsburgh, Pennsylvania / Virtual, United States, May 2022, pp. 1--12.
[30]
Z. Ren, H. Jiang, J. Xuan, and Z. Yang, Automated localization for unreproducible builds, in Proceedings of the 40th International Conference on Software Engineering, ICSE '18, New York, NY, USA, 2018, Association for Computing Machinery, p. 71--81.
[31]
Z. Ren, C. Liu, X. Xiao, H. Jiang, and T. Xie, Root cause localization for un-reproducible builds via causality analysis over system call tracing, in 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2019, pp. 527--538.
[32]
Z. Ren, S. Sun, J. Xuan, X. Li, Z. Zhou, and H. Jiang, Automated patching for unreproducible builds, in Proceedings of the 44th International Conference on Software Engineering, ICSE '22, New York, NY, USA, 2022, Association for Computing Machinery, p. 200--211.
[33]
M. Sayagh, N. Kerzazi, and B. Adams, On cross-stack configuration errors, in Proceedings of the 39th International Conference on Software Engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017, 2017, pp. 255--265.
[34]
R. W. Schwanke and G. E. Kaiser, Smarter recompilation, ACM Trans. Program. Lang. Syst., 10 (1988), p. 627--632.
[35]
Y. Shi, M. Wen, F. R. Cogo, B. Chen, and Z. M. Jiang, An experience report on producing verifiable builds for large-scale commercial systems, IEEE Transactions on Software Engineering, 48 (2022), pp. 3361--3377.
[36]
C. Song, A. Porter, and J. S. Foster, itree: Efficiently discovering high-coverage configurations using interaction trees, IEEE Transactions on Software Engineering, 40 (2014), pp. 251--265.
[37]
T. Thüm, S. Apel, C. Kästner, I. Schaefer, and G. Saake, A classification and survey of analysis strategies for software product lines, ACM Computing Surveys, (2014).
[38]
W. F. Tichy, Smart recompilation, ACM Trans. Program. Lang. Syst., 8 (1986), p. 273--291.
[39]
C. Yilmaz, M. B. Cohen, and A. A. Porter, Covering arrays for efficient fault characterization in complex configuration spaces, IEEE Transactions on Software Engineering, 32 (2006), pp. 20--34.
[40]
S. Zhang and M. D. Ernst, Automated diagnosis of software configuration errors, in 2013 35th International Conference on Software Engineering (ICSE), May 2013, pp. 312--321.

Cited By

View all
  • (2024)Debugging Unreproducible Builds using eBPF2024 3rd International Conference on Applied Artificial Intelligence and Computing (ICAAIC)10.1109/ICAAIC60222.2024.10575127(1781-1787)Online publication date: 5-Jun-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories
April 2024
788 pages
ISBN:9798400705878
DOI:10.1145/3643991
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2024

Check for updates

Author Tags

  1. reproducible build
  2. build system
  3. highly-configurable system

Qualifiers

  • Research-article

Conference

MSR '24
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)2
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Debugging Unreproducible Builds using eBPF2024 3rd International Conference on Applied Artificial Intelligence and Computing (ICAAIC)10.1109/ICAAIC60222.2024.10575127(1781-1787)Online publication date: 5-Jun-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media