Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3180155.3180224acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Automated localization for unreproducible builds

Published: 27 May 2018 Publication History
  • Get Citation Alerts
  • Abstract

    Reproducibility is the ability of recreating identical binaries under pre-defined build environments. Due to the need of quality assurance and the benefit of better detecting attacks against build environments, the practice of reproducible builds has gained popularity in many open-source software repositories such as Debian and Bitcoin. However, identifying the unreproducible issues remains a labour intensive and time consuming challenge, because of the lacking of information to guide the search and the diversity of the causes that may lead to the unreproducible binaries.
    In this paper we propose an automated framework called RepLoc to localize the problematic files for unreproducible builds. RepLoc features a query augmentation component that utilizes the information extracted from the build logs, and a heuristic rule-based filtering component that narrows the search scope. By integrating the two components with a weighted file ranking module, RepLoc is able to automatically produce a ranked list of files that are helpful in locating the problematic files for the unreproducible builds. We have implemented a prototype and conducted extensive experiments over 671 real-world unreproducible Debian packages in four different categories. By considering the topmost ranked file only, RepLoc achieves an accuracy rate of 47.09%. If we expand our examination to the top ten ranked files in the list produced by RepLoc, the accuracy rate becomes 79.28%. Considering that there are hundreds of source code, scripts, Makefiles, etc., in a package, RepLoc significantly reduces the scope of localizing problematic files. Moreover, with the help of RepLoc, we successfully identified and fixed six new unreproducible packages from Debian and Guix.

    References

    [1]
    2015. Protecting our customers from XcodeGhost. https://www.fireeye.com/blog/executive-perspective/2015/09/protecting_our_custo.html. (September 2015).
    [2]
    2017. Debian bug report logs - #773916: libical. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=773916. (August 2017).
    [3]
    2017. Debian bug report logs - #854293: manpages-tr. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=854293. (August 2017).
    [4]
    2017. Debian bug report logs - #854294: regina-rexx. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=854294. (August 2017).
    [5]
    2017. Debian bug report logs - #854362: fonts-uralic. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=854362. (August 2017).
    [6]
    2017. Debian packaging/source package. https://wiki.debian.org/Packaging/SourcePackage. (February 2017).
    [7]
    2017. Fixing a toolchain package. https://reproducible-builds.org/contribute/. (January 2017).
    [8]
    2017. GNU bug report logs - #28015: djvulibre. https://debbugs.gnu.org/cgi/bugreport.cgi?bug=28015. (August 2017).
    [9]
    2017. GNU bug report logs - #28016: libjpeg-turbo. https://debbugs.gnu.org/cgi/bugreport.cgi?bug=28016. (August 2017).
    [10]
    2017. GNU bug report logs - #28017: skalibs. https://debbugs.gnu.org/cgi/bugreport.cgi?bug=28017. (August 2017).
    [11]
    2017. The Guix System Distribution. https://www.gnu.org/software/guix/. (August 2017).
    [12]
    2017. Known issues related to reproducible builds. https://tests.reproducible-builds.org/index_issues.html. (July 2017).
    [13]
    2017. Notes on build reproducibility of Debian packages. https://anonscm.debian.org/git/reproducible/notes.git. (August 2017).
    [14]
    2017. Overview of reproducible builds for packages in unstable for amd64. https://tests.reproducible-builds.org/debian/unstable/index_suite_amd64_stats.html. (August 2017).
    [15]
    2017. Reproducible builds. https://reproducible-builds.org/. (August 2017).
    [16]
    2017. Reproducible builds bugs filed. https://tests.reproducible-builds.org/debian/index_bugs.html. (August 2017).
    [17]
    2017. Reproducible Builds Experimental Toolchain. https://wiki.debian.org/ReproducibleBuilds/ExperimentalToolchain. (February 2017).
    [18]
    2017. Reproducible builds: week 54 in Stretch cycle. https://reproducible.alioth.debian.org/blog/posts/54/. (October 2017).
    [19]
    2017. Reproducible builds: who's involved. https://reproducible-builds.org/who/. (August 2017).
    [20]
    2017. Timestamps In PE Binaries. https://wiki.debian.org/ReproducibleBuilds/TimestampsInPEBinaries. (August 2017).
    [21]
    2017. Validating Your Version of Xcode. https://electricnews.fr/validating-your-version-of-xcode/. (August 2017).
    [22]
    2017. Variations introduced when testing Debian packages. https://tests.reproducible-builds.org/debian/index_variations.html. (August 2017).
    [23]
    Ludovic Courtès. 2015. Reproducible builds: a means to an end. https://www.gnu.org/software/guix/news/reproducible-builds-a-means-to-an-end.html. (November 2015).
    [24]
    Steven Davies, Marc Roper, and Murray Wood. 2012. Using bug report similarity to enhance bug localisation. In Reverse Engineering (WCRE), 2012 19th Working Conference on. IEEE, 125--134.
    [25]
    Xavier de Carné de Carnavalet and Mohammad Mannan. 2014. Challenges and Implications of Verifiable Builds for Security-critical Open-source Software. In Proceedings of the 30th Annual Computer Security Applications Conference (ACSAC '14). ACM, New York, NY, USA, 16--25.
    [26]
    Andrea Höller, Nermin Kajtazovic, Tobias Rauter, Kay Römer, and Christian Kreiner. 2015. Evaluation of diverse compiling for software-fault detection. In Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition. EDA Consortium, 531--536.
    [27]
    Nima Honarmand and Josep Torrellas. 2014. Replay Debugging: Leveraging Record and Replay for Program Debugging. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA '14). IEEE Press, Piscataway, NJ, USA, 445--456. http://dl.acm.org/citation.cfm?id=2665671.2665737
    [28]
    Pavneet Singh Kochhar, Yuan Tian, and David Lo. 2014. Potential biases in bug localization: Do they matter?. In Proceedings of the 29th ACM/IEEE international conference on Automated software engineering. ACM, 803--814.
    [29]
    An Ngoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N Nguyen. 2015. Combining Deep Learning with Information Retrieval to Localize Buggy Files for Bug Reports (N). In Automated Software Engineering (ASE), 2015 30th IEEE/ACM International Conference on. IEEE, 476--481.
    [30]
    Hang Li. 2014. Learning to rank for information retrieval and natural language processing. Synthesis Lectures on Human Language Technologies 7, 3 (2014), 1--121.
    [31]
    Stacy K Lukins, Nicholas A Kraft, and Letha H Etzkorn. 2010. Bug localization using latent Dirichlet allocation. Information and Software Technology 52, 9 (2010), 972--990.
    [32]
    Christopher D Manning, Prabhakar Raghavan, Hinrich Schütze, and others. 2008. Introduction to information retrieval. Vol. 1. Cambridge university press Cambridge.
    [33]
    Kevin Moran, Mario Linares Vásquez, Carlos Bernal-Cárdenas, Christopher Vendome, and Denys Poshyvanyk. 2016. Automatically Discovering, Reporting and Reproducing Android Application Crashes. In 2016 IEEE International Conference on Software Testing, Verification and Validation, ICST 2016, Chicago, IL, USA, April 11--15, 2016. 33--44.
    [34]
    Robert O'Callahan, Chris Jones, Nathan Froyd, Kyle Huey, Albert Noll, and Nimrod Partush. 2017. Engineering Record and Replay for Deployability. In Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '17). USENIX Association, Berkeley, CA, USA, 377--389. http://dl.acm.org/citation.cfm?id=3154690.3154727
    [35]
    Shivani Rao, Henry Medeiros, and Avinash Kak. 2015. Comparing Incremental Latent Semantic Analysis Algorithms for Efficient Retrieval from Software Libraries for Bug Localization. ACM SIGSOFT Software Engineering Notes 40, 1 (2015), 1--8.
    [36]
    Cristian Ruiz, Salem Harrache, Michael Mercier, and Olivier Richard. 2015. Reconstructable Software Appliances with Kameleon. SIGOPS Oper. Syst. Rev. 49, 1 (Jan. 2015), 80--89.
    [37]
    Davide Di Ruscio and Patrizio Pelliccione. 2014. Simulating upgrades of complex systems: The case of Free and Open Source Software. Information and Software Technology 56, 4 (2014), 438 -- 462.
    [38]
    Ripon K Saha, Matthew Lease, Sarfraz Khurshid, and Dewayne E Perry. 2013. Improving bug localization using structured information retrieval. In Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference on. IEEE, 345--355.
    [39]
    Bunyamin Sisman and Avinash C Kak. 2012. Incorporating version histories in information retrieval based bug localization. In Proceedings of the 9th IEEE Working Conference on Mining Software Repositories. IEEE Press, 50--59.
    [40]
    Chakkrit Tantithamthavorn, Akinori Ihara, and Ken-ichi Matsumoto. 2013. Using co-change histories to improve bug localization performance. In Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2013 14th ACIS International Conference on. IEEE, 543--548.
    [41]
    Ken Thompson. 1984. Reflections on trusting trust. Commun. ACM 27, 8 (1984), 761--763.
    [42]
    Qianqian Wang, Chris Parnin, and Alessandro Orso. 2015. Evaluating the Usefulness of IR-based Fault Localization Techniques. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA 2015). ACM, New York, NY, USA, 1--11.
    [43]
    Shaowei Wang and David Lo. 2014. Version history, similar report, and structure: Putting them together for improved bug localization. In Proceedings of the 22nd International Conference on Program Comprehension. ACM, 53--63.
    [44]
    Shaowei Wang, David Lo, and Julia Lawall. 2014. Compositional vector space models for improved bug localization. In 2014 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 171--180.
    [45]
    David A Wheeler. 2005. Countering trusting trust through diverse double-compiling. In Computer Security Applications Conference, 21st Annual. IEEE, 13--pp.
    [46]
    Chu-Pan Wong, Yingfei Xiong, Hongyu Zhang, Dan Hao, Lu Zhang, and Hong Mei. 2014. Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In 2014 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 181--190.
    [47]
    Jifeng Xuan, Xiaoyuan Xie, and Martin Monperrus. 2015. Crash reproduction via test case mutation: let existing test cases help. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, Bergamo, Italy, August 30 - September 4, 2015. 910--913.
    [48]
    Xin Ye, Razvan Bunescu, and Chang Liu. 2014. Learning to rank relevant files for bug reports using domain knowledge. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 689--699.
    [49]
    Jian Zhou, Hongyu Zhang, and David Lo. 2012. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In Software Engineering (ICSE), 2012 34th International Conference on. IEEE, 14--24.

    Cited By

    View all
    • (2024)Options Matter: Documenting and Fixing Non-Reproducible Builds in Highly-Configurable SystemsProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644913(654-664)Online publication date: 15-Apr-2024
    • (2024)AROMA: Automatic Reproduction of Maven ArtifactsProceedings of the ACM on Software Engineering10.1145/36437641:FSE(836-858)Online publication date: 12-Jul-2024
    • (2024)When debugging encounters artificial intelligence: state of the art and open challengesScience China Information Sciences10.1007/s11432-022-3803-967:4Online publication date: 21-Feb-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICSE '18: Proceedings of the 40th International Conference on Software Engineering
    May 2018
    1307 pages
    ISBN:9781450356381
    DOI:10.1145/3180155
    • Conference Chair:
    • Michel Chaudron,
    • General Chair:
    • Ivica Crnkovic,
    • Program Chairs:
    • Marsha Chechik,
    • Mark Harman
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 May 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    • Distinguished Paper

    Author Tags

    1. localization
    2. software maintenance
    3. unreproducible build

    Qualifiers

    • Research-article

    Funding Sources

    • Fundamental Research Funds for the Central Universities
    • National Natural Science Foundation of China

    Conference

    ICSE '18
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 276 of 1,856 submissions, 15%

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)51
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Options Matter: Documenting and Fixing Non-Reproducible Builds in Highly-Configurable SystemsProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644913(654-664)Online publication date: 15-Apr-2024
    • (2024)AROMA: Automatic Reproduction of Maven ArtifactsProceedings of the ACM on Software Engineering10.1145/36437641:FSE(836-858)Online publication date: 12-Jul-2024
    • (2024)When debugging encounters artificial intelligence: state of the art and open challengesScience China Information Sciences10.1007/s11432-022-3803-967:4Online publication date: 21-Feb-2024
    • (2023)UniLoc: Unified Fault Localization of Continuous Integration FailuresACM Transactions on Software Engineering and Methodology10.1145/359379932:6(1-31)Online publication date: 28-Sep-2023
    • (2023)It’s like flossing your teeth: On the Importance and Challenges of Reproducible Builds for Software Supply Chain Security2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179320(1527-1544)Online publication date: May-2023
    • (2023)Evaluating the Impact of Experimental Assumptions in Automated Fault LocalizationProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00025(159-171)Online publication date: 14-May-2023
    • (2022)Accelerating Build Dependency Error Detection via Virtual BuildProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3556930(1-12)Online publication date: 10-Oct-2022
    • (2022)Towards build verifiability for Java-based systemsProceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice10.1145/3510457.3513050(297-306)Online publication date: 21-May-2022
    • (2022)Automated patching for unreproducible buildsProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510102(200-211)Online publication date: 21-May-2022
    • (2022)Automating the Quantitative Analysis of Reproducibility for Build Artifacts derived from the Android Open Source ProjectProceedings of the 15th ACM Conference on Security and Privacy in Wireless and Mobile Networks10.1145/3507657.3528537(6-19)Online publication date: 16-May-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media