Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1508244.1508254acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Capo: a software-hardware interface for practical deterministic multiprocessor replay

Published: 07 March 2009 Publication History
  • Get Citation Alerts
  • Abstract

    While deterministic replay of parallel programs is a powerful technique, current proposals have shortcomings. Specifically, software-based replay systems have high overheads on multiprocessors, while hardware-based proposals focus only on basic hardware-level mechanisms, ignoring the overall replay system. To be practical, hardware-based replay systems need to support an environment with multiple parallel jobs running concurrently -- some being recorded, others being replayed and even others running without recording or replay. Moreover, they need to manage limited-size log buffers.
    This paper addresses these shortcomings by introducing, for the first time, a set of abstractions and a software-hardware interface for practical hardware-assisted replay of multiprocessor systems. The approach, called Capo, introduces the novel abstraction of the Replay Sphere to separate the responsibilities of the hardware and software components of the replay system. In this paper, we also design and build CapoOne, a prototype of a deterministic multiprocessor replay system that implements Capo using Linux and simulated DeLorean hardware. Our evaluation of 4-processor executions shows that CapoOne largely records with the efficiency of hardware-based schemes and the flexibility of software-based schemes.

    References

    [1]
    H. Agrawal, R. A. DeMillo, and E. H. Spafford, "An Execution-Backtracking Approach to Debugging," IEEE Software, vol. 8, May 1991.
    [2]
    D. F. Bacon and S. C. Goldstein, "Hardware-Assisted Replay of Multiprocessor Programs," in Workshop on Parallel and Distributed Debugging, August 1991.
    [3]
    B. Boothe, "Efficient Algorithms for Bidirectional Debugging," in Conference on Programming Language Design and Implementation, June 2000.
    [4]
    T. C. Bressoud and F. B. Schneider, "Hypervisor-Based Fault-Tolerance," in Symposium on Operating Systems Principles, December 1995.
    [5]
    L. Ceze, J. M. Tuck, P. Montesinos, and J. Torrellas, "BulkSC: Bulk Enforcement of Sequential Consistency," in International Symposium on Computer Architecture, June 2007.
    [6]
    S.-K. Chen, W. K. Fuchs, and J.-Y. Chung, "Reversible Debugging Using Program Instrumentation," IEEE Transactions on Software Engineering, vol. 27, August 2001.
    [7]
    G. W. Dunlap, S. T. King, S. Cinar, M. Basrai, and P. M. Chen, "ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay," in Symposium on Operating Systems Design and Implementation, December 2002.
    [8]
    G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen, "Execution Replay of Multiprocessor Virtual Machines," in International Conference on Virtual Execution Environments, March 2008.
    [9]
    S. I. Feldman and C. B. Brown, "IGOR: A System for Program Debugging Via Reversible Execution," in Workshop on Parallel and Distributed Debugging, November 1988.
    [10]
    A. Forin, "Debugging of Heterogeneous Parallel Systems," in Workshop on Parallel and Distributed Debugging, May 1988.
    [11]
    D. Hitz, J. Lau, and M. Malcolm, "File System Design for an NFS File Server Appliance," in USENIX Technical Conference, January 1994.
    [12]
    D. R. Hower and M. D. Hill, "Rerun: Exploiting Episodes for Lightweight Memory Race Recording," in International Symposium on Computer Architecture, June 2008.
    [13]
    J. Choi and H. Srinivasan, "Deterministic Replay of Java Multithreaded Applications," in Symposium on Parallel and Distributed Tools, August 1998.
    [14]
    A. Joshi, S. T. King, G. W. Dunlap, and P. M. Chen, "Detecting Past and Present Intrusions Through Vulnerability-Specific Predicates," in Symposium on Operating Systems Principles, October 2005.
    [15]
    S. T. King and P. M. Chen, "Backtracking Intrusions," in Symposium on Operating Systems Principles, October 2003.
    [16]
    S. T. King, G. W. Dunlap, and P. M. Chen, "Debugging Operating Systems with Time-Traveling Virtual Machines," in USENIX Technical Conference, April 2005.
    [17]
    T. J. LeBlanc and J. M. Mellor-Crummey, "Debugging Parallel Programs with Instant Replay," IEEE Transactions on Computers, vol. 36, April 1987.
    [18]
    P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner, "Simics: A Full System Simulation Platform," IEEE Computer, vol. 35, no. 2, 2002.
    [19]
    P. Montesinos, L. Ceze, and J. Torrellas, "DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Efficiently," in International Symposium on Computer Architecture, June 2008.
    [20]
    S. Narayanasamy, C. Pereira, and B. Calder, "Recording Shared Memory Dependencies Using Strata," in International Conference on Architectural Support for Programming Languages and Operating Systems, October 2006.
    [21]
    S. Narayanasamy, G. Pokam, and B. Calder, "BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging," in International Symposium on Computer Architecture, June 2005.
    [22]
    R. H. B. Netzer, "Optimal Tracing and Replay for Debugging Shared-Memory Parallel Programs," in Workshop on Parallel and Distributed Debugging, May 1993.
    [23]
    D. Z. Pan and M. A. Linton, "Supporting Reverse Execution for Parallel Programs," in Workshop on Parallel and Distributed Debugging, 1988.
    [24]
    M. Russinovich and B. Cogswell, "Replay for Concurrent Non-Deterministic Shared-Memory Applications," in Conference on Programming Language Design and Implementation, May 1996.
    [25]
    D. S. Santry, M. J. Feeley, N. C. Hutchinson, A. C. Veitch, R. W. Carton, and J. Ofir, "Deciding When to Forget in the Elephant File System," in Symposium on Operating Systems Principles, December 1999.
    [26]
    S. Srinivasan, S. Kandula, C. Andrews, and Y. Zhou, "Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging," in USENIX Technical Conference, 2004.
    [27]
    M. Xu, R. Bodik, and M. D. Hill, "A "Flight Data Recorder" for Enabling Full-System Multiprocessor Deterministic Replay," in International Symposium on Computer Architecture, June 2003.
    [28]
    M. Xu, R. Bodik, and M. D. Hill, "A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording," in International Conference on Architectural Support for Programming Languages and Operating Systems, October 2006.
    [29]
    M. V. Zelkowitz, "Reversible Execution," Communications of the ACM, vol. 16, September 1973.

    Cited By

    View all
    • (2022)Fuzzing@Home: Distributed Fuzzing on Untrusted Heterogeneous ClientsProceedings of the 25th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3545948.3545971(1-16)Online publication date: 26-Oct-2022
    • (2021)RAProducer: efficiently diagnose and reproduce data race bugs for binaries via trace analysisProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3460319.3464831(593-606)Online publication date: 11-Jul-2021
    • (2020)ACR: Amnesic Checkpointing and Recovery2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA47549.2020.00013(30-43)Online publication date: Feb-2020
    • Show More Cited By

    Index Terms

    1. Capo: a software-hardware interface for practical deterministic multiprocessor replay

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
        March 2009
        358 pages
        ISBN:9781605584065
        DOI:10.1145/1508244
        • cover image ACM SIGARCH Computer Architecture News
          ACM SIGARCH Computer Architecture News  Volume 37, Issue 1
          ASPLOS 2009
          March 2009
          346 pages
          ISSN:0163-5964
          DOI:10.1145/2528521
          Issue’s Table of Contents
        • cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 44, Issue 3
          ASPLOS 2009
          March 2009
          346 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/1508284
          Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 07 March 2009

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. capo
        2. capoone
        3. deterministic replay
        4. replay sphere

        Qualifiers

        • Research-article

        Conference

        ASPLOS09

        Acceptance Rates

        Overall Acceptance Rate 535 of 2,713 submissions, 20%

        Upcoming Conference

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)6
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 09 Aug 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2022)Fuzzing@Home: Distributed Fuzzing on Untrusted Heterogeneous ClientsProceedings of the 25th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3545948.3545971(1-16)Online publication date: 26-Oct-2022
        • (2021)RAProducer: efficiently diagnose and reproduce data race bugs for binaries via trace analysisProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3460319.3464831(593-606)Online publication date: 11-Jul-2021
        • (2020)ACR: Amnesic Checkpointing and Recovery2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA47549.2020.00013(30-43)Online publication date: Feb-2020
        • (2018)PloverProceedings of the 15th USENIX Conference on Networked Systems Design and Implementation10.5555/3307441.3307483(483-499)Online publication date: 9-Apr-2018
        • (2018)REPTProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291171(17-32)Online publication date: 8-Oct-2018
        • (2017)Lazy Diagnosis of In-Production Concurrency BugsProceedings of the 26th Symposium on Operating Systems Principles10.1145/3132747.3132767(582-598)Online publication date: 14-Oct-2017
        • (2017)CoopREP: Cooperative record and replay of concurrency bugsSoftware Testing, Verification and Reliability10.1002/stvr.164528:1Online publication date: 5-Sep-2017
        • (2016)ReplayconfusionThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195685(1-14)Online publication date: 15-Oct-2016
        • (2016)Intermittent computation without hardware support or programmer interventionProceedings of the 12th USENIX conference on Operating Systems Design and Implementation10.5555/3026877.3026880(17-32)Online publication date: 2-Nov-2016
        • (2016)STREAMSCOPEProceedings of the 13th Usenix Conference on Networked Systems Design and Implementation10.5555/2930611.2930640(439-453)Online publication date: 16-Mar-2016
        • Show More Cited By

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media