Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1811039.1811057acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article

Transparent, lightweight application execution replay on commodity multiprocessor operating systems

Published: 14 June 2010 Publication History
  • Get Citation Alerts
  • Abstract

    We present Scribe, the first system to provide transparent, low-overhead application record-replay and the ability to go live from replayed execution. Scribe introduces new lightweight operating system mechanisms, rendezvous and sync points, to efficiently record nondeterministic interactions such as related system calls, signals, and shared memory accesses. Rendezvous points make a partial ordering of execution based on system call dependencies sufficient for replay, avoiding the recording overhead of maintaining an exact execution ordering. Sync points convert asynchronous interactions that can occur at arbitrary times into synchronous events that are much easier to record and replay.
    We have implemented Scribe without changing, relinking, or recompiling applications, libraries, or operating system kernels, and without any specialized hardware support such as hardware performance counters. It works on commodity Linux operating systems, and commodity multi-core and multiprocessor hardware. Our results show for the first time that an operating system mechanism can correctly and transparently record and replay multi-process and multi-threaded applications on commodity multiprocessors. Scribe recording overhead is less than 2.5% for server applications including Apache and MySQL, and less than 15% for desktop applications including Firefox, Acrobat, OpenOffice, parallel kernel compilation, and movie playback.

    References

    [1]
    D. F. Bacon and S. C. Goldstein. Hardware-Assisted Replay of Multiprocessor Programs. In Proceedings of the 1991 ACM/ONR Workshop on Parallel and Distributed Debugging, May 1991.
    [2]
    R. M. Balzer. EXDAMS: Extendable Debugging and Monitoring System. In Proceedings of the AFIPS Spring Joint Computer Conference, May 1969.
    [3]
    P. Bergheaud, D. Subhraveti, and M. Vertes. Fault Tolerance in Multiprocessor Systems Via Application Cloning. In Proceedings of the 27th International Conference on Distributed Computing Systems (ICDCS), June 2007.
    [4]
    T. C. Bressoud. TFT: A Software System for Application-Transparent Fault Tolerance. In Proceedings of the 28th Annual International Symposium on Fault-Tolerant Computing, June 1998.
    [5]
    T. C. Bressoud and F. B. Schneider. Hypervisor-Based Fault Tolerance. In Proceedings of the 15th Symposium on Operating Systems Principles (SOSP), Dec. 1995.
    [6]
    J.-D. Choi and H. Srinivasan. Deterministic Replay of Java Multithreaded Applications. In Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools, June 1998.
    [7]
    P. J. Courtois, F. Heymans, and D. L. Parnas. Concurrent Control with "Readers" and "Writers". Communications of the ACM, 14(10), 1971.
    [8]
    J. Devietti, B. Lucia, L. Ceze, and M. Oskin. DMP: Deterministic Shared Memory Multiprocessing. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Mar. 2009.
    [9]
    G. W. Dunlap, S. T. King, S. Cinar, M. A. Basrai, and P. M. Chen. ReVirt: Enabling Intrusion Analysis Through Virtual--Machine Logging and Replay. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI), Dec. 2002.
    [10]
    G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen. Execution Replay of Multiprocessor Virtual Machines. In Proceedings of the 4th International Conference on Virtual Execution Environments (VEE), Mar. 2008.
    [11]
    D. Geels, G. Altekar, S. Shenker, and I. Stoica. Replay Debugging for Distributed Applications. In Proceedings of the 2006 USENIX Annual Technical Conference, June 2006.
    [12]
    Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, and Z. Zhang. R2: An Application-Level Kernel for Record and Replay. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI), Dec. 2008.
    [13]
    D. R. Hower and M. D. Hill. Rerun: Exploiting Episodes for Lightweight Memory Race Recording. In Proceedings of the 35th International Symposium on Computer Architecture (ISCA), June 2008.
    [14]
    O. Laadan, R. A. Baratto, D. Phung, S. Potter, and J. Nieh. DejaView: A Personal Virtual Computer Recorder. In Proceedings of the 21st Symposium on Operating Systems Principles (SOSP), Oct. 2007.
    [15]
    O. Laadan and J. Nieh. Transparent Checkpoint-Restart of Multiple Processes on Commodity Operating Systems. In Proceedings of the 2007 USENIX Annual Technical Conference, June 2007.
    [16]
    O. Laadan and J. Nieh. Operating System Virtualization: Practice and Experience. In Proceedings of the 3rd Annual Haifa Experimental Systems Conference (SYSTOR), May 2010.
    [17]
    T. J. Leblanc and J. M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE Transactions on Computers, C-36(4), Apr. 1987.
    [18]
    N. McWhirter, editor. The Guinness Book of World Records. Sterling Publishing Co., Inc, 1985.
    [19]
    P. Montesinos, L. Ceze, and J. Torrellas. DeLorean: Recording and Deterministically Replaying Shared--Memory Multiprocesso rExecution Efficiently. In Proceedings of the 35th International Symposium on Computer Architecture (ISCA), June 2008.
    [20]
    P. Montesinos, M. Hicks, S. T. King, and J. Torrellas. Capo: a Software-Hardware Interface for Practical Deterministic Multiprocessor Replay. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Mar. 2009.
    [21]
    S. Narayanasamy, C. Pereira, and B. Calder. Recording Shared Memory Dependencies Using Strata. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct. 2006.
    [22]
    S. Narayanasamy, G. Pokam, and B. Calder. BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging. In Proceedings of the 32nd International Symposium on Computer Architecture (ISCA), 2005.
    [23]
    M. Olszweski, J. Ansel, and S. Amarasinghe. Kendo: Efficient Deterministic Multithreading in Software. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Mar. 2009.
    [24]
    S. Osman, D. Subhraveti, G. Su, and J. Nieh. The Design and Implementation of Zap: A System for Migrating Computing Environments. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI), Dec. 2002.
    [25]
    M. Russinovich and B. Cogswell. Replay for Concurrent Non-Deterministic Shared-Memory Applications. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (PLDI), May 1996.
    [26]
    Y. Saito. Jockey: a User-Space Library for Record-Replay Debugging. In Proceedings of the 6th International Symposium on Automated Analysis-Driven Debugging, Sept. 2005.
    [27]
    J. H. Slye and E. Elnozahy. Supporting Nondeterministic Execution in Fault-Tolerant Systems. In Proceedings of the 26th Annual International Symposium on Fault-Tolerant Computing, 1996.
    [28]
    S. M. Srinivasan, S. Kandula, C. R. Andrews, and Y. Zhou. Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging. In Proceedings of the 2004 USENIX Annual Technical Conference, June 2004.
    [29]
    D. Stodden, H. Eichner, M. Walter, and C. Trinitis. Hardware Instruction Counting for Log-based Rollback Recovery on x86-family Processors. In Proceedings of the 3rd International Service Availability Symposium (ISAS), 2006.
    [30]
    H. Thane and H. Hansson. Using Deterministic Replay for Debugging of Distributed Real-Time Systems. In Proceedings of the 12th Euromicro Conference on Real-Time System, June 2000.
    [31]
    A. Tucker. Personal communications, June 2009.
    [32]
    Vmware. http://www.vmware.com.
    [33]
    M. Xu, R. Bodik, and M. D. Hill. A "Flight Data Recorder" for Enabling Full-System Multiprocessor Deterministic Replay. In Proceedings of the 30th International Symposium on Computer Architecture (ISCA), June 2003.

    Cited By

    View all
    • (2024)Efficient Auditing of Event-driven Web ApplicationsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650089(1208-1224)Online publication date: 22-Apr-2024
    • (2023)Always-On Recording Framework for Serverless Computations: Opportunities and ChallengesProceedings of the 1st Workshop on SErverless Systems, Applications and MEthodologies10.1145/3592533.3592810(41-49)Online publication date: 8-May-2023
    • (2023)Vidi: Record Replay for Reconfigurable HardwareProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582040(806-820)Online publication date: 25-Mar-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMETRICS '10: Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
    June 2010
    398 pages
    ISBN:9781450300384
    DOI:10.1145/1811039
    • cover image ACM SIGMETRICS Performance Evaluation Review
      ACM SIGMETRICS Performance Evaluation Review  Volume 38, Issue 1
      Performance evaluation review
      June 2010
      382 pages
      ISSN:0163-5999
      DOI:10.1145/1811099
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 June 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. debugging
    2. fault-tolerance
    3. record-replay
    4. virtualization

    Qualifiers

    • Research-article

    Conference

    SIGMETRICS '10
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 459 of 2,691 submissions, 17%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 09 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Efficient Auditing of Event-driven Web ApplicationsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650089(1208-1224)Online publication date: 22-Apr-2024
    • (2023)Always-On Recording Framework for Serverless Computations: Opportunities and ChallengesProceedings of the 1st Workshop on SErverless Systems, Applications and MEthodologies10.1145/3592533.3592810(41-49)Online publication date: 8-May-2023
    • (2023)Vidi: Record Replay for Reconfigurable HardwareProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582040(806-820)Online publication date: 25-Mar-2023
    • (2023)Diagnosing Kernel Concurrency Failures with AITIAProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3567486(94-110)Online publication date: 8-May-2023
    • (2022)Making Information Hiding Effective AgainIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2021.306408619:4(2576-2594)Online publication date: 1-Jul-2022
    • (2021)Understanding and detecting server-side request races in web applicationsProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3468264.3468594(842-854)Online publication date: 20-Aug-2021
    • (2020)On another levelProceedings of the workshop on Testing Database Systems10.1145/3395032.3395321(1-6)Online publication date: 19-Jun-2020
    • (2020)Ad hoc Test Generation Through Binary Rewriting2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM51674.2020.00018(115-126)Online publication date: Sep-2020
    • (2019)SafehiddenProceedings of the 28th USENIX Conference on Security Symposium10.5555/3361338.3361424(1239-1256)Online publication date: 14-Aug-2019
    • (2019)Processor-Oblivious Record and ReplayACM Transactions on Parallel Computing10.1145/33656596:4(1-28)Online publication date: 17-Dec-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media