Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2485922.2485977acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

QuickRec: prototyping an intel architecture extension for record and replay of multithreaded programs

Published: 23 June 2013 Publication History

Abstract

There has been significant interest in hardware-assisted deterministic Record and Replay (RnR) systems for multithreaded programs on multiprocessors. However, no proposal has implemented this technique in a hardware prototype with full operating system support. Such an implementation is needed to assess RnR practicality.
This paper presents QuickRec, the first multicore Intel Architecture (IA) prototype of RnR for multithreaded programs. QuickRec is based on QuickIA, an Intel emulation platform for rapid prototyping of new IA extensions. QuickRec is composed of a Xeon server platform with FPGA-emulated second-generation Pentium cores, and Capo3, a full software stack for managing the recording hardware from within a modified Linux kernel.
This paper's focus is understanding and evaluating the implementation issues of RnR on a real platform. Our effort leads to some lessons learned, as well as to some pointers for future research. We demonstrate that RnR can be implemented efficiently on a real multicore IA system. In particular, we show that the rate of memory log generation is insignificant, and that the recording hardware has negligible performance overhead. However, the software stack incurs an average recording overhead of nearly 13%, which must be reduced to enable always-on use of RnR.

References

[1]
H. Agrawal, R. A. DeMillo, and E. H. Spafford. An Execution-Backtracking Approach to Debugging. IEEE Software, May 1991.
[2]
U. Banerjee, B. Bliss, Z. Ma, and P. Petersen. Unraveling Data Race Detection in the Intel Thread Checker. In STMCS, March 2006.
[3]
A. Basu, J. Bobba, and M. D. Hill. Karma: Scalable Deterministic Record-Replay. In ICS, June 2011.
[4]
B. Boothe. Efficient Algorithms for Bidirectional Debugging. In PLDI, June 2000.
[5]
T. Bressoud and F. Schneider. Hypervisor-Based Fault-Tolerance. ACM Transactions on Computer Systems, 14(1), February 1996.
[6]
S.-K. Chen, W. K. Fuchs, and J.-Y. Chung. Reversible Debugging Using Program Instrumentation. IEEE Transactions on Software Engineering, 27(8):715--727, August 2001.
[7]
Y. Chen, W. Hu, T. Chen, and R. Wu. LReplay: A Pending Period Based Deterministic Replay Scheme. In ISCA, June 2010.
[8]
J.-D. Choi and H. Srinivasan. Deterministic Replay of Java Multithreaded Applications. In SPDT, August 1998.
[9]
G. Dunlap, S. King, S. Cinar, M. Basrai, and P. Chen. ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay. In OSDI, December 2002.
[10]
G. Dunlap, D. Lucchetti, M. Fetterman, and P. Chen. Execution Replay of Multiprocessor Virtual Machines. In VEE, March 2008.
[11]
A. Forin. Debugging of Heterogeneous Parallel Systems. In PDD, May 1988.
[12]
N. Honarmand, N. Dautenhahn, J. Torrellas, S. T. King, G. Pokam, and C. Pereira. Cyrus: Unintrusive Application-Level Record-Replay for Replay Parallelism. In ASPLOS, March 2013.
[13]
D. R. Hower and M. D. Hill. Rerun: Exploiting Episodes for Lightweight Memory Race Recording. In ISCA, June 2008.
[14]
Intel Corp. Intel 64 and IA-32 Architectures Software Developer's Manual. 2002. http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html.
[15]
A. Joshi, S. T. King, G. W. Dunlap, and P. M. Chen. Detecting Past and Present Intrusions Through Vulnerability-Specific Predicates. In SOSP, October 2005.
[16]
S. T. King and P. M. Chen. Backtracking Intrusions. In SOSP, October 2003.
[17]
S. T. King, G. W. Dunlap, and P. M. Chen. Debugging Operating Systems with Time-Traveling Virtual Machines. In USENIX Annual Technical Conference, April 2005.
[18]
T. J. LeBlanc and J. M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE Trans. Comp., April 1987.
[19]
G. Lueck, H. Patil, and C. Pereira. PinADX: An Interface for Customizable Debugging with Dynamic Instrumentation. In CGO, 2012.
[20]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In PLDI, 2005.
[21]
P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A Full System Simulation Platform. IEEE Computer, February 2002.
[22]
J. D. McCalpin. Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE TCCA Newsletter, pages 19--25, December 1995.
[23]
P. Montesinos, L. Ceze, and J. Torrellas. DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Efficiently. In ISCA, June 2008.
[24]
P. Montesinos, M. Hicks, S. King, and J. Torrellas. Capo: A SoftwareHardware Interface for Practical Deterministic Multiprocessor Replay. In ASPLOS, March 2009.
[25]
S. Narayanasamy, C. Pereira, and B. Calder. Recording Shared Memory Dependencies Using Strata. In ASPLOS, October 2006.
[26]
S. Narayanasamy, G. Pokam, and B. Calder. BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging. In ISCA, June 2005.
[27]
D. Z. Pan and M. A. Linton. Supporting Reverse Execution for Parallel Programs. In PDD, May 1988.
[28]
H. Patil, C. Pereira, M. Stallcup, G. Lueck, and J. Cownie. PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs. In CGO, April 2010.
[29]
C. Pereira, G. Pokam, K. Danne, R. Devarajan, and A.-R. Adl-Tabatabai. Virtues and Obstacles of Hardware-Assisted MultiProcessor Execution Replay. In HotPAR, June 2010.
[30]
G. Pokam, C. Pereira, K. Danne, R. Kassa, and A.-R. Adl-Tabatabai. Architecting a Chunk-Based Memory Race Recorder in Modern CMPs. In MICRO, December 2009.
[31]
G. Pokam, C. Pereira, S. Hu, A.-R. Adl-Tabatabai, J. Gottschlich, H. Jungwoo, and Y. Wu. CoreRacer: A Practical Memory Race Recorder for Multicore x86 TSO Processors. In MICRO, 2011.
[32]
M. Russinovich and B. Cogswell. Replay for Concurrent Non-Deterministic Shared-Memory Applications. In PLDI, May 1996.
[33]
K. Serebryany and T. Iskhodzhanov. ThreadSanitizer: Data Race Detection in Practice. In WBIA, December 2009.
[34]
S. Srinivasan, S. Kandula, C. Andrews, and Y. Zhou. Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging. In USENIX Ann. Tech. Conf., June 2004.
[35]
K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. DoublePlay: Parallelizing Sequential Logging and Replay. In ASPLOS, March 2011.
[36]
G. Voskuilen, F. Ahmad, and T. N. Vijaykumar. Timetraveler: Exploiting Acyclic Races for Optimizing Memory Race Recording. In ISCA, June 2010.
[37]
Q. Wang, R. Kassa, W. Shen, N. Ijih, B. Chitlur, M. Konow, D. Liu, A. Sheiman, and P. Gupta. An FPGA Based Hybrid Processor Emulation Platform. In FPL, August 2010.
[38]
XtreamData. http://www.xtreamdata.com.
[39]
M. Xu, R. Bodik, and M. Hill. A "Flight Data Recorder" for Enabling Full-System Multiprocessor Deterministic Replay. In ISCA, June 2003.
[40]
M. Xu, R. Bodik, and M. D. Hill. A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording. In ASPLOS, 2006.
[41]
M. V. Zelkowitz. Reversible Execution. Communications of the ACM, 16(9):566, September 1973.

Cited By

View all
  • (2021)RAProducer: efficiently diagnose and reproduce data race bugs for binaries via trace analysisProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3460319.3464831(593-606)Online publication date: 11-Jul-2021
  • (2019)Different is GoodProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security10.1145/3319535.3345654(1883-1897)Online publication date: 6-Nov-2019
  • (2019)Sparse record and replay with controlled schedulingProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314635(576-593)Online publication date: 8-Jun-2019
  • Show More Cited By

Recommendations

Reviews

Amitabha Roy

This paper describes QuickRec, a field-programmable gate array (FPGA) implementation of hardware-assisted record replay (RnR) for an x86 processor. Record replay allows the recording of the execution of a multithreaded application, capturing all sources of nondeterminism, including input, nondeterministic instructions such as those that read the processor timestamp counter, and the interleaving of racing accesses to memory. This then allows replay of the execution, providing complete information about the execution to tools such as debuggers and race detectors, thereby enabling reasoning about it. There has been significant prior work in this area, and the key contribution with QuickRec is a fully working prototype on FPGAs of previous work called Capo, which was originally evaluated on a simulator. The resulting full system (Capo3) consists of a modified Linux kernel supporting record replay, an FPGA prototype of four Intel Pentium cores connected to memory, and modifications to the cores to support record replay. The primary components of the record replay system are bloom filters that record addresses of reads and writes to the level 1 cache and in-memory logs of input events such as data supplied by the operating system. On certain events that demand the enforcing of total order, such as an interleaving access from a different core, the bloom filters and input logs are written out to a totally ordered log as "chunks." Because this work resulted in the building of a real system, the paper provides a number of interesting insights of a practical nature. First, the overheads of record and replay are as low as 13 percent on average, suggesting that this feature is mature enough and useful enough to demand inclusion in future processors. This is backed up by the fact that memory bandwidth requirements for record and replay are as low as 0.3 percent in the emulated system. The authors also provide a number of practical suggestions for operating system support for RnR, including a careful exposition on how to instrument routines that copy data back to user space and how to handle page faults in those routines by means of extra hardware support, thereby connecting the dots between RnR hardware and operating system support for RnR. This paper is a good read for researchers interested in the practical aspects of record and replay. However, it does assume knowledge of prior art in the area. In particular, a careful reading of the original Capo system paper [1] would greatly enhance the potential for learning from this paper. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
June 2013
686 pages
ISBN:9781450320795
DOI:10.1145/2485922
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 41, Issue 3
    ICSA '13
    June 2013
    666 pages
    ISSN:0163-5964
    DOI:10.1145/2508148
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • IEEE CS

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGA prototype
  2. deterministic record and replay
  3. hardware-software interface
  4. shared memory multiprocessors

Qualifiers

  • Research-article

Funding Sources

Conference

ISCA'13
Sponsor:

Acceptance Rates

ISCA '13 Paper Acceptance Rate 56 of 288 submissions, 19%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2021)RAProducer: efficiently diagnose and reproduce data race bugs for binaries via trace analysisProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3460319.3464831(593-606)Online publication date: 11-Jul-2021
  • (2019)Different is GoodProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security10.1145/3319535.3345654(1883-1897)Online publication date: 6-Nov-2019
  • (2019)Sparse record and replay with controlled schedulingProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314635(576-593)Online publication date: 8-Jun-2019
  • (2019)Applying Transactional Memory for Concurrency-Bug Failure Recovery in Production RunsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.287765630:5(990-1006)Online publication date: 1-May-2019
  • (2018)Record-and-Replay Techniques for HPC Systems: A SurveySupercomputing Frontiers and Innovations: an International Journal10.14529/jsfi1801025:1(11-30)Online publication date: 15-Mar-2018
  • (2018)A Quick Deterministic Replay Method Based on Dependence Pair2018 Eighth International Conference on Information Science and Technology (ICIST)10.1109/ICIST.2018.8426126(110-113)Online publication date: Jun-2018
  • (2018)Record-Replay Architecture as a General Security Framework2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2018.00025(180-193)Online publication date: Feb-2018
  • (2018)RollSecInternational Journal of Parallel Programming10.1007/s10766-017-0523-046:4(788-805)Online publication date: 1-Aug-2018
  • (2017)Lazy Diagnosis of In-Production Concurrency BugsProceedings of the 26th Symposium on Operating Systems Principles10.1145/3132747.3132767(582-598)Online publication date: 14-Oct-2017
  • (2016)ReplayconfusionThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195685(1-14)Online publication date: 15-Oct-2016
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media