research-article

QuickRec: prototyping an intel architecture extension for record and replay of multithreaded programs

Authors:

Josep TorrellasAuthors Info & Claims

ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture

Pages 643 - 654

https://doi.org/10.1145/2485922.2485977

Published: 23 June 2013 Publication History

Get Access

Abstract

There has been significant interest in hardware-assisted deterministic Record and Replay (RnR) systems for multithreaded programs on multiprocessors. However, no proposal has implemented this technique in a hardware prototype with full operating system support. Such an implementation is needed to assess RnR practicality.

This paper presents QuickRec, the first multicore Intel Architecture (IA) prototype of RnR for multithreaded programs. QuickRec is based on QuickIA, an Intel emulation platform for rapid prototyping of new IA extensions. QuickRec is composed of a Xeon server platform with FPGA-emulated second-generation Pentium cores, and Capo3, a full software stack for managing the recording hardware from within a modified Linux kernel.

This paper's focus is understanding and evaluating the implementation issues of RnR on a real platform. Our effort leads to some lessons learned, as well as to some pointers for future research. We demonstrate that RnR can be implemented efficiently on a real multicore IA system. In particular, we show that the rate of memory log generation is insignificant, and that the recording hardware has negligible performance overhead. However, the software stack incurs an average recording overhead of nearly 13%, which must be reduced to enable always-on use of RnR.

References

[1]

H. Agrawal, R. A. DeMillo, and E. H. Spafford. An Execution-Backtracking Approach to Debugging. IEEE Software, May 1991.

Digital Library

Google Scholar

[2]

U. Banerjee, B. Bliss, Z. Ma, and P. Petersen. Unraveling Data Race Detection in the Intel Thread Checker. In STMCS, March 2006.

Google Scholar

[3]

A. Basu, J. Bobba, and M. D. Hill. Karma: Scalable Deterministic Record-Replay. In ICS, June 2011.

Digital Library

Google Scholar

[4]

B. Boothe. Efficient Algorithms for Bidirectional Debugging. In PLDI, June 2000.

Digital Library

Google Scholar

[5]

T. Bressoud and F. Schneider. Hypervisor-Based Fault-Tolerance. ACM Transactions on Computer Systems, 14(1), February 1996.

Digital Library

Google Scholar

[6]

S.-K. Chen, W. K. Fuchs, and J.-Y. Chung. Reversible Debugging Using Program Instrumentation. IEEE Transactions on Software Engineering, 27(8):715--727, August 2001.

Digital Library

Google Scholar

[7]

Y. Chen, W. Hu, T. Chen, and R. Wu. LReplay: A Pending Period Based Deterministic Replay Scheme. In ISCA, June 2010.

Digital Library

Google Scholar

[8]

J.-D. Choi and H. Srinivasan. Deterministic Replay of Java Multithreaded Applications. In SPDT, August 1998.

Digital Library

Google Scholar

[9]

G. Dunlap, S. King, S. Cinar, M. Basrai, and P. Chen. ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay. In OSDI, December 2002.

Digital Library

Google Scholar

[10]

G. Dunlap, D. Lucchetti, M. Fetterman, and P. Chen. Execution Replay of Multiprocessor Virtual Machines. In VEE, March 2008.

Digital Library

Google Scholar

[11]

A. Forin. Debugging of Heterogeneous Parallel Systems. In PDD, May 1988.

Digital Library

Google Scholar

[12]

N. Honarmand, N. Dautenhahn, J. Torrellas, S. T. King, G. Pokam, and C. Pereira. Cyrus: Unintrusive Application-Level Record-Replay for Replay Parallelism. In ASPLOS, March 2013.

Digital Library

Google Scholar

[13]

D. R. Hower and M. D. Hill. Rerun: Exploiting Episodes for Lightweight Memory Race Recording. In ISCA, June 2008.

Digital Library

Google Scholar

[14]

Intel Corp. Intel 64 and IA-32 Architectures Software Developer's Manual. 2002. http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html.

Google Scholar

[15]

A. Joshi, S. T. King, G. W. Dunlap, and P. M. Chen. Detecting Past and Present Intrusions Through Vulnerability-Specific Predicates. In SOSP, October 2005.

Digital Library

Google Scholar

[16]

S. T. King and P. M. Chen. Backtracking Intrusions. In SOSP, October 2003.

Digital Library

Google Scholar

[17]

S. T. King, G. W. Dunlap, and P. M. Chen. Debugging Operating Systems with Time-Traveling Virtual Machines. In USENIX Annual Technical Conference, April 2005.

Digital Library

Google Scholar

[18]

T. J. LeBlanc and J. M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE Trans. Comp., April 1987.

Digital Library

Google Scholar

[19]

G. Lueck, H. Patil, and C. Pereira. PinADX: An Interface for Customizable Debugging with Dynamic Instrumentation. In CGO, 2012.

Digital Library

Google Scholar

[20]

C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In PLDI, 2005.

Digital Library

Google Scholar

[21]

P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A Full System Simulation Platform. IEEE Computer, February 2002.

Digital Library

Google Scholar

[22]

J. D. McCalpin. Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE TCCA Newsletter, pages 19--25, December 1995.

Google Scholar

[23]

P. Montesinos, L. Ceze, and J. Torrellas. DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Efficiently. In ISCA, June 2008.

Digital Library

Google Scholar

[24]

P. Montesinos, M. Hicks, S. King, and J. Torrellas. Capo: A SoftwareHardware Interface for Practical Deterministic Multiprocessor Replay. In ASPLOS, March 2009.

Digital Library

Google Scholar

[25]

S. Narayanasamy, C. Pereira, and B. Calder. Recording Shared Memory Dependencies Using Strata. In ASPLOS, October 2006.

Digital Library

Google Scholar

[26]

S. Narayanasamy, G. Pokam, and B. Calder. BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging. In ISCA, June 2005.

Digital Library

Google Scholar

[27]

D. Z. Pan and M. A. Linton. Supporting Reverse Execution for Parallel Programs. In PDD, May 1988.

Digital Library

Google Scholar

[28]

H. Patil, C. Pereira, M. Stallcup, G. Lueck, and J. Cownie. PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs. In CGO, April 2010.

Digital Library

Google Scholar

[29]

C. Pereira, G. Pokam, K. Danne, R. Devarajan, and A.-R. Adl-Tabatabai. Virtues and Obstacles of Hardware-Assisted MultiProcessor Execution Replay. In HotPAR, June 2010.

Google Scholar

[30]

G. Pokam, C. Pereira, K. Danne, R. Kassa, and A.-R. Adl-Tabatabai. Architecting a Chunk-Based Memory Race Recorder in Modern CMPs. In MICRO, December 2009.

Digital Library

Google Scholar

[31]

G. Pokam, C. Pereira, S. Hu, A.-R. Adl-Tabatabai, J. Gottschlich, H. Jungwoo, and Y. Wu. CoreRacer: A Practical Memory Race Recorder for Multicore x86 TSO Processors. In MICRO, 2011.

Digital Library

Google Scholar

[32]

M. Russinovich and B. Cogswell. Replay for Concurrent Non-Deterministic Shared-Memory Applications. In PLDI, May 1996.

Digital Library

Google Scholar

[33]

K. Serebryany and T. Iskhodzhanov. ThreadSanitizer: Data Race Detection in Practice. In WBIA, December 2009.

Digital Library

Google Scholar

[34]

S. Srinivasan, S. Kandula, C. Andrews, and Y. Zhou. Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging. In USENIX Ann. Tech. Conf., June 2004.

Digital Library

Google Scholar

[35]

K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. DoublePlay: Parallelizing Sequential Logging and Replay. In ASPLOS, March 2011.

Digital Library

Google Scholar

[36]

G. Voskuilen, F. Ahmad, and T. N. Vijaykumar. Timetraveler: Exploiting Acyclic Races for Optimizing Memory Race Recording. In ISCA, June 2010.

Digital Library

Google Scholar

[37]

Q. Wang, R. Kassa, W. Shen, N. Ijih, B. Chitlur, M. Konow, D. Liu, A. Sheiman, and P. Gupta. An FPGA Based Hybrid Processor Emulation Platform. In FPL, August 2010.

Digital Library

Google Scholar

[38]

XtreamData. http://www.xtreamdata.com.

Google Scholar

[39]

M. Xu, R. Bodik, and M. Hill. A "Flight Data Recorder" for Enabling Full-System Multiprocessor Deterministic Replay. In ISCA, June 2003.

Digital Library

Google Scholar

[40]

M. Xu, R. Bodik, and M. D. Hill. A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording. In ASPLOS, 2006.

Digital Library

Google Scholar

[41]

M. V. Zelkowitz. Reversible Execution. Communications of the ACM, 16(9):566, September 1973.

Digital Library

Google Scholar

Cited By

View all

Yuan MLee YZhang CLi YCai YZhao BCadar CZhang X(2021)RAProducer: efficiently diagnose and reproduce data race bugs for binaries via trace analysisProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3460319.3464831(593-606)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3460319.3464831
Cao MHou XWang TQu HZhou YBai XWang FCavallaro LKinder JWang XKatz J(2019)Different is GoodProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security10.1145/3319535.3345654(1883-1897)Online publication date: 6-Nov-2019
https://dl.acm.org/doi/10.1145/3319535.3345654
Lidbury CDonaldson AMcKinley KFisher K(2019)Sparse record and replay with controlled schedulingProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314635(576-593)Online publication date: 8-Jun-2019
https://dl.acm.org/doi/10.1145/3314221.3314635
Show More Cited By

Index Terms

QuickRec: prototyping an intel architecture extension for record and replay of multithreaded programs

Recommendations

QuickRec: prototyping an intel architecture extension for record and replay of multithreaded programs
ICSA '13

There has been significant interest in hardware-assisted deterministic Record and Replay (RnR) systems for multithreaded programs on multiprocessors. However, no proposal has implemented this technique in a hardware prototype with full operating system ...
System prototypes: virtual, hardware or hybrid?
DAC '09: Proceedings of the 46th Annual Design Automation Conference

Almost all SoC designs today use hardware prototyping at some point of the development cycle to perform hardware/software validation or test interfaces to real-world stimulus. Recently, virtual prototypes have emerged as a way to run system level tests ...
HArtes: Hardware-Software Codesign for Heterogeneous Multicore Platforms

Developing heterogeneous multicore platforms requires choosing the best hardware configuration for mapping the application, and modifying that application so that different parts execute on the most appropriate hardware component. The hArtes toolchain ...

Reviews

Reviewer: Amitabha Roy

This paper describes QuickRec, a field-programmable gate array (FPGA) implementation of hardware-assisted record replay (RnR) for an x86 processor. Record replay allows the recording of the execution of a multithreaded application, capturing all sources of nondeterminism, including input, nondeterministic instructions such as those that read the processor timestamp counter, and the interleaving of racing accesses to memory. This then allows replay of the execution, providing complete information about the execution to tools such as debuggers and race detectors, thereby enabling reasoning about it. There has been significant prior work in this area, and the key contribution with QuickRec is a fully working prototype on FPGAs of previous work called Capo, which was originally evaluated on a simulator. The resulting full system (Capo3) consists of a modified Linux kernel supporting record replay, an FPGA prototype of four Intel Pentium cores connected to memory, and modifications to the cores to support record replay. The primary components of the record replay system are bloom filters that record addresses of reads and writes to the level 1 cache and in-memory logs of input events such as data supplied by the operating system. On certain events that demand the enforcing of total order, such as an interleaving access from a different core, the bloom filters and input logs are written out to a totally ordered log as "chunks." Because this work resulted in the building of a real system, the paper provides a number of interesting insights of a practical nature. First, the overheads of record and replay are as low as 13 percent on average, suggesting that this feature is mature enough and useful enough to demand inclusion in future processors. This is backed up by the fact that memory bandwidth requirements for record and replay are as low as 0.3 percent in the emulated system. The authors also provide a number of practical suggestions for operating system support for RnR, including a careful exposition on how to instrument routines that copy data back to user space and how to handle page faults in those routines by means of extra hardware support, thereby connecting the dots between RnR hardware and operating system support for RnR. This paper is a good read for researchers interested in the practical aspects of record and replay. However, it does assume knowledge of prior art in the area. In particular, a careful reading of the original Capo system paper [1] would greatly enhance the potential for learning from this paper. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture

June 2013

686 pages

ISBN:9781450320795

DOI:10.1145/2485922

General Chair:
Avi Mendelson
Technion

ACM SIGARCH Computer Architecture News Volume 41, Issue 3
ICSA '13
June 2013
666 pages
ISSN:0163-5964
DOI:10.1145/2508148
Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

University of Illinois at Urbana-Champaign

Conference

ISCA'13

Sponsor:

ISCA'13: The 40th Annual International Symposium on Computer Architecture

June 23 - 27, 2013

Tel-Aviv, Israel

Acceptance Rates

ISCA '13 Paper Acceptance Rate 56 of 288 submissions, 19%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
543
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Yuan MLee YZhang CLi YCai YZhao BCadar CZhang X(2021)RAProducer: efficiently diagnose and reproduce data race bugs for binaries via trace analysisProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3460319.3464831(593-606)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3460319.3464831
Cao MHou XWang TQu HZhou YBai XWang FCavallaro LKinder JWang XKatz J(2019)Different is GoodProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security10.1145/3319535.3345654(1883-1897)Online publication date: 6-Nov-2019
https://dl.acm.org/doi/10.1145/3319535.3345654
Lidbury CDonaldson AMcKinley KFisher K(2019)Sparse record and replay with controlled schedulingProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314635(576-593)Online publication date: 8-Jun-2019
https://dl.acm.org/doi/10.1145/3314221.3314635
Chen YWang SLu SSankaralingam K(2019)Applying Transactional Memory for Concurrency-Bug Failure Recovery in Production RunsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.287765630:5(990-1006)Online publication date: 1-May-2019
https://doi.org/10.1109/TPDS.2018.2877656
Chapp DSato KAhn DTaufer M(2018)Record-and-Replay Techniques for HPC Systems: A SurveySupercomputing Frontiers and Innovations: an International Journal10.14529/jsfi1801025:1(11-30)Online publication date: 15-Mar-2018
https://dl.acm.org/doi/10.14529/jsfi180102
Huan YSi-Wei MLi-Fang HWen-Hao YLiang Z(2018)A Quick Deterministic Replay Method Based on Dependence Pair2018 Eighth International Conference on Information Science and Technology (ICIST)10.1109/ICIST.2018.8426126(110-113)Online publication date: Jun-2018
https://doi.org/10.1109/ICIST.2018.8426126
Shalabi YYan MHonarmand NLee RTorrellas J(2018)Record-Replay Architecture as a General Security Framework2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2018.00025(180-193)Online publication date: Feb-2018
https://doi.org/10.1109/HPCA.2018.00025
Dai WDu YJin HQiang WZou DXu SLiu Z(2018)RollSecInternational Journal of Parallel Programming10.1007/s10766-017-0523-046:4(788-805)Online publication date: 1-Aug-2018
https://dl.acm.org/doi/10.1007/s10766-017-0523-0
Kasikci BCui WGe XNiu B(2017)Lazy Diagnosis of In-Production Concurrency BugsProceedings of the 26th Symposium on Operating Systems Principles10.1145/3132747.3132767(582-598)Online publication date: 14-Oct-2017
https://dl.acm.org/doi/10.1145/3132747.3132767
Yan MShalabi YTorrellas JHsu WYang CLipasti MLee H(2016)ReplayconfusionThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195685(1-14)Online publication date: 15-Oct-2016
https://dl.acm.org/doi/10.5555/3195638.3195685
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations