research-article

ECMon: exposing cache events for monitoring

Authors:

Vijay Nagarajan,

Rajiv GuptaAuthors Info & Claims

ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

Pages 349 - 360

https://doi.org/10.1145/1555754.1555798

Published: 20 June 2009 Publication History

Abstract

The advent of multicores has introduced new challenges for programmers to provide increased performance and software reliability. There has been significant interest in techniques that use software speculation to better utilize the computational power of multicores. At the same time, several recent proposals for ensuring software reliability are not applicable in a multicore setting due to their inability to handle interprocessor shared memory dependences (ISMDs). The demands for performing speculation and ensuring software reliability in a multicore setting, although seemingly different, share a common requirement: the need for monitoring program execution and collecting interprocessor dependence information at low overhead. For example, an important component of speculation is the effcient detection of missspeculation which in turn requires dependence information. Likewise, tasks that help ensure software reliability on multicores, including recording for replay, require ISMD information.

In this paper, we propose ECMon: support for exposing cache events to the software. This enables the programmer to catch these events and react to them; in effect, efficiently exposing the ISMDs to the programmer. In the context of speculation, we show how ECMon optimizes the detection of miss-speculation; we use this simple support to speculate past active barriers and achieve a speedup of 12% for the set of parallel programs considered. As an application of ensuring software reliability, we show how ECMon can be used to record shared memory dependences on multicores using no specialized hardware support at only 2.8 fold execution time overhead.

References

[1]

A.-R. Adl-Tabatabai, B. T. Lewis, V. Menon, B. R. Murphy, B. Saha, and T. Shpeisman. Compiler and runtime support for efficient software transactional memory. In PLDI, pages 26--37, 2006.

Digital Library

[2]

S. Chen, M. Kozuch, T. Strigkos, B. Falsafi, P. B. Gibbons, T. C. Mowry, V. Ramachandran, O. Ruwase, M. Ryan, and E. Vlachos. Flexible hardware acceleration for instruction-grain program monitoring. In ISCA, 2008.

Digital Library

[3]

J. Chung, M. Dalton, H. Kannan, and C. Kozyrakis. Thread-safe binary translation using transactional memory. In HPCA, 2008.

[4]

M. H. Cintra, J. F. Martinez, and J. Torrellas. Architectural support for scalable speculative parallelization in shared-memory multiprocessors. In ISCA, pages 13--24, 2000.

Digital Library

[5]

M. Dalton, H. Kannan, and C. Kozyrakis. Raksha: a flexible information flow architecture for software security. In ISCA, pages 482--493, 2007.

Digital Library

[6]

P. Damron, A. Fedorova, Y. Lev, V. Luchangco, M. Moir, and D. Nussbaum. Hybrid transactional memory. In ASPLOS-XII, pages 336--346, 2006.

Digital Library

[7]

C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In PLDI, pages 223--234, 2007.

Digital Library

[8]

R. Gupta. The fuzzy barrier: A mechanism for high speed synchronization of processors. In ASPLOS, pages 54--63, 1989.

Digital Library

[9]

L. Hammond, M. Willey, and K. Olukotun. Data speculation support for a chip multiprocessor. In ASPLOS, pages 58--69, 1998.

Digital Library

[10]

M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. In ISCA, 1993.

Digital Library

[11]

M. Horowitz, M. Martonosi, T. C. Mowry, and M. D. Smith. Informing memory operations: Memory performance feedback mechanisms and their applications. ACM Trans. Comput. Syst., 16(2):170--205, 1998.

Digital Library

[12]

D. R. Hower and M. D. Hill. Rerun: Exploiting episodes for lightweight memory race recording. In ISCA, pages 265--276, Washington, DC, USA, 2008. IEEE Computer Society.

Digital Library

[13]

V. J. Marathe, W. N. S. III, and M. L. Scott. Adaptive software transactional memory. In DISC, pages 354--368, 2005.

Digital Library

[14]

J. F. Martinez and J. Torrellas. Speculative synchronization: applying thread-level speculation to explicitly parallel applications. In ASPLOS, pages 18--29, 2002.

Digital Library

[15]

M. Martonosi, D. Ofelt, and M. Heinrich. Integrating performance monitoring and communication in parallel computers. In SIGMETRICS, pages 138--147, 1996.

Digital Library

[16]

C. C. Minh, M. Trautmann, J. Chung, A. McDonald, N. Bronson, J. Casper, C. Kozyrakis, and K. Olukotun. An effective hybrid transactional memory system with strong isolation guarantees. In ISCA, pages 69--80, 2007.

Digital Library

[17]

P. Montesinos, L. Ceze, and J. Torrellas. Delorean: Recording and deterministically replaying shared-memory multiprocessor execution efficiently. In ISCA, pages 289--300, Los Alamitos, CA, USA, 2008. IEEE Computer Society.

Digital Library

[18]

T. C. Mowry and S. R. Ramkissoon. Software-controlled multithreading using informing memory operations. In HPCA, pages 121--132, 2000.

[19]

V. Nagarajan and R. Gupta. Architectural support for shadow memory in multiprocessors. In VEE, pages 1--10, 2009.

Digital Library

[20]

V. Nagarajan and R. Gupta. Runtime monitoring on multicores via oases. In Operating Systems Review, 2009, to appear.

Digital Library

[21]

S. Narayanasamy, C. Pereira, and B. Calder. Recording shared memory dependencies using strata. In ASPLOS-XII, pages 229--240, 2006.

Digital Library

[22]

S. Narayanasamy, G. Pokam, and B. Calder. Bugnet: Recording application-level execution for deterministic replay debugging. IEEE Micro, 26(1):100--109, 2006.

Digital Library

[23]

N. Nethercote and J. Seward. How to shadow every byte of memory used by a program. In VEE, pages 65--74, 2007.

Digital Library

[24]

R. H. B. Netzer. Optimal tracing and replay for debugging shared-memory parallel programs. In Workshop on Parallel and Distributed Debugging, pages 1--11, 1993.

Digital Library

[25]

J. Newsome and D. Song. Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. In NDSS, 2005.

[26]

F. Qin, C. Wang, Z. Li, H. seop Kim, Y. Zhou, and Y. Wu. Lift: A low-overhead practical information flow tracking system for detecting security attacks. In MICRO 39, pages 135--148, 2006.

Digital Library

[27]

R. Rajwar and J. R. Goodman. Speculative lock elision: enabling highly concurrent multithreaded execution. In MICRO, pages 294--305, 2001.

Digital Library

[28]

R. Rajwar, M. Herlihy, and K. K. Lai. Virtualizing transactional memory. In ISCA, pages 494--505, 2005.

Digital Library

[29]

J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. SESC simulator, January 2005. http://sesc.sourceforge.net.

[30]

A. Rogers, M. C. Carlisle, J. H. Reppy, and L. J. Hendren. Supporting dynamic data structures on distributed-memory machines. ACM Trans. Program. Lang. Syst., 17(2):233--263, 1995.

Digital Library

[31]

J. Sampson, R. Gonzalez, J.-F. Collard, N. P. Jouppi, M. Schlansker, and B. Calder. Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers. In MICRO, pages 235--246, 2006.

Digital Library

[32]

S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: a dynamic data race detector for multithreaded programs. ACM Trans. Comput. Syst., 15(4):391--411, 1997.

Digital Library

[33]

A. Shriraman, M. F. Spear, H. Hossain, V. J. Marathe, S. Dwarkadas, and M. L. Scott. An integrated hardware-software approach to flexible transactional memory. In ISCA, pages 104--115, 2007.

Digital Library

[34]

M. F. Spear, A. Shriraman, H. Hossain, S. Dwarkadas, and M. L. Scott. Alert-on-update: a communication aid for shared memory multiprocessors. In PPOPP, pages 132--133, 2007.

Digital Library

[35]

S. M. Srinivasan, S. Kandula, C. R. Andrews, and Y. Zhou. Flashback: a lightweight extension for rollback and deterministic replay for software debugging. In ATEC, pages 3--3, 2004.

Digital Library

[36]

G. E. Suh, J. W. Lee, D. Zhang, and S. Devadas. Secure program execution via dynamic information flow tracking. In ASPLOS, pages 85--96, 2004.

Digital Library

[37]

C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In MICRO, pages 330--341, 2008.

Digital Library

[38]

G. Venkataramani, B. Roemer, Y. Solihin, and M. Prvulovic. Memtracker: Efficient and programmable support for memory access monitoring and debugging. In HPCA, pages 273--284, 2007.

Digital Library

[39]

S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In ISCA, pages 24--36, 1995.

Digital Library

[40]

M. Xu, R. Bodik, and M. D. Hill. A "flight data recorder" for enabling full-system multiprocessor deterministic replay. In ISCA, pages 122--133, 2003.

Digital Library

Cited By

Bakhshalipour MZare HSamandi FGolshan FLotfi-Kamran PSarbazi-Azad H(2024)Blenda: Dynamically-Reconfigurable Stacked DRAM2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00098(1323-1337)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00098
Mirhosseini AGolestani HWenisch T(2020)HyperPlane: A Scalable Low-Latency Notification Accelerator for Software Data Planes2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00074(852-867)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00074
Hu YSong MLi T(2017)Towards "Full Containerization" in Containerized Network Function VirtualizationACM SIGARCH Computer Architecture News10.1145/3093337.303771345:1(467-481)Online publication date: 4-Apr-2017
https://dl.acm.org/doi/10.1145/3093337.3037713
Show More Cited By

Recommendations

ECMon: exposing cache events for monitoring

The advent of multicores has introduced new challenges for programmers to provide increased performance and software reliability. There has been significant interest in techniques that use software speculation to better utilize the computational power ...
Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
Enabling SIMT Execution Model on Homogeneous Multi-Core System

Single-instruction multiple-thread (SIMT) machine emerges as a primary computing device in high-perfor-mance computing, since the SIMT execution paradigm can exploit data-level parallelism effectively. This article explores the SIMT execution potential ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

June 2009

510 pages

ISBN:9781605585260

DOI:10.1145/1555754

General Chair:
Steve Keckler
University of Texas at Austin
,
Program Chair:
Luiz André Barroso
Google Inc.

ACM SIGARCH Computer Architecture News Volume 37, Issue 3
June 2009
495 pages
ISSN:0163-5964
DOI:10.1145/1555815
Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISCA '09

Sponsor:

ISCA '09: The 36th Annual International Symposium on Computer Architecture

June 20 - 24, 2009

TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
610
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bakhshalipour MZare HSamandi FGolshan FLotfi-Kamran PSarbazi-Azad H(2024)Blenda: Dynamically-Reconfigurable Stacked DRAM2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00098(1323-1337)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00098
Mirhosseini AGolestani HWenisch T(2020)HyperPlane: A Scalable Low-Latency Notification Accelerator for Software Data Planes2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00074(852-867)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00074
Hu YSong MLi T(2017)Towards "Full Containerization" in Containerized Network Function VirtualizationACM SIGARCH Computer Architecture News10.1145/3093337.303771345:1(467-481)Online publication date: 4-Apr-2017
https://dl.acm.org/doi/10.1145/3093337.3037713
Hu YSong MLi T(2017)Towards "Full Containerization" in Containerized Network Function VirtualizationACM SIGPLAN Notices10.1145/3093336.303771352:4(467-481)Online publication date: 4-Apr-2017
https://dl.acm.org/doi/10.1145/3093336.3037713
Hu YSong MLi T(2017)Towards "Full Containerization" in Containerized Network Function VirtualizationACM SIGOPS Operating Systems Review10.1145/3093315.303771351:2(467-481)Online publication date: 4-Apr-2017
https://doi.org/10.1145/3093315.3037713
Hu YSong MLi TChen YTemam OCarter J(2017)Towards "Full Containerization" in Containerized Network Function VirtualizationProceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3037697.3037713(467-481)Online publication date: 4-Apr-2017
https://dl.acm.org/doi/10.1145/3037697.3037713
Huang JPrabhu PJablin TGhosh SApostolakis SLee JAugust DZaks AMendelson BRauchwerger LHwu W(2016)Speculatively Exploiting Cross-Invocation ParallelismProceedings of the 2016 International Conference on Parallel Architectures and Compilation10.1145/2967938.2967959(207-221)Online publication date: 11-Sep-2016
https://dl.acm.org/doi/10.1145/2967938.2967959
Lo DCheng LGovindaraju RRanganathan PKozyrakis C(2016)Improving Resource Efficiency at Scale with HeraclesACM Transactions on Computer Systems10.1145/288278334:2(1-33)Online publication date: 5-May-2016
https://dl.acm.org/doi/10.1145/2882783
Lo DCheng LGovindaraju RRanganathan PKozyrakis C(2015)HeraclesACM SIGARCH Computer Architecture News10.1145/2872887.274947543:3S(450-462)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2872887.2749475
Lo DCheng LGovindaraju RRanganathan PKozyrakis CMarr DAlbonesi D(2015)HeraclesProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2749475(450-462)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2749469.2749475
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten