Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1555754.1555798acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

ECMon: exposing cache events for monitoring

Published: 20 June 2009 Publication History

Abstract

The advent of multicores has introduced new challenges for programmers to provide increased performance and software reliability. There has been significant interest in techniques that use software speculation to better utilize the computational power of multicores. At the same time, several recent proposals for ensuring software reliability are not applicable in a multicore setting due to their inability to handle interprocessor shared memory dependences (ISMDs). The demands for performing speculation and ensuring software reliability in a multicore setting, although seemingly different, share a common requirement: the need for monitoring program execution and collecting interprocessor dependence information at low overhead. For example, an important component of speculation is the effcient detection of missspeculation which in turn requires dependence information. Likewise, tasks that help ensure software reliability on multicores, including recording for replay, require ISMD information.
In this paper, we propose ECMon: support for exposing cache events to the software. This enables the programmer to catch these events and react to them; in effect, efficiently exposing the ISMDs to the programmer. In the context of speculation, we show how ECMon optimizes the detection of miss-speculation; we use this simple support to speculate past active barriers and achieve a speedup of 12% for the set of parallel programs considered. As an application of ensuring software reliability, we show how ECMon can be used to record shared memory dependences on multicores using no specialized hardware support at only 2.8 fold execution time overhead.

References

[1]
A.-R. Adl-Tabatabai, B. T. Lewis, V. Menon, B. R. Murphy, B. Saha, and T. Shpeisman. Compiler and runtime support for efficient software transactional memory. In PLDI, pages 26--37, 2006.
[2]
S. Chen, M. Kozuch, T. Strigkos, B. Falsafi, P. B. Gibbons, T. C. Mowry, V. Ramachandran, O. Ruwase, M. Ryan, and E. Vlachos. Flexible hardware acceleration for instruction-grain program monitoring. In ISCA, 2008.
[3]
J. Chung, M. Dalton, H. Kannan, and C. Kozyrakis. Thread-safe binary translation using transactional memory. In HPCA, 2008.
[4]
M. H. Cintra, J. F. Martinez, and J. Torrellas. Architectural support for scalable speculative parallelization in shared-memory multiprocessors. In ISCA, pages 13--24, 2000.
[5]
M. Dalton, H. Kannan, and C. Kozyrakis. Raksha: a flexible information flow architecture for software security. In ISCA, pages 482--493, 2007.
[6]
P. Damron, A. Fedorova, Y. Lev, V. Luchangco, M. Moir, and D. Nussbaum. Hybrid transactional memory. In ASPLOS-XII, pages 336--346, 2006.
[7]
C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In PLDI, pages 223--234, 2007.
[8]
R. Gupta. The fuzzy barrier: A mechanism for high speed synchronization of processors. In ASPLOS, pages 54--63, 1989.
[9]
L. Hammond, M. Willey, and K. Olukotun. Data speculation support for a chip multiprocessor. In ASPLOS, pages 58--69, 1998.
[10]
M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. In ISCA, 1993.
[11]
M. Horowitz, M. Martonosi, T. C. Mowry, and M. D. Smith. Informing memory operations: Memory performance feedback mechanisms and their applications. ACM Trans. Comput. Syst., 16(2):170--205, 1998.
[12]
D. R. Hower and M. D. Hill. Rerun: Exploiting episodes for lightweight memory race recording. In ISCA, pages 265--276, Washington, DC, USA, 2008. IEEE Computer Society.
[13]
V. J. Marathe, W. N. S. III, and M. L. Scott. Adaptive software transactional memory. In DISC, pages 354--368, 2005.
[14]
J. F. Martinez and J. Torrellas. Speculative synchronization: applying thread-level speculation to explicitly parallel applications. In ASPLOS, pages 18--29, 2002.
[15]
M. Martonosi, D. Ofelt, and M. Heinrich. Integrating performance monitoring and communication in parallel computers. In SIGMETRICS, pages 138--147, 1996.
[16]
C. C. Minh, M. Trautmann, J. Chung, A. McDonald, N. Bronson, J. Casper, C. Kozyrakis, and K. Olukotun. An effective hybrid transactional memory system with strong isolation guarantees. In ISCA, pages 69--80, 2007.
[17]
P. Montesinos, L. Ceze, and J. Torrellas. Delorean: Recording and deterministically replaying shared-memory multiprocessor execution efficiently. In ISCA, pages 289--300, Los Alamitos, CA, USA, 2008. IEEE Computer Society.
[18]
T. C. Mowry and S. R. Ramkissoon. Software-controlled multithreading using informing memory operations. In HPCA, pages 121--132, 2000.
[19]
V. Nagarajan and R. Gupta. Architectural support for shadow memory in multiprocessors. In VEE, pages 1--10, 2009.
[20]
V. Nagarajan and R. Gupta. Runtime monitoring on multicores via oases. In Operating Systems Review, 2009, to appear.
[21]
S. Narayanasamy, C. Pereira, and B. Calder. Recording shared memory dependencies using strata. In ASPLOS-XII, pages 229--240, 2006.
[22]
S. Narayanasamy, G. Pokam, and B. Calder. Bugnet: Recording application-level execution for deterministic replay debugging. IEEE Micro, 26(1):100--109, 2006.
[23]
N. Nethercote and J. Seward. How to shadow every byte of memory used by a program. In VEE, pages 65--74, 2007.
[24]
R. H. B. Netzer. Optimal tracing and replay for debugging shared-memory parallel programs. In Workshop on Parallel and Distributed Debugging, pages 1--11, 1993.
[25]
J. Newsome and D. Song. Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. In NDSS, 2005.
[26]
F. Qin, C. Wang, Z. Li, H. seop Kim, Y. Zhou, and Y. Wu. Lift: A low-overhead practical information flow tracking system for detecting security attacks. In MICRO 39, pages 135--148, 2006.
[27]
R. Rajwar and J. R. Goodman. Speculative lock elision: enabling highly concurrent multithreaded execution. In MICRO, pages 294--305, 2001.
[28]
R. Rajwar, M. Herlihy, and K. K. Lai. Virtualizing transactional memory. In ISCA, pages 494--505, 2005.
[29]
J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. SESC simulator, January 2005. http://sesc.sourceforge.net.
[30]
A. Rogers, M. C. Carlisle, J. H. Reppy, and L. J. Hendren. Supporting dynamic data structures on distributed-memory machines. ACM Trans. Program. Lang. Syst., 17(2):233--263, 1995.
[31]
J. Sampson, R. Gonzalez, J.-F. Collard, N. P. Jouppi, M. Schlansker, and B. Calder. Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers. In MICRO, pages 235--246, 2006.
[32]
S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: a dynamic data race detector for multithreaded programs. ACM Trans. Comput. Syst., 15(4):391--411, 1997.
[33]
A. Shriraman, M. F. Spear, H. Hossain, V. J. Marathe, S. Dwarkadas, and M. L. Scott. An integrated hardware-software approach to flexible transactional memory. In ISCA, pages 104--115, 2007.
[34]
M. F. Spear, A. Shriraman, H. Hossain, S. Dwarkadas, and M. L. Scott. Alert-on-update: a communication aid for shared memory multiprocessors. In PPOPP, pages 132--133, 2007.
[35]
S. M. Srinivasan, S. Kandula, C. R. Andrews, and Y. Zhou. Flashback: a lightweight extension for rollback and deterministic replay for software debugging. In ATEC, pages 3--3, 2004.
[36]
G. E. Suh, J. W. Lee, D. Zhang, and S. Devadas. Secure program execution via dynamic information flow tracking. In ASPLOS, pages 85--96, 2004.
[37]
C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In MICRO, pages 330--341, 2008.
[38]
G. Venkataramani, B. Roemer, Y. Solihin, and M. Prvulovic. Memtracker: Efficient and programmable support for memory access monitoring and debugging. In HPCA, pages 273--284, 2007.
[39]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In ISCA, pages 24--36, 1995.
[40]
M. Xu, R. Bodik, and M. D. Hill. A "flight data recorder" for enabling full-system multiprocessor deterministic replay. In ISCA, pages 122--133, 2003.

Cited By

View all
  • (2024)Blenda: Dynamically-Reconfigurable Stacked DRAM2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00098(1323-1337)Online publication date: 2-Nov-2024
  • (2020)HyperPlane: A Scalable Low-Latency Notification Accelerator for Software Data Planes2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00074(852-867)Online publication date: Oct-2020
  • (2017)Towards "Full Containerization" in Containerized Network Function VirtualizationACM SIGARCH Computer Architecture News10.1145/3093337.303771345:1(467-481)Online publication date: 4-Apr-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture
June 2009
510 pages
ISBN:9781605585260
DOI:10.1145/1555754
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 37, Issue 3
    June 2009
    495 pages
    ISSN:0163-5964
    DOI:10.1145/1555815
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cache events
  2. recording for replay
  3. speculation past barriers

Qualifiers

  • Research-article

Conference

ISCA '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Blenda: Dynamically-Reconfigurable Stacked DRAM2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00098(1323-1337)Online publication date: 2-Nov-2024
  • (2020)HyperPlane: A Scalable Low-Latency Notification Accelerator for Software Data Planes2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00074(852-867)Online publication date: Oct-2020
  • (2017)Towards "Full Containerization" in Containerized Network Function VirtualizationACM SIGARCH Computer Architecture News10.1145/3093337.303771345:1(467-481)Online publication date: 4-Apr-2017
  • (2017)Towards "Full Containerization" in Containerized Network Function VirtualizationACM SIGPLAN Notices10.1145/3093336.303771352:4(467-481)Online publication date: 4-Apr-2017
  • (2017)Towards "Full Containerization" in Containerized Network Function VirtualizationACM SIGOPS Operating Systems Review10.1145/3093315.303771351:2(467-481)Online publication date: 4-Apr-2017
  • (2017)Towards "Full Containerization" in Containerized Network Function VirtualizationProceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3037697.3037713(467-481)Online publication date: 4-Apr-2017
  • (2016)Speculatively Exploiting Cross-Invocation ParallelismProceedings of the 2016 International Conference on Parallel Architectures and Compilation10.1145/2967938.2967959(207-221)Online publication date: 11-Sep-2016
  • (2016)Improving Resource Efficiency at Scale with HeraclesACM Transactions on Computer Systems10.1145/288278334:2(1-33)Online publication date: 5-May-2016
  • (2015)HeraclesACM SIGARCH Computer Architecture News10.1145/2872887.274947543:3S(450-462)Online publication date: 13-Jun-2015
  • (2015)HeraclesProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2749475(450-462)Online publication date: 13-Jun-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media