Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2039370.2039416acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article

HC-Sim: a fast and exact l1 cache simulator with scratchpad memory co-simulation support

Published: 09 October 2011 Publication History

Abstract

The configuration of L1 caches has a significant impact on the performance and energy consumption of an embedded system. Normally, an embedded system is designed for a specific application or a domain of applications. Performing simulations on the application(s) is the most popular way to find the optimal L1 cache configuration. However, the simulation-based approach suffers from long simulation time due to the need to exhaustively simulate all configurations, which are characterized by three parameters: the number of cache sets, associativity, and the cache line size. In previous work, the most time-consuming part was to determine the hit or miss status of a cache access under each configuration by performing a linear search on a long linked-list based on the inclusion property. In this work, we propose a novel simulator, HC-Sim, which adopts elaborate data structures, a centralized hash table, and a novel miss counter structure, to effectively reduce the search time. On average, we can achieve 2.56X speedup compared to the existing fastest approach (SuSeSim). In addition, we implement HC-Sim by using the dynamic binary instrumentation tool, Pin. This provides scalability for simulating larger applications by eliminating the overhead of generating and storing a huge trace file. Furthermore, HC-Sim provides the capacity to simulate an L1 cache and a scratchpad memory (SPM) simultaneously. It helps designers to explore the design space considering both L1 cache configurations and the SPM sizes.

References

[1]
NVIDIA's Next Generation CUDA Compute Architecture: Fermi (Whitepaper), 2009.
[2]
D. H. Albonesi. Selective Cache Ways: On-Demand Cache Resource Allocation. In Proc. MICRO, pages 248--259, 1999.
[3]
R. Banakar, S. Steinke, B. Lee, M. Balakrishnan, and P. Marwedel. Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. In Proc. CODES, pages 73--78, 2002.
[4]
E. Berg and E. Hagersten. Statcache: A probabilistic approach to efficient and accurate data locality analysis. In Proc. ISPASS, pages 20--27, 2004.
[5]
D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In Proc. ISCA, pages 83--94, 2000.
[6]
D. Chiou, P. Jain, L. Rudolph, and S. Devadas. Application-specific memory management for embedded systems using software-controlled caches. In Proc. DAC, pages 416--419, 2000.
[7]
J. Cong, K. Gururaj, H. Huang, C. Liu, G. Reinman, and Y. Zou. An Energy-Efficient Adaptive Hybrid Cache. In Proc. ISLPED, 2011.
[8]
J. Cong, H. Huang, C. Liu, and Y. Zou. A Reuse-Aware Prefetching Algorithm for Scratchpad Memory. In Proc. DAC, pages 960--965, 2011.
[9]
J. Edler and M. D. Hill. Dinero IV Trace-Driven Uniprocessor Cache Simulator. http://pages.cs.wisc.edu/~markhill/DineroIV/, 1998.
[10]
W. Fornaciari, D. Sciuto, C. Silvano, and V. Zaccaria. A design framework to efficiently explore energy-delay tradeoffs. In Proc. CODES, pages 260--265, 2001.
[11]
A. Ghosh and T. Givargis. Analytical design space exploration of caches for embedded systems. In Proc. DATE, pages 650--655, 2003.
[12]
S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: a compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems (TOPLAS), 21(4):703--746, 1999.
[13]
M. S. Haque, A. Janapsatya, and S. Parameswaran. Susesim: A fast simulation strategy to find optimal l1 cache configuration for embedded systems. In Proc. CODES+ISSS, pages 295--304, 2009.
[14]
M. D. Hill and A. J. Smith. Evaluating associativity in cpu caches. IEEE Transactions on Computers, 38(12):1612--1630, 1989.
[15]
http://code.google.com/p/google-sparsehash/. Google Sparsehash.
[16]
http://www.itk.org/ItkSoftwareGuide.pdf. ITK Software Guide.
[17]
http://www.spec.org/cpu2006. SPEC Benchmark, 2006.
[18]
A. Janapsatya, A. Ignjatović, and S. Parameswaran. Finding optimal l1 cache configuration for embedded systems. In Proc. ASPDAC, pages 796--801, 2006.
[19]
X. Jiang, A. Mishra, L. Zhao, R. Iyer, Z. Fang, S. Srinivasan, S. Makineni, P. Brett, and C. R. Das. Access: Smart scheduling for asymmetric cache cmps. In Proc. HPCA, 2011.
[20]
M. Kandemir and A. Choudhary. Compiler-directed scratch pad memory hierarchy design and management. In Proc. DAC, pages 628--633, 2002.
[21]
Y. H. Kim, M. D. Hill, and D. A. Wood. Implementing stack simulation for highly-associative memories. In Proc. SIGMETRICS, pages 212--213, 1991.
[22]
C. K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In Proc. PLDI, pages 190--200, 2005.
[23]
M. Martin, D. Sorin, B. Beckmann, M. Marty, M. Xu, A. Alameldeen, K. Moore, M. Hill, and D. Wood. Multifacet's General Execution-Driven Multiprocessor Simulator (GEMS) Toolset. In Computer Architecture News, pages 92--99, 2005.
[24]
R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 9(2):78--117, 1970.
[25]
J. J. Pieper, A. Mellan, J. M. Paul, D. E. Thomas, and F. Karim. High level cache simulation for heterogeneous multiprocessors. In Proc. DAC, pages 287--292, 2004.
[26]
P. Ranganathan, S. Adve, and N. P. Jouppi. Reconfigurable caches and their application to media processing. In Proc. ISCA, pages 214--224, 2000.
[27]
R. A. Sugumar and S. G. Abraham. Set-associative cache simulation using generalized binomial trees. ACM Transactions on Computer Systems (TOCS), 13(1):32--56, 1995.
[28]
N. Tojo, N. Togawa, M. Yanagisawa, and T. Ohtsuki. Exact and fast l1 cache simulation for embedded systems. In Proc. ASPDAC, pages 817--822, 2009.
[29]
X. Vera, N. Bermudo, J. Llosa, and A. González. A fast and accurate framework to analyze and optimize cache memory behavior. ACM Transactions on Programming Languages and Systems (TOPLAS), 26(2):263--300, 2004.

Cited By

View all
  • (2015)Exploring Multilevel Cache Hierarchies in Application Specific MPSoCsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2015.244573634:12(1991-2003)Online publication date: Dec-2015
  • (2015)Speeding up single pass simulation of PLRUt cachesThe 20th Asia and South Pacific Design Automation Conference10.1109/ASPDAC.2015.7059091(695-700)Online publication date: Jan-2015
  • (2015)Superoptimizing Memory Subsystems for Multiple ObjectivesEuro-Par 2015: Parallel Processing Workshops10.1007/978-3-319-27308-2_29(352-363)Online publication date: 18-Dec-2015
  • Show More Cited By

Index Terms

  1. HC-Sim: a fast and exact l1 cache simulator with scratchpad memory co-simulation support

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CODES+ISSS '11: Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
      October 2011
      402 pages
      ISBN:9781450307154
      DOI:10.1145/2039370
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 October 2011

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. cache simulation
      2. dynamic binary instrumentation
      3. l1 cache
      4. lru
      5. miss rate
      6. scratchpad memory
      7. simulation

      Qualifiers

      • Research-article

      Conference

      ESWeek '11
      ESWeek '11: Seventh Embedded Systems Week
      October 9 - 14, 2011
      Taipei, Taiwan

      Acceptance Rates

      Overall Acceptance Rate 280 of 864 submissions, 32%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)7
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 12 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2015)Exploring Multilevel Cache Hierarchies in Application Specific MPSoCsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2015.244573634:12(1991-2003)Online publication date: Dec-2015
      • (2015)Speeding up single pass simulation of PLRUt cachesThe 20th Asia and South Pacific Design Automation Conference10.1109/ASPDAC.2015.7059091(695-700)Online publication date: Jan-2015
      • (2015)Superoptimizing Memory Subsystems for Multiple ObjectivesEuro-Par 2015: Parallel Processing Workshops10.1007/978-3-319-27308-2_29(352-363)Online publication date: 18-Dec-2015
      • (2014)Hardware-based fast exploration of cache hierarchies in application specific MPSoCsProceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2617020(1-6)Online publication date: 24-Mar-2014
      • (2014)Student satisfaction of e-Learning tools for Computer Architecture and Organization course2014 IEEE Global Engineering Education Conference (EDUCON)10.1109/EDUCON.2014.6826159(630-637)Online publication date: Apr-2014
      • (2014)A scorchingly fast FPGA-based Precise L1 LRU cache simulator2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASPDAC.2014.6742926(412-417)Online publication date: Jan-2014
      • (2013)EDUCacheIC: Interactive and collaborative successor of the EDUCache simulator2013 International Conference on Interactive Collaborative Learning (ICL)10.1109/ICL.2013.6644599(360-366)Online publication date: Sep-2013
      • (2013)EDUCache simulator for teaching computer architecture and organization2013 IEEE Global Engineering Education Conference (EDUCON)10.1109/EduCon.2013.6530232(1015-1022)Online publication date: Mar-2013

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media