Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

An analytical approach for fast and accurate design space exploration of instruction caches

Published: 24 December 2013 Publication History

Abstract

Application-specific system-on-chip platforms create the opportunity to customize the cache configuration for optimal performance with minimal chip area. Simulation, in particular trace-driven simulation, is widely used to estimate cache hit rates. However, simulation is too slow to be deployed in design space exploration, especially when there are hundreds of design points and the traces are huge. In this article, we propose a novel analytical approach for design space exploration of instruction caches. Given the program control flow graph (CFG) annotated only with basic block and control flow edge execution counts, we first model the cache states at each point of the CFG in a probabilistic manner. Then, we exploit the structural similarities among related cache configurations to estimate the cache hit rates for multiple cache configurations in one pass. Experimental results indicate that our analysis is 28--2,500 times faster compared to the fastest known cache simulator while maintaining high accuracy (0.2% average error) in estimating cache hit rates for a large set of popular benchmarks. Moreover, compared to a state-of-the-art cache design space exploration technique, our approach achieves 304--8,086 times speedup and saves up to 62% (average 7%) energy for the evaluated benchmarks.

References

[1]
Arnold, R., Mueller, F., Whalley, D., and Harmon, M. 1994. Bounding worst-case instruction cache performance. In Proceedings of the Real-Time Systems Symposium. 172--181.
[2]
Austin, T., Larson, E., and Ernst, D. 2002. Simplescalar: An infrastructure for computer system modeling. IEEE Computer 35, 2, 59--67.
[3]
Ball, T. 1994. Efficiently counting program events with support for on-line queries. ACM Trans. Program. Lang. Syst. 16, 5, 1399--1410.
[4]
Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA'00). 83--94.
[5]
Ghosh, A. and Givargis, T. 2004. Cache optimization for embedded processor cores: An analytical approach. ACM Trans. Des. Autom. Electron. Syst. 9, 4, 419--440.
[6]
Gordon-Ross, A., Viana, P., Vahid, F., Najjar, W., and Barros, E. 2007. A one-shot configurable-cache tuner for improved energy and performance. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'07). 755--760.
[7]
Guillon, C., Rustello, F., Bidault, T., and Bouchez, F. 2004. Procedure placement using temporal-ordering information: Dealing with code size expansion. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'04). 268--279.
[8]
Guthaus, M. R., RingeNberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. Mibench: A free, commercially representative embedded benchmark suite. In Proceedings of the Workload Characterization. 3--14.
[9]
Haque, M. S., Janapsatya, A., and Parameswaran, S. 2009. SuSeSim: A fast simulation strategy to find optimal l1 cache configuration for embedded systems. In Proceedings of the 7th IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'09). 295--304.
[10]
Hill, M. D. and Smith, A. J. 1989. Evaluating associativity in cpu caches. IEEE Trans. Comput. 38, 12, 1612--1630.
[11]
Li, X. F., Mitra, T., Negi, H. S., and Roychoundhury, A. 2004. Design space exploration of caches using compressed traces. In Proceedings of the 18th Annual International Conference on Supercomputing (ICS'04). 116--125.
[12]
Li, Y., Callahan, T., Darnell, E., Harr, R., Kurkure, U., and Stockwood, J. 2000. Hardware-software co-design of embedded reconfigurable architectures. In Proceedings of the 37th Annual Design Automation Conference (DAC'00). 507--512.
[13]
Liang, Y. and Mitra, T. 2008a. Cache modeling in probabilistic execution time analysis. In Proceedings of the 45th Annual Design Automation Conference (DAC'08). 319--324.
[14]
Liang, Y. and Mitra, T. 2008b. Static analysis for fast and accurate design space exploration of caches. In Proceedings of the 6th IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'08). 103--108.
[15]
Liang, Y. and Mitra, T. 2010a. lnstruction cache locking using temporal reuse profile. In Proceedings of the 47th Design Automation Conference (DAC'10). 344--349.
[16]
Liang, Y. and Mitra, T. 2010b. Improved procedure placement for set associative caches. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES'10). 147--156.
[17]
Mattson, R. L., Gecsel, J., Slute, D. R., and Traiger, I. L. 1970. Evaluation techniques for storage hierarchies. IBM Syst. J. 9, 2, 78--117.
[18]
Montanaro, J., Witek, R. T. Anne, K., Black, A. J., Cooper, E. M., Dobberpuhl, D. W., Donahue, P. M., Eno, J., Farell, A., Hoeppner, G. W., Kruckmeyer, D., Lee, T. H., Lin, P. C. M, Madden, L., Murray, D., Pearce, M. H., Santhanam, S., Snyder, K. J., Stephany, R., and Thieruf, S. C. 1997. A 160-mhz, 32-b, 0.5-w cmos risc microprocessor. Digital Tech. J. 9, 1.
[19]
Steven, J. E. W. and Norman, P. J. 1996. Cacti: An enhanced cache access and cycle time model. IEEE J. Solid-State Circuits 31, 677--688.
[20]
Sugumar, R. A. and Abraham, S. G. 1995. Set-associative cache simulation using generalized binomial trees. ACM Trans. Comput. Syst. 13, 1.
[21]
Uhlig, R. A. and Mudge, T. N. 1997. Trace-driven memory simulation: A survey. ACM Comput. Surv. 29, 2, 128--170.
[22]
Wang, W. H. and Baer, J. L. 1991. Efficient trace-driven simulation methods for cache performance analysis. ACM Trans. Comput. Syst. 9, 3, 222--241.
[23]
Wu, Z. and Wolf, W. 1999, Iterative cache simulation of embedded CPUs with trace Stripping. In Proceedings of the 7th International Workshop on Hardware/Software Codesign (CODES'99). 95--99.
[24]
Zhang, C. and Vahid, F. 2003. Cache configuratoin exploration on prototying platforms. In Proceeding of the 14th IEEE International Workshop on Rapid System Prototyping. 164.
[25]
Zhang, C., Vahid, F., and Najjar, W. 2003. A highly configurable cache architecture for embedded systems. SIGARCH Comput. Archit. News 31, 2, 136--146.
[26]
Zitzler, E., Deb, K., and Thiele, L. 2000. Comparison of multiobjective evolutionary algorithms: Empirical results. Evol. Comput. 8, 2, 173--195.

Cited By

View all
  • (2020)A Machine Learning Methodology for Cache Memory Design Based on Dynamic InstructionsACM Transactions on Embedded Computing Systems10.1145/337692019:2(1-20)Online publication date: 11-Mar-2020
  • (2019)Memory-Aware Design Space Exploration for Reliability Evaluation in Computing SystemsJournal of Electronic Testing: Theory and Applications10.1007/s10836-019-05785-035:2(145-162)Online publication date: 25-May-2019
  • (2018)An Analytical Cache Performance Evaluation Framework for Embedded Out-of-Order Processors Using Software CharacteristicsACM Transactions on Embedded Computing Systems10.1145/323318217:4(1-25)Online publication date: 9-Aug-2018
  • Show More Cited By

Index Terms

  1. An analytical approach for fast and accurate design space exploration of instruction caches

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Embedded Computing Systems
        ACM Transactions on Embedded Computing Systems  Volume 13, Issue 3
        December 2013
        385 pages
        ISSN:1539-9087
        EISSN:1558-3465
        DOI:10.1145/2539036
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Journal Family

        Publication History

        Published: 24 December 2013
        Accepted: 01 March 2012
        Revised: 01 January 2012
        Received: 01 July 2011
        Published in TECS Volume 13, Issue 3

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Cache
        2. analytical approach
        3. design space exploration

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Funding Sources

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)3
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 25 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2020)A Machine Learning Methodology for Cache Memory Design Based on Dynamic InstructionsACM Transactions on Embedded Computing Systems10.1145/337692019:2(1-20)Online publication date: 11-Mar-2020
        • (2019)Memory-Aware Design Space Exploration for Reliability Evaluation in Computing SystemsJournal of Electronic Testing: Theory and Applications10.1007/s10836-019-05785-035:2(145-162)Online publication date: 25-May-2019
        • (2018)An Analytical Cache Performance Evaluation Framework for Embedded Out-of-Order Processors Using Software CharacteristicsACM Transactions on Embedded Computing Systems10.1145/323318217:4(1-25)Online publication date: 9-Aug-2018
        • (2018)Analytical Two-Level Near Threshold Cache Exploration for Low Power Biomedical ApplicationsAdvanced Computer Architecture10.1007/978-981-13-2423-9_8(95-108)Online publication date: 13-Sep-2018
        • (2014)Rapid design space exploration of two-level unified caches2014 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS.2014.6865540(1937-1940)Online publication date: Jun-2014

        View Options

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media