Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Program optimization for instruction caches

Published: 01 April 1989 Publication History
  • Get Citation Alerts
  • Abstract

    This paper presents an optimization algorithm for reducing instruction cache misses. The algorithm uses profile information to reposition programs in memory so that a direct-mapped cache behaves much like an optimal cache with full associativity and full knowledge of the future. For best results, the cache should have a mechanism for excluding certain instructions designated by the compiler. This paper first presents a reduced form of the algorithm. This form is shown to produce an optimal miss rate for programs without conditionals and with a tree call graph, assuming basic blocks can be reordered at will. If conditionals are allowed, but there are no loops within conditionals, the algorithm does as well as an optimal cache for the worst case execution of the program consistent with the profile information. Next, the algorithm is extended with heuristics for general programs. The effectiveness of these heuristics are demonstrated with empirical results for a set of 10 programs for various cache sizes. The improvement depends on cache size. For a 512 word cache, miss rates for a direct-mapped instruction cache are halved. For an 8K word cache, miss rates fall by over 75%. Over a wide range of cache sizes the algorithm is as effective as increasing the cache size by a factor of 3 times. For 512 words, the algorithm generates only 32% more misses than an optimal cache. Optimized programs on a direct-mapped cache have lower miss rates than unoptimized programs on set-associative caches of the same size.

    References

    [1]
    Smith, A. j., "Cache Memories", Computing Surveys, Vol. 14, No. 3, September, 1982, pp. 473-530.
    [2]
    Horowitz, M., Chow, P., et. al., "MIPS-X: A 20 MIPS Peak, 32-Bit Microprocessor with On- Chip Cache", IEEE Journal of Solid-State Circuits, Vol. SC-22, No. 5, October, 1987, pp. 790-799.
    [3]
    Agarwal, A., Chow, P., Horowitz, M., Acken, J., Salz, A., and Hennessy, J., "On-Chip Instruction Caches for High Performance Processors", Proc. of the Conference on Advanced Research in VLS{, Losleben, P.,ed., Stanford University, Stanford, Ca, March 1987.
    [4]
    Hill, M.D., Aspects of Cache Memory and Instruction Buffer Performance, PhD dissertation, University of California, Berkeley, November 1987.
    [5]
    Przybylski, S, Horowitz, M., Hennessy, J.L., "Performance Tradeoffs in Cache Design", Proc. 15th Sym. on Computer Architecture, Honolulu, Hawaii, June 1988.
    [6]
    Kemighan, B.W., "Optimal Sequential Partitions of Graphs", JACM, Vol. 18, No. 1, 1971, pp. 34-40.
    [7]
    Hatfield, D.J., and Gerald, j., "Program Restructuring for Virtual Memory", IBM Systems J., Vol. 10, No. 3, 1971, pp. 168-192.
    [8]
    Fermri, D., "Improving Locality by Critical Working Sets", CACM, Vol. 17, No. 11, Nov., 1974, pp. 614-620.
    [9]
    Ferrari, D., "The Improvement of Program Behavior", Computer, Vol. 9, No. 11, Nov., 1976, pp. 39-47.
    [10]
    Baer, J.L., and Caughey, R., "Segmentation and Optimization of programs from Cyclic Structure Analysis ", Proc. AFIPS, 1972, pp. 23-36.
    [11]
    Hartley, S. J., "Compile-Time Program Restruction in Mulfiprogrammed Virtual Memory Systems", IEEE Transactions on Software Engineering, Vol. 14, No. 11, November, 1988, pp. 1640-1644.
    [12]
    Fabri, J., Automatic Storage Optimizations, PhD dissertation, NYU, 1979.
    [13]
    Abu-Sufrah, W., "Identifying Program Localities at the Source Level", Technical Report UIUCDCS -R-82-1108, Dept. of Computer Science, Univ. of Illinois, October 1982.
    [14]
    Thabit, K.O., Cache Management by the Compiler, PhD dissertation, Rice University, Nov. 1981.
    [15]
    Chow, F., "Private Communication".
    [16]
    Samples, A. D., and Hilfinger, P. N., "Code Reorganization for Instruction Caches", Technical Report UCB/CSD 88/447, University of California, Berkeley, October 1988.
    [17]
    Belady, L.A., "A Study of Replacement Algorithms for a Virtual-Storage Computer ", IBM Systems J., Vol. 5, No. 2, 1966, pp. 78-101.

    Cited By

    View all
    • (2019)Vienna Graph ClusteringProtein-Protein Interaction Networks10.1007/978-1-4939-9873-9_16(215-231)Online publication date: 4-Oct-2019
    • (2018)A Practical and Aggressive Loop Fission TechniqueAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-05234-8_9(66-75)Online publication date: 30-Dec-2018
    • (2015)An incremental rendering VMProceedings of the 7th Conference on High-Performance Graphics10.1145/2790060.2790073(51-60)Online publication date: 7-Aug-2015
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 17, Issue 2
    Special issue: Proceedings of ASPLOS-III: the third international conference on architecture support for programming languages and operating systems
    April 1989
    291 pages
    ISSN:0163-5964
    DOI:10.1145/68182
    Issue’s Table of Contents
    • cover image ACM Conferences
      ASPLOS III: Proceedings of the third international conference on Architectural support for programming languages and operating systems
      April 1989
      303 pages
      ISBN:0897913000
      DOI:10.1145/70082
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 April 1989
    Published in SIGARCH Volume 17, Issue 2

    Check for updates

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)136
    • Downloads (Last 6 weeks)19

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Vienna Graph ClusteringProtein-Protein Interaction Networks10.1007/978-1-4939-9873-9_16(215-231)Online publication date: 4-Oct-2019
    • (2018)A Practical and Aggressive Loop Fission TechniqueAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-05234-8_9(66-75)Online publication date: 30-Dec-2018
    • (2015)An incremental rendering VMProceedings of the 7th Conference on High-Performance Graphics10.1145/2790060.2790073(51-60)Online publication date: 7-Aug-2015
    • (2012)Minimizing Energy Consumption of Embedded Systems via Optimal Code LayoutIEEE Transactions on Computers10.1109/TC.2011.12261:8(1127-1139)Online publication date: 1-Aug-2012
    • (2011)Reducing memory space consumption through dataflow analysisComputer Languages, Systems and Structures10.1016/j.cl.2011.07.00137:4(168-177)Online publication date: 1-Oct-2011
    • (2009)Using Bypass to Tighten WCET Estimates for Multi-Core Processors with Shared Instruction CachesProceedings of the 2009 30th IEEE Real-Time Systems Symposium10.1109/RTSS.2009.34(68-77)Online publication date: 1-Dec-2009
    • (2004)Predicting program behavior using real or estimated profilesACM SIGPLAN Notices10.1145/989393.98943639:4(429-441)Online publication date: 1-Apr-2004
    • (1992)Experience with a software-defined machine architectureACM Transactions on Programming Languages and Systems10.1145/129393.12939514:3(299-338)Online publication date: 1-May-1992
    • (2020)Identifying and (automatically) remedying performance problems in CPU/GPU applicationsProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392759(1-13)Online publication date: 29-Jun-2020
    • (2018)Exploring predictive replacement policies for instruction cache and branch target bufferProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00050(519-532)Online publication date: 2-Jun-2018
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media