Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/605397.605420acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Published: 01 October 2002 Publication History
  • Get Citation Alerts
  • Abstract

    Growing wire delays will force substantive changes in the designs of large caches. Traditional cache architectures assume that each level in the cache hierarchy has a single, uniform access time. Increases in on-chip communication delays will make the hit time of large on-chip caches a function of a line's physical location within the cache. Consequently, cache access times will become a continuum of latencies rather than a single discrete latency. This non-uniformity can be exploited to provide faster access to cache lines in the portions of the cache that reside closer to the processor. In this paper, we evaluate a series of cache designs that provides fast hits to multi-megabyte cache memories. We first propose physical designs for these Non-Uniform Cache Architectures (NUCAs). We extend these physical designs with logical policies that allow important data to migrate toward the processor within the same level of the cache. We show that, for multi-megabyte level-two caches, an adaptive, dynamic NUCA design achieves 1.5 times the IPC of a Uniform Cache Architecture of any size, outperforms the best static NUCA scheme by 11%, outperforms the best three-level hierarchy--while using less silicon area--by 13%, and comes within 13% of an ideal minimal hit latency solution.

    References

    [1]
    V. Agarwal, M. S. Hrishikesh, S. W. Keckler, and D. Burger. Clock rate vs. IPC: The end of the road for conventional microprocessors. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 248-259, June 2000.
    [2]
    D. H. Albonesi. Selective cache ways: On-demand cache resource allocation. In Proceedings of the 32nd International Symposium on Microarchitecture, pages 248-259, December 1999.
    [3]
    D. Bailey, J. Barton, T. Lasinski, and H. Simon. The NAS parallel benchmarks. Technical Report RNR-91-002 Revision 2, NASA Ames Research Laboratory, Mountain View, CA, August 1991.
    [4]
    F. Bodin and A. Seznec. Skewed associativity enhances performance predictability. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 265-274, June 1995.
    [5]
    F. Dahlgren and P. Stenström. On reconfigurable on-chip data caches. In Proceedings of the 24th International Symposium on Microarchitecture, pages 189-198, November 1991.
    [6]
    R. Desikan, D. Burger, S. W. Keckler, and T. M. Austin. Sim-alpha: A validated execution-driven alpha 21264 simulator. Technical Report TR-01-23, Department of Computer Sciences, University of Texas at Austin, 2001.
    [7]
    A. González, C. Aliagas, and M. Valero. A data cache with multiple caching strategies tuned to different types of locality. In Proceedings of the 1995 International Conference on Supercomputing, pages 338-347, July 1995.
    [8]
    L. Gwennap. Alpha 21364 to ease memory bottleneck. Microprocessor Report, 12(14), October 1998.
    [9]
    E. G. Hallnor and S. K. Reinhardt. A fully associative software-managed cache design. In Proceedings of the 27th International Symposium on Computer Architecture, pages 107-116, June 2000.
    [10]
    J. M. Hill and J. Lachman. A 900MHz 2.25 MB cache with on-chip CPU now in Cu SOI. In Proceedings of the IEEE International Solid-State Circuits Conference, pages 171-177, February 2001.
    [11]
    M. Horowitz, R. Ho, and K. Mai. The future of wires. In Seminconductor Research Corporation Workshop on Interconnects for Systems on a Chip, May 1999.
    [12]
    M. S. Hrishikesh, Norman P. Jouppi, Keith I. Farkas, Doug Burger, Stephen W. Keckler, and Premkishore Shivakumar. The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays. In Proceedings of the 29th Annual International Symposium on Computer Architecture, pages 14-24, May 2002.
    [13]
    J. Huh, D. Burger, and S. W. Keckler. Exploring the design space of future CMPs. In Proceedings of the 10th International Conference on Parallel Architectures and Compilation Techniques, pages 199-210, September 2001.
    [14]
    J. Rubinstein, P. Penfield, and M. A. Horowitz. Signal delay in RC tree networks. IEEE Transactions on Computer-Aided Design, CAD-2(3):202-211, 1983.
    [15]
    T. L. Johnson and W. W. Hwu. Run-time adaptive cache hierarchy management via reference analysis. In Proceedings of the 24th Annual International Symposium on Computer Architecture, pages 315-326, June 1997.
    [16]
    N. Jouppi and S. Wilton. An enhanced access and cycle time model for on-chip caches. Technical Report TR-93-5, Compaq WRL, July 1994.
    [17]
    R. E. Kessler. Analysis of Multi-Megabyte Secondary CPU Cache Memories. PhD thesis, University of Wisconsin-Madison, December 1989.
    [18]
    R. E. Kessler. The alpha 21264 microprocessor. IEEE Micro, 19(2):24-36, March/April 1999.
    [19]
    R. E. Kessler, M. D. Hill, and D. A. Wood. A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Transactions on Computers, 43(6):664-675, June 1994.
    [20]
    R. E. Kessler, R. Jooss, A. Lebeck, and M. D. Hill. Inexpensive implementations of set-associativity. In Proceedings of the 16th Annual International Symposium on Computer Architecture, pages 131-139, May 1989.
    [21]
    K.-F. Lee, H.-W. Hon, and R. Reddy. An overview of the SPHINX speech recognition system. IEEE Transactions on Acoustics, Speech and Signal Processing, 38(1):35-44, 1990.
    [22]
    D. Matzke. Will physical scalability sabotage performance gains? IEEE Computer, 30(9):37-39, September 1997.
    [23]
    H. Pilo, A. Allen, J. Covino, P. Hansen, S. Lamphier, C. Murphy, T. Traver, and P. Yee. An 833MHz 1.5w 18Mb CMOS SRAM with 1.67Gb/s/pin. In Proceedings of the 2000 IEEE International Solid-State Circuits Conference, pages 266-267, February 2000.
    [24]
    M. D. Powell, A. Agarwal, T. N. Vijaykumar, B. Falsafi, and K. Roy. Reducing set-associative cache energy via way-prediction and selective direct-mapping. In Proceedings of the 34th International Symposium on Microarchitecture, pages 54-65, December 2001.
    [25]
    S. A. Przybylski. Performance-Directed Memory Hierarchy Design. PhD thesis, Stanford University, September 1988. Technical report CSL-TR-88-366.
    [26]
    The national technology roadmap for semiconductors. Semiconductor Industry Association, 1999.
    [27]
    P. Shivakumar and N. P. Jouppi. Cacti 3.0: An integrated cache timing, power and area model. Technical report, Compaq Computer Corporation, August 2001.
    [28]
    K. So and R. N. Rechtshaffen. Cache operations by MRU change. IEEE Transactions on Computers, 37(6):700-109, July 1988.
    [29]
    G. S. Sohi and M. Franklin. High-performance data memory systems for superscalar processors. In Proceedings of the Fourth Symposium on Architectural Support for Programming Languages and Operating Systems, pages 53-62, April 1991.
    [30]
    Standard Performance Evaluation Corporation. SPEC Newsletter, Fairfax, VA, September 2000.
    [31]
    G. Tyson, M. Farrens, J. Matthews, and A. Pleszkun. A modified approach to data cache management. In Proceedings of the 28th International Symposium on Microarchitecture, pages 93-103, December 1995.
    [32]
    K. M. Wilson and K. Olukotun. Designing high bandwidth on-chip caches. In Proceedings of the 24th Annual International Symposium on Computer Architecture, pages 121-132, June 1997.
    [33]
    S. Wilton and N. Jouppi. Cacti: An enhanced cache access and cycle time model. IEEE Journal of Solid-State Circuits, 31(5):677-688, May 1996.

    Cited By

    View all
    • (2023)Attack of the Knights:Non Uniform Cache Side Channel AttackProceedings of the 39th Annual Computer Security Applications Conference10.1145/3627106.3627199(691-703)Online publication date: 4-Dec-2023
    • (2023)ACTION: Adaptive Cache Block Migration in Distributed Cache ArchitecturesACM Transactions on Architecture and Code Optimization10.1145/357291120:2(1-19)Online publication date: 1-Mar-2023
    • (2023)Huffman Cache Trails2023 IEEE International Symposium on Smart Electronic Systems (iSES)10.1109/iSES58672.2023.00063(277-282)Online publication date: 18-Dec-2023
    • Show More Cited By
    1. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ASPLOS X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
      October 2002
      318 pages
      ISBN:1581135742
      DOI:10.1145/605397
      • cover image ACM SIGOPS Operating Systems Review
        ACM SIGOPS Operating Systems Review  Volume 36, Issue 5
        December 2002
        296 pages
        ISSN:0163-5980
        DOI:10.1145/635508
        Issue’s Table of Contents
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 37, Issue 10
        October 2002
        296 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/605432
        Issue’s Table of Contents
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 30, Issue 5
        Special Issue: Proceedings of the 10th annual conference on Architectural Support for Programming Languages and Operating Systems
        December 2002
        296 pages
        ISSN:0163-5964
        DOI:10.1145/635506
        Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 October 2002

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Article

      Conference

      ASPLOS02

      Acceptance Rates

      ASPLOS X Paper Acceptance Rate 24 of 175 submissions, 14%;
      Overall Acceptance Rate 535 of 2,713 submissions, 20%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)162
      • Downloads (Last 6 weeks)7
      Reflects downloads up to 27 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Attack of the Knights:Non Uniform Cache Side Channel AttackProceedings of the 39th Annual Computer Security Applications Conference10.1145/3627106.3627199(691-703)Online publication date: 4-Dec-2023
      • (2023)ACTION: Adaptive Cache Block Migration in Distributed Cache ArchitecturesACM Transactions on Architecture and Code Optimization10.1145/357291120:2(1-19)Online publication date: 1-Mar-2023
      • (2023)Huffman Cache Trails2023 IEEE International Symposium on Smart Electronic Systems (iSES)10.1109/iSES58672.2023.00063(277-282)Online publication date: 18-Dec-2023
      • (2023)Enterprise-Class Multilevel Cache Design: Low Latency, Huge Capacity, and High ReliabilityIEEE Micro10.1109/MM.2022.319364243:1(58-66)Online publication date: 1-Jan-2023
      • (2023)3D-DNaPE: Dynamic Neighbor-Aware Performance Enhancement for Thermally Constrained 3D Many-Core SystemsIEEE Access10.1109/ACCESS.2023.333628011(131964-131978)Online publication date: 2023
      • (2022)A scalable architecture for reprioritizing ordered parallelismProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527387(437-453)Online publication date: 18-Jun-2022
      • (2022)TD-NUCA: Runtime Driven Management of NUCA Caches in Task Dataflow Programming ModelsSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00085(1-15)Online publication date: Nov-2022
      • (2022)DTM-NUCA: Dynamic Texture Mapping-NUCA for Energy-Efficient Graphics Rendering2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)10.1109/PDP55904.2022.00030(144-151)Online publication date: Mar-2022
      • (2022)A Memory Access Performance Detection and Optimization Driven by Address Mapping2022 3rd International Conference on Computer Vision, Image and Deep Learning & International Conference on Computer Engineering and Applications (CVIDL & ICCEA)10.1109/CVIDLICCEA56201.2022.9824079(513-518)Online publication date: 20-May-2022
      • (2021)Categorical Semantics of Cyber-Physical Systems TheoryACM Transactions on Cyber-Physical Systems10.1145/34616695:3(1-32)Online publication date: 11-Jul-2021
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media