Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/956417.956577acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article

Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures

Published: 03 December 2003 Publication History

Abstract

Wire delays continue to grow as the dominant component oflatency for large caches.A recent work proposed an adaptive,non-uniform cache architecture (NUCA) to manage large, on-chipcaches.By exploiting the variation in access time acrosswidely-spaced subarrays, NUCA allows fast access to closesubarrays while retaining slow access to far subarrays.Whilethe idea of NUCA is attractive, NUCA does not employ designchoices commonly used in large caches, such as sequential tag-dataaccess for low power.Moreover, NUCA couples dataplacement with tag placement foregoing the flexibility of dataplacement and replacement that is possible in a non-uniformaccess cache.Consequently, NUCA can place only a few blockswithin a given cache set in the fastest subarrays, and mustemploy a high-bandwidth switched network to swap blockswithin the cache for high performance.In this paper, we proposethe Non-uniform access with Replacement And PlacementusIng Distance associativity" cache, or NuRAPID, whichleverages sequential tag-data access to decouple data placementfrom tag placement.Distance associativity, the placementof data at a certain distance (and latency), is separated from setassociativity, the placement of tags within a set.This decouplingenables NuRAPID to place flexibly the vast majority offrequently-accessed data in the fastest subarrays, with fewerswaps than NUCA.Distance associativity fundamentallychanges the trade-offs made by NUCA's best-performingdesign, resulting in higher performance and substantiallylower cache energy.A one-ported, non-banked NuRAPIDcache improves performance by 3% on average and up to 15%compared to a multi-banked NUCA with an infinite-bandwidthswitched network, while reducing L2 cache energy by 77%.

References

[1]
{1} D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 83-94, June 2000.
[2]
{2} D. Burger and T. M. Austin. The SimpleScalar tool set, version 2.0. Technical Report 1342, Computer Sciences Department, University of Wisconsin-Madison, June 1997.
[3]
{3} J. H. Edmondson et al. Internal organization of the Alpha 21164, a 300- MHz 64-bit quad-issue CMOS RISC microprocessor. Digital Technical Journal, 7(1), 1995.
[4]
{4} A. Gonzalez, C. Aliagas, and M. Valero. A data cache with multiple caching strategies tuned to different types of locality. In Proceedings of the 1995 International Conference on Supercomputing, pages 338- 347, July 1995.
[5]
{5} E. G. Hallnor and S. K. Reinhardt. A fully-associative software-managed cache design. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 107-116, June 2000.
[6]
{6} T. L. Johnson and W. W. Hwu. Run-time adaptive cache hierarchy management via reference analysis. In 24th International Symposium on Computer Architecture, pages 315-326, July 1997.
[7]
{7} C. Kim, D. Burger, and S. W. Keckler. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS X), pages 211-222, Oct. 2002.
[8]
{8} S. D. Naffziger, G. Colon-Bonet, T. Fischer, R. Riedlinger, T. Sullivan, and T. Grutkowski. The implementation of the Itanium 2 microprocessor. IEEE Journal of Solid-State Circuits, 37(11):1448-1460, Nov. 2002.
[9]
{9} M. D. Powell, A. Agarwal, T. N. Vijaykumar, and B. Falsafi. Reducing set-associative cache energy via way-prediction and selective direct-mapping. In Proceedings of the 34th International Symposium on Microarchitecture (MICRO 34), pages 54-65, Dec. 2001.
[10]
{10} S. A. Przybylski. Performance-directed memory hierarchy design. Technical Report 366, Stanford University, Sept. 1988.
[11]
{11} P. Shivakumar and N. P. Jouppi. Cacti 3.0: An integrated cache timing, power and area model. Technical report, Compaq Computer Corporation, Aug. 2001.
[12]
{12} A. J. Smith. Cache memories. ACM Computing Surveys (CSUR), 14(3):473-530, 1982.
[13]
{13} G. Tyson, M. Farrens, J. Matthews, and A. Pleszkun. A modified approach to data cache management. In Proceedings of the 28th International Symposium on Microarchitecture, pages 93-103, Dec. 1995.
[14]
{14} D. Weiss, J. J. Wuu, and V. Chin. The on-chip 3-mb subarray-based third-level cache on an Itanium microprocessor. IEEE Journal of Solid-State Circuits, 37(11):1523-1529, Nov. 2002.

Cited By

View all
  • (2019)HALOProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322272(601-614)Online publication date: 22-Jun-2019
  • (2019)Make the Most out of Last Level Cache in Intel ProcessorsProceedings of the Fourteenth EuroSys Conference 201910.1145/3302424.3303977(1-17)Online publication date: 25-Mar-2019
  • (2018)Enhancing computation-to-core assignment with physical location informationACM SIGPLAN Notices10.1145/3296979.319238653:4(312-327)Online publication date: 11-Jun-2018
  • Show More Cited By

Index Terms

  1. Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
          December 2003
          412 pages
          ISBN:076952043X

          Sponsors

          Publisher

          IEEE Computer Society

          United States

          Publication History

          Published: 03 December 2003

          Check for updates

          Qualifiers

          • Article

          Conference

          MICRO-36
          Sponsor:

          Acceptance Rates

          MICRO 36 Paper Acceptance Rate 35 of 134 submissions, 26%;
          Overall Acceptance Rate 484 of 2,242 submissions, 22%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)1
          • Downloads (Last 6 weeks)1
          Reflects downloads up to 06 Feb 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2019)HALOProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322272(601-614)Online publication date: 22-Jun-2019
          • (2019)Make the Most out of Last Level Cache in Intel ProcessorsProceedings of the Fourteenth EuroSys Conference 201910.1145/3302424.3303977(1-17)Online publication date: 25-Mar-2019
          • (2018)Enhancing computation-to-core assignment with physical location informationACM SIGPLAN Notices10.1145/3296979.319238653:4(312-327)Online publication date: 11-Jun-2018
          • (2018)Enhancing computation-to-core assignment with physical location informationProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3192366.3192386(312-327)Online publication date: 11-Jun-2018
          • (2018)BenzeneACM Transactions on Architecture and Code Optimization10.1145/317796315:1(1-23)Online publication date: 22-Mar-2018
          • (2016)Exploiting Private Local Memories to Reduce the Opportunity Cost of Accelerator IntegrationProceedings of the 2016 International Conference on Supercomputing10.1145/2925426.2926258(1-12)Online publication date: 1-Jun-2016
          • (2015)SLIPACM SIGARCH Computer Architecture News10.1145/2872887.275039843:3S(349-361)Online publication date: 13-Jun-2015
          • (2015)SLIPProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750398(349-361)Online publication date: 13-Jun-2015
          • (2014)The Direct-to-Data (D2D) cacheProceeding of the 41st annual international symposium on Computer architecuture10.5555/2665671.2665694(133-144)Online publication date: 14-Jun-2014
          • (2014)The Direct-to-Data (D2D) cacheACM SIGARCH Computer Architecture News10.1145/2678373.266569442:3(133-144)Online publication date: 14-Jun-2014
          • Show More Cited By

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media