Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

The Direct-to-Data (D2D) cache: navigating the cache hierarchy with a single lookup

Published: 14 June 2014 Publication History

Abstract

Modern processors optimize for cache energy and performance by employing multiple levels of caching that address bandwidth, low-latency and high-capacity. A request typically traverses the cache hierarchy, level by level, until the data is found, thereby wasting time and energy in each level. In this paper, we present the Direct-to-Data (D2D) cache that locates data across the entire cache hierarchy with a single lookup. To navigate the cache hierarchy, D2D extends the TLB with per cache-line location information that indicates in which cache and way the cache line is located. This allows the D2D cache to: 1) skip levels in the hierarchy (by accessing the right cache level directly), 2) eliminate extra data array reads (by reading the right way directly), 3) avoid tag comparisons (by eliminating the tag arrays), and 4) go directly to DRAM on cache misses (by checking the TLB). This reduces the L2 latency by 40% and saves 5-17% of the total cache hierarchy energ
D2D's lower L2 latency directly improves L2 sensitive applications' performance by 5-14%. More significantly, we can take advantage of the L2 latency reduction to optimize other parts of the micro-architecture. For example, we can reduce the ROB size for the L2 bound applications by 25%, or we can reduce the L1 cache size, delivering an overall 21% energy savings across all benchmarks, without hurting performance.

References

[1]
B. M. Beckmann and D. A. Wood, "Managing Wire Delay in Large Chip-Multiprocessor Caches," in Proc. International Symposium on Microarchitecture (MICRO), 2004.
[2]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 Simulator," SIGARCH Comput. Archit. News, 2011.
[3]
M. Boettcher, G. Gabrielli, B. M. Al-Hashimi, and D. Kershaw, "MALEC: A Multiple Access Low Energy Cache," in Proc. Design, Automation Test in Europe Conference Exhibition (DATE), 2013.
[4]
B. Calder, D. Grunwald, and J. Emer, "Predictive Sequential Associative Cache," in Proc. International Symposium on High-Performance Computer Architecture (HPCA), 1996.
[5]
Z. Chishti, M. D. Powell, and T. N. Vijaykumar, "Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures," in Proc. International Symposium on Microarchitecture (MICRO), 2003.
[6]
E. Hagersten and A. Singhal, "Method and Apparatus for Selecting a Way of a Multi-way Associative Cache by Storing Waylets in a Translation Structure," Patent US-5--778--427, July, 1998.
[7]
N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki, "Reactive NUCA: Near-optimal Block Placement and Replication in Distributed Caches," in Proc. International Symposium on Computer Architecture (ISCA), 2009.
[8]
J. L. Henning, "SPEC CPU2006 Benchmark Descriptions," SIGARCH Comput. Archit. News, 2006.
[9]
S. Kaxiras and M. Martonosi, Computer Architecture Techniques for Power-Efficiency, 2008.
[10]
C. Kim, D. Burger, and S. W. Keckler, "An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches," in Proc. Internationl Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2002.
[11]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures," in Proc. International Symposium on Microarchitecture (MICRO), 2009.
[12]
R. Min, W.-B. Jone, and Y. Hu, "Location Cache: A Low-Power L2 Cache System," in Proc. International Symposium on Low Power Electronics and Design (ISPLED), 2004.
[13]
N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, "CACTI 6.0: A Tool to Model Large Caches," Hewlett Packard Labs, Tech. Rep., 2009.
[14]
M. D. Powell, A. Agarwal, T. N. Vijaykumar, B. Falsafi, and K. Roy, "Reducing Set-Associative Cache Energy viaWay-Prediction and Selective Direct-Mapping," in Proc. International Symposium on Microarchitecture (MICRO), 2001.
[15]
A. Sembrant, E. Hagersten, and D. Black-Schaffer, "TLC: A Tag-Less Cache for Reducing Dynamic First Level Cache Energy," in Proc. International Symposium on Microarchitecture (MICRO), 2013.
[16]
A. Seznec, "Don'T Use the Page Number, but a Pointer to It," in Proc. International Symposium on Computer Architecture (ISCA), 1996.
[17]
A. Sodani, "Race to Exascale: Opportunities and Challenges," in MICRO 2011 Keynote, 2011.
[18]
SPECjbb2005, http://www.spec.org/jbb2005/.
[19]
Transaction Processing Performance Council, http://www.tpc.org/.
[20]
J. Zebchuk, E. Safi, and A. Moshovos, "A Framework for Coarse- Grain Optimizations in the On-Chip Memory Hierarchy," in Proc. International Symposium on Microarchitecture (MICRO), 2007.
[21]
C. Zhang, X. Zhang, and Y. Yan, "Two Fast and High-Associativity Cache Schemes," Micro, IEEE, 1997.

Cited By

View all
  • (2024)A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00046(528-542)Online publication date: 2-Mar-2024
  • (2022)Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load PredictionProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00015(1-18)Online publication date: 1-Oct-2022
  • (2021)Stream Floating: Enabling Proactive and Decentralized Cache Optimizations2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00060(640-653)Online publication date: Feb-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 42, Issue 3
ISCA '14
June 2014
552 pages
ISSN:0163-5964
DOI:10.1145/2678373
Issue’s Table of Contents
  • cover image ACM Conferences
    ISCA '14: Proceeding of the 41st annual international symposium on Computer architecuture
    June 2014
    566 pages
    ISBN:9781479943944

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2014
Published in SIGARCH Volume 42, Issue 3

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)3
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00046(528-542)Online publication date: 2-Mar-2024
  • (2022)Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load PredictionProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00015(1-18)Online publication date: 1-Oct-2022
  • (2021)Stream Floating: Enabling Proactive and Decentralized Cache Optimizations2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00060(640-653)Online publication date: Feb-2021
  • (2021)DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement BottlenecksIEEE Access10.1109/ACCESS.2021.31109939(134457-134502)Online publication date: 2021
  • (2020)DELTA: Distributed Locality-Aware Cache Partitioning for Tile-based Chip Multiprocessors2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00066(578-589)Online publication date: May-2020
  • (2018)Rethinking the memory hierarchy for modern languagesProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00025(203-216)Online publication date: 20-Oct-2018
  • (2017)A Split Cache Hierarchy for Enabling Data-Oriented Optimizations2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2017.25(133-144)Online publication date: Feb-2017
  • (2016)Efficient footprint caching for Tagless DRAM Caches2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2016.7446068(237-248)Online publication date: Mar-2016
  • (2016)Revisiting virtual L1 caches: A practical design using dynamic synonym remapping2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2016.7446066(212-224)Online publication date: Mar-2016
  • (2014)Efficient Memory VirtualizationProceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2014.37(178-189)Online publication date: 13-Dec-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media