research-article

The Direct-to-Data (D2D) cache: navigating the cache hierarchy with a single lookup

Authors:

Andreas Sembrant,

Erik Hagersten,

David Black-SchafferAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 42, Issue 3

Pages 133 - 144

https://doi.org/10.1145/2678373.2665694

Published: 14 June 2014 Publication History

Abstract

Modern processors optimize for cache energy and performance by employing multiple levels of caching that address bandwidth, low-latency and high-capacity. A request typically traverses the cache hierarchy, level by level, until the data is found, thereby wasting time and energy in each level. In this paper, we present the Direct-to-Data (D2D) cache that locates data across the entire cache hierarchy with a single lookup. To navigate the cache hierarchy, D2D extends the TLB with per cache-line location information that indicates in which cache and way the cache line is located. This allows the D2D cache to: 1) skip levels in the hierarchy (by accessing the right cache level directly), 2) eliminate extra data array reads (by reading the right way directly), 3) avoid tag comparisons (by eliminating the tag arrays), and 4) go directly to DRAM on cache misses (by checking the TLB). This reduces the L2 latency by 40% and saves 5-17% of the total cache hierarchy energ

D2D's lower L2 latency directly improves L2 sensitive applications' performance by 5-14%. More significantly, we can take advantage of the L2 latency reduction to optimize other parts of the micro-architecture. For example, we can reduce the ROB size for the L2 bound applications by 25%, or we can reduce the L1 cache size, delivering an overall 21% energy savings across all benchmarks, without hurting performance.

References

[1]

B. M. Beckmann and D. A. Wood, "Managing Wire Delay in Large Chip-Multiprocessor Caches," in Proc. International Symposium on Microarchitecture (MICRO), 2004.

Digital Library

[2]

N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 Simulator," SIGARCH Comput. Archit. News, 2011.

Digital Library

[3]

M. Boettcher, G. Gabrielli, B. M. Al-Hashimi, and D. Kershaw, "MALEC: A Multiple Access Low Energy Cache," in Proc. Design, Automation Test in Europe Conference Exhibition (DATE), 2013.

Digital Library

[4]

B. Calder, D. Grunwald, and J. Emer, "Predictive Sequential Associative Cache," in Proc. International Symposium on High-Performance Computer Architecture (HPCA), 1996.

Digital Library

[5]

Z. Chishti, M. D. Powell, and T. N. Vijaykumar, "Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures," in Proc. International Symposium on Microarchitecture (MICRO), 2003.

Digital Library

[6]

E. Hagersten and A. Singhal, "Method and Apparatus for Selecting a Way of a Multi-way Associative Cache by Storing Waylets in a Translation Structure," Patent US-5--778--427, July, 1998.

[7]

N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki, "Reactive NUCA: Near-optimal Block Placement and Replication in Distributed Caches," in Proc. International Symposium on Computer Architecture (ISCA), 2009.

Digital Library

[8]

J. L. Henning, "SPEC CPU2006 Benchmark Descriptions," SIGARCH Comput. Archit. News, 2006.

Digital Library

[9]

S. Kaxiras and M. Martonosi, Computer Architecture Techniques for Power-Efficiency, 2008.

Digital Library

[10]

C. Kim, D. Burger, and S. W. Keckler, "An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches," in Proc. Internationl Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2002.

Digital Library

[11]

S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures," in Proc. International Symposium on Microarchitecture (MICRO), 2009.

Digital Library

[12]

R. Min, W.-B. Jone, and Y. Hu, "Location Cache: A Low-Power L2 Cache System," in Proc. International Symposium on Low Power Electronics and Design (ISPLED), 2004.

Digital Library

[13]

N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, "CACTI 6.0: A Tool to Model Large Caches," Hewlett Packard Labs, Tech. Rep., 2009.

[14]

M. D. Powell, A. Agarwal, T. N. Vijaykumar, B. Falsafi, and K. Roy, "Reducing Set-Associative Cache Energy viaWay-Prediction and Selective Direct-Mapping," in Proc. International Symposium on Microarchitecture (MICRO), 2001.

Digital Library

[15]

A. Sembrant, E. Hagersten, and D. Black-Schaffer, "TLC: A Tag-Less Cache for Reducing Dynamic First Level Cache Energy," in Proc. International Symposium on Microarchitecture (MICRO), 2013.

Digital Library

[16]

A. Seznec, "Don'T Use the Page Number, but a Pointer to It," in Proc. International Symposium on Computer Architecture (ISCA), 1996.

Digital Library

[17]

A. Sodani, "Race to Exascale: Opportunities and Challenges," in MICRO 2011 Keynote, 2011.

[18]

SPECjbb2005, http://www.spec.org/jbb2005/.

[19]

Transaction Processing Performance Council, http://www.tpc.org/.

[20]

J. Zebchuk, E. Safi, and A. Moshovos, "A Framework for Coarse- Grain Optimizations in the On-Chip Memory Hierarchy," in Proc. International Symposium on Microarchitecture (MICRO), 2007.

Digital Library

[21]

C. Zhang, X. Zhang, and Y. Yan, "Two Fast and High-Associativity Cache Schemes," Micro, IEEE, 1997.

Digital Library

Cited By

Jamet AVavouliotis GJiménez DAlvarez LCasas M(2024)A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00046(528-542)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00046
Bera RKanellopoulos KBalachandran SNovo DOlgun ASadrosadati MMutlu OHardavellas NCampanoni SGrot BKarpuzcu U(2022)Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load PredictionProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00015(1-18)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/MICRO56248.2022.00015
Wang ZWeng JLowe-Power JGaur JNowatzki T(2021)Stream Floating: Enabling Proactive and Decentralized Cache Optimizations2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00060(640-653)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00060
Show More Cited By

Recommendations

The Direct-to-Data (D2D) cache: navigating the cache hierarchy with a single lookup
ISCA '14: Proceeding of the 41st annual international symposium on Computer architecuture

Modern processors optimize for cache energy and performance by employing multiple levels of caching that address bandwidth, low-latency and high-capacity. A request typically traverses the cache hierarchy, level by level, until the data is found, ...
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
Special Issue: Proceedings of the 17th annual international symposium on Computer Architecture

Projections of computer technology forecast processors with peak performance of 1,000 MIPS in the relatively near future. These processors could easily lose half or more of their performance in the memory hierarchy if the hierarchy design is based on ...
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
ISCA '90: Proceedings of the 17th annual international symposium on Computer Architecture

Projections of computer technology forecast processors with peak performance of 1,000 MIPS in the relatively near future. These processors could easily lose half or more of their performance in the memory hierarchy if the hierarchy design is based on ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 42, Issue 3

ISCA '14

June 2014

552 pages

ISSN:0163-5964

DOI:10.1145/2678373

Editor:
Doug DeGroot
acm dot org

Issue’s Table of Contents

ISCA '14: Proceeding of the 41st annual international symposium on Computer architecuture
June 2014
566 pages
ISBN:9781479943944
General Chairs:
Pen-Chung Yew
University of Minnesota
,
Antonia Zhai
University of Minnesota
,
Program Chair:
Steve Keckler
NVIDIA/University of Texas at Austin

Copyright © 2014 IEEE.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2014

Published in SIGARCH Volume 42, Issue 3

Check for updates

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
890
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)3

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jamet AVavouliotis GJiménez DAlvarez LCasas M(2024)A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00046(528-542)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00046
Bera RKanellopoulos KBalachandran SNovo DOlgun ASadrosadati MMutlu OHardavellas NCampanoni SGrot BKarpuzcu U(2022)Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load PredictionProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00015(1-18)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/MICRO56248.2022.00015
Wang ZWeng JLowe-Power JGaur JNowatzki T(2021)Stream Floating: Enabling Proactive and Decentralized Cache Optimizations2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00060(640-653)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00060
Oliveira GGomez-Luna JOrosa LGhose SVijaykumar NFernandez ISadrosadati MMutlu O(2021)DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement BottlenecksIEEE Access10.1109/ACCESS.2021.31109939(134457-134502)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3110993
Holtryd NManivannan MStenstrom PPericas M(2020)DELTA: Distributed Locality-Aware Cache Partitioning for Tile-based Chip Multiprocessors2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00066(578-589)Online publication date: May-2020
https://doi.org/10.1109/IPDPS47924.2020.00066
Tsai PGan YSanchez DOskin MInoue K(2018)Rethinking the memory hierarchy for modern languagesProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00025(203-216)Online publication date: 20-Oct-2018
https://dl.acm.org/doi/10.1109/MICRO.2018.00025
Sembrant AHagersten EBlack-Schaffer D(2017)A Split Cache Hierarchy for Enabling Data-Oriented Optimizations2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2017.25(133-144)Online publication date: Feb-2017
https://doi.org/10.1109/HPCA.2017.25
Jang HLee YKim JKim YKim JJeong JLee J(2016)Efficient footprint caching for Tagless DRAM Caches2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2016.7446068(237-248)Online publication date: Mar-2016
https://doi.org/10.1109/HPCA.2016.7446068
Yoon HSohi G(2016)Revisiting virtual L1 caches: A practical design using dynamic synonym remapping2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2016.7446066(212-224)Online publication date: Mar-2016
https://doi.org/10.1109/HPCA.2016.7446066
Gandhi JBasu AHill MSwift MFlautner KWenisch TOzer EFerdman M(2014)Efficient Memory VirtualizationProceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2014.37(178-189)Online publication date: 13-Dec-2014
https://dl.acm.org/doi/10.1109/MICRO.2014.37
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents