Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2830772.2830790acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Doppelgänger: a cache for approximate computing

Published: 05 December 2015 Publication History

Abstract

Modern processors contain large last level caches (LLCs) that consume substantial energy and area yet are imperative for high performance. Cache designs have improved dramatically by considering reference locality. Data values are also a source of optimization. Compression and deduplication exploit data values to use cache storage more efficiently resulting in smaller caches without sacrificing performance. In multi-megabyte LLCs, many identical or similar values may be cached across multiple blocks simultaneously. This redundancy effectively wastes cache capacity. We observe that a large fraction of cache values exhibit approximate similarity. More specifically, values across cache blocks are not identical but are similar. Coupled with approximate computing which observes that some applications can tolerate error or inexactness, we leverage approximate similarity to design a novel LLC architecture: the Doppelgänger cache. The Doppelgänger cache associates the tags of multiple similar blocks with a single data array entry to reduce the amount of data stored. Our design achieves 1.55×, 2.55× and 1.41× reductions in LLC area, dynamic energy and leakage energy without harming performance nor incurring high application error.

References

[1]
A. Alameldeen and D. A. Wood, "Adaptive cache compression for high-performance processors," in International Symposium on Computer Architecture, 2004.
[2]
J. Albericio et al., "The reuse cache: Downsizing the shared last-level cache," in Proceedings of the International Symposium on Microarchitecture, 2013.
[3]
C. Alvarez et al., "Fuzzy memoization for floating-point multimedia applications," IEEE Transactions on Computers, 2005.
[4]
C. Bienia, "Benchmarking modern multiprocessors," Ph.D. dissertation, Princeton University, January 2011.
[5]
S. Biswas et al., "Multi-execution: multicore caching for data-similar executions," in Proceedings of the International Symposium on Computer Architecture, 2009.
[6]
X. Chen et al., "C-pack: a high-performance microprocessor cache compression algorithm," IEEE Transactions on Very Large Scale Integration, vol. 18, no. 8, 2010.
[7]
H. Esmaeilzadeh et al., "Architecture support for disciplined approximate programming," in Proc. Int. Conf. Architectural Support for Programming Languages and Operating Systems, 2012.
[8]
H. Esmaeilzadeh et al., "Neural acceleration for general-purpose approximate programs," in Proc. Int. Symp. Microarchitecture, 2012.
[9]
B. Falsafi and T. Wenisch, A Primer on Hardware Prefetching. Morgan Claypool, 2014.
[10]
K. Flautner et al., "Drowsy caches: simple techniques for reducing leakage power," in Proc. Int. Symp. Computer Architecture, 2002.
[11]
E. Fluhr et al., "POWER8TM: A 12-core server-class processor in 22nm SOI with 7.6Tb/s off-chip bandwidth," in Proceedings of the International Solid State Circuits Conference, 2014.
[12]
S. Galal et al., "Fpu generator for design space exploration," in Proceedings of the International Symposium on Computer Arithmetic, 2013.
[13]
E. Hallnor and S. Reinhardt, "A unified compressed memory hierarchy," in Proceedings of the International Symposium on High Performance Computer Architecture, 2005.
[14]
P. Hammarlund et al., "Haswell: The fourth-generation intel core processor," IEEE MICRO, vol. 34, no. 2, 2014.
[15]
A. Jaleel et al., "High performance cache replacement using re-reference interval prediction (RRIP)," in Proceedings of the 38th International Symposium on Computer Architecture, 2010.
[16]
D. Kadjo et al., "Power gating with block migration in chip-multiprocessor last-level caches," in Proceedings of the International Conference on Computer Design, 2013.
[17]
S. M. Khan et al., "Dead block replacement and bypass with a sampling predictor," in Proceedings of the 43rd International Symposium on Microarchitecture, 2010.
[18]
M. Kleanthous and Y. Sazeides, "CATCH: A mechanism for dynamically detecting cache-content-duplication and its application to instruction caches," in Proceedings of the Conference on Design Automation and Test in Europe, 2008.
[19]
S. Liu et al., "Flikker: saving DRAM refresh-power through critical data partitioning," in Int. Conf. Architectural Support for Programming Languages and Operating Systems, 2011.
[20]
C.-K. Luk et al., "Pin: building customized program analysis tools with dynamic instrumentation," in Proc. Conf. Programming Language Design and Implementation, 2005.
[21]
N. Neelakantam et al., "FeS2: a full-system execution-driven simulator for x86," poster presented at Int. Conf. Architectural Support for Programming Languages and Operating Systems, 2008.
[22]
G. Pekhimenko et al., "Base-delta-immediate compression: Practical data compression for on-chip caches," in Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2012.
[23]
M. K. Qureshi et al., "Adaptive insertion policies for high performance caching," in Proceedings of the 34th International Symposium on Computer Architecture, 2007.
[24]
L. Renganarayana et al., "Programming with relaxed synchronization," in Proc. Workshop on Relaxing Synchronization for Multicore and Manycore Scalability, 2012.
[25]
A. Sampson et al., "EnerJ: approximate data types for safe and general low-power consumption," in Proc. Conf. Programming Language Design and Implementation, 2011.
[26]
A. Sampson et al., "Approximate storage in solid-state memories," in Proc. Int. Symp. Microarchitecture, 2013.
[27]
J. San Miguel et al., "Load value approximation," in International Symposium on Microarchitecture, 2014.
[28]
D. Sanchez and C. Kozyrakis, "The ZCache: decoupling ways and associativity," in Proceedings of the International Symposium on Microarchitecture, 2010.
[29]
S. Sardashti et al., "Skewed compressed cache," in International Symposium on Microarchitecture, 2014.
[30]
S. Sardashti and D. A. Wood, "Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching," in International Symposium on Microarchitecture, 2013.
[31]
R. Sendag et al., "Address correlation: Exceeding the limits of locality," IEEE Computer Architecture Letters, 2003.
[32]
S. Sidiroglou-Douskos et al., "Managing performance vs. accuracy trade-offs with loop perforation," in Proc. of the 19th ACM SIGSOFT Symposium and the 13th European Conf. on Foundations of software engineering, 2011, pp. 124--134.
[33]
J. Sreeram and S. Pande, "Exploiting approximate value locality for data synchronization on multi-core processors," in Proc. Int. Symp. Workload Characterization, 2010.
[34]
R. St. Amant et al., "General-purpose code acceleration with limited-precision analog computation," in Proc. of the Int. Symp. on Computer Architecture, 2014.
[35]
S. Thoziyoor et al., "CACTI 5.1," Technical Report HPL-2008-20, HP Labs, 2008.
[36]
Y. Tian et al., "Last-level cache deduplication," in Proceedings of the International Conference on Supercomputing, 2014.
[37]
C.-J. Wu et al., "PACMan: Prefetch-aware cache management for high performance caching," in Proc. of the Int. Symp. on Microarchitecture, 2011.
[38]
Y. Zhang et al., "Frequent value locality and value-centric data cache design," ACM SIGOPS Operating Systems Review, vol. 34, pp. 150--159, 2000.

Cited By

View all
  • (2024)Exploiting Human Color Discrimination for Memory- and Energy-Efficient Image Encoding in Virtual RealityProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624860(166-180)Online publication date: 27-Apr-2024
  • (2024) AxOCS : Scaling FPGA-Based Approximate Operators Using Configuration Supersampling IEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.338533371:6(2646-2659)Online publication date: Jun-2024
  • (2024)A novel approximate cache block compressor for error-resilient image dataComputers and Electrical Engineering10.1016/j.compeleceng.2024.109106115(109106)Online publication date: Apr-2024
  • Show More Cited By

Index Terms

  1. Doppelgänger: a cache for approximate computing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture
    December 2015
    787 pages
    ISBN:9781450340342
    DOI:10.1145/2830772
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 December 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Conference

    MICRO-48
    Sponsor:

    Acceptance Rates

    MICRO-48 Paper Acceptance Rate 61 of 283 submissions, 22%;
    Overall Acceptance Rate 484 of 2,242 submissions, 22%

    Upcoming Conference

    MICRO '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Exploiting Human Color Discrimination for Memory- and Energy-Efficient Image Encoding in Virtual RealityProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624860(166-180)Online publication date: 27-Apr-2024
    • (2024) AxOCS : Scaling FPGA-Based Approximate Operators Using Configuration Supersampling IEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.338533371:6(2646-2659)Online publication date: Jun-2024
    • (2024)A novel approximate cache block compressor for error-resilient image dataComputers and Electrical Engineering10.1016/j.compeleceng.2024.109106115(109106)Online publication date: Apr-2024
    • (2024)A Study on Approximate Computing for Non-volatile Memory-Based Memory SystemsJournal of Electrical Engineering & Technology10.1007/s42835-024-01795-xOnline publication date: 29-Jan-2024
    • (2024)Approximate Similarity-Aware Compression for Non-Volatile Main MemoryJournal of Computer Science and Technology10.1007/s11390-023-2565-739:1(63-81)Online publication date: 30-Jan-2024
    • (2023)AxOTreeS: A Tree Search Approach to Synthesizing FPGA-based Approximate OperatorsACM Transactions on Embedded Computing Systems10.1145/360909622:5s(1-26)Online publication date: 31-Oct-2023
    • (2023)Approximation Opportunities in Edge Computing Hardware: A Systematic Literature ReviewACM Computing Surveys10.1145/357277255:12(1-49)Online publication date: 3-Mar-2023
    • (2023)Single Exact Single Approximate Adders and Single Exact Dual Approximate AddersIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.326827531:7(907-916)Online publication date: Jul-2023
    • (2023)APPcache+: An STT-MRAM-Based Approximate Cache System With Low Power and Long LifetimeIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.326771342:11(3840-3853)Online publication date: Nov-2023
    • (2023)MSI-A: An Energy Efficient Approximated Cache Coherence ProtocolIEEE Access10.1109/ACCESS.2023.327321911(48123-48135)Online publication date: 2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media