Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Leak Stopper: An Actively Revitalized Snoop Filter Architecture with Effective Generation Control

Published: 10 March 2017 Publication History

Abstract

To alleviate high energy dissipation of unnecessary snooping accesses, snoop filters have been designed to reduce snoop lookups. These filters have the problem of decreasing filtering efficiency, and thus usually rely on partial or whole filter reset by detecting block evictions. Unfortunately, the reset conditions occur infrequently or unevenly (called passive filter deletion). This work proposes the concept of revitalized snoop filter (RSF) design, which can actively renew the destination filter by employing a generation wrapping-around scheme for various reference behaviors. We further utilize a sampling mechanism for RSF to timely trigger precise filter revitalizations, so that unnecessary RSF flushing can be minimized. The proposed RSF can be integrated to various existent inclusive snoop filters with only a minor change to their designs. We evaluate our proposed design and demonstrate that RSF eliminates 58.6% of snoop energy compared to JETTY on average while inducing only 6.5% of revitalization energy overhead. In addition, RSF eliminates 45.5% of snoop energy compared to stream registers on average and only induces 2.5% of revitalization energy overhead. Overall, these RSFs reduce the total L2 cache energy consumption by 52.1% (58.6% -- 6.5%) as compared to JETTY and by 43% (45.5% -- 2.5%) as compared to stream registers. Furthermore, RSF improves the overall performance by 1% to 1.4% on average compared to JETTY and stream registers for various benchmark suites.

References

[1]
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT). ACM, New York, 72--81.
[2]
Matthias Blumrich, Valentina Salapura, and Alan Gara. 2011. Exploring the architecture of a stream register-based snoop filter. In Transactions on High-Performance Embedded Architectures and Compilers III, Per Stenström (Ed.). Springer-Verlag, Berlin, 93--114.
[3]
Jason F. Cantin, Mikko H. Lipasti, and James E. Smith. 2005. Improving multiprocessor performance with coarse-grain coherence tracking. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA). IEEE Computer Society, Washington, DC, 246--257.
[4]
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the International Symposium on Workload Characterization (IISWC). IEEE Computer Society, Washington, DC, 44--54.
[5]
Eiman Ebrahimi, Onur Mutlu, Chang Joo Lee, and Yale N. Patt. 2009. Coordinated control of multiple prefetchers in multi-core systems. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). ACM, New York, 316--326.
[6]
Aamer Jaleel, Eric Borch, Malini Bhandaru, Simon C. Steely Jr, and Joel Emer. 2010. Achieving non-inclusive cache performance with inclusive caches: Temporal locality aware (TLA) cache management policies. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE Computer Society, Washington, DC, 151--162.
[7]
Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy Ranganathan, and Dean M. Tullsen. 2003. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE Computer Society, Washington, DC, 81--92.
[8]
Andreas Moshovos. 2005. RegionScout: Exploiting coarse grain sharing in snoop-based coherence. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA). IEEE Computer Society, Washington, DC, 234--245.
[9]
Andreas Moshovos, Gokhan Memik, Babak Falsafi, and Alok Choudhary. 2001. JETTY: Filtering snoops for reduced energy consumption in SMP servers. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA). IEEE Computer Society, Washington, DC, USA, 85--96.
[10]
Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P. Jouppi. 2009. CACTI 6.0: A tool to Model Large Caches. HP Lab Technical Report HPL-2009-85. Hewlett Packard Enterprise.
[11]
PARSEC Group. 2011. A memo on exploration of SPLASH-2 input sets. Princeton University.
[12]
Aanjhan Ranganathan, Ali Galip Bayrak, Theo Kluter, Philip Brisk, Edoardo Charbon, and Paolo Ienne. 2012. Counting stream registers: An efficient and effective snoop filter architecture. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS). 120--127.
[13]
Valentina Salapura, Matthias Blumrich, and Alan Gara. 2007. Improving the accuracy of snoop filtering using stream registers. In Proceedings of the Workshop on MEmory Performance: DEaling with Applications, Systems and Architecture (MEDEA). ACM, New York, 25--32.
[14]
Valentina Salapura, Matthias Blumrich, and Alan Gara. 2008. Design and implementation of the Blue Gene/P snoop filter. In Proceedings of the 14th International Symposium on High-Performance Computer Architecture (HPCA). IEEE Computer Society, Washington, DC, 5--14.
[15]
Jie Shen and Ana Lucia Varbanescu. 2011. A Detailed Performance Analysis of the OpenMP Rodinia Benchmark. PDS Group, Delft University of Technology Technical Report PDS-2011-011. Delft University of Technology.
[16]
Santhosh Srinath, Onur Mutlu, Hyesoon Kim, and Yale N. Patt. 2007. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In Proceedings of the 13th International Symposium on High-Performance Computer Architecture (HPCA). IEEE Computer Society, Washington, DC, 63--74.
[17]
Rafael Ubal, Byunghyun Jang, Perhaad Mistry, Dana Schaa, and David Kaeli. 2012. Multi2Sim: A simulation framework for CPU-GPU computing. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT). ACM, New York, 335--344.
[18]
Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA). ACM, New York, 24--36.
[19]
Jason Zebchuk, Elham Safi, and Andreas Moshovos. 2007. A framework for coarse-grain optimizations in the on-chip memory hierarchy. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE Computer Society, Washington, DC, 314--327.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems  Volume 22, Issue 3
July 2017
440 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/3062395
  • Editor:
  • Naehyuck Chang
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 10 March 2017
Accepted: 01 September 2016
Revised: 01 September 2016
Received: 01 June 2016
Published in TODAES Volume 22, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cache storage
  2. computer architecture
  3. filters
  4. memory architecture

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 111
    Total Downloads
  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media