Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1057661.1057728acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
Article

Exploring the energy efficiency of cache coherence protocols in single-chip multi-processors

Published: 17 April 2005 Publication History

Abstract

The performance of the various cache coherence protocols proposed in the literature have been extensively analyzed in the context of high-performance multi-processor systems.A similar analysis for Multi-Processor Systems-on-Chips (MP-SoCs), where energy is at least as important as performace, and for which strict constraints on hardware and software resources do exist, has not been done yet.This work provides an effort in that sense, showing energy/performance tradeoffs for different snoop-based protocols on a realistic MPSoC architecture. The analysis leverage a multi-processor simulation platform, augmented with accurate power models, that allows cycle-accurate simulations.Our analysis show that (i) cache write policy is actually more important than the actual cache coherence protocol, and (ii) matching the programming model and style to the architecture may have dramatic effects on the energy and performance of the system.

References

[1]
"Broadening the Reach of the Intel Itanium 2 Processor Family," Technical White Paper, www.intel.com/ebusiness/pdf/prod/itanium/wp reach.pdf
[2]
M. Tremblay, J. Chen, S. Chaudry, A. Conigliaro, S.-S. Tse. "The MAJC Architecture: A Synthesis of Parallelism and Scalability,", IEEE Micro, Vol. 20, No. 6, Nov.-Dec. 2000, pp 12--25.
[3]
J.M. Tendler, J.S. Dodson, J.S. Fields Jr., H. Le, B. Sin-Haroy. "POWER4 System Microarchitecture," IBM Journal of Research and Development, Vol. 46, No. 1, January 2002.
[4]
P. Cumming "The TI OMAP Platform Approach to SoC," in Winning the SOC Revolution, Kluwer Academic Publishers, 2003.
[5]
S. Richardson, "MPOC: A Chip Multiprocessor for Embedded Systems,", HP Technical Report, HPL-2002-186, July 2002.
[6]
B. Ackland et al., "A Single Chip, 1.6 Billion, 16-b MAC/s Multiprocessor DSP," IEEE Journal of Solid State Circuits, Vol. 35, No. 3, March 2000.
[7]
Philips Semiconductor, "Philips Nexperia Platform", www.semiconductors.philips.com/products/nexperia/home S. Dutta, R. Jensen, A. Rieckmann.
[8]
M. Grammatikakis, M. Coppola, F. Sensini, "Software for Multiprocessor Networks-on-Chip," Networks on Chip, Kluwer Academic Publishers, pp. 281--303, 2003.
[9]
E. Aarts, R. Roovers, "IC Design Challenges for Ambient Intelligence," Design, Automation and Test in Europe, pp. 3--7, 2003.
[10]
L. Benini, M. Poncino, "Ambient Intelligence: A Computational Platform Perspective" in: Ambient Intelligence: Impact on Embedded System Design, T. Basten, M. Geilen, H. de Groot eds. Kluwer Academic Publishers, 2003.
[11]
A. Macii, L. Benini, M. Poncino, Memory Design Techniques for Low-Energy Embedded Systems, Kluwer Academic Publishers, 2002.
[12]
C. Lin, L. Snyder, "A Comparison of Programming Models for Shared Memory Multiprocessors," International Conference on Parallel Processing, pp. 163--170, 1990.
[13]
T.J. LeBlanc, E.P. Markatos, "Shared memory vs. message passing in shared-memory multiprocessors," Symposium on Parallel and Distributed Processing, pp. 254--263, Dec. 1992.
[14]
A.C. Klaiber, H.M. Levy, "A Comparison of Message Passing and Shared Memory Architectures for Data Parallel Programs," ISCA'94: International Symposium on Computer Architecture, pp. 94--105, 1994.
[15]
S. Chandra, J. R. Larus, A. Rogers, "Where is Time Spent in Message-Passing and Shared-Memory Programs?" ASPLOS'94: International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 61--73, 1994.
[16]
S. Karlsson and M. Brorsson. "A comparative characterization of communication patterns in applications using MPI and shared memory on an IBM SPI," International.Workshop on Communication, Architecture, and Applications for Network-Based Parallel Computing, pp. 189--201, 1998.
[17]
H. Shan, J.P. Singh, L. Oliker, R. Biswas, "Message passing vs. shared address space on a cluster of SMPs," International Parallel and Distributed Processing Symposium, April 2001.
[18]
D.E. Culler, A. Gupta. J.P. Singh, Parallel Computer Architecture: A Hardware/Software Approach Morgan Kaufmann Publishers, 1997.
[19]
M. Ekman, F. Dahlgren, P. Stenström "Evaluation of Snoop-Energy Reduction Techniques for Chip-Multiprocessors," Workshop on Duplicating, Deconstructing and Debunking - in conjunction with ISCA'02: International Symposium on Computer Architecture, May 2002. ISCA'02, May 2002.
[20]
M. Ekman, F. Dahgren, P. Stenström, "TLB and Snoop Energy-Reduction Using Virtual Caches in Low-Power Chip-Multiprocessors," ISLPED'02, : International Symposium on Low Power Electronics and Design, August 2002, pp. 243--246.
[21]
M. Loghi, M. Poncino, "Exploring Energy/Performance Tradeoffs in Shared Memory MPSoCs: Snoop-Based Cache Coherence vs. Software Solutions" DATE'05: Design, Automation and Test in Europe, to appear.
[22]
P. Stenström, "A Survey of Cache Coherence Schemes for Multiprocessors," IEEE Computer, Vol. 23, No. 6, June 1990, pp. 12--24.
[23]
M. Tomasevic, V. M. Milutinovic, "Hardware Approaches to Cache Coherence in Shared-Memory Multiprocessors," IEEE Micro, Vol. 14, No. 5-6, pp. 52--59, October/December 1994.
[24]
I. Tartalja, V. M. Milutinovic, "Classifying Software-Based Cache Coherence Solutions," IEEE Software, Vol. 14, No. 3, pp. 90--101, March 1997.
[25]
A. Moshovos, B. Falsafi, A. Choudhary, "JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers", HPCA'01 January 2001, pp. 85--97.
[26]
C. Saldanha and M. Lipasti, "Power Efficient Cache Coherence", High Performance Memory Systems, Springer-Verlag, 2003, pp. 63--78.
[27]
M. Loghi, F. Angiolini, D. Bertozzi, L. Benini, R. Zafalon, "Analyzing On-Chip Communication in a MPSoC Environment", DATE'04: Design, Automation and Test in Europe, February 2004, pp. 752--757.
[28]
Software ARM, www.g141.com/projects/swarm.
[29]
ARM Ltd., www.arm.com/products/solutions/AMBAHomePage.html
[30]
RTEMS home page, www.rtems.com.
[31]
L. Benini et al. "A power modeling and estimation framework for VLIW-based embedded systems," PATMOS'01, October 2001, pp. 26--28.
[32]
M. Chinosi, R. Zafalon, C. Guardiani, "Automatic Characterization and Modeling of Power Consumption in Static RAMs," ISLPED'98, Aug. 1998, pp. 112--114.
[33]
A. Bona, V. Zaccaria, R. Zafalon, "System-Level Power Modeling and Simulation of High-End Industrial Network-on-chip", DATE'04,pp. 318--323.
[34]
J. P. Singh, W.-D. Weber, A. Gupta, "SPLASH: Stanford Parallel Applications for Shared-Memory", Computer Architecture News, Vol. 20, No. 1, pages 5--44, March 1992.

Cited By

View all
  • (2021)Empirical Evidence for MPSoCs in Critical Systems: The Case of NXP's T2080 Cache Coherence2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474078(1162-1165)Online publication date: 1-Feb-2021
  • (2021)Performance improvement and analysis of snoopy cache coherence based multicore architecturesInternational Journal of System Assurance Engineering and Management10.1007/s13198-021-01177-w14:S3(848-864)Online publication date: 24-Jun-2021
  • (2018)Designs of Low Power Snoop for Multiprocessor System on ChipJournal of Signal Processing Systems10.1007/s11265-016-1135-488:1(83-89)Online publication date: 27-Dec-2018
  • Show More Cited By

Index Terms

  1. Exploring the energy efficiency of cache coherence protocols in single-chip multi-processors

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      GLSVLSI '05: Proceedings of the 15th ACM Great Lakes symposium on VLSI
      April 2005
      518 pages
      ISBN:1595930574
      DOI:10.1145/1057661
      • General Chair:
      • John Lach,
      • Program Chairs:
      • Gang Qu,
      • Yehea Ismail
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 April 2005

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. cache coherence
      2. low power
      3. multiprocessor
      4. system-on-chip

      Qualifiers

      • Article

      Conference

      GLSVLSI05
      Sponsor:
      GLSVLSI05: Great Lakes Symposium on VLSI 2005
      April 17 - 19, 2005
      Illinois, Chicago, USA

      Acceptance Rates

      Overall Acceptance Rate 312 of 1,156 submissions, 27%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)8
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 13 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Empirical Evidence for MPSoCs in Critical Systems: The Case of NXP's T2080 Cache Coherence2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474078(1162-1165)Online publication date: 1-Feb-2021
      • (2021)Performance improvement and analysis of snoopy cache coherence based multicore architecturesInternational Journal of System Assurance Engineering and Management10.1007/s13198-021-01177-w14:S3(848-864)Online publication date: 24-Jun-2021
      • (2018)Designs of Low Power Snoop for Multiprocessor System on ChipJournal of Signal Processing Systems10.1007/s11265-016-1135-488:1(83-89)Online publication date: 27-Dec-2018
      • (2014)Supporting faulty banks in NUCA by NoC assisted remapping mechanismsThe Journal of Supercomputing10.1007/s11227-013-1001-067:2(305-323)Online publication date: 1-Feb-2014
      • (2012)Improving performance of multi-core NUCA coherent systems using NoC-assisted mechanismsThe Journal of Supercomputing10.1007/s11227-012-0793-762:3(1318-1337)Online publication date: 9-Jun-2012
      • (2011)Directory-based cache coherence protocol for power-aware chip-multiprocessors2011 24th Canadian Conference on Electrical and Computer Engineering(CCECE)10.1109/CCECE.2011.6030618(001036-001039)Online publication date: May-2011
      • (2010)Energy- and Performance-Efficient Communication Framework for Embedded MPSoCs through Application-Driven Release ConsistencyACM Transactions on Design Automation of Electronic Systems10.1145/1870109.187011716:1(1-39)Online publication date: 1-Nov-2010
      • (2010)Design and implementation of a NoC supporting priority-based communications for many-core SoCs2010 International Computer Symposium (ICS2010)10.1109/COMPSYM.2010.5685465(483-488)Online publication date: Dec-2010
      • (2009)Low-power snoop architecture for synchronized producer-consumer embedded multiprocessingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2009.201941417:9(1362-1366)Online publication date: 1-Sep-2009
      • (2009)Cooperative caching in power-aware chip-multiprocessors2009 Canadian Conference on Electrical and Computer Engineering10.1109/CCECE.2009.5090119(195-198)Online publication date: May-2009
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media