Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/581630.581647acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
Article

Increasing power efficiency of multi-core network processors through data filtering

Published: 08 October 2002 Publication History

Abstract

We propose and evaluate a data filtering method to reduce the power consumption of high-end processors with multiple execution cores. Although the proposed method can be applied to a wide variety of multi-processor systems including MPPs, SMPs and any type of single-chip multiprocessor, we concentrate on Network Processors. The proposed method uses an execution unit called Data Filtering Engine that processes data with low temporal locality before it is placed on the system bus. The execution cores use locality to decide which load instructions have low temporal locality and which portion of the surrounding code should be off-loaded to the data filtering engine.Our technique reduces the power consumption, because a) the low temporal data is processed on the data filtering engine before it is placed onto the high capacitance system bus, and b) the conflict misses caused by low temporal data are reduced resulting in fewer accesses to the L2 cache. Specifically, we show that our technique reduces the bus accesses in representative applications by as much as 46.8% (26.5% on average) and reduces the overall power by as much as 15.6% (8.6% on average) on a single-core processor. It also improves the performance by as much as 76.7% (29.7% on average) for a processor with 16 execution cores.

References

[1]
Abraham, S. G., R. A. Sugumar, D. Windheiser, B. R. Rau, and R. Gupta. Predictability of Load/Store Instruction Latencies. In Proceedings of Twenty-sixth International Symposium on Microarchitecture (MICRO-26), Dec. 1993.
[2]
Albonesi, D. H. Selective cache ways. In Proceedings of Int. Symposium of Microarchitecture, Nov. 1999, Haifa / Israel.
[3]
Asthana, A., M. Cravatts, and P. Krzyzanowski. Towards a Programming Environment for a Computer with Intelligent Memory. In Proceedings of Proc. of the Parallel Architectures and Compilation Techniques, Aug. 1994, Montreal / Canada.
[4]
Bahar, R. I., G. Albera, and S. Manne. Power and Performance Tradeoffs using Various Caching Strategies. In Proceedings of International Symposium on Low Power Electronics and Design, Aug. 1998, Monterey / CA.
[5]
Bellas, N., I. Hajj, C. Polychronopoulos, and G. Stamoulis. Architectural and compiler support for energy reduction in the memory hierarchy of high performance processors. In Proceedings of Intl. Symposium on Low Power Electronics and Design, Aug. 1998.
[6]
Benitez, M. E. and J. W. Davidson. Code Generation for Streaming: An Access/Execute Mechanism. In Proceedings of Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 1991, Los Alamitos / CA.
[7]
C-port Corp. C-5 Network Processor Architecture Guide. C5NPD0-AG/D, May 2001.
[8]
D. Patterson, et al., A Case for Intelligent RAM: IRAM, in IEEE Micro. April 1997.
[9]
Gonzalez, A., C. Aliagas, and M. Valero. A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality. In Proceedings of International Conference on Supercomputing, 338--347, July 1995.
[10]
Gonzalez, R. and M. Horowitz, Energy dissipation in general purpose microprocessors. IEEE Journal of Solid-State Circuits, 1996. 31(9): p. 1277--84.
[11]
Halfhill, T. R., Intel Network Processor Targets Routers, in Microprocessor Report. Sep. 13, 1999.
[12]
IBM Corp. IBM PowerNP NP4GS3 Network Processor Datasheet. IBM technical library, np3_DLTOC.fm.08, May 2001.
[13]
Johnson, T. L. and W. W. Hwu. Run-Time Adaptive Cache Hierarchy Management via Reference Analysis. In Proceedings of 24th International Symposium on Computer Architecture (ISCA), 315--326, June 1997, Denver, CO.
[14]
Jouppi, N. P. Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache Prefetch Buffers. In Proceedings of 25 Years ISCA: Retrospectives and Reprints, 388--397, 1998.
[15]
Kin, J., M. Gupta, and W. H. Mangione-Smith. The Filter Cache: an energy efficient memory structure. In Proceedings of Intl. Symposium on Microarchitecture, Dec. 1997, Research Triangle Park / NC.
[16]
Kogge, P. M., T. Sunaga, E. Retter, et al. Combined DRAM and Logic Chip for Massively Parallel Applications. In Proceedings of 16th IEEE Conference on Advanced Research in VLSI, March 1995, Raleigh / NC.
[17]
SimpleScalar LLC. SimpleScalar Home Page. http://www.simplescalar.com
[18]
Mangione-Smith, W. H. and G. Memik, Network Processing: Applications, Architectures and Examples. Tutorial at International Symposium on Microarchitecture, Austin / TX. Dec. 2001.
[19]
McKee, S. A., R. H. Klenke, K. L. Wright, W. A. Wulf, M. H. Salinas, J. H. Aylor, and A.P. Batson, Smarter Memory: Improving Bandwidth for Streamed References, in IEEE Computer. July 1998. p. 54--63.
[20]
Memik, G., W. H. Mangione-Smith, and W. Hu. NetBench: A Benchmarking Suite for Network Processors. In Proceedings of International Conference on Computer-Aided Design (ICCAD), pp. 39--42, Nov. 2001, San Jose / CA.
[21]
Montanaro, J., et al., A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor. IEEE Journal of Solid-State Circuits, 1996. 31(11): p. 1703--14.
[22]
Moshovos, A., G. Memik, B. Falsafi, and A. Choudhary. JETTY: Snoop filtering for reduced power in SMP servers. In Proceedings of International Symposium on High Performance Computer Architecture (HPCA-7), Jan 2001, Toulouse / France.
[23]
Panda, P. R. and N. D. Dutt. Reducing Address Bus Transitions for Low Power Memory Mapping. In Proceedings of EDTC-96: IEEE European Design and Test Conference, pp. 63--67, March 1996, Paris / France.
[24]
Powell, M. D., S. H. Yang, B. Falsafi, K. Roy, and T. N. Vijaykumar. Gated-Vdd: A circuit technique to reduce leakage in cache memories. In Proceedings of Intl. Symposium on Low Power Electronics and Design, July 2000.
[25]
Rivers, J. A. and E. S. Davidson. Reducing conflicts in direct-mapped caches with a temporality-based design. In Proceedings of International Conference on Parallel Processing, 154--63 vol.1, 1996.
[26]
Tyson, G., M. Farrens, J. Matthews, and A. R. Pleszkun, Managing Data Caches Using Selective Cache Line Replacement. International Journal of Parallel Programming, 1997. 25(3): p. 213--242.
[27]
Wilton, S. and N. Jouppi. An enhanced access and cycle time model for on-chip caches. July 1995.
[28]
Zhang, Y. and M .J. Irwin. Energy-Delay Analysis for On-Chip Interconnect at System Level. In Proceedings of IEEE Computer Society Workshop on VLSI, 1999.

Cited By

View all
  • (2014)Author retrospective for the dual data cacheACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2591652(32-34)Online publication date: 10-Jun-2014
  • (2012)Efficient traffic aware power management in multicore communications processorsProceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems10.1145/2396556.2396581(123-134)Online publication date: 29-Oct-2012
  • (2011)E-AHRWProceedings of the 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems10.1109/ANCS.2011.15(45-56)Online publication date: 3-Oct-2011
  • Show More Cited By

Index Terms

  1. Increasing power efficiency of multi-core network processors through data filtering

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CASES '02: Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
      October 2002
      324 pages
      ISBN:1581135750
      DOI:10.1145/581630
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 October 2002

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. chip multiprocessors
      2. data locality
      3. network processors
      4. power reduction
      5. remote procedure call

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate 52 of 230 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 03 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2014)Author retrospective for the dual data cacheACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2591652(32-34)Online publication date: 10-Jun-2014
      • (2012)Efficient traffic aware power management in multicore communications processorsProceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems10.1145/2396556.2396581(123-134)Online publication date: 29-Oct-2012
      • (2011)E-AHRWProceedings of the 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems10.1109/ANCS.2011.15(45-56)Online publication date: 3-Oct-2011
      • (2007)Conserving network processor power consumption by exploiting traffic variabilityACM Transactions on Architecture and Code Optimization10.1145/1216544.12165474:1(4-es)Online publication date: 1-Mar-2007
      • (2007)L1 Cache Filtering Through Random Selection of Memory References16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007)10.1109/PACT.2007.4336215(235-244)Online publication date: Sep-2007
      • (2005)Compiler-directed proactive power management for networksProceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1086297.1086316(137-146)Online publication date: 24-Sep-2005
      • (2005)Exploiting a computation reuse cache to reduce energy in network processorsProceedings of the First international conference on High Performance Embedded Architectures and Compilers10.1007/11587514_17(251-265)Online publication date: 17-Nov-2005
      • (2003)NpBench: a benchmark suite for control plane and data plane applications for network processorsProceedings 21st International Conference on Computer Design10.1109/ICCD.2003.1240899(226-233)Online publication date: 2003

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media