Article

Increasing power efficiency of multi-core network processors through data filtering

Authors:

William H. Mangione-SmithAuthors Info & Claims

CASES '02: Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems

Pages 108 - 116

https://doi.org/10.1145/581630.581647

Published: 08 October 2002 Publication History

Abstract

We propose and evaluate a data filtering method to reduce the power consumption of high-end processors with multiple execution cores. Although the proposed method can be applied to a wide variety of multi-processor systems including MPPs, SMPs and any type of single-chip multiprocessor, we concentrate on Network Processors. The proposed method uses an execution unit called Data Filtering Engine that processes data with low temporal locality before it is placed on the system bus. The execution cores use locality to decide which load instructions have low temporal locality and which portion of the surrounding code should be off-loaded to the data filtering engine.Our technique reduces the power consumption, because a) the low temporal data is processed on the data filtering engine before it is placed onto the high capacitance system bus, and b) the conflict misses caused by low temporal data are reduced resulting in fewer accesses to the L2 cache. Specifically, we show that our technique reduces the bus accesses in representative applications by as much as 46.8% (26.5% on average) and reduces the overall power by as much as 15.6% (8.6% on average) on a single-core processor. It also improves the performance by as much as 76.7% (29.7% on average) for a processor with 16 execution cores.

References

[1]

Abraham, S. G., R. A. Sugumar, D. Windheiser, B. R. Rau, and R. Gupta. Predictability of Load/Store Instruction Latencies. In Proceedings of Twenty-sixth International Symposium on Microarchitecture (MICRO-26), Dec. 1993.

Digital Library

[2]

Albonesi, D. H. Selective cache ways. In Proceedings of Int. Symposium of Microarchitecture, Nov. 1999, Haifa / Israel.

Digital Library

[3]

Asthana, A., M. Cravatts, and P. Krzyzanowski. Towards a Programming Environment for a Computer with Intelligent Memory. In Proceedings of Proc. of the Parallel Architectures and Compilation Techniques, Aug. 1994, Montreal / Canada.

Digital Library

[4]

Bahar, R. I., G. Albera, and S. Manne. Power and Performance Tradeoffs using Various Caching Strategies. In Proceedings of International Symposium on Low Power Electronics and Design, Aug. 1998, Monterey / CA.

Digital Library

[5]

Bellas, N., I. Hajj, C. Polychronopoulos, and G. Stamoulis. Architectural and compiler support for energy reduction in the memory hierarchy of high performance processors. In Proceedings of Intl. Symposium on Low Power Electronics and Design, Aug. 1998.

Digital Library

[6]

Benitez, M. E. and J. W. Davidson. Code Generation for Streaming: An Access/Execute Mechanism. In Proceedings of Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 1991, Los Alamitos / CA.

Digital Library

[7]

C-port Corp. C-5 Network Processor Architecture Guide. C5NPD0-AG/D, May 2001.

[8]

D. Patterson, et al., A Case for Intelligent RAM: IRAM, in IEEE Micro. April 1997.

Digital Library

[9]

Gonzalez, A., C. Aliagas, and M. Valero. A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality. In Proceedings of International Conference on Supercomputing, 338--347, July 1995.

Digital Library

[10]

Gonzalez, R. and M. Horowitz, Energy dissipation in general purpose microprocessors. IEEE Journal of Solid-State Circuits, 1996. 31(9): p. 1277--84.

[11]

Halfhill, T. R., Intel Network Processor Targets Routers, in Microprocessor Report. Sep. 13, 1999.

[12]

IBM Corp. IBM PowerNP NP4GS3 Network Processor Datasheet. IBM technical library, np3_DLTOC.fm.08, May 2001.

[13]

Johnson, T. L. and W. W. Hwu. Run-Time Adaptive Cache Hierarchy Management via Reference Analysis. In Proceedings of 24th International Symposium on Computer Architecture (ISCA), 315--326, June 1997, Denver, CO.

Digital Library

[14]

Jouppi, N. P. Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache Prefetch Buffers. In Proceedings of 25 Years ISCA: Retrospectives and Reprints, 388--397, 1998.

Digital Library

[15]

Kin, J., M. Gupta, and W. H. Mangione-Smith. The Filter Cache: an energy efficient memory structure. In Proceedings of Intl. Symposium on Microarchitecture, Dec. 1997, Research Triangle Park / NC.

Digital Library

[16]

Kogge, P. M., T. Sunaga, E. Retter, et al. Combined DRAM and Logic Chip for Massively Parallel Applications. In Proceedings of 16th IEEE Conference on Advanced Research in VLSI, March 1995, Raleigh / NC.

Digital Library

[17]

SimpleScalar LLC. SimpleScalar Home Page. http://www.simplescalar.com

[18]

Mangione-Smith, W. H. and G. Memik, Network Processing: Applications, Architectures and Examples. Tutorial at International Symposium on Microarchitecture, Austin / TX. Dec. 2001.

[19]

McKee, S. A., R. H. Klenke, K. L. Wright, W. A. Wulf, M. H. Salinas, J. H. Aylor, and A.P. Batson, Smarter Memory: Improving Bandwidth for Streamed References, in IEEE Computer. July 1998. p. 54--63.

Digital Library

[20]

Memik, G., W. H. Mangione-Smith, and W. Hu. NetBench: A Benchmarking Suite for Network Processors. In Proceedings of International Conference on Computer-Aided Design (ICCAD), pp. 39--42, Nov. 2001, San Jose / CA.

Digital Library

[21]

Montanaro, J., et al., A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor. IEEE Journal of Solid-State Circuits, 1996. 31(11): p. 1703--14.

[22]

Moshovos, A., G. Memik, B. Falsafi, and A. Choudhary. JETTY: Snoop filtering for reduced power in SMP servers. In Proceedings of International Symposium on High Performance Computer Architecture (HPCA-7), Jan 2001, Toulouse / France.

[23]

Panda, P. R. and N. D. Dutt. Reducing Address Bus Transitions for Low Power Memory Mapping. In Proceedings of EDTC-96: IEEE European Design and Test Conference, pp. 63--67, March 1996, Paris / France.

Digital Library

[24]

Powell, M. D., S. H. Yang, B. Falsafi, K. Roy, and T. N. Vijaykumar. Gated-Vdd: A circuit technique to reduce leakage in cache memories. In Proceedings of Intl. Symposium on Low Power Electronics and Design, July 2000.

Digital Library

[25]

Rivers, J. A. and E. S. Davidson. Reducing conflicts in direct-mapped caches with a temporality-based design. In Proceedings of International Conference on Parallel Processing, 154--63 vol.1, 1996.

[26]

Tyson, G., M. Farrens, J. Matthews, and A. R. Pleszkun, Managing Data Caches Using Selective Cache Line Replacement. International Journal of Parallel Programming, 1997. 25(3): p. 213--242.

Digital Library

[27]

Wilton, S. and N. Jouppi. An enhanced access and cycle time model for on-chip caches. July 1995.

[28]

Zhang, Y. and M .J. Irwin. Energy-Delay Analysis for On-Chip Interconnect at System Level. In Proceedings of IEEE Computer Society Workshop on VLSI, 1999.

Digital Library

Cited By

González AAliagas C(2014)Author retrospective for the dual data cacheACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2591652(32-34)Online publication date: 10-Jun-2014
https://dl.acm.org/doi/10.1145/2591635.2591652
Iqbal MJohn LWolf TMoore APrasanna V(2012)Efficient traffic aware power management in multicore communications processorsProceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems10.1145/2396556.2396581(123-134)Online publication date: 29-Oct-2012
https://dl.acm.org/doi/10.1145/2396556.2396581
Kuang JBhuyan LXie HGuo D(2011)E-AHRWProceedings of the 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems10.1109/ANCS.2011.15(45-56)Online publication date: 3-Oct-2011
https://dl.acm.org/doi/10.1109/ANCS.2011.15
Show More Cited By

Index Terms

Increasing power efficiency of multi-core network processors through data filtering
1. Computer systems organization
  1. Architectures
2. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

A leakage-aware cache sharing technique for low-power chip multi-processors (CMPs) with private L2 caches
MEDEA '08: Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture

Power dissipation becomes an important issue in modern microprocessors such as chip multiprocessors (CMPs). Especially as the process technology advances below 90nm, the leakage power consumption becomes dominant in the total power dissipation, thus ...
Proximity-aware directory-based coherence for multi-core processor architectures
SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures

As the number of cores increases on chip multiprocessors, coherence is fast becoming a central issue for multi-core performance. This is exacerbated by the fact that interconnection speeds are not scaling well with technology. This paper describes ...
A hyperscalar multi-core architecture
CF '10: Proceedings of the 7th ACM international conference on Computing frontiers

This paper proposes a reconfigurable multi-core architecture, called hyperscalar that enables many scalar cores to be united dynamically as a larger superscalar processor to accelerate a thread. To accomplish this, we propose the virtual shared register ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CASES '02: Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems

October 2002

324 pages

ISBN:1581135750

DOI:10.1145/581630

General Chairs:
Shuvra S. Bhattacharyya
University of Maryland
,
Trevor Mudge
University of Michigan
,
Program Chairs:
Wayne Wolf
Princeton University
,
Ahmed Jerraya
TIMA, Grenoble, France

Copyright © 2002 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ACM: Association for Computing Machinery

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2002

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 52 of 230 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
657
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

González AAliagas C(2014)Author retrospective for the dual data cacheACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2591652(32-34)Online publication date: 10-Jun-2014
https://dl.acm.org/doi/10.1145/2591635.2591652
Iqbal MJohn LWolf TMoore APrasanna V(2012)Efficient traffic aware power management in multicore communications processorsProceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems10.1145/2396556.2396581(123-134)Online publication date: 29-Oct-2012
https://dl.acm.org/doi/10.1145/2396556.2396581
Kuang JBhuyan LXie HGuo D(2011)E-AHRWProceedings of the 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems10.1109/ANCS.2011.15(45-56)Online publication date: 3-Oct-2011
https://dl.acm.org/doi/10.1109/ANCS.2011.15
Luo YYu JYang JBhuyan L(2007)Conserving network processor power consumption by exploiting traffic variabilityACM Transactions on Architecture and Code Optimization10.1145/1216544.12165474:1(4-es)Online publication date: 1-Mar-2007
https://dl.acm.org/doi/10.1145/1216544.1216547
Etsion YFeitelson D(2007)L1 Cache Filtering Through Random Selection of Memory References16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007)10.1109/PACT.2007.4336215(235-244)Online publication date: Sep-2007
https://doi.org/10.1109/PACT.2007.4336215
Li FChen GKandemir MIrwin MConte TFaraboschi PMangione-Smith BNajjar W(2005)Compiler-directed proactive power management for networksProceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1086297.1086316(137-146)Online publication date: 24-Sep-2005
https://dl.acm.org/doi/10.1145/1086297.1086316
Li BVenkatesh GCalder BGupta R(2005)Exploiting a computation reuse cache to reduce energy in network processorsProceedings of the First international conference on High Performance Embedded Architectures and Compilers10.1007/11587514_17(251-265)Online publication date: 17-Nov-2005
https://dl.acm.org/doi/10.1007/11587514_17
Lee BJohn L(2003)NpBench: a benchmark suite for control plane and data plane applications for network processorsProceedings 21st International Conference on Computer Design10.1109/ICCD.2003.1240899(226-233)Online publication date: 2003
https://doi.org/10.1109/ICCD.2003.1240899

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents