Article

Efficient use of memory bandwidth to improve network processor throughput

Authors:

Jahangir Hasan,

Satish Chandra,

T. N. VijaykumarAuthors Info & Claims

ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture

Pages 300 - 313

https://doi.org/10.1145/859618.859653

Published: 01 May 2003 Publication History

Abstract

We consider the efficiency of packet buffers used in packet switches built using network processors (NPs). Packet buffers are typically implemented using DRAM, which provides plentiful buffering at a reasonable cost. The problem we address is that a typical NP workload may be unable to utilize the peak DRAM bandwidth. Since the bandwidth of the packet buffer is often the bottleneck in the performance of a shared-memory packet switch, inefficient use of available DRAM bandwidth further reduces the packet throughput. Specialized hardware-based schemes that alleviate the DRAM bandwith problem in high-end routers may be less applicable to NP-based systems, in which cost is an important consideration.In this paper, we propose cost-effective ways to enhance average-case DRAM bandwidth. In modern DRAMs, successive accesses falling within the same DRAM row are significantly faster than those falling across rows. If accesses to DRAM can be generated differently or reordered to take advantage of fast same-row accesses, peak DRAM bandwidth can be approached. The challenge is in exploiting this "row locality" despite the unpredictable nature of memory accesses in NPs. We propose a set of simple techniques to meet this challenge. These include locality-sensitive buffer allocation on packet input, reordering DRAM accesses to increase locality, and prefetching to reduce row miss penalty. We evaluate our techniques on cycle-accurate simulations of Intel's IXP 1200 network processor and find that they boost packet throughput on average by 42.7%, utilizing nearly the peak DRAM bandwidth, for a set of common NP applications processing a real trace.

References

[1]

Werner Bux, et al. Technologies and building blocks for fast packet forwarding. IEEE Communications Magazine, pages 70--77, January 2001.

Digital Library

[2]

C-Port Corporation. C-5 Digital Communications Processor. http://www.cportcorp.com/solutions/docs/c5brief.pdf, 1999.

[3]

C-Port Corporation. C-5 Network Processor D0 Architecture Guide. http://e-www.motorola.com/collateral/C5NPD0-AG.pdf, 2001.

[4]

Tzi cker Chiueh and Srinidhi Varadarajan. Design and evaluation of a DRAM-based shared memory ATM switch. In Proceedings of ACM Sigmetrics '97 Conference, pages 248--259, 1997.

Digital Library

[5]

W. S. Cleveland, D. Lin, and D. X. Sun. IP packet generation: Statistical models for TCP start times based on connection-rate superposition. In Performance Evaluation Review: Proc. ACM Sigmetrics, pages 166--177, 2000.

Digital Library

[6]

S. I. Hong et al. Access order and effective bandwidth for streams on a direct rambus memory. In Proceedings of Fifth International Symposium on High Performance Computer Architecture, pages 80--89, January 1999.

Digital Library

[7]

IBM. The Network Processor: Enabling Technology for High-Performance Networking. IBM Microelectronics, 1999.

[8]

IBM. IBM PowerNP NP2G Datasheet. http://www-3.ibm.com/chips/techlib/techlib.nsf/products/PowerNP_NP2G, 2002.

[9]

Intel Corporation. Intel IXP1200 Network Processor Family Hardware Reference Manual. http://developer.intel.com/design/network/ixa.htm, 2001.

[10]

Intel Corporation. IXP1200 Software Development Kit. http://developer.intel.com/design/network/ixa.htm, 2001.

[11]

S. Iyer, R. R. Kompella, and N. McKeown. Analysis of a memory architecture for fast packet buffers. In Proc. IEEE Workshop High Performance Switching and Routing (HPSR), 2001.

[12]

J. Hasan, S. Chandra and T. N. Vijaykumar. Enhancing row locality to improve network processor throughput. Technical report, 10009638-020318-11TM, Bell Labs, Lucent Technologies, Mar 2002.

[13]

S. Keshav and R. Sharma. Issues and trends in router design. IEEE Communications MAgazine, pages 144--151, May 1998.

Digital Library

[14]

Eddie Kohler, Robert Morris, Benjie Chen, John Jannotti, and M. Frans Kaashoek. The Click modular router. Computer Systems, 18(3):263--297, 2000.

Digital Library

[15]

Mark Kohler. NP complete. Embedded Systems Programming, page 45, November 2000.

[16]

James Larus and Michael Parkes. Using cohort scheduling to enhance server performance. In Proceedings of the Usenix Technical Conference, June 2002.

Digital Library

[17]

W. Lin, S. K. Reinhardt, and D. Burger. Reducing DRAM latencies with an integrated memory hierarchy design. In Proceedings of Seventh International Symposium on High-Performance Computer Architecture, pages 301--312, January 2001.

Digital Library

[18]

National Laboratory for Applied Network Research. Daily Traces. http://pma.nlanr.net/PMA/, 2002.

[19]

A. Nikologiannis and M. Katevenis. Efficient per-flow queueing in DRAM at OC-192 line rate using out-of-order execution techniques. In Proceedings of the IEEE International Conference on Communications, pages 2048--2052, June 2001.

[20]

C. Partridge, et al. A fifty gigabit per second IP router. IEEE/ACM Transactions on Networking, 6(3):237--248, June 1998.

Digital Library

[21]

Scott Rixner et al. Memory access scheduling. In Proceedings of 27th Annual International Symposium Computer Architecture, pages 128--138, June 2000.

Digital Library

[22]

SAMSUNG Corporation. SAMSUNG Network DRAM. http://www.samsungelectronics.com/semiconductors/dram/technical data/application notes/network-dram_app_note_2.pdf, 2002.

[23]

Tammo Spalink, Scott Karlin, Larry Peterson, and Yitzchak Gottlieb. Building a robust software-based router using network processors. In Proceedings of the 18th ACM Symposium on Operating Systems Principles, pages 216--229. Association for Computing Machinery, October 2001.

Digital Library

[24]

M. Waldvogel, G. Varghese, J. Turner, and B. Plattner. Scalable high speed IP routing lookup. In Proceedings of the ACM Communication Architectures, Protocols, and Applications (SIGCOMM'97), September 1997.

Digital Library

[25]

T. Wolf and M. Franklin. Locality-aware predictive scheduling of network processors. In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 152--159, November 2001.

Cited By

Kandemir MZhao HTang XKarakoy M(2015)Memory Row Reuse Distance and its Role in Optimizing Application PerformanceACM SIGMETRICS Performance Evaluation Review10.1145/2796314.274586743:1(137-149)Online publication date: 15-Jun-2015
https://dl.acm.org/doi/10.1145/2796314.2745867
Kandemir MZhao HTang XKarakoy MLin BXu JSengupta SShah D(2015)Memory Row Reuse Distance and its Role in Optimizing Application PerformanceProceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems10.1145/2745844.2745867(137-149)Online publication date: 15-Jun-2015
https://dl.acm.org/doi/10.1145/2745844.2745867
Wu LZhang W(2013)Time-predictable DRAM access scheduling algorithms for real-time multicore processors2013 Proceedings of IEEE Southeastcon10.1109/SECON.2013.6567367(1-6)Online publication date: Apr-2013
https://doi.org/10.1109/SECON.2013.6567367
Show More Cited By

Efficient use of memory bandwidth to improve network processor throughput
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
2. Networks

Recommendations

Efficient use of memory bandwidth to improve network processor throughput
ISCA 2003

We consider the efficiency of packet buffers used in packet switches built using network processors (NPs). Packet buffers are typically implemented using DRAM, which provides plentiful buffering at a reasonable cost. The problem we address is that a ...
Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors
ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture

Destination-set prediction can improve the latency/bandwidth tradeoff in shared-memory multiprocessors. The destination set is the collection of processors that receive a particular coherence request. Snooping protocols send requests to the maximal ...
Prefetching Techniques for Near-memory Throughput Processors
ICS '16: Proceedings of the 2016 International Conference on Supercomputing

Near-memory processing or processing-in-memory (PIM) is regaining a lot of interest recently as a viable solution to overcome the challenges imposed by memory wall. This trend has been mainly fueled by the emergence of 3D-stacked memories. GPUs are ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture

June 2003

432 pages

ISBN:0769519458

DOI:10.1145/859618

Conference Chair:
Allan Gottlieb
New York University & NEC Laboratories America
,
Program Chair:
Kai Li
Princeton University

ACM SIGARCH Computer Architecture News Volume 31, Issue 2
ISCA 2003
May 2003
422 pages
ISSN:0163-5964
DOI:10.1145/871656
Issue’s Table of Contents

Copyright © 2003 Authors.

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2003

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ISCA03

Sponsor:

SIGARCH

ISCA03: International Symposium on Computer Architecture

June 9 - 11, 2003

California, San Diego

Acceptance Rates

ISCA '03 Paper Acceptance Rate 36 of 184 submissions, 20%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

39
Total Citations
View Citations
1,035
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)1

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kandemir MZhao HTang XKarakoy M(2015)Memory Row Reuse Distance and its Role in Optimizing Application PerformanceACM SIGMETRICS Performance Evaluation Review10.1145/2796314.274586743:1(137-149)Online publication date: 15-Jun-2015
https://dl.acm.org/doi/10.1145/2796314.2745867
Kandemir MZhao HTang XKarakoy MLin BXu JSengupta SShah D(2015)Memory Row Reuse Distance and its Role in Optimizing Application PerformanceProceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems10.1145/2745844.2745867(137-149)Online publication date: 15-Jun-2015
https://dl.acm.org/doi/10.1145/2745844.2745867
Wu LZhang W(2013)Time-predictable DRAM access scheduling algorithms for real-time multicore processors2013 Proceedings of IEEE Southeastcon10.1109/SECON.2013.6567367(1-6)Online publication date: Apr-2013
https://doi.org/10.1109/SECON.2013.6567367
Zhao YYuan RWang WMeng DZhang SLi J(2012)A Hardware-Based TCP Stream State Tracking and Reassembly Solution for 10G Backbone TrafficProceedings of the 2012 IEEE Seventh International Conference on Networking, Architecture, and Storage10.1109/NAS.2012.24(154-163)Online publication date: 28-Jun-2012
https://dl.acm.org/doi/10.1109/NAS.2012.24
Llorente DKarras KWild THerkersdorf A(2011)Advanced packet segmentation and buffering algorithms in network processorsTransactions on High-Performance Embedded Architectures and Compilers IV10.5555/2172445.2172466(334-353)Online publication date: 1-Jan-2011
https://dl.acm.org/doi/10.5555/2172445.2172466
Shi JChen KLi KDi Z(2011)A Storage Scheme for Fast Packet Buffers in Network ProcessorAdvanced Materials Research10.4028/www.scientific.net/AMR.403-408.2628403-408(2628-2631)Online publication date: Nov-2011
https://doi.org/10.4028/www.scientific.net/AMR.403-408.2628
Li KGuang QLei LPeng YShi J(2011)A high-performance DRAM controller based on multi-core system through instruction prefetching2011 International Conference on Electronics, Communications and Control (ICECC)10.1109/ICECC.2011.6066295(1220-1223)Online publication date: Sep-2011
https://doi.org/10.1109/ICECC.2011.6066295
Agrawal BSherwood T(2009)High-bandwidth network memory system through virtual pipelinesIEEE/ACM Transactions on Networking10.1109/TNET.2008.200864617:4(1029-1041)Online publication date: 1-Aug-2009
https://dl.acm.org/doi/10.1109/TNET.2008.2008646
Rajan KGovindarajan R(2009)A Novel Cache Architecture and Placement Framework for Packet Forwarding EnginesIEEE Transactions on Computers10.1109/TC.2009.1858:8(1009-1025)Online publication date: 1-Aug-2009
https://dl.acm.org/doi/10.1109/TC.2009.18
Llorente DKarras KWild THerkersdorf A(2008)Buffer allocation for advanced packet segmentation in Network ProcessorsProceedings of the 2008 International Conference on Application-Specific Systems, Architectures and Processors10.1109/ASAP.2008.4580182(221-226)Online publication date: 2-Jul-2008
https://dl.acm.org/doi/10.1109/ASAP.2008.4580182
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents