Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/859618.859653acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

Efficient use of memory bandwidth to improve network processor throughput

Published: 01 May 2003 Publication History

Abstract

We consider the efficiency of packet buffers used in packet switches built using network processors (NPs). Packet buffers are typically implemented using DRAM, which provides plentiful buffering at a reasonable cost. The problem we address is that a typical NP workload may be unable to utilize the peak DRAM bandwidth. Since the bandwidth of the packet buffer is often the bottleneck in the performance of a shared-memory packet switch, inefficient use of available DRAM bandwidth further reduces the packet throughput. Specialized hardware-based schemes that alleviate the DRAM bandwith problem in high-end routers may be less applicable to NP-based systems, in which cost is an important consideration.In this paper, we propose cost-effective ways to enhance average-case DRAM bandwidth. In modern DRAMs, successive accesses falling within the same DRAM row are significantly faster than those falling across rows. If accesses to DRAM can be generated differently or reordered to take advantage of fast same-row accesses, peak DRAM bandwidth can be approached. The challenge is in exploiting this "row locality" despite the unpredictable nature of memory accesses in NPs. We propose a set of simple techniques to meet this challenge. These include locality-sensitive buffer allocation on packet input, reordering DRAM accesses to increase locality, and prefetching to reduce row miss penalty. We evaluate our techniques on cycle-accurate simulations of Intel's IXP 1200 network processor and find that they boost packet throughput on average by 42.7%, utilizing nearly the peak DRAM bandwidth, for a set of common NP applications processing a real trace.

References

[1]
Werner Bux, et al. Technologies and building blocks for fast packet forwarding. IEEE Communications Magazine, pages 70--77, January 2001.
[2]
C-Port Corporation. C-5 Digital Communications Processor. http://www.cportcorp.com/solutions/docs/c5brief.pdf, 1999.
[3]
C-Port Corporation. C-5 Network Processor D0 Architecture Guide. http://e-www.motorola.com/collateral/C5NPD0-AG.pdf, 2001.
[4]
Tzi cker Chiueh and Srinidhi Varadarajan. Design and evaluation of a DRAM-based shared memory ATM switch. In Proceedings of ACM Sigmetrics '97 Conference, pages 248--259, 1997.
[5]
W. S. Cleveland, D. Lin, and D. X. Sun. IP packet generation: Statistical models for TCP start times based on connection-rate superposition. In Performance Evaluation Review: Proc. ACM Sigmetrics, pages 166--177, 2000.
[6]
S. I. Hong et al. Access order and effective bandwidth for streams on a direct rambus memory. In Proceedings of Fifth International Symposium on High Performance Computer Architecture, pages 80--89, January 1999.
[7]
IBM. The Network Processor: Enabling Technology for High-Performance Networking. IBM Microelectronics, 1999.
[8]
IBM. IBM PowerNP NP2G Datasheet. http://www-3.ibm.com/chips/techlib/techlib.nsf/products/PowerNP_NP2G, 2002.
[9]
Intel Corporation. Intel IXP1200 Network Processor Family Hardware Reference Manual. http://developer.intel.com/design/network/ixa.htm, 2001.
[10]
Intel Corporation. IXP1200 Software Development Kit. http://developer.intel.com/design/network/ixa.htm, 2001.
[11]
S. Iyer, R. R. Kompella, and N. McKeown. Analysis of a memory architecture for fast packet buffers. In Proc. IEEE Workshop High Performance Switching and Routing (HPSR), 2001.
[12]
J. Hasan, S. Chandra and T. N. Vijaykumar. Enhancing row locality to improve network processor throughput. Technical report, 10009638-020318-11TM, Bell Labs, Lucent Technologies, Mar 2002.
[13]
S. Keshav and R. Sharma. Issues and trends in router design. IEEE Communications MAgazine, pages 144--151, May 1998.
[14]
Eddie Kohler, Robert Morris, Benjie Chen, John Jannotti, and M. Frans Kaashoek. The Click modular router. Computer Systems, 18(3):263--297, 2000.
[15]
Mark Kohler. NP complete. Embedded Systems Programming, page 45, November 2000.
[16]
James Larus and Michael Parkes. Using cohort scheduling to enhance server performance. In Proceedings of the Usenix Technical Conference, June 2002.
[17]
W. Lin, S. K. Reinhardt, and D. Burger. Reducing DRAM latencies with an integrated memory hierarchy design. In Proceedings of Seventh International Symposium on High-Performance Computer Architecture, pages 301--312, January 2001.
[18]
National Laboratory for Applied Network Research. Daily Traces. http://pma.nlanr.net/PMA/, 2002.
[19]
A. Nikologiannis and M. Katevenis. Efficient per-flow queueing in DRAM at OC-192 line rate using out-of-order execution techniques. In Proceedings of the IEEE International Conference on Communications, pages 2048--2052, June 2001.
[20]
C. Partridge, et al. A fifty gigabit per second IP router. IEEE/ACM Transactions on Networking, 6(3):237--248, June 1998.
[21]
Scott Rixner et al. Memory access scheduling. In Proceedings of 27th Annual International Symposium Computer Architecture, pages 128--138, June 2000.
[22]
SAMSUNG Corporation. SAMSUNG Network DRAM. http://www.samsungelectronics.com/semiconductors/dram/technical data/application notes/network-dram_app_note_2.pdf, 2002.
[23]
Tammo Spalink, Scott Karlin, Larry Peterson, and Yitzchak Gottlieb. Building a robust software-based router using network processors. In Proceedings of the 18th ACM Symposium on Operating Systems Principles, pages 216--229. Association for Computing Machinery, October 2001.
[24]
M. Waldvogel, G. Varghese, J. Turner, and B. Plattner. Scalable high speed IP routing lookup. In Proceedings of the ACM Communication Architectures, Protocols, and Applications (SIGCOMM'97), September 1997.
[25]
T. Wolf and M. Franklin. Locality-aware predictive scheduling of network processors. In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 152--159, November 2001.

Cited By

View all
  • (2015)Memory Row Reuse Distance and its Role in Optimizing Application PerformanceACM SIGMETRICS Performance Evaluation Review10.1145/2796314.274586743:1(137-149)Online publication date: 15-Jun-2015
  • (2015)Memory Row Reuse Distance and its Role in Optimizing Application PerformanceProceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems10.1145/2745844.2745867(137-149)Online publication date: 15-Jun-2015
  • (2013)Time-predictable DRAM access scheduling algorithms for real-time multicore processors2013 Proceedings of IEEE Southeastcon10.1109/SECON.2013.6567367(1-6)Online publication date: Apr-2013
  • Show More Cited By
  1. Efficient use of memory bandwidth to improve network processor throughput

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture
      June 2003
      432 pages
      ISBN:0769519458
      DOI:10.1145/859618
      • Conference Chair:
      • Allan Gottlieb,
      • Program Chair:
      • Kai Li
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 31, Issue 2
        ISCA 2003
        May 2003
        422 pages
        ISSN:0163-5964
        DOI:10.1145/871656
        Issue’s Table of Contents

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 May 2003

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Article

      Conference

      ISCA03
      Sponsor:
      ISCA03: International Symposium on Computer Architecture
      June 9 - 11, 2003
      California, San Diego

      Acceptance Rates

      ISCA '03 Paper Acceptance Rate 36 of 184 submissions, 20%;
      Overall Acceptance Rate 543 of 3,203 submissions, 17%

      Upcoming Conference

      ISCA '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)14
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 26 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2015)Memory Row Reuse Distance and its Role in Optimizing Application PerformanceACM SIGMETRICS Performance Evaluation Review10.1145/2796314.274586743:1(137-149)Online publication date: 15-Jun-2015
      • (2015)Memory Row Reuse Distance and its Role in Optimizing Application PerformanceProceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems10.1145/2745844.2745867(137-149)Online publication date: 15-Jun-2015
      • (2013)Time-predictable DRAM access scheduling algorithms for real-time multicore processors2013 Proceedings of IEEE Southeastcon10.1109/SECON.2013.6567367(1-6)Online publication date: Apr-2013
      • (2012)A Hardware-Based TCP Stream State Tracking and Reassembly Solution for 10G Backbone TrafficProceedings of the 2012 IEEE Seventh International Conference on Networking, Architecture, and Storage10.1109/NAS.2012.24(154-163)Online publication date: 28-Jun-2012
      • (2011)Advanced packet segmentation and buffering algorithms in network processorsTransactions on High-Performance Embedded Architectures and Compilers IV10.5555/2172445.2172466(334-353)Online publication date: 1-Jan-2011
      • (2011)A Storage Scheme for Fast Packet Buffers in Network ProcessorAdvanced Materials Research10.4028/www.scientific.net/AMR.403-408.2628403-408(2628-2631)Online publication date: Nov-2011
      • (2011)A high-performance DRAM controller based on multi-core system through instruction prefetching2011 International Conference on Electronics, Communications and Control (ICECC)10.1109/ICECC.2011.6066295(1220-1223)Online publication date: Sep-2011
      • (2009)High-bandwidth network memory system through virtual pipelinesIEEE/ACM Transactions on Networking10.1109/TNET.2008.200864617:4(1029-1041)Online publication date: 1-Aug-2009
      • (2009)A Novel Cache Architecture and Placement Framework for Packet Forwarding EnginesIEEE Transactions on Computers10.1109/TC.2009.1858:8(1009-1025)Online publication date: 1-Aug-2009
      • (2008)Buffer allocation for advanced packet segmentation in Network ProcessorsProceedings of the 2008 International Conference on Application-Specific Systems, Architectures and Processors10.1109/ASAP.2008.4580182(221-226)Online publication date: 2-Jul-2008
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media