Article

High-performance IPv6 forwarding algorithm for multi-core and multithreaded network processor

Authors:

Bei HuaAuthors Info & Claims

PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming

Pages 168 - 177

https://doi.org/10.1145/1122971.1122998

Published: 29 March 2006 Publication History

Abstract

IP forwarding is one of the main bottlenecks in Internet backbone routers, as it requires performing the longest-prefix match at 10Gbps speed or higher. IPv6 forwarding further exacerbates the situation because its search space is quadrupled. We propose a high-performance IPv6 forwarding algorithm TrieC, and implement it efficiently on the Intel IXP2800 network processor (NPU). Programming the multi-core and multithreaded NPU is a daunting task. We study the interaction between the parallel algorithm design and the architecture mapping to facilitate efficient algorithm implementation. We experiment with an architecture-aware design principle to guarantee the high performance of the resulting algorithm.This paper investigates the main software design issues that have dramatic performance impacts on any NPU based implementation: memory space reduction, instruction selection, data allocation, task partitioning, latency hiding, and thread synchronization. In the paper, we provide insight on how to design an NPU-aware algorithm for high-performance networking applications. Based on the detailed performance analysis of the TrieC algorithm, we provide guidance on developing high-performance networking applications for the multi-core and multithreaded architecture.

References

[1]

Agere, Network Processor, http://www.agere.com/telecom/network_processors.html.

[2]

J. R. Allen, B. M. Bass, C. Basso, R. H. Boivie, J. L. Calvignac, G. T. Davis, L. Frelechoux, M. Heddes, A., et al., "IBM PowerNP Network Processor: Hardware, Software, and Applications", IBM J. Res. & Dev., Vol. 47 NO. 2/3 MARCH/MAY 2003.

Digital Library

[3]

AMCC, Network Processor, https://www.amcc.com/MyAMCC/jsp/public/browse/controller.jsp?networkLevel=COMM&superFamily=NETP.

[4]

CERNET BGP View Project, http://bgpview.6test.edu.cn/bgp-view/index.shtml.

[5]

M. K. Chen, X. F. Li, R. Lian, J. H. Lin, L. Liu, T. Liu, and R. Ju, "Shangri-La: achieving high performance from compiled network applications while enabling ease of programming", in Proc. of ACM PLDI'05, Chicago, IL, USA, 2005, pp. 224--236.

Digital Library

[6]

J. Dai, B. Huang, L. Li, and L. Harrison, "Automatically Partitioning Packet Processing Applications for Pipelined Architectures", in Proc. of ACM PLDI'05, 2005, pp. 237--248.

Digital Library

[7]

S. Deering, and R. Hinden, RFC2460, "Internet Protocol, Version 6?IPv6?Specification".

Digital Library

[8]

M. Degermark, A. Brodnik, S. Carlsson, and S. Pink, "Small Forwarding Tables for Fast Routing Lookups," in Proc. of ACM SIGCOMM '97, Cannes, France, 1997, pp. 3--14.

Digital Library

[9]

W. Eatherton, G. Varghese, and Z Dittia, "Tree Bitmap: Hardware/Software IP Lookups with Incremental Updates," in Proc. of ACM SIGCOMM on Computer Communication Review, Vol. 34, Issue 2, April 2004, pp. 97--122.

Digital Library

[10]

Freescale, C-Port Network Processors, http://www.freescale.com/webapp/sps/site/homepage.jsp?nodeId=02VS0lDFTQ3126.

[11]

P. Gupta, S. Lin, and N. McKeown, "Routing Lookups in Hardware at Memory Access Speeds", in Proc. of INFOCOM'98, Vol. 3, San Francisco, 1998, pp. 1240--1247.

[12]

J. Hasan, and T. N. Vijaykumar, "Dynamic Pipelining: Making IP-lookup Truly Scalable", in Proc. of ACM SIGCOMM'05, Philadelphia, USA, 2005, pp. 205--216.

Digital Library

[13]

Xianghui Hu, Bei Hua, and Xinan Tang, "TrieC: A High-Speed IPv6 Lookup with Fast Updates Using Network Processor", in Proc. of the International Conference on Embedded Software and Systems, Xi'an, China, Dec. 2005. pp. 117--128.

Digital Library

[14]

Intel, IXP2XXX Product Line of Network Processor, http://www.intel.com/design/network/products/npfamily/ixp2xxx.htm.

[15]

IPv6 Report, http://bgp.potaroo.net/index-v6.html.

[16]

R. Jain, "A Comparison of Hashing Schemes for Address Lookup in Computer Networks", IEEE Transactions on Communications, 40 (10), Oct. 1992, pp. 1570--1573.

[17]

C. Kulkarni, M. Gries, C. Sauer, and K. Keutzer, "Programming Challenges in Network Processor Deployment", in Proc. of the International Conference on Compilers, Architecture, and Synthesis for Embedded System, San Jose, 2003, pp. 178--187.

Digital Library

[18]

B. Lampson, V. Srinivasan, and G. Varghese, "IP Lookups using Multiway and Multicolumn Search", in Proc. of INFOCOM'98, San Francisco, 1998, pp. 1248--1256.

[19]

A.J. McAuley, and P. Francis, "Fast Routing Table Lookup using CAMs", in Proc. of INFOCOM'93, Vol. 3, pp. 1382--1391.

[20]

L. K. McDowell, S. J. Eggers, and S. D. Gribble, "Improving Server Software Support for Simultaneous Multithreaded Processors", in Proc. of ASPLOS'00, Cambridge, MA, USA, 2000, pp. 245--256.

Digital Library

[21]

D. R. Morrison, "PATRICIA - Practical Algorithm to Retrieve Information Coded in Alphanumeric", J. ACM, Vol. 15, No. 4, 1968, pp. 514--534.

Digital Library

[22]

M. K. Prabhu and K. Olukotun, "Exposing Speculative Thread Parallelism in SPEC2000", in Proc. of ACM PPoPP'05, Chicago, 2005, pp. 142--152.

Digital Library

[23]

M. A. Ruiz-Sanchez, E.W. Biersack, and W. Dabbous, "Survey and Taxonomy of IP Address Lookup Algorithms", IEEE Network, Vol. 15, 2001, pp. 8--23.

Digital Library

[24]

R. Sangireddy, and A.K. Somani, "High-speed IP Routing with Binary Decision Diagrams based Hardware Address Lookup Engine", IEEE Journal on Selected Areas in Communications, Vol. 21, Issue 4, May 2003, pp. 513--521.

Digital Library

[25]

A. Sodan, G. R. Gao, O. Maquelin, J. Schultz, and X. M. Tian, "Experiences with Non-numeric Applications on Multithreaded Architectures", in Proc. of ACM PPoPP'99, Las Vegas, 1999, pp. 124--135.

Digital Library

[26]

V. Srinivasan, and G. Varghese, "Fast Address Lookups using Controlled Prefix Expansion", in Proc. of ACM Sigmetrics'98, June 1998, pp. 1--11.

Digital Library

[27]

S. Suri, G. Varghese, and P.R. Warkhede, "Multiway Range Trees: Scalable IP Lookup with Fast updates", in Proc. of IEEE GLOBECOM'01, Vol. 3, Nov. 2001, pp. 1610--1614.

[28]

Xinan Tang, Guang R. Gao, "Automatically Partitioning Threads for Multithreaded Architectures", in Journal of Parallel Distributed Computing, 58(2): 159--189, 1999.

Digital Library

[29]

E. Taylor, J. W. Lockwood, T. S. Sproull, J. S. Turner, and D. B. Parlour, "Scalable IP Lookup for Programmable Routers", in Proc. of INFOCOM'02, Vol. 2, pp. 562--571.

[30]

M. Waldvogel, G. Varghese, J. Turner, and B. Plattner, "Scalable High Speed IP Routing Lookups", in Proc. of ACM SIGCOMM'97, Vol. 27, 1997, pp. 25--36.

Digital Library

[31]

M. Wang, S. Deering, T. Hain, and L. Dunn, "Non-random Generator for IPv6 Tables", in Proc. of IEEE Symposium on High Performance Interconnects, 2004, pp. 35--40.

Digital Library

Cited By

CAO LLI XDU LGUI SWANG BGAO CTANG X(2022)Applying bigdata technology for campus wireless network optimizationJournal of Shenzhen University Science and Engineering10.3724/SP.J.1249.2020.9920037:Z1(200-206)Online publication date: 14-Oct-2022
https://doi.org/10.3724/SP.J.1249.2020.99200
Guo HLi ZLiu QLi JZhou ZSun BKorczyński MMazurczyk WYoshioka Kvan Eeten MRobertson W(2016)A High Performance IPv6 Flow Table Lookup Algorithm Based on HashProceedings of the 2016 ACM International on Workshop on Traffic Measurements for Cybersecurity10.1145/2903185.2903187(35-39)Online publication date: 30-May-2016
https://dl.acm.org/doi/10.1145/2903185.2903187
Hung CWang HGuo SLin YLi K(2011)Efficient GPGPU-Based Parallel Packet ClassificationProceedings of the 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications10.1109/TrustCom.2011.186(1367-1374)Online publication date: 16-Nov-2011
https://dl.acm.org/doi/10.1109/TrustCom.2011.186
Show More Cited By

Index Terms

High-performance IPv6 forwarding algorithm for multi-core and multithreaded network processor

Recommendations

High-performance packet classification algorithm for multithreaded IXP network processor

Packet classification is crucial for the Internet to provide more value-added services and guaranteed quality of service. Besides hardware-based solutions, many software-based classification algorithms have been proposed. However, classifying at 10 Gbps ...
An evaluation of speculative instruction execution on simultaneous multithreaded processors

Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...
High-performance packet classification algorithm for many-core and multithreaded network processor
CASES '06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems

Packet classification is crucial for the Internet to provide more value-added services and guaranteed quality of service. Besides hardware-based solutions, many software-based classification algorithms have been proposed. However, classifying at 10Gbps ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming

March 2006

258 pages

ISBN:1595931899

DOI:10.1145/1122971

General Chair:
Josep Torrellas
University of Illinois
,
Program Chair:
Siddhartha Chatterjee
IBM Research

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

PPoPP06

Sponsor:

PPoPP06: ACM SIGPLAN 2006 Symposium on Principles and Practice of Parallel Programming 2006

March 29 - 31, 2006

New York, New York, USA

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
1,213
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)1

Reflects downloads up to 24 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

CAO LLI XDU LGUI SWANG BGAO CTANG X(2022)Applying bigdata technology for campus wireless network optimizationJournal of Shenzhen University Science and Engineering10.3724/SP.J.1249.2020.9920037:Z1(200-206)Online publication date: 14-Oct-2022
https://doi.org/10.3724/SP.J.1249.2020.99200
Guo HLi ZLiu QLi JZhou ZSun BKorczyński MMazurczyk WYoshioka Kvan Eeten MRobertson W(2016)A High Performance IPv6 Flow Table Lookup Algorithm Based on HashProceedings of the 2016 ACM International on Workshop on Traffic Measurements for Cybersecurity10.1145/2903185.2903187(35-39)Online publication date: 30-May-2016
https://dl.acm.org/doi/10.1145/2903185.2903187
Hung CWang HGuo SLin YLi K(2011)Efficient GPGPU-Based Parallel Packet ClassificationProceedings of the 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications10.1109/TrustCom.2011.186(1367-1374)Online publication date: 16-Nov-2011
https://dl.acm.org/doi/10.1109/TrustCom.2011.186
Hua Yu Rong Cong Luying Chen Zhenming Lei (2010)A flow label hash compression method applied to IPv6 network traffic analysis system2010 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC-BNMT)10.1109/ICBNMT.2010.5704883(138-143)Online publication date: Oct-2010
https://doi.org/10.1109/ICBNMT.2010.5704883
Bando MArtan NChao H(2009)FlashlookProceedings of the 15th international conference on High Performance Switching and Routing10.5555/1715730.1715733(14-21)Online publication date: 22-Jun-2009
https://dl.acm.org/doi/10.5555/1715730.1715733
Wang JCheng HHua BTang XGschwind MNicolau ASalapura VMoreira J(2009)Practice of parallelizing network applications on multi-core architecturesProceedings of the 23rd international conference on Supercomputing10.1145/1542275.1542307(204-213)Online publication date: 8-Jun-2009
https://dl.acm.org/doi/10.1145/1542275.1542307
Ortiz AOrtega JDíaz ACascón PPrieto A(2009)Protocol offload analysis by simulationJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2008.07.00555:1(25-42)Online publication date: 1-Jan-2009
https://dl.acm.org/doi/10.1016/j.sysarc.2008.07.005
Cheng HChen ZHua BTang XChatterjee SScott M(2008)Scalable packet classification using interpretingProceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming10.1145/1345206.1345214(33-42)Online publication date: 20-Feb-2008
https://dl.acm.org/doi/10.1145/1345206.1345214
Liu DChen ZHua BYu NTang X(2008)High-performance packet classification algorithm for multithreaded IXP network processorACM Transactions on Embedded Computing Systems10.1145/1331331.13313407:2(1-25)Online publication date: 29-Jan-2008
https://dl.acm.org/doi/10.1145/1331331.1331340
Qi YXu BHe FYang BYu JLi JYavatkar RGrunwald DRamakrishnan K(2007)Towards high-performance flow-level packet processing on multi-core network processorsProceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems10.1145/1323548.1323552(17-26)Online publication date: 3-Dec-2007
https://dl.acm.org/doi/10.1145/1323548.1323552
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten