Article

A comparative study of arbitration algorithms for the Alpha 21364 pipelined router

Authors:

Shubhendu S. Mukherjee,

Federico Silla,

David WebbAuthors Info & Claims

ASPLOS X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems

Pages 223 - 234

https://doi.org/10.1145/605397.605421

Published: 01 October 2002 Publication History

Abstract

Interconnection networks usually consist of a fabric of interconnected routers, which receive packets arriving at their input ports and forward them to appropriate output ports. Unfortunately, network packets moving through these routers are often delayed due to conflicting demand for resources, such as output ports or buffer space. Hence, routers typically employ arbiters that resolve conflicting resource demands to maximize the number of matches between packets waiting at input ports and free output ports. Efficient design and implementation of the algorithm running on these arbiters is critical to maximize network performance.This paper proposes a new arbitration algorithm called SPAA (Simple Pipelined Arbitration Algorithm), which is implemented in the Alpha 21364 processor's on-chip router pipeline. Simulation results show that SPAA significantly outperforms two earlier well-known arbitration algorithms: PIM (Parallel Iterative Matching) and WFA (Wave-Front Arbiter) implemented in the SGI Spider switch. SPAA outperforms PIM and WFA because SPAA exhibits matching capabilities similar to PIM and WFA under realistic conditions when many output ports are busy, incurs fewer clock cycles to perform the arbitration, and can be pipelined effectively. Additionally, we propose a new prioritization policy called the Rotary Rule, which prevents the network's adverse performance degradation from saturation at high network loads by prioritizing packets already in the network over new packets generated by caches or memory.

References

[1]

M. G. Ajmone Marshan, A. Bianco, and E. Leonardi, "RPA: A Flexible Scheduling Algorithm for Input Buffered Switches," IEEE Transaction on Communications, vol. 47, no. 12, pp. 1921-1933, Dec. 1999.]]

[2]

Thomas E. Anderson, Susan S. Owicki, James B. Saxe, and Charles P. Thacker, "High Speed Switching for Local Area Networks," Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 98 - 110, Boston, Massachusetts, October 12-15, 1992.]]

Digital Library

[3]

Peter Bannon, "Alpha 21364: A Scalable Single-Chip SMP," 11th Annual Microprocessor Forum, Microdesign Resources, Sebastopol, California, 1998.]]

[4]

E. Baydal, P. Lopez, and J. Duato, "A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks," 14th International Parallel and Distributed Processing Symposium, pp. 617-622, 2000.]]

Digital Library

[5]

Laxmi N. Bhuyan, "Analysis of Interconnection Networks with different arbiter designs," Journal of Parallel and Distributed Computing, vol. 4, no. 4, pp 384-403, August 1987.]]

Digital Library

[6]

Alan Charlesworth, "The Sun Fireplane Interconnect," IEEE Micro, pp 36-45, Volume 22, Number 1, January/February 2002.]]

Digital Library

[7]

Andrew Chien and Magda Konstantinidou, "Workloads and Performance Metrics for Evaluating Parallel Interconnects," IEEE TCCA Newsletter, Fall 1994.]]

[8]

R. Cutler and S. Atkins, "IBM e-Server pSeries 680 Handbook," IBM, Armonk, N. Y., 2000; http://www.redbooks.ibm.com/pubs/pdfs/redbooks/sg246023.pdf.]]

[9]

William J. Dally, "Virtual Channel Flow Control," 17th Annual International Symposium on Computer Architecture (ISCA), pp. 60-68, 1990.]]

Digital Library

[10]

William J. Dally and Charles L. Seitz, "The Torus Routing Chip," Distributed Computing, vol. 1, no. 4, pp. 187-196, Oct. 1986.]]

[11]

A. DeHon, F. Chong, M. Becker, E. Egozy, H. Minsky, S. Peretz, and T. F. Knight Jr., "METRO: A Router Architecture for High-Performance, Short-Haul Routing Networks," pp. 266-277, 21stAnnual International Symposium on Computer Architecture (ISCA), Chicago, Illinois, April 1994.]]

Digital Library

[12]

H. Duan, J. W. Lockwood, S. M. Kang, and J. D. Will, "A High Performance OC12/OC48 Queue Design Prototype for Input Buffered ATM Switches," INFOCOM 97: 16th Annual Joint Conference of the IEEE Computer and Communication Societies (Infocom 97), IEEE CS Press, pp. 20-28, Los Alamos, California, 1997.]]

Digital Library

[13]

Jose Duato, "A New Theory of Deadlock-free Adaptive Routing in Wormhole Networks," IEEE Transaction on Parallel and Distributed Systems, vol. 4, no. 12, pp. 1320-1331, December 1993.]]

Digital Library

[14]

Jose Duato, Sudhakar Yalamanchilli, and Lionel Ni, "Interconnection Networks: An Engineering Approach," IEEE Computer Society, 1997.]]

Digital Library

[15]

Joel Emer, Pritpal Ahuja, Nathan Binkert, Eric Borch, Roger Espasa, Toni Juan, Artur Klauser, Chi-Keung Luk, Srilatha Manne, Shubhendu S. Mukherjee, Harish Patil, and Steven Wallace, "Asim: A Performance Model Framework," IEEE Computer, pp. 68-76, February 2002.]]

Digital Library

[16]

M. Galles, "Spider: A High-Speed Network Interconnect," IEEE Micro, vol. 17, no. 1, pp. 34-39, Jan.-Feb. 1997.]]

Digital Library

[17]

K. Gharachorloo, et al., "Architecture and Design of the Alphaserver GS320," Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 13-24, 2000.]]

Digital Library

[18]

Paolo Giaccone, Devavrat Shah, and Balaji Prabhakar, "An Implementable Parallel Scheduler for Input-Queued Switches," IEEE Micro, pp 19-25, Volume 22, Number 1, January/February 2002.]]

Digital Library

[19]

Hewlett-Packard, "Meet the HP Superdome Servers," September, 2001; http://www.hp.com/products1/servers/scalableservers/superdome/infolibrary/whitepapers/technical_wp.pdf.]]

[20]

M. D. Hill, "A Case for Direct-Mapped Caches," IEEE Computer, vol. 21, no. 12, pp. 25-40, December 1988.]]

Digital Library

[21]

P. Lopez, J. M. Martinez, J. Duato, "DRIL: Dynamically Reduced Message Injection Limitation Mechanism for Wormhole Networks," International Conference on Parallel Processing, pp. 535-542, August 1998.]]

Digital Library

[22]

P. Lopez, J. M. Martinez, J. Duato, and F. Petrini, "On the Reduction of Deadlock Frequency by Limiting Message Injection in Wormhole Networks," Parallel Computer Routing and Communication Workshop, June 1997.]]

[23]

N. McKeown, "iSLIP: A Scheduling Algorithm for Input-Queued Switches," IEEE Transaction on Networking, vol. 7, no. 2, pp. 188-201, April 1999.]]

Digital Library

[24]

N. McKeown, "Scheduling Algorithms for Input-Queued Cell Switches," doctoral dissertation, Dept. of EECS, University of California, Berkeley, 1995.]]

Digital Library

[25]

Devavrat Shah, Paolo Giaccone, and Balaji Prabhakar, "Efficient Randomized Algorithms for Input-Queued Switch Scheduling," IEEE Micro, pp 10-18, Volume 22, Number 1, January/February 2002.]]

Digital Library

[26]

Shubhendu S. Mukherjee, Peter Bannon, Steven Lang, Aaron Spink, and David Webb, "The 21364 Network Architecture," IEEE Micro, pp 26-35, Volume 22, Number 1, January/February 2002.]]

Digital Library

[27]

Li-Shiuan Peh and William J. Dally, "A Delay Model for Speculative Architecture for Pipelined Routers," 7th Annual International Symposium on High-Performance Computer Architecture (HPCA), pp. 255-266, Neuvo Leon, Mexico, January 2001.]]

Digital Library

[28]

G. E. Pfister and V. A. Norton, "Hot-Spot Contention and Combining in Multistage Interconnection Networks," IEEE Transaction on Computers, C-34(10):943-948, October 1985.]]

[29]

Parthasarathy Ranganathan, Kourosh Gharachorloo, Sarita V. Adve, and Luiz Andre Barroso, "Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors," Eighth International Conference on Architectural Support for Programming Language (ASPLOS), pp. 307-318, San Jose, California, October 1998.]]

Digital Library

[30]

S. L. Scott and G. Sohi, "The Use of Feedback in Multiprocessors and its Application to Tree Saturation Control," IEEE Transactions on Parallel and Distributed Systems, vol. 1, no. 4, pp. 385-398, October 1990.]]

Digital Library

[31]

S. L. Scott and G. M. Thorson, "The Cray T3E Network," Hot Interconnects IV, pp. 147-156, 1996.]]

[32]

Simon Steely, Compaq Computer Corporation, Personal Communication.]]

[33]

Silicon Graphics, "SGI 3000 Family Reference Guide," 2001; http://www.sgi.com/origin/3000/3000_ref.pdf.]]

[34]

A. Smai and L. Thorelli, "Global Reactive Congestion Control in Multicomputer Networks," 5th International Conference on High Performance Computing, pp. 179-186, 1998.]]

Digital Library

[35]

C. B. Stunkel, J. Herring, B. Abali, and R. Sivaram, "A new switch chip for IBM RS/6000 SP systems," Supercomputing '99, Portland, Oregon, November 1999.]]

Digital Library

[36]

Yuval Tamir and Hsin-Chou Chi, "Symmetric Crossbar Arbiters for VLSI Communication Switches," IEEE Trans. on Parallel and Distributed Systems, vol. 4, no. 1, pp. 13-27, January 1993.]]

Digital Library

[37]

J. M. Tendler, et al., "IBM e-server POWER4 System Microarchitecture," IBM, Armonk, N.Y., 2001; http://www-1.ibm.com/servers/eserver/pseries/hardware/whitepapers/power4.pdf.]]

[38]

Mithuna Thottethodi, Alvin Lebeck, and Shubhendu S. Mukherjee, "Self-Tuned Congestion Control for Multiprocessor Networks," Seventh Annual International Symposium on High-Performance Computer Architecture (HPCA), pp. 107-118, Monterrey, Mexico, January 2001.]]

Digital Library

[39]

Brian Towles and William J. Dally, "Worst-case Traffic for Oblivious Routing," Computer Architecture Letters, (http://www.cs.virginia.edu/~tcca/2002paps.html), 2002.]]

Cited By

Katta MRamesh TPlosila J(2021)SB-Router: A Swapped Buffer Activated Low Latency Network-on-Chip RouterIEEE Access10.1109/ACCESS.2021.31112949(126564-126578)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3111294
Li CDong DLiao XWu JLei F(2016)RoB-Router: Low Latency Network-on-Chip Router Microarchitecture Using Reorder Buffer2016 IEEE 24th Annual Symposium on High-Performance Interconnects (HOTI)10.1109/HOTI.2016.023(68-75)Online publication date: Aug-2016
https://doi.org/10.1109/HOTI.2016.023
Dimitrakopoulos GKalligeros EGalanopoulos K(2013)Merged Switch Allocation and Traversal in Network-on-Chip SwitchesIEEE Transactions on Computers10.1109/TC.2012.11662:10(2001-2012)Online publication date: 1-Oct-2013
https://dl.acm.org/doi/10.1109/TC.2012.116
Show More Cited By

Recommendations

A comparative study of arbitration algorithms for the Alpha 21364 pipelined router
Special Issue: Proceedings of the 10th annual conference on Architectural Support for Programming Languages and Operating Systems

Interconnection networks usually consist of a fabric of interconnected routers, which receive packets arriving at their input ports and forward them to appropriate output ports. Unfortunately, network packets moving through these routers are often ...
A comparative study of arbitration algorithms for the Alpha 21364 pipelined router

Interconnection networks usually consist of a fabric of interconnected routers, which receive packets arriving at their input ports and forward them to appropriate output ports. Unfortunately, network packets moving through these routers are often ...
A comparative study of arbitration algorithms for the Alpha 21364 pipelined router

Interconnection networks usually consist of a fabric of interconnected routers, which receive packets arriving at their input ports and forward them to appropriate output ports. Unfortunately, network packets moving through these routers are often ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems

October 2002

318 pages

ISBN:1581135742

DOI:10.1145/605397

Conference Chair:
Kourosh Gharachorloo
Compaq Western Research Lab
,
Program Chair:
David A. Wood

ACM SIGOPS Operating Systems Review Volume 36, Issue 5
December 2002
296 pages
ISSN:0163-5980
DOI:10.1145/635508
Issue’s Table of Contents
ACM SIGARCH Computer Architecture News Volume 30, Issue 5
Special Issue: Proceedings of the 10th annual conference on Architectural Support for Programming Languages and Operating Systems
December 2002
296 pages
ISSN:0163-5964
DOI:10.1145/635506
Issue’s Table of Contents
ACM SIGPLAN Notices Volume 37, Issue 10
October 2002
296 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/605432
Issue’s Table of Contents

Copyright © 2002 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2002

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ASPLOS02

Sponsor:

ASPLOS02: Tenth International Conference on Architectural Support for Programming Languages and Operating Systems

October 5 - 9, 2002

California, San Jose

Acceptance Rates

ASPLOS X Paper Acceptance Rate 24 of 175 submissions, 14%;

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
1,255
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Katta MRamesh TPlosila J(2021)SB-Router: A Swapped Buffer Activated Low Latency Network-on-Chip RouterIEEE Access10.1109/ACCESS.2021.31112949(126564-126578)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3111294
Li CDong DLiao XWu JLei F(2016)RoB-Router: Low Latency Network-on-Chip Router Microarchitecture Using Reorder Buffer2016 IEEE 24th Annual Symposium on High-Performance Interconnects (HOTI)10.1109/HOTI.2016.023(68-75)Online publication date: Aug-2016
https://doi.org/10.1109/HOTI.2016.023
Dimitrakopoulos GKalligeros EGalanopoulos K(2013)Merged Switch Allocation and Traversal in Network-on-Chip SwitchesIEEE Transactions on Computers10.1109/TC.2012.11662:10(2001-2012)Online publication date: 1-Oct-2013
https://dl.acm.org/doi/10.1109/TC.2012.116
Dimitrakopoulos GKalligeros ERosenstiel WMacii E(2012)Dynamic-priority arbiter and multiplexer soft macros for on-chip networks switchesProceedings of the Conference on Design, Automation and Test in Europe10.5555/2492708.2492843(542-545)Online publication date: 12-Mar-2012
https://dl.acm.org/doi/10.5555/2492708.2492843
Dimitrakopoulos GKalligeros E(2012)Dynamic-priority arbiter and multiplexer soft macros for on-chip networks switches2012 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.1109/DATE.2012.6176527(542-545)Online publication date: Mar-2012
https://doi.org/10.1109/DATE.2012.6176527
Dimitrakopoulos GKachris CKalligeros E(2012)Switch Design for Soft Interconnection NetworksEmbedded Systems Design with FPGAs10.1007/978-1-4614-1362-2_6(125-147)Online publication date: 1-Nov-2012
https://doi.org/10.1007/978-1-4614-1362-2_6
Roca AFlich JSilla FDuato J(2011)A low-latency modular switch for CMP systemsMicroprocessors & Microsystems10.1016/j.micpro.2011.08.01135:8(742-754)Online publication date: 1-Nov-2011
https://dl.acm.org/doi/10.1016/j.micpro.2011.08.011
Becker DDally WPinfold W(2009)Allocator implementations for network-on-chip routersProceedings of the Conference on High Performance Computing Networking, Storage and Analysis10.1145/1654059.1654112(1-12)Online publication date: 14-Nov-2009
https://dl.acm.org/doi/10.1145/1654059.1654112
Abad PPuente VGregorio J(2009)MRR: Enabling fully adaptive multicast routing for CMP interconnection networks2009 IEEE 15th International Symposium on High Performance Computer Architecture10.1109/HPCA.2009.4798273(355-366)Online publication date: Feb-2009
https://doi.org/10.1109/HPCA.2009.4798273
Nicopoulos CNarayanan VDas CNicopoulos CNarayanan VDas C(2009)Exploring FaultoTolerant Network-on-Chip Architectures [37]Network-on-Chip Architectures10.1007/978-90-481-3031-3_5(65-92)Online publication date: 17-Aug-2009
https://doi.org/10.1007/978-90-481-3031-3_5
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents