Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/605397.605421acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article

A comparative study of arbitration algorithms for the Alpha 21364 pipelined router

Published: 01 October 2002 Publication History
  • Get Citation Alerts
  • Abstract

    Interconnection networks usually consist of a fabric of interconnected routers, which receive packets arriving at their input ports and forward them to appropriate output ports. Unfortunately, network packets moving through these routers are often delayed due to conflicting demand for resources, such as output ports or buffer space. Hence, routers typically employ arbiters that resolve conflicting resource demands to maximize the number of matches between packets waiting at input ports and free output ports. Efficient design and implementation of the algorithm running on these arbiters is critical to maximize network performance.This paper proposes a new arbitration algorithm called SPAA (Simple Pipelined Arbitration Algorithm), which is implemented in the Alpha 21364 processor's on-chip router pipeline. Simulation results show that SPAA significantly outperforms two earlier well-known arbitration algorithms: PIM (Parallel Iterative Matching) and WFA (Wave-Front Arbiter) implemented in the SGI Spider switch. SPAA outperforms PIM and WFA because SPAA exhibits matching capabilities similar to PIM and WFA under realistic conditions when many output ports are busy, incurs fewer clock cycles to perform the arbitration, and can be pipelined effectively. Additionally, we propose a new prioritization policy called the Rotary Rule, which prevents the network's adverse performance degradation from saturation at high network loads by prioritizing packets already in the network over new packets generated by caches or memory.

    References

    [1]
    M. G. Ajmone Marshan, A. Bianco, and E. Leonardi, "RPA: A Flexible Scheduling Algorithm for Input Buffered Switches," IEEE Transaction on Communications, vol. 47, no. 12, pp. 1921-1933, Dec. 1999.]]
    [2]
    Thomas E. Anderson, Susan S. Owicki, James B. Saxe, and Charles P. Thacker, "High Speed Switching for Local Area Networks," Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 98 - 110, Boston, Massachusetts, October 12-15, 1992.]]
    [3]
    Peter Bannon, "Alpha 21364: A Scalable Single-Chip SMP," 11th Annual Microprocessor Forum, Microdesign Resources, Sebastopol, California, 1998.]]
    [4]
    E. Baydal, P. Lopez, and J. Duato, "A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks," 14th International Parallel and Distributed Processing Symposium, pp. 617-622, 2000.]]
    [5]
    Laxmi N. Bhuyan, "Analysis of Interconnection Networks with different arbiter designs," Journal of Parallel and Distributed Computing, vol. 4, no. 4, pp 384-403, August 1987.]]
    [6]
    Alan Charlesworth, "The Sun Fireplane Interconnect," IEEE Micro, pp 36-45, Volume 22, Number 1, January/February 2002.]]
    [7]
    Andrew Chien and Magda Konstantinidou, "Workloads and Performance Metrics for Evaluating Parallel Interconnects," IEEE TCCA Newsletter, Fall 1994.]]
    [8]
    R. Cutler and S. Atkins, "IBM e-Server pSeries 680 Handbook," IBM, Armonk, N. Y., 2000; http://www.redbooks.ibm.com/pubs/pdfs/redbooks/sg246023.pdf.]]
    [9]
    William J. Dally, "Virtual Channel Flow Control," 17th Annual International Symposium on Computer Architecture (ISCA), pp. 60-68, 1990.]]
    [10]
    William J. Dally and Charles L. Seitz, "The Torus Routing Chip," Distributed Computing, vol. 1, no. 4, pp. 187-196, Oct. 1986.]]
    [11]
    A. DeHon, F. Chong, M. Becker, E. Egozy, H. Minsky, S. Peretz, and T. F. Knight Jr., "METRO: A Router Architecture for High-Performance, Short-Haul Routing Networks," pp. 266-277, 21stAnnual International Symposium on Computer Architecture (ISCA), Chicago, Illinois, April 1994.]]
    [12]
    H. Duan, J. W. Lockwood, S. M. Kang, and J. D. Will, "A High Performance OC12/OC48 Queue Design Prototype for Input Buffered ATM Switches," INFOCOM 97: 16th Annual Joint Conference of the IEEE Computer and Communication Societies (Infocom 97), IEEE CS Press, pp. 20-28, Los Alamos, California, 1997.]]
    [13]
    Jose Duato, "A New Theory of Deadlock-free Adaptive Routing in Wormhole Networks," IEEE Transaction on Parallel and Distributed Systems, vol. 4, no. 12, pp. 1320-1331, December 1993.]]
    [14]
    Jose Duato, Sudhakar Yalamanchilli, and Lionel Ni, "Interconnection Networks: An Engineering Approach," IEEE Computer Society, 1997.]]
    [15]
    Joel Emer, Pritpal Ahuja, Nathan Binkert, Eric Borch, Roger Espasa, Toni Juan, Artur Klauser, Chi-Keung Luk, Srilatha Manne, Shubhendu S. Mukherjee, Harish Patil, and Steven Wallace, "Asim: A Performance Model Framework," IEEE Computer, pp. 68-76, February 2002.]]
    [16]
    M. Galles, "Spider: A High-Speed Network Interconnect," IEEE Micro, vol. 17, no. 1, pp. 34-39, Jan.-Feb. 1997.]]
    [17]
    K. Gharachorloo, et al., "Architecture and Design of the Alphaserver GS320," Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 13-24, 2000.]]
    [18]
    Paolo Giaccone, Devavrat Shah, and Balaji Prabhakar, "An Implementable Parallel Scheduler for Input-Queued Switches," IEEE Micro, pp 19-25, Volume 22, Number 1, January/February 2002.]]
    [19]
    Hewlett-Packard, "Meet the HP Superdome Servers," September, 2001; http://www.hp.com/products1/servers/scalableservers/superdome/infolibrary/whitepapers/technical_wp.pdf.]]
    [20]
    M. D. Hill, "A Case for Direct-Mapped Caches," IEEE Computer, vol. 21, no. 12, pp. 25-40, December 1988.]]
    [21]
    P. Lopez, J. M. Martinez, J. Duato, "DRIL: Dynamically Reduced Message Injection Limitation Mechanism for Wormhole Networks," International Conference on Parallel Processing, pp. 535-542, August 1998.]]
    [22]
    P. Lopez, J. M. Martinez, J. Duato, and F. Petrini, "On the Reduction of Deadlock Frequency by Limiting Message Injection in Wormhole Networks," Parallel Computer Routing and Communication Workshop, June 1997.]]
    [23]
    N. McKeown, "iSLIP: A Scheduling Algorithm for Input-Queued Switches," IEEE Transaction on Networking, vol. 7, no. 2, pp. 188-201, April 1999.]]
    [24]
    N. McKeown, "Scheduling Algorithms for Input-Queued Cell Switches," doctoral dissertation, Dept. of EECS, University of California, Berkeley, 1995.]]
    [25]
    Devavrat Shah, Paolo Giaccone, and Balaji Prabhakar, "Efficient Randomized Algorithms for Input-Queued Switch Scheduling," IEEE Micro, pp 10-18, Volume 22, Number 1, January/February 2002.]]
    [26]
    Shubhendu S. Mukherjee, Peter Bannon, Steven Lang, Aaron Spink, and David Webb, "The 21364 Network Architecture," IEEE Micro, pp 26-35, Volume 22, Number 1, January/February 2002.]]
    [27]
    Li-Shiuan Peh and William J. Dally, "A Delay Model for Speculative Architecture for Pipelined Routers," 7th Annual International Symposium on High-Performance Computer Architecture (HPCA), pp. 255-266, Neuvo Leon, Mexico, January 2001.]]
    [28]
    G. E. Pfister and V. A. Norton, "Hot-Spot Contention and Combining in Multistage Interconnection Networks," IEEE Transaction on Computers, C-34(10):943-948, October 1985.]]
    [29]
    Parthasarathy Ranganathan, Kourosh Gharachorloo, Sarita V. Adve, and Luiz Andre Barroso, "Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors," Eighth International Conference on Architectural Support for Programming Language (ASPLOS), pp. 307-318, San Jose, California, October 1998.]]
    [30]
    S. L. Scott and G. Sohi, "The Use of Feedback in Multiprocessors and its Application to Tree Saturation Control," IEEE Transactions on Parallel and Distributed Systems, vol. 1, no. 4, pp. 385-398, October 1990.]]
    [31]
    S. L. Scott and G. M. Thorson, "The Cray T3E Network," Hot Interconnects IV, pp. 147-156, 1996.]]
    [32]
    Simon Steely, Compaq Computer Corporation, Personal Communication.]]
    [33]
    Silicon Graphics, "SGI 3000 Family Reference Guide," 2001; http://www.sgi.com/origin/3000/3000_ref.pdf.]]
    [34]
    A. Smai and L. Thorelli, "Global Reactive Congestion Control in Multicomputer Networks," 5th International Conference on High Performance Computing, pp. 179-186, 1998.]]
    [35]
    C. B. Stunkel, J. Herring, B. Abali, and R. Sivaram, "A new switch chip for IBM RS/6000 SP systems," Supercomputing '99, Portland, Oregon, November 1999.]]
    [36]
    Yuval Tamir and Hsin-Chou Chi, "Symmetric Crossbar Arbiters for VLSI Communication Switches," IEEE Trans. on Parallel and Distributed Systems, vol. 4, no. 1, pp. 13-27, January 1993.]]
    [37]
    J. M. Tendler, et al., "IBM e-server POWER4 System Microarchitecture," IBM, Armonk, N.Y., 2001; http://www-1.ibm.com/servers/eserver/pseries/hardware/whitepapers/power4.pdf.]]
    [38]
    Mithuna Thottethodi, Alvin Lebeck, and Shubhendu S. Mukherjee, "Self-Tuned Congestion Control for Multiprocessor Networks," Seventh Annual International Symposium on High-Performance Computer Architecture (HPCA), pp. 107-118, Monterrey, Mexico, January 2001.]]
    [39]
    Brian Towles and William J. Dally, "Worst-case Traffic for Oblivious Routing," Computer Architecture Letters, (http://www.cs.virginia.edu/~tcca/2002paps.html), 2002.]]

    Cited By

    View all
    • (2021)SB-Router: A Swapped Buffer Activated Low Latency Network-on-Chip RouterIEEE Access10.1109/ACCESS.2021.31112949(126564-126578)Online publication date: 2021
    • (2016)RoB-Router: Low Latency Network-on-Chip Router Microarchitecture Using Reorder Buffer2016 IEEE 24th Annual Symposium on High-Performance Interconnects (HOTI)10.1109/HOTI.2016.023(68-75)Online publication date: Aug-2016
    • (2013)Merged Switch Allocation and Traversal in Network-on-Chip SwitchesIEEE Transactions on Computers10.1109/TC.2012.11662:10(2001-2012)Online publication date: 1-Oct-2013
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASPLOS X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
    October 2002
    318 pages
    ISBN:1581135742
    DOI:10.1145/605397
    • cover image ACM SIGOPS Operating Systems Review
      ACM SIGOPS Operating Systems Review  Volume 36, Issue 5
      December 2002
      296 pages
      ISSN:0163-5980
      DOI:10.1145/635508
      Issue’s Table of Contents
    • cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 30, Issue 5
      Special Issue: Proceedings of the 10th annual conference on Architectural Support for Programming Languages and Operating Systems
      December 2002
      296 pages
      ISSN:0163-5964
      DOI:10.1145/635506
      Issue’s Table of Contents
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 37, Issue 10
      October 2002
      296 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/605432
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 October 2002

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    ASPLOS02

    Acceptance Rates

    ASPLOS X Paper Acceptance Rate 24 of 175 submissions, 14%;
    Overall Acceptance Rate 535 of 2,713 submissions, 20%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)SB-Router: A Swapped Buffer Activated Low Latency Network-on-Chip RouterIEEE Access10.1109/ACCESS.2021.31112949(126564-126578)Online publication date: 2021
    • (2016)RoB-Router: Low Latency Network-on-Chip Router Microarchitecture Using Reorder Buffer2016 IEEE 24th Annual Symposium on High-Performance Interconnects (HOTI)10.1109/HOTI.2016.023(68-75)Online publication date: Aug-2016
    • (2013)Merged Switch Allocation and Traversal in Network-on-Chip SwitchesIEEE Transactions on Computers10.1109/TC.2012.11662:10(2001-2012)Online publication date: 1-Oct-2013
    • (2012)Dynamic-priority arbiter and multiplexer soft macros for on-chip networks switchesProceedings of the Conference on Design, Automation and Test in Europe10.5555/2492708.2492843(542-545)Online publication date: 12-Mar-2012
    • (2012)Dynamic-priority arbiter and multiplexer soft macros for on-chip networks switches2012 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.1109/DATE.2012.6176527(542-545)Online publication date: Mar-2012
    • (2012)Switch Design for Soft Interconnection NetworksEmbedded Systems Design with FPGAs10.1007/978-1-4614-1362-2_6(125-147)Online publication date: 1-Nov-2012
    • (2011)A low-latency modular switch for CMP systemsMicroprocessors & Microsystems10.1016/j.micpro.2011.08.01135:8(742-754)Online publication date: 1-Nov-2011
    • (2009)Allocator implementations for network-on-chip routersProceedings of the Conference on High Performance Computing Networking, Storage and Analysis10.1145/1654059.1654112(1-12)Online publication date: 14-Nov-2009
    • (2009)MRR: Enabling fully adaptive multicast routing for CMP interconnection networks2009 IEEE 15th International Symposium on High Performance Computer Architecture10.1109/HPCA.2009.4798273(355-366)Online publication date: Mar-2009
    • (2009)Exploring FaultoTolerant Network-on-Chip Architectures [37]Network-on-Chip Architectures10.1007/978-90-481-3031-3_5(65-92)Online publication date: 17-Aug-2009
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media