Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1362622.1362646acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

The Cray BlackWidow: a highly scalable vector multiprocessor

Published: 10 November 2007 Publication History
  • Get Citation Alerts
  • Abstract

    This paper describes the system architecture of the Cray BlackWidow scalable vector multiprocessor. The BlackWidow system is a distributed shared memory (DSM) architecture that is scalable to 32K processors, each with a 4-way dispatch scalar execution unit and an 8-pipe vector unit capable of 20.8 Gflops for 64-bit operations and 41.6 Gflops for 32-bit operations at the prototype operating frequency of 1.3 GHz. Global memory is directly accessible with processor loads and stores and is globally coherent. The system supports thousands of outstanding references to hide remote memory latencies, and provides a rich suite of built-in synchronization primitives. Each BlackWidow node is implemented as a 4-way SMP with up to 128 Gbytes of DDR2 main memory capacity. The system supports common programming models such as MPI and OpenMP, as well as global address space languages such as UPC and CAF. We describe the system architecture and microarchitecture of the processor, memory controller, and router chips. We give preliminary performance results and discuss design tradeoffs.

    References

    [1]
    D. Abts, S. Scott, and D. J. Lilja. So many states, so little time: Verifying memory coherence in the Cray XI. In IPDPS 03, In the Proceedings of the International Parallel and Distributed Processing Symposium. IEEE Computer Society, April 2003.
    [2]
    C. Batten, R. Krashinsky, S. Gerding, and K. Asanovic. Cache refill/access decoupling for vector machines. In MICRO 37: Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, pages 331--342, Washington, DC, USA, 2004. IEEE Computer Society.
    [3]
    C. Bell, W.-Y. Chen, D. Bonachea, and K. Yelick. Evaluating support for global address space languages on the Cray XI. In ICS '04: Proceedings of the 18th annual international conference on Supercomputing, pages 184--195, New York, NY, USA, 2004. ACM Press.
    [4]
    C. Clos. A Study of Non-Blocking Switching Networks. The Bell System technical Journal, 32(2):406--424, March 1953.
    [5]
    Condensed results for HPCC Challenge Benchmarks. http://icl.cs.utk.edu/hpcc/hpcc_results.cgi
    [6]
    Cray XI. http://www.cray.com/products/xl/.
    [7]
    Cray XT3. http://www.cray.com/products/xt3/.
    [8]
    Cray XT4. http://www.cray.com/products/xt4/.
    [9]
    R. Espasa, F. Ardanaz, J. Emer, S. Felix, J. Gago, R. Gramunt, I. Hernandez, T. Juan, G. Lowney, M. Mattina, and A. Seznec. Tarantula: a vector extension to the alpha architecture. In ISCA '02: Proceedings of the 29th annual international symposium on Computer architecture, pages 281--292, Washington, DC, USA, 2002. IEEE Computer Society.
    [10]
    R. Espasa and M. Valero. A simulation study of decoupled vector architectures. Journal of Supercomputing, 14(2): 124--152, 1999.
    [11]
    HPCC Challenge Benchmarks. http://icl.cs.utk.edu/hpcc/.
    [12]
    Intel Core2 Duo. http://www.cray.com/products/xdl/.
    [13]
    A. Johnston. Scaling and Technology Issues for Soft Error Rates. In Proceedings of the 4th Annual Research Conference on Reliability, Stanford, CA, October 2000.
    [14]
    J. Kim, W. J. Dally, B. Towles, and A. K. Gupta. Microarchitecture of a high-radix router. In ISCA '05: Proceedings of the 32nd Annual International Symposium on Computer Architecture, pages 420--431, Madison, WI, USA, 2005. IEEE Computer Society.
    [15]
    S. S. Mukherjee, J. Emer, and S. K. Reinhardt. The Soft Error Problem: An Architectural Perspective. In Proceedings of the llth International Conference on High-Performance Computer Architecture (HPCA2005), 2005.
    [16]
    NEC SX-8 Vector supercomputer. http://www.nec.co.jp/press/en/0410/2001.html.
    [17]
    L. Oliker, J. Carter, M. Wehner, A. Canning, S. Ethier, A. Mirin, D. Parks, P. Worley, S. Kitawaki, and Y. Tsuda. Leading computational methods on scalar and vector hec platforms. In SC '05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, page 62, Washington, DC, USA, 2005. IEEE Computer Society.
    [18]
    S. Scott, D. Abts, J. Kim, and W. J. Dally. The BlackWidow High-radix Clos Network. In ISCA '06: Proceedings of the 33rd Annual International Symposium on Computer Architecture, pages 16--28, Boston, MA, June 2006.
    [19]
    S. Scott and A. Bataineh. U.S. Patent: Optimized high-bandwidth cache coherence mechanism, http://www.patentstorm.us/patents/7082500.html. 2006.
    [20]
    H. Shan and E. Strohmaier. Performance characteristics of the Cray XI and their implications for application performance tuning. In ICS '04: Proceedings of the 18th annual international conference on Supercomputing, pages 175--183, New York, NY, USA, 2004. ACM Press.
    [21]
    J. E. Smith. Decoupled access/execute computer architectures. In ISCA '82: Proceedings of the 9th annual symposium on Computer Architecture, pages 112--119, Los Alamitos, CA, USA, 1982. IEEE Computer Society Press.
    [22]
    J. T. H. Dunigan, M. R. Fahey, J. B. W. III, and P. H. Worley. Early evaluation of the Cray XI. In SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, page 18, Washington, DC, USA, 2003. IEEE Computer Society.

    Cited By

    View all
    • (2023)EVE: Ephemeral Vector Engines2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071074(691-704)Online publication date: Feb-2023
    • (2022)Breaking the Interaction Wall: A DLPU-Centric Deep Learning Computing SystemIEEE Transactions on Computers10.1109/TC.2020.304424571:1(209-222)Online publication date: 1-Jan-2022
    • (2022)big.VLITTLE: On-Demand Data-Parallel Acceleration for Mobile Systems on Chip2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO56248.2022.00025(181-198)Online publication date: Oct-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing
    November 2007
    723 pages
    ISBN:9781595937643
    DOI:10.1145/1362622
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 November 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. MPP
    2. architecture
    3. distributed shared memory
    4. fat-tree
    5. high-radix
    6. multiprocessor
    7. shared memory
    8. vector

    Qualifiers

    • Research-article

    Conference

    SC '07
    Sponsor:

    Acceptance Rates

    SC '07 Paper Acceptance Rate 54 of 268 submissions, 20%;
    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)26
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)EVE: Ephemeral Vector Engines2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071074(691-704)Online publication date: Feb-2023
    • (2022)Breaking the Interaction Wall: A DLPU-Centric Deep Learning Computing SystemIEEE Transactions on Computers10.1109/TC.2020.304424571:1(209-222)Online publication date: 1-Jan-2022
    • (2022)big.VLITTLE: On-Demand Data-Parallel Acceleration for Mobile Systems on Chip2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO56248.2022.00025(181-198)Online publication date: Oct-2022
    • (2022)Adaptable Register File Organization for Vector Processors2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00063(786-799)Online publication date: Apr-2022
    • (2021)BoomGate: Deadlock Avoidance in Non-Minimal Routing for High-Radix Networks2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00064(696-708)Online publication date: Feb-2021
    • (2020)Corner-operated Tran-similar (COTS) Maps, Patterns, and LatticesACM Transactions on Graphics10.1145/326734639:1(1-14)Online publication date: 7-Feb-2020
    • (2019)A System for Efficient 3D Printed Stop-motion Face AnimationACM Transactions on Graphics10.1145/336051039:1(1-11)Online publication date: 18-Oct-2019
    • (2018)Vector Processing-Aware Advanced Clock-Gating Techniques for Low-Power Fused Multiply-AddIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2017.278480726:4(639-652)Online publication date: Apr-2018
    • (2018)H OEIN: A Hierarchical Hybrid Optical/Electrical Interconnection Network for Exascale Computing SystemsIEEE Transactions on Multi-Scale Computing Systems10.1109/TMSCS.2018.28817154:4(722-733)Online publication date: 1-Oct-2018
    • (2018)Accelerated bulk memory operations on heterogeneous multi-core systemsThe Journal of Supercomputing10.1007/s11227-018-2589-x74:12(6898-6922)Online publication date: 1-Dec-2018
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media