research-article

Benchmark Evaluation of the IBM SP2 for Parallel Signal Processing

Authors:

Masahiro ArakawaAuthors Info & Claims

IEEE Transactions on Parallel and Distributed Systems, Volume 7, Issue 5

Pages 522 - 536

https://doi.org/10.1109/71.503777

Published: 01 May 1996 Publication History

Abstract

This paper evaluates the IBM SP2 architecture, the AIX parallel programming environment, and the IBM message-passing library (MPL) through STAP (Space-Time Adaptive Processing) benchmark experiments. Only coarse-grain parallelism was exploited on the SP2 due to its high communication overhead. A new parallelization scheme is developed for programming message passing multicomputers. Parallel STAP benchmark structures are illustrated with domain decomposition, efficient mapping of partitioned programs, and optimization of collective communication operations. We measure the SP2 performance in terms of execution time, Gflop/s rate, speedup over a single SP2 node, and overall system utilization. With 256 nodes, the Maui SP2 demonstrated the best performance of 23 Gflop/s in executing the High-Order Post-Doppler program, corresponding to a 34% system utilization. We have conducted a scalability analysis to reveal the performance growth rate as a function of machine size and STAP problem size. Important lessons learned from these parallel processing benchmark experiments are discussed in the context of real-time, adaptive, radar signal processing on massively parallel processors (MPP).

References

[1]

D. Adams, "Cray T3D System Architecture Overview Manual," http://www.cray.com, Cray Research, Inc., Sept. 1993.

[2]

R.C. Agarwal F.G. Gustavson and M. Zubair, "Exploiting Functional Parallelism of POWER2 to Design High-Performance Numerical Algorithms," IBM J. Research and Development, vol. 38, no. 5, pp. 563-576, 1994.

Digital Library

[3]

R.C. Agarwal, et al., "High-Performance Implementations of the NAS Kernel Benchmarks on the IBM SP2," IBM System J., vol. 34, no. 2, pp. 263-272, 1995.

Digital Library

[4]

T. Agerwala J.L. Martin J.H. Mirza D. C. Sadler D.M. Dias and M. Snir, "SP2 System Architecture," IBM System J., vol. 34, no. 2, pp. 152-184, 1995.

Digital Library

[5]

D.P. Bertsekas and J.N. Tsitsiklis, Parallel and Distributed Computing. Englewood Cliffs, N.J.: Prentice Hall, 1989.

Digital Library

[6]

R. Bond, "Measuring Performance and Scalability Using Extended Versions of the STAP Processor Benchmarks," technical report, MIT Lincoln Laboratory, Dec. 1994.

[7]

J.J. Dongarra, "The Performance Database Server (PDS): Reports: Linpack Benchmark - Parallel," http://performance.netlib.org/performance/html/linpack-parallel.data.co10.html.

[8]

A. Geist A. Beguelin J. Dongarra W. Jiang R. Mancheck and V. Sunderam, PVM: Parallel Virtual Machine—A User's Guide and Tutorial for Networked Parallel Computing. Cambridge, Mass.: MIT Press, 1994.

Digital Library

[9]

W. Gropp E. Lusk and A. Skjellum, Using MPI: Portable Parallel Programming with the Message Passing Interface. Cambridge, Mass.: MIT Press, 1994

Digital Library

[10]

P. Brinch Hansen, Studies in Computational Science: Parallel Programming Paradigms. Englewood Cliffs, N.J.: Prentice Hall, 1995.

Digital Library

[11]

R.W. Hockney, "Performance Parameters and Benchmarking of Supercomputers," Parallel Computing, vol. 17, pp. 1,111-1,130, 1991.

Digital Library

[12]

R.W. Hockney, "A Framework for Benchmark Performance Analysis," Computer Benchmarks, Advances in Parallel Computing, vol. 8, J.J. Dongarra and W. Gentzsch, eds., pp. 65-76, Elsevier Science, 1993.

Digital Library

[13]

R.W. Hockney, "Computational Similarity," Concurrency: Practice and Experience, vol. 7, no. 2, pp. 147-166, 1995.

[14]

R.W. Hockney and M. Berry, "Public International Benchmarks for Parallel Computers: PARKBENCH Committee Report No. 1," Scientific Computing, vol. 3, no. 2, pp.101-146, 1994.

Digital Library

[15]

R.W. Hockney and C.R. Jesshope, Parallel Computers: Architecture, Programming, and Algorithms. Philadelphia: Adam Hilger, 1988.

Digital Library

[16]

K. Hwang, Advanced Computer Architecture: Parallelism, Scalability, and Programmability. New York: McGraw-Hill, 1993

Digital Library

[17]

K. Hwang and Z. Xu, "scalable Parallel Computers for Real-Time Signal Processing," IEEE Signal Processing, July 1996.

[18]

IBM Corp., AIX Parallel Environment: Programming Primer, Release 2.0, Pub. No. SH26-7223, IBM Corp., June 1994.

[19]

MHPCC, "MHPCC 400-Node SP2 Environment," Maui High-Performance Computing Center, Maui, Hawaii, Oct. 1994

[20]

MIT/LL, "STAP Processor Benchmarks," MIT Lincoln Laboratory, Lexington, Mass., Feb. 28, 1994.

[21]

MIT/LL, "Commercial Programmable Processor Benchmarks: Detailed Design Documents," MIT Lincoln Laboratory, Lexington, Mass., July 29, 1994.

[22]

J. McComb, "Engineering and Scientific Subroutine Library (ESSL) Version 2.2 Presentation Guide," IBM Kingston, Jan. 28, 1994.

[23]

C.B. Stunkel D.G. Shea B. Abali M.G. Atkins C.A. Bender D.G. Grice P. Hochschild D.J. Joseph B.J. Nathanson R.A. Swetz R.F. Stucke M. Tsao and P.R. Varker, "The SP2 High-Performance Switch," IBM System J., vol. 34, no. 2, pp. 185-204, 1995.

Digital Library

[24]

L.G. Valiant, "A Bridging Model for Parallel Computation," Comm. ACM, vol. 33, no. 8, pp. 103-113, 1990.

Digital Library

[25]

Z. Xu and K. Hwang, "Modeling Communication Overhead: MPI and MPL Performance on the IBM SP2 Multicomputer," IEEE Parallel and Distributed Technology, vol. 7, no. 3, pp. 9-23, Mar. 1996.

Digital Library

[26]

Z. Xu and K. Hwang, "Early Prediction of MPP Performance: SP2, T3D, and Paragon Experiences," J. Parallel Computing, to appear, 1996.

Digital Library

Cited By

Guo QLow TAlachiotis NAkin BPileggi LHoe JFranchetti FPrvulovic M(2015)Enabling portable energy efficiency with memory accelerated libraryProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830788(750-761)Online publication date: 5-Dec-2015
https://dl.acm.org/doi/10.1145/2830772.2830788
Huang TZhu YQiu MYin XWang X(2013)Extending Amdahl's law and Gustafson's law by evaluating interconnections on multi-core processorsThe Journal of Supercomputing10.1007/s11227-013-0908-966:1(305-319)Online publication date: 1-Oct-2013
https://dl.acm.org/doi/10.1007/s11227-013-0908-9
Kwok YMaciejewski ASiegel HAhmad IGhafoor A(2006)A semi-static approach to mapping dynamic iterative tasks onto heterogeneous computing systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2005.06.01566:1(77-98)Online publication date: 1-Jan-2006
https://dl.acm.org/doi/10.1016/j.jpdc.2005.06.015
Show More Cited By

Index Terms

Benchmark Evaluation of the IBM SP2 for Parallel Signal Processing

Recommendations

MPPs and clusters for scalable computing
ISPAN '96: Proceedings of the 1996 International Symposium on Parallel Architectures, Algorithms and Networks

This article assess the state-of-the-art technology in massively parallel processors (MPPs) and clusters of workstations (COWs) for scalable parallel computing. We evaluate the IBM SP2, the Intel Paragon, the Cray T3D/T3E, and the ASCI TeraFLOPS system ...
MPI as a Coordination Layer for Communicating HPF Tasks
MPIDC '96: Proceedings of the Second MPI Developers Conference

Abstract: Data-parallel languages such as High Performance Fortran (HPF) present a simple execution model in which a single thread of control performs high-level operations on distributed arrays. These languages can greatly ease the development of ...
Benchmarking IBM SP1 system for SPMD programming
ICPADS '96: Proceedings of the 1996 International Conference on Parallel and Distributed Systems

The IBM SP1 is the first member of the IBM Scalable POWERparallel series, a distributed memory parallel computer based on RISC System/6000 processing element. In this paper, the benchmarking exercise of two message passing libraries, MPL and PVM, on the ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems

IEEE Transactions on Parallel and Distributed Systems Volume 7, Issue 5

May 1996

96 pages

ISSN:1045-9219

Editor:
Duncan H. Lawrie
Univ. of Illinois at Urbana-Champaign, Urbana

Issue’s Table of Contents

Copyright © Copyright © 1996 IEEE. All Rights Reserved.

Publisher

IEEE Press

Publication History

Published: 01 May 1996

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Guo QLow TAlachiotis NAkin BPileggi LHoe JFranchetti FPrvulovic M(2015)Enabling portable energy efficiency with memory accelerated libraryProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830788(750-761)Online publication date: 5-Dec-2015
https://dl.acm.org/doi/10.1145/2830772.2830788
Huang TZhu YQiu MYin XWang X(2013)Extending Amdahl's law and Gustafson's law by evaluating interconnections on multi-core processorsThe Journal of Supercomputing10.1007/s11227-013-0908-966:1(305-319)Online publication date: 1-Oct-2013
https://dl.acm.org/doi/10.1007/s11227-013-0908-9
Kwok YMaciejewski ASiegel HAhmad IGhafoor A(2006)A semi-static approach to mapping dynamic iterative tasks onto heterogeneous computing systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2005.06.01566:1(77-98)Online publication date: 1-Jan-2006
https://dl.acm.org/doi/10.1016/j.jpdc.2005.06.015
Souravlas SRoumeliotis M(2004)A pipeline technique for dynamic data transfer on a multiprocessor gridInternational Journal of Parallel Programming10.1023/B:IJPP.0000038068.80639.5232:5(361-388)Online publication date: 1-Oct-2004
https://dl.acm.org/doi/10.1023/B%3AIJPP.0000038068.80639.52
Gunawan TCai W(2003)Performance Analysis of a Myrinet-Based ClusterCluster Computing10.1023/A:10257098251216:4(299-313)Online publication date: 1-Oct-2003
https://dl.acm.org/doi/10.1023/A%3A1025709825121
Shibusawa SMakino HNimiya SHatta JBryant B(2000)Scatter and gather operations on an asynchronous communication modelProceedings of the 2000 ACM symposium on Applied computing - Volume 210.1145/338407.338544(685-691)Online publication date: 1-Mar-2000
https://dl.acm.org/doi/10.1145/338407.338544
Lebak JBojanczyk A(2000)Design and Performance Evaluation of a Portable Parallel Library for Space-Time Adaptive ProcessingIEEE Transactions on Parallel and Distributed Systems10.1109/71.84174411:3(287-298)Online publication date: 1-Mar-2000
https://dl.acm.org/doi/10.1109/71.841744
Ahmad IKwok YWu MShu W(2000)CASCHIEEE Concurrency10.1109/4434.8951018:4(21-33)Online publication date: 1-Oct-2000
https://dl.acm.org/doi/10.1109/4434.895101
Rauber TRünger G(2000)A Transformation Approach to Derive Efficient Parallel ImplementationsIEEE Transactions on Software Engineering10.1109/32.84449226:4(315-339)Online publication date: 1-Apr-2000
https://dl.acm.org/doi/10.1109/32.844492
Kwok YAhmad I(2000)Link contention-constrained scheduling and mapping of tasksCluster Computing10.1023/A:10190760031633:2(113-124)Online publication date: 1-Apr-2000
https://dl.acm.org/doi/10.1023/A%3A1019076003163
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents