Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Benchmark Evaluation of the IBM SP2 for Parallel Signal Processing

Published: 01 May 1996 Publication History

Abstract

This paper evaluates the IBM SP2 architecture, the AIX parallel programming environment, and the IBM message-passing library (MPL) through STAP (Space-Time Adaptive Processing) benchmark experiments. Only coarse-grain parallelism was exploited on the SP2 due to its high communication overhead. A new parallelization scheme is developed for programming message passing multicomputers. Parallel STAP benchmark structures are illustrated with domain decomposition, efficient mapping of partitioned programs, and optimization of collective communication operations. We measure the SP2 performance in terms of execution time, Gflop/s rate, speedup over a single SP2 node, and overall system utilization. With 256 nodes, the Maui SP2 demonstrated the best performance of 23 Gflop/s in executing the High-Order Post-Doppler program, corresponding to a 34% system utilization. We have conducted a scalability analysis to reveal the performance growth rate as a function of machine size and STAP problem size. Important lessons learned from these parallel processing benchmark experiments are discussed in the context of real-time, adaptive, radar signal processing on massively parallel processors (MPP).

References

[1]
D. Adams, "Cray T3D System Architecture Overview Manual," http://www.cray.com, Cray Research, Inc., Sept. 1993.
[2]
R.C. Agarwal F.G. Gustavson and M. Zubair, "Exploiting Functional Parallelism of POWER2 to Design High-Performance Numerical Algorithms," IBM J. Research and Development, vol. 38, no. 5, pp. 563-576, 1994.
[3]
R.C. Agarwal, et al., "High-Performance Implementations of the NAS Kernel Benchmarks on the IBM SP2," IBM System J., vol. 34, no. 2, pp. 263-272, 1995.
[4]
T. Agerwala J.L. Martin J.H. Mirza D. C. Sadler D.M. Dias and M. Snir, "SP2 System Architecture," IBM System J., vol. 34, no. 2, pp. 152-184, 1995.
[5]
D.P. Bertsekas and J.N. Tsitsiklis, Parallel and Distributed Computing. Englewood Cliffs, N.J.: Prentice Hall, 1989.
[6]
R. Bond, "Measuring Performance and Scalability Using Extended Versions of the STAP Processor Benchmarks," technical report, MIT Lincoln Laboratory, Dec. 1994.
[7]
J.J. Dongarra, "The Performance Database Server (PDS): Reports: Linpack Benchmark - Parallel," http://performance.netlib.org/performance/html/linpack-parallel.data.co10.html.
[8]
A. Geist A. Beguelin J. Dongarra W. Jiang R. Mancheck and V. Sunderam, PVM: Parallel Virtual Machine—A User's Guide and Tutorial for Networked Parallel Computing. Cambridge, Mass.: MIT Press, 1994.
[9]
W. Gropp E. Lusk and A. Skjellum, Using MPI: Portable Parallel Programming with the Message Passing Interface. Cambridge, Mass.: MIT Press, 1994
[10]
P. Brinch Hansen, Studies in Computational Science: Parallel Programming Paradigms. Englewood Cliffs, N.J.: Prentice Hall, 1995.
[11]
R.W. Hockney, "Performance Parameters and Benchmarking of Supercomputers," Parallel Computing, vol. 17, pp. 1,111-1,130, 1991.
[12]
R.W. Hockney, "A Framework for Benchmark Performance Analysis," Computer Benchmarks, Advances in Parallel Computing, vol. 8, J.J. Dongarra and W. Gentzsch, eds., pp. 65-76, Elsevier Science, 1993.
[13]
R.W. Hockney, "Computational Similarity," Concurrency: Practice and Experience, vol. 7, no. 2, pp. 147-166, 1995.
[14]
R.W. Hockney and M. Berry, "Public International Benchmarks for Parallel Computers: PARKBENCH Committee Report No. 1," Scientific Computing, vol. 3, no. 2, pp.101-146, 1994.
[15]
R.W. Hockney and C.R. Jesshope, Parallel Computers: Architecture, Programming, and Algorithms. Philadelphia: Adam Hilger, 1988.
[16]
K. Hwang, Advanced Computer Architecture: Parallelism, Scalability, and Programmability. New York: McGraw-Hill, 1993
[17]
K. Hwang and Z. Xu, "scalable Parallel Computers for Real-Time Signal Processing," IEEE Signal Processing, July 1996.
[18]
IBM Corp., AIX Parallel Environment: Programming Primer, Release 2.0, Pub. No. SH26-7223, IBM Corp., June 1994.
[19]
MHPCC, "MHPCC 400-Node SP2 Environment," Maui High-Performance Computing Center, Maui, Hawaii, Oct. 1994
[20]
MIT/LL, "STAP Processor Benchmarks," MIT Lincoln Laboratory, Lexington, Mass., Feb. 28, 1994.
[21]
MIT/LL, "Commercial Programmable Processor Benchmarks: Detailed Design Documents," MIT Lincoln Laboratory, Lexington, Mass., July 29, 1994.
[22]
J. McComb, "Engineering and Scientific Subroutine Library (ESSL) Version 2.2 Presentation Guide," IBM Kingston, Jan. 28, 1994.
[23]
C.B. Stunkel D.G. Shea B. Abali M.G. Atkins C.A. Bender D.G. Grice P. Hochschild D.J. Joseph B.J. Nathanson R.A. Swetz R.F. Stucke M. Tsao and P.R. Varker, "The SP2 High-Performance Switch," IBM System J., vol. 34, no. 2, pp. 185-204, 1995.
[24]
L.G. Valiant, "A Bridging Model for Parallel Computation," Comm. ACM, vol. 33, no. 8, pp. 103-113, 1990.
[25]
Z. Xu and K. Hwang, "Modeling Communication Overhead: MPI and MPL Performance on the IBM SP2 Multicomputer," IEEE Parallel and Distributed Technology, vol. 7, no. 3, pp. 9-23, Mar. 1996.
[26]
Z. Xu and K. Hwang, "Early Prediction of MPP Performance: SP2, T3D, and Paragon Experiences," J. Parallel Computing, to appear, 1996.

Cited By

View all
  • (2015)Enabling portable energy efficiency with memory accelerated libraryProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830788(750-761)Online publication date: 5-Dec-2015
  • (2013)Extending Amdahl's law and Gustafson's law by evaluating interconnections on multi-core processorsThe Journal of Supercomputing10.1007/s11227-013-0908-966:1(305-319)Online publication date: 1-Oct-2013
  • (2006)A semi-static approach to mapping dynamic iterative tasks onto heterogeneous computing systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2005.06.01566:1(77-98)Online publication date: 1-Jan-2006
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems  Volume 7, Issue 5
May 1996
96 pages
ISSN:1045-9219
Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 May 1996

Author Tags

  1. Message passing
  2. STAP benchmarks
  3. adaptive sensor array processing
  4. data parallelism
  5. massively parallel processors
  6. performance evaluation
  7. programmability
  8. real-time applications.
  9. scalability

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2015)Enabling portable energy efficiency with memory accelerated libraryProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830788(750-761)Online publication date: 5-Dec-2015
  • (2013)Extending Amdahl's law and Gustafson's law by evaluating interconnections on multi-core processorsThe Journal of Supercomputing10.1007/s11227-013-0908-966:1(305-319)Online publication date: 1-Oct-2013
  • (2006)A semi-static approach to mapping dynamic iterative tasks onto heterogeneous computing systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2005.06.01566:1(77-98)Online publication date: 1-Jan-2006
  • (2004)A pipeline technique for dynamic data transfer on a multiprocessor gridInternational Journal of Parallel Programming10.1023/B:IJPP.0000038068.80639.5232:5(361-388)Online publication date: 1-Oct-2004
  • (2003)Performance Analysis of a Myrinet-Based ClusterCluster Computing10.1023/A:10257098251216:4(299-313)Online publication date: 1-Oct-2003
  • (2000)Scatter and gather operations on an asynchronous communication modelProceedings of the 2000 ACM symposium on Applied computing - Volume 210.1145/338407.338544(685-691)Online publication date: 1-Mar-2000
  • (2000)Design and Performance Evaluation of a Portable Parallel Library for Space-Time Adaptive ProcessingIEEE Transactions on Parallel and Distributed Systems10.1109/71.84174411:3(287-298)Online publication date: 1-Mar-2000
  • (2000)CASCHIEEE Concurrency10.1109/4434.8951018:4(21-33)Online publication date: 1-Oct-2000
  • (2000)A Transformation Approach to Derive Efficient Parallel ImplementationsIEEE Transactions on Software Engineering10.1109/32.84449226:4(315-339)Online publication date: 1-Apr-2000
  • (2000)Link contention-constrained scheduling and mapping of tasksCluster Computing10.1023/A:10190760031633:2(113-124)Online publication date: 1-Apr-2000
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media