article

Free access

A quantitative comparison of parallel computation models

Editor: Kenneth P. Birman Authors:

Ben H. H. Juurlink,

Harry A. G. WijshoffAuthors Info & Claims

ACM Transactions on Computer Systems (TOCS), Volume 16, Issue 3

Pages 271 - 318

https://doi.org/10.1145/290409.290412

Published: 01 August 1998 Publication History

Abstract

In recent years, a large number of parallel computation models have been proposed to replace the PRAM as the parallel computation model presented to the algorithm designer. Although mostly the theoretical justifications for these models are sound, and many algorithmic results where obtained through these models, little experimentation has been conducted to validate the effectiveness of these models for developing cost-effective algorithms and applications on existing hardware platforms. In this article a first attempt is made to perform a detailed experimental account on the preciseness of these models. The achieve this, three models (BSP, E-BSP, and BPRAM) were selected and validated on five parallel platforms (Cray T3E, Thinking Machines CM-5, Intel Paragon, MasPar MP-1, and Parsytec GCel). The work described in this article consists of three parts. First, the predictive capabilities of the models are investigated. Unlike previous experimental work, which mostly demonstrated a close match between the measuredd and predicted execution times, this article shows that there are several situations in which the models do not precisely predict the actual runtime behavior of an algorithm implementation. Second, a comparison between the models is provided in order to determine the model that induces that most efficient algorithms. Lastly, the performance achieved by the model-derived algorithms is compared with the performace attained by machine-specific algorithms in order to examine the effectiveness of deriving fast algorithms through the formalisms of the models.

References

[1]

AGGARWAL, A., CHANDRA, A. K., AND SNIR, M. 1989. On communication latency in PRAM computations. In Proceedings of the 1989 ACM Symposium on Parallel Algorithms and Architectures (SPAA '89, Santa Fe, NM, June 18-21), F. T. Leighton, Ed. ACM Press, New York, NY, 11-21.]]

[2]

AGGARWAL, A., CHANDRA, A. K., AND SNIR, M. 1990. Communication complexity of PRAMs. Theor. Comput. Sci. 71, 1 (Mar.), 3-28.]]

[3]

AHO, A., HOPCROFT, J., AND ULLMAN, J. 1983. Data Structures and Algorithms. Addison-Wesley, Reading, MA.]]

[4]

BATCHER, K. 1968. Sorting networks and their applications. In Proceedings of the AFIPS Spring Joint Computer Conference. AFIPS Press, Arlington, VA, 307-314.]]

[5]

BLANK, T. 1990. The MasPar MP-1 architecture. In Proceedings of IEEE CompCon Spring. IEEE Press, Piscataway, NJ, 20-24.]]

[6]

BLELLOCH, G. E., LEISERSON, C. E., MAGGS, B. M., PLAXTON, C. G., SMITH, S. J., AND ZAGHA, M. 1991. A comparison of sorting algorithms for the connection machine CM-2. In Proceedings of the 3rd Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA '91, Hilton Head, SC, July 21-24). ACM Press, New York, NY, 3-16.]]

[7]

CULLER, D., DUSSEAU, A., MARTIN, R., AND SCHAUSER, K. 1994. Fast parallel sorting under LogP: From theory to practice. In Portability and Performance for Parallel Processing, Hey, T. and Ferrante, J., Eds. John Wiley & Sons, Inc., New York, NY.]]

[8]

CULLER, D., KARP, R., PATTERSON, D., SAHAY, A., SCHAUSER, K. E., SANTOS, E., SUBRAMONIAN, R., AND VON EICKEN, T. 1993. LogP: Towards a realistic model of parallel computation. SIGPLAN Not. 28, 7 (July), 1-12.]]

[9]

DE LA TORRE, P. AND KRUSHAL, C. P. 1991. Towards a single model of efficient computation in real parallel machines. In Proceedings of the Conference on Parallel Architectures and Languages Europe: Vol. 1, Parallel Architectures and Algorithms (PARLE '91, Eindhoven, The Netherlands, June 10-13), E. H. L. Aarts, J. van Leeuwen, and M. Rem, Eds. Lecture Notes in Computer Science, vol. 505. Springer-Verlag, New York, NY, 7-24.]]

[10]

DIEKMANN, R., GEHRIG, J., LULING, R., MONIEN, B., NUBEL, M., AND WANKA, R. 1994. Sorting large data sets on a massively parallel system. In Proceedings of the Syposium on Parallel and Distributed Processing.]]

[11]

FORTUNE, S. AND WYLLIE, J. 1978. Parallelism in random access machines. In Proceedings of the lOth Symposium on Theory of Computing. ACM Press, New York, NY, 114-118.]]

[12]

GEIST, A., BEGUELIN, A., DONGARRA, J., JIANG, W., MANCHEK, R., AND SUNDERAM, V. 1993. PVM 3 user's guide and reference manual. Tech. Rep. TM-12187. Oak Ridge National Laboratory, Oak Ridge, TN.]]

[13]

GERBESSIOTIS, A. AND VALIANT, L. 1992. Direct bulk-synchronous parallel algorithms. In Proceedings of the 3rd Scandinavian Workshop on Algorithm Theory, 0. Nurmi, Ed. Lecture Notes in Computer Science, vol. 621. Springer-Verlag, Berlin, Germany, 1-18.]]

[14]

GOUDREAU, M., LANG, K., RAO, S., SUEL, T., AND TSANTILAS, T. 1996. Towards efficiency and portability: Programming with the BSP model. In Proceedings of the 8th Symposium on Parallel Algorithms and Architectures. ACM Press, New York, NY, 1-12.]]

[15]

GROSCUP, W. 1992. The Intel Paragon XP/S supercomputer. In Proceedings of the 5th ECMWF Workshop on the Use of Parallel Processors in Meteorology.]]

[16]

HEYWOOD, W. AND RANKA, S. 1992. A practical hierarchical model of parallel computation I: The model. J. Parallel Distrib. Comput. 16, 212-232.]]

[17]

HIGHTOWER, W. L., PRINS, J. F., AND REIF, J. H. 1992. Implementations of randomized sorting on large parallel machines. In Proceedings of the 4th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA '92, San Diego, CA, June 29-July 1). ACM Press, New York, NY, 158-167.]]

[18]

HILL, J., MCCOLL, W., STEFANESCU, D., GOUDREAU, M., LANG, K., RAO, S., SUEL, T., TSANTILAS, T., AND BISSELING, R. 1997. The BSPlib--The BSP programming library.]]

[19]

HONG, J. AND KUNG, g. 1981. I/O complexity: The red-blue pebble game. In Proceedings of the 13th Annual ACM Symposium on Theory of Computing (STOC 81). ACM, New York, NY, 326-333.]]

[20]

JUURLINK, B. g. g. 1998. Experimental validation of parallel computations models on the Intel Paragon. In Proceedings of the International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing (IPPS/SPDP '98).]]

[21]

JUURLINK, B. g. g. AND WIJSHOFF, g. A. a. 1993. Experiences with a model for parallel computation. In Proceedings of the 12th Annual ACM Symposium on Principles of Distributed Computing (PODC '93, Ithaca, NY, August 15-18). ACM Press, New York, NY, 87-96.]]

[22]

JUURLINK, B. AND WIJSHOFF, g. 1996a. A quantitative comparison of parallel computation models. In Proceedings of the 8th Symposium on Parallel Algorithms and Architectures. ACM Press, New York, NY, 13-24. Full version available as TR-96-01, Leiden University, The Netherlands.]]

[23]

JUURLINK, B. AND WIJSHOFF, g. 1996b. Communication primitives for BSP computers. Inf. Process.Lett. 58, 6 (June), 303-310.]]

[24]

JUURLINK, B. AND WIJSHOFF, g. 1996c. The E-BSP model: Incorporating unbalanced communication and general locality into the BSP model. In Proceedings of Eur-Par '96 (Euro-Par 96). Lecture Notes in Computer Science, vol. 1124. Springer-Verlag, Berlin, Germany, 339-347.]]

[25]

KRISHNAMURTHY, A., CULLER, D. E., DUSSEAU, A., GOLDSTEIN, S. C., LUMETTA, S., VON SICKEN, T., AND YELICK, K. 1993. Parallel programming in Split-C. In Proceedings of Supercomputing (Supercomputing '93, Portland, OR, Nov. 15-19). IEEE Computer Society Press, Los Alamitos, CA, 262-273.]]

[26]

KUMAR, V., GRAMA, A., GUPTA, A., AND KARYPIS, G. 1994. Introduction to Parallel Programming. Benjamin-Cummings Publ. Co., Inc., Redwood City, CA.]]

[27]

LANGHAMMER, F. 1992. Second generation and teraflops parallel computers. In Parallel Computing and Transputer Applications, Valero, M., Onate, E., Jane, M., Larriba, J., and Suarez, B., Eds. IOS Press, Amsterdam, The Netherlands, 62-79.]]

[28]

LEISERSON, C. E., ABUHAMDEH, Z. S., DOUGLAS, D. C., FEYNMAN, C. R., GANMUKHI, M. N., HILL, J. V., gILLIE, D., KUSZMAUL, B. C., ST. PIERRE, M. A., WELLS, D. S., TONG, M. C., YANG, S.-W., AND ZAK, R. 1992. The network architecture of the Connection Machine CM-5 (extended abstract). In Proceedings of the 4th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA '92, San Diego, CA, June 29-July 1). ACM Press, New York, NY, 272-285.]]

[29]

MCCOLL, W. F. 1993. General purpose parallel computing. In Lectures on Parallel Computation, Gibbons, A. and Spirakis, P., Eds. Cambridge International Series on Parallel Computation. Cambridge University Press, New York, NY, 337-391.]]

[30]

MCCOLL, W. 1995. Scalable computing. In Computer Science Today: Recent Trends and Developments. Springer Lecture Notes in Computer Science, vol. 1000. Springer-Verlag, Berlin, Germany.]]

[31]

THE MPI FORUM. 1993. MPI: A message passing interface. In Proceedings of Supercomputing (Supercomputing '93, Portland, OR, Nov. 15-19). IEEE Computer Society Press, Los Alamitos, CA, 878-883.]]

[32]

NICKOLLS, J. 1990. The design of the MasPar MP-I: A cost-effective massively parallel computer. In Proceedings of IEEE CompCon Spring. IEEE Press, Piscataway, NJ, 25-28.]]

[33]

OBERLIN, S., KESSLER, R., SCOTT, S., AND THORSON, a. 1996. Cray T3E architecture overview. Cray Supercomputers, Chippewa Falls, MN.]]

[34]

SHUMAKER, G. AND GOUDREAU, M. 1997. Bulk-synchronous parallel computing on the Maspar. In Proceedings of the World Multiconference on Systemics, Cybernetics and Informatics. 475-481.]]

[35]

SKILLICORN, D. 1991. Models for practical parallel computation. Int. J. Parallel Program. 20, 2, 133-158.]]

[36]

SKILLICORN, D., HILL, J., AND MCCOLL, W. 1997. Questions and answers about BSP. J. Sci. Program. 6, 3, 249-274.]]

[37]

ULLMAN, J. AND YANNAKAKIS, M. 1991. The input/output complexity of transitive closure. Ann. Math. Art. Intell. 3, 331-360.]]

[38]

VALIANT, L. a. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (Aug.), 103-111.]]

Cited By

Bilardi GScquizzato MSilvestri F(2018)A Lower Bound Technique for Communication in BSPACM Transactions on Parallel Computing10.1145/31817764:3(1-27)Online publication date: 20-Feb-2018
https://dl.acm.org/doi/10.1145/3181776
Amaris Mde Camargo RDyab MGoldman ATrystram D(2016)A comparison of GPU execution time prediction using machine learning and analytical modeling2016 IEEE 15th International Symposium on Network Computing and Applications (NCA)10.1109/NCA.2016.7778637(326-333)Online publication date: Oct-2016
https://doi.org/10.1109/NCA.2016.7778637
Bader D(2016)High Performance Algorithm Engineering for Large-Scale ProblemsEncyclopedia of Algorithms10.1007/978-1-4939-2864-4_178(914-918)Online publication date: 22-Apr-2016
https://doi.org/10.1007/978-1-4939-2864-4_178
Show More Cited By

Index Terms

A quantitative comparison of parallel computation models

Recommendations

A perspective on the future of massively parallel computing: fine-grain vs. coarse-grain parallel models comparison & contrast
CF '04: Proceedings of the 1st conference on Computing frontiers

Models, architectures and languages for parallel computation have been of utmost research interest in computer science and engineering for several decades. A great variety of parallel computation models has been proposed and studied, and different ...
Models of parallel computation: a survey and synthesis
HICSS '95: Proceedings of the 28th Hawaii International Conference on System Sciences

In the realm of sequential computing, the random access machine has successfully provided an underlying model of computation that has promoted consistency and coordination among algorithm developers, computer architects and language experts. In the ...
Performance measurement and comparison of a set of parallel periodic and non-periodic tridiagonal solvers
ISPAN '96: Proceedings of the 1996 International Symposium on Parallel Architectures, Algorithms and Networks

Various traditional solvers have been proposed in recent years for different parallel platforms. In this paper, the performance of three tridiagonal solvers, namely, the parallel partition LU algorithm, the parallel diagonal dominant algorithm, and the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer Systems

ACM Transactions on Computer Systems Volume 16, Issue 3

Aug. 1998

112 pages

ISSN:0734-2071

EISSN:1557-7333

DOI:10.1145/290409

Editor:
Kenneth P. Birman
Cornell Univ., Ithaca, NY

Issue’s Table of Contents

Copyright © 1998 ACM.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 1998

Published in TOCS Volume 16, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
1,262
Total Downloads

Downloads (Last 12 months)54
Downloads (Last 6 weeks)18

Reflects downloads up to 11 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bilardi GScquizzato MSilvestri F(2018)A Lower Bound Technique for Communication in BSPACM Transactions on Parallel Computing10.1145/31817764:3(1-27)Online publication date: 20-Feb-2018
https://dl.acm.org/doi/10.1145/3181776
Amaris Mde Camargo RDyab MGoldman ATrystram D(2016)A comparison of GPU execution time prediction using machine learning and analytical modeling2016 IEEE 15th International Symposium on Network Computing and Applications (NCA)10.1109/NCA.2016.7778637(326-333)Online publication date: Oct-2016
https://doi.org/10.1109/NCA.2016.7778637
Bader D(2016)High Performance Algorithm Engineering for Large-Scale ProblemsEncyclopedia of Algorithms10.1007/978-1-4939-2864-4_178(914-918)Online publication date: 22-Apr-2016
https://doi.org/10.1007/978-1-4939-2864-4_178
Amaris MCordeiro DGoldman ACamargo R(2015)A Simple BSP-based Model to Predict Execution Time in GPU ApplicationsProceedings of the 2015 IEEE 22nd International Conference on High Performance Computing (HiPC)10.1109/HiPC.2015.34(285-294)Online publication date: 16-Dec-2015
https://dl.acm.org/doi/10.1109/HiPC.2015.34
Tatas KSiozios KSoudris DJantsch ATatas KSiozios KSoudris DJantsch A(2013)NoC Modeling and Topology ExplorationDesigning 2D and 3D Network-on-Chip Architectures10.1007/978-1-4614-4274-5_2(19-49)Online publication date: 9-Oct-2013
https://doi.org/10.1007/978-1-4614-4274-5_2
Bilardi GScquizzato MSilvestri F(2012)A lower bound technique for communication on BSP with application to the FFTProceedings of the 18th international conference on Parallel Processing10.1007/978-3-642-32820-6_67(676-687)Online publication date: 27-Aug-2012
https://dl.acm.org/doi/10.1007/978-3-642-32820-6_67
Sasaki STanaka A(2011)Modelling coherence overhead of multi-versioned caches for random accessesInternational Journal of Parallel, Emergent and Distributed Systems10.1080/17445760.2010.48178726:4(291-311)Online publication date: 1-Aug-2011
https://dl.acm.org/doi/10.1080/17445760.2010.481787
(2010)Algorithm engineeringundefinedOnline publication date: 1-Jan-2010
Bader D(2008)High Performance Algorithm Engineering for Large-scale ProblemsEncyclopedia of Algorithms10.1007/978-0-387-30162-4_178(387-390)Online publication date: 2008
https://doi.org/10.1007/978-0-387-30162-4_178
Bjerregaard TMahadevan S(2006)A survey of research and practices of Network-on-chipACM Computing Surveys10.1145/1132952.113295338:1(1-es)Online publication date: 29-Jun-2006
https://dl.acm.org/doi/10.1145/1132952.1132953
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents