Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

A quantitative comparison of parallel computation models

Published: 01 August 1998 Publication History
  • Get Citation Alerts
  • Abstract

    In recent years, a large number of parallel computation models have been proposed to replace the PRAM as the parallel computation model presented to the algorithm designer. Although mostly the theoretical justifications for these models are sound, and many algorithmic results where obtained through these models, little experimentation has been conducted to validate the effectiveness of these models for developing cost-effective algorithms and applications on existing hardware platforms. In this article a first attempt is made to perform a detailed experimental account on the preciseness of these models. The achieve this, three models (BSP, E-BSP, and BPRAM) were selected and validated on five parallel platforms (Cray T3E, Thinking Machines CM-5, Intel Paragon, MasPar MP-1, and Parsytec GCel). The work described in this article consists of three parts. First, the predictive capabilities of the models are investigated. Unlike previous experimental work, which mostly demonstrated a close match between the measuredd and predicted execution times, this article shows that there are several situations in which the models do not precisely predict the actual runtime behavior of an algorithm implementation. Second, a comparison between the models is provided in order to determine the model that induces that most efficient algorithms. Lastly, the performance achieved by the model-derived algorithms is compared with the performace attained by machine-specific algorithms in order to examine the effectiveness of deriving fast algorithms through the formalisms of the models.

    References

    [1]
    AGGARWAL, A., CHANDRA, A. K., AND SNIR, M. 1989. On communication latency in PRAM computations. In Proceedings of the 1989 ACM Symposium on Parallel Algorithms and Architectures (SPAA '89, Santa Fe, NM, June 18-21), F. T. Leighton, Ed. ACM Press, New York, NY, 11-21.]]
    [2]
    AGGARWAL, A., CHANDRA, A. K., AND SNIR, M. 1990. Communication complexity of PRAMs. Theor. Comput. Sci. 71, 1 (Mar.), 3-28.]]
    [3]
    AHO, A., HOPCROFT, J., AND ULLMAN, J. 1983. Data Structures and Algorithms. Addison-Wesley, Reading, MA.]]
    [4]
    BATCHER, K. 1968. Sorting networks and their applications. In Proceedings of the AFIPS Spring Joint Computer Conference. AFIPS Press, Arlington, VA, 307-314.]]
    [5]
    BLANK, T. 1990. The MasPar MP-1 architecture. In Proceedings of IEEE CompCon Spring. IEEE Press, Piscataway, NJ, 20-24.]]
    [6]
    BLELLOCH, G. E., LEISERSON, C. E., MAGGS, B. M., PLAXTON, C. G., SMITH, S. J., AND ZAGHA, M. 1991. A comparison of sorting algorithms for the connection machine CM-2. In Proceedings of the 3rd Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA '91, Hilton Head, SC, July 21-24). ACM Press, New York, NY, 3-16.]]
    [7]
    CULLER, D., DUSSEAU, A., MARTIN, R., AND SCHAUSER, K. 1994. Fast parallel sorting under LogP: From theory to practice. In Portability and Performance for Parallel Processing, Hey, T. and Ferrante, J., Eds. John Wiley & Sons, Inc., New York, NY.]]
    [8]
    CULLER, D., KARP, R., PATTERSON, D., SAHAY, A., SCHAUSER, K. E., SANTOS, E., SUBRAMONIAN, R., AND VON EICKEN, T. 1993. LogP: Towards a realistic model of parallel computation. SIGPLAN Not. 28, 7 (July), 1-12.]]
    [9]
    DE LA TORRE, P. AND KRUSHAL, C. P. 1991. Towards a single model of efficient computation in real parallel machines. In Proceedings of the Conference on Parallel Architectures and Languages Europe: Vol. 1, Parallel Architectures and Algorithms (PARLE '91, Eindhoven, The Netherlands, June 10-13), E. H. L. Aarts, J. van Leeuwen, and M. Rem, Eds. Lecture Notes in Computer Science, vol. 505. Springer-Verlag, New York, NY, 7-24.]]
    [10]
    DIEKMANN, R., GEHRIG, J., LULING, R., MONIEN, B., NUBEL, M., AND WANKA, R. 1994. Sorting large data sets on a massively parallel system. In Proceedings of the Syposium on Parallel and Distributed Processing.]]
    [11]
    FORTUNE, S. AND WYLLIE, J. 1978. Parallelism in random access machines. In Proceedings of the lOth Symposium on Theory of Computing. ACM Press, New York, NY, 114-118.]]
    [12]
    GEIST, A., BEGUELIN, A., DONGARRA, J., JIANG, W., MANCHEK, R., AND SUNDERAM, V. 1993. PVM 3 user's guide and reference manual. Tech. Rep. TM-12187. Oak Ridge National Laboratory, Oak Ridge, TN.]]
    [13]
    GERBESSIOTIS, A. AND VALIANT, L. 1992. Direct bulk-synchronous parallel algorithms. In Proceedings of the 3rd Scandinavian Workshop on Algorithm Theory, 0. Nurmi, Ed. Lecture Notes in Computer Science, vol. 621. Springer-Verlag, Berlin, Germany, 1-18.]]
    [14]
    GOUDREAU, M., LANG, K., RAO, S., SUEL, T., AND TSANTILAS, T. 1996. Towards efficiency and portability: Programming with the BSP model. In Proceedings of the 8th Symposium on Parallel Algorithms and Architectures. ACM Press, New York, NY, 1-12.]]
    [15]
    GROSCUP, W. 1992. The Intel Paragon XP/S supercomputer. In Proceedings of the 5th ECMWF Workshop on the Use of Parallel Processors in Meteorology.]]
    [16]
    HEYWOOD, W. AND RANKA, S. 1992. A practical hierarchical model of parallel computation I: The model. J. Parallel Distrib. Comput. 16, 212-232.]]
    [17]
    HIGHTOWER, W. L., PRINS, J. F., AND REIF, J. H. 1992. Implementations of randomized sorting on large parallel machines. In Proceedings of the 4th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA '92, San Diego, CA, June 29-July 1). ACM Press, New York, NY, 158-167.]]
    [18]
    HILL, J., MCCOLL, W., STEFANESCU, D., GOUDREAU, M., LANG, K., RAO, S., SUEL, T., TSANTILAS, T., AND BISSELING, R. 1997. The BSPlib--The BSP programming library.]]
    [19]
    HONG, J. AND KUNG, g. 1981. I/O complexity: The red-blue pebble game. In Proceedings of the 13th Annual ACM Symposium on Theory of Computing (STOC 81). ACM, New York, NY, 326-333.]]
    [20]
    JUURLINK, B. g. g. 1998. Experimental validation of parallel computations models on the Intel Paragon. In Proceedings of the International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing (IPPS/SPDP '98).]]
    [21]
    JUURLINK, B. g. g. AND WIJSHOFF, g. A. a. 1993. Experiences with a model for parallel computation. In Proceedings of the 12th Annual ACM Symposium on Principles of Distributed Computing (PODC '93, Ithaca, NY, August 15-18). ACM Press, New York, NY, 87-96.]]
    [22]
    JUURLINK, B. AND WIJSHOFF, g. 1996a. A quantitative comparison of parallel computation models. In Proceedings of the 8th Symposium on Parallel Algorithms and Architectures. ACM Press, New York, NY, 13-24. Full version available as TR-96-01, Leiden University, The Netherlands.]]
    [23]
    JUURLINK, B. AND WIJSHOFF, g. 1996b. Communication primitives for BSP computers. Inf. Process.Lett. 58, 6 (June), 303-310.]]
    [24]
    JUURLINK, B. AND WIJSHOFF, g. 1996c. The E-BSP model: Incorporating unbalanced communication and general locality into the BSP model. In Proceedings of Eur-Par '96 (Euro-Par 96). Lecture Notes in Computer Science, vol. 1124. Springer-Verlag, Berlin, Germany, 339-347.]]
    [25]
    KRISHNAMURTHY, A., CULLER, D. E., DUSSEAU, A., GOLDSTEIN, S. C., LUMETTA, S., VON SICKEN, T., AND YELICK, K. 1993. Parallel programming in Split-C. In Proceedings of Supercomputing (Supercomputing '93, Portland, OR, Nov. 15-19). IEEE Computer Society Press, Los Alamitos, CA, 262-273.]]
    [26]
    KUMAR, V., GRAMA, A., GUPTA, A., AND KARYPIS, G. 1994. Introduction to Parallel Programming. Benjamin-Cummings Publ. Co., Inc., Redwood City, CA.]]
    [27]
    LANGHAMMER, F. 1992. Second generation and teraflops parallel computers. In Parallel Computing and Transputer Applications, Valero, M., Onate, E., Jane, M., Larriba, J., and Suarez, B., Eds. IOS Press, Amsterdam, The Netherlands, 62-79.]]
    [28]
    LEISERSON, C. E., ABUHAMDEH, Z. S., DOUGLAS, D. C., FEYNMAN, C. R., GANMUKHI, M. N., HILL, J. V., gILLIE, D., KUSZMAUL, B. C., ST. PIERRE, M. A., WELLS, D. S., TONG, M. C., YANG, S.-W., AND ZAK, R. 1992. The network architecture of the Connection Machine CM-5 (extended abstract). In Proceedings of the 4th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA '92, San Diego, CA, June 29-July 1). ACM Press, New York, NY, 272-285.]]
    [29]
    MCCOLL, W. F. 1993. General purpose parallel computing. In Lectures on Parallel Computation, Gibbons, A. and Spirakis, P., Eds. Cambridge International Series on Parallel Computation. Cambridge University Press, New York, NY, 337-391.]]
    [30]
    MCCOLL, W. 1995. Scalable computing. In Computer Science Today: Recent Trends and Developments. Springer Lecture Notes in Computer Science, vol. 1000. Springer-Verlag, Berlin, Germany.]]
    [31]
    THE MPI FORUM. 1993. MPI: A message passing interface. In Proceedings of Supercomputing (Supercomputing '93, Portland, OR, Nov. 15-19). IEEE Computer Society Press, Los Alamitos, CA, 878-883.]]
    [32]
    NICKOLLS, J. 1990. The design of the MasPar MP-I: A cost-effective massively parallel computer. In Proceedings of IEEE CompCon Spring. IEEE Press, Piscataway, NJ, 25-28.]]
    [33]
    OBERLIN, S., KESSLER, R., SCOTT, S., AND THORSON, a. 1996. Cray T3E architecture overview. Cray Supercomputers, Chippewa Falls, MN.]]
    [34]
    SHUMAKER, G. AND GOUDREAU, M. 1997. Bulk-synchronous parallel computing on the Maspar. In Proceedings of the World Multiconference on Systemics, Cybernetics and Informatics. 475-481.]]
    [35]
    SKILLICORN, D. 1991. Models for practical parallel computation. Int. J. Parallel Program. 20, 2, 133-158.]]
    [36]
    SKILLICORN, D., HILL, J., AND MCCOLL, W. 1997. Questions and answers about BSP. J. Sci. Program. 6, 3, 249-274.]]
    [37]
    ULLMAN, J. AND YANNAKAKIS, M. 1991. The input/output complexity of transitive closure. Ann. Math. Art. Intell. 3, 331-360.]]
    [38]
    VALIANT, L. a. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (Aug.), 103-111.]]

    Cited By

    View all
    • (2018)A Lower Bound Technique for Communication in BSPACM Transactions on Parallel Computing10.1145/31817764:3(1-27)Online publication date: 20-Feb-2018
    • (2016)A comparison of GPU execution time prediction using machine learning and analytical modeling2016 IEEE 15th International Symposium on Network Computing and Applications (NCA)10.1109/NCA.2016.7778637(326-333)Online publication date: Oct-2016
    • (2016)High Performance Algorithm Engineering for Large-Scale ProblemsEncyclopedia of Algorithms10.1007/978-1-4939-2864-4_178(914-918)Online publication date: 22-Apr-2016
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Computer Systems
    ACM Transactions on Computer Systems  Volume 16, Issue 3
    Aug. 1998
    112 pages
    ISSN:0734-2071
    EISSN:1557-7333
    DOI:10.1145/290409
    • Editor:
    • Kenneth P. Birman
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 August 1998
    Published in TOCS Volume 16, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. parallel computation models
    2. performance evaluation

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)54
    • Downloads (Last 6 weeks)18
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)A Lower Bound Technique for Communication in BSPACM Transactions on Parallel Computing10.1145/31817764:3(1-27)Online publication date: 20-Feb-2018
    • (2016)A comparison of GPU execution time prediction using machine learning and analytical modeling2016 IEEE 15th International Symposium on Network Computing and Applications (NCA)10.1109/NCA.2016.7778637(326-333)Online publication date: Oct-2016
    • (2016)High Performance Algorithm Engineering for Large-Scale ProblemsEncyclopedia of Algorithms10.1007/978-1-4939-2864-4_178(914-918)Online publication date: 22-Apr-2016
    • (2015)A Simple BSP-based Model to Predict Execution Time in GPU ApplicationsProceedings of the 2015 IEEE 22nd International Conference on High Performance Computing (HiPC)10.1109/HiPC.2015.34(285-294)Online publication date: 16-Dec-2015
    • (2013)NoC Modeling and Topology ExplorationDesigning 2D and 3D Network-on-Chip Architectures10.1007/978-1-4614-4274-5_2(19-49)Online publication date: 9-Oct-2013
    • (2012)A lower bound technique for communication on BSP with application to the FFTProceedings of the 18th international conference on Parallel Processing10.1007/978-3-642-32820-6_67(676-687)Online publication date: 27-Aug-2012
    • (2011)Modelling coherence overhead of multi-versioned caches for random accessesInternational Journal of Parallel, Emergent and Distributed Systems10.1080/17445760.2010.48178726:4(291-311)Online publication date: 1-Aug-2011
    • (2010)Algorithm engineeringundefinedOnline publication date: 1-Jan-2010
    • (2008)High Performance Algorithm Engineering for Large-scale ProblemsEncyclopedia of Algorithms10.1007/978-0-387-30162-4_178(387-390)Online publication date: 2008
    • (2006)A survey of research and practices of Network-on-chipACM Computing Surveys10.1145/1132952.113295338:1(1-es)Online publication date: 29-Jun-2006
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media