Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Better Process Mapping and Sparse Quadratic Assignment

Published: 30 September 2020 Publication History
  • Get Citation Alerts
  • Abstract

    Communication and topology-aware process mapping is a powerful approach to reduce communication time in parallel applications with known communication patterns on large, distributed memory systems. We address the problem as a quadratic assignment problem (QAP) and present algorithms to construct initial mappings of processes to processors and fast local search algorithms to further improve the mappings. By exploiting assumptions that typically hold for applications and modern supercomputer systems such as sparse communication patterns and hierarchically organized communication systems, we obtain significantly more powerful algorithms for these special QAPs. Our multilevel construction algorithms employ perfectly balanced graph partitioning techniques and exploit the given communication system hierarchy in significant ways. We present improvements to a local search algorithm of Brandfass et al. (2013) and further decrease the running time by reducing the time needed to perform swaps in the assignment as well as by carefully constraining local search neighborhoods. We also investigate different algorithms to create the communication graph that is mapped onto the processor network. Experiments indicate that our algorithms not only dramatically speed up local search but also, due to the multilevel approach, find much better solutions in practice.

    References

    [1]
    A. H. Abdel-Gawad, M. Thottethodi, and A. Bhatele. 2014. RAHTM: Routing algorithm aware hierarchical task mapping. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14). 325--335.
    [2]
    D. A. Bader, H. Meyerhenke, P. Sanders, C. Schulz, A. Kappes, and D. Wagner. 2014. Benchmarking for graph clustering and partitioning. In Encyclopedia of Social Network Analysis and Mining. Springer, 73--82.
    [3]
    M. A. Bender and M. Farach-Colton. 2000. The LCA problem revisited. In Proceedings of the Latin American Symposium on Theoretical Informatics, Lecture Notes in Computer Science, Vol. 1776. Springer, 88--94.
    [4]
    C. Bichot and P. Siarry (Eds.). 2011. Graph Partitioning. Wiley.
    [5]
    B. Brandfass, T. Alrutz, and T. Gerhold. 2013. Rank reordering for MPI communication optimization. Comput. Fluids 80 (2013), 372--380.
    [6]
    A. Buluç, H. Meyerhenke, I. Safro, P. Sanders, and C. Schulz. 2016. Recent advances in graph partitioning. In Algorithm Engineering—Selected Results and Surveys, Lecture Notes in Computer Science, Vol. 9220. 117--158.
    [7]
    R. E Burkard, E. Cela, P. M. Pardalos, and L. S. Pitsoulis. 1998. The quadratic assignment problem. In Handbook of Combinatorial Optimization. Springer, 1713--1809.
    [8]
    Ü. V. Çatalyürek and C. Aykanat. 1996. Decomposing irregularly sparse matrices for parallel matrix-vector multiplication. In Proceedings of the 3rd International Workshop on Parallel Algorithms for Irregularly Structured Problems, Lecture Notes in Computer Science, Vol. 1117. Springer, 75--86.
    [9]
    T. A. Davis and Y. Hu. 2011. The university of florida sparse matrix collection. ACM Trans. Math. Softw. 38, 1 (2011), 1:1--1:25.
    [10]
    D. Delling, P. Sanders, D. Schultes, and D. Wagner. 2009. Engineering route planning algorithms. In Algorithmics of Large and Complex Networks. LNCS State-of-the-Art Survey, Vol. 5515. Springer, 117--139.
    [11]
    C. M. Fiduccia and R. M. Mattheyses. 1982. A linear-time heuristic for improving network partitions. In Proceedings of the 19th Conference on Design Automation. 175--181.
    [12]
    J. Fietz, M. Krause, C. Schulz, P. Sanders, and V. Heuveline. 2012. Optimized hybrid parallel lattice boltzmann fluid flow simulations on complex geometries. In Proceedings of the European Conference on Parallel Processing (Euro-Par’12), Lecture Notes in Computer Science, Vol. 7484. Springer, 818--829.
    [13]
    R. Glantz, H. Meyerhenke, and A. Noe. 2015. Algorithms for mapping parallel processes onto grid and torus architectures. In Proceedings of the 23rd Euromicro Intl. Conference on Parallel, Distributed, and Network-Based Processing. 236--243.
    [14]
    T. Hatazaki. 1998. Rank reordering strategy for MPI topology creation functions. In Proceedings of the 5th European PVM/MPI User’s Group Meeting, Lecture Notes in Computer Science, Vol. 1497. 188--195.
    [15]
    C. H. Heider. 1972. A Computationally Simplified Pair-exchange Algorithm for the Quadratic Assignment Problem. Technical Report. DTIC Document.
    [16]
    T. Hoefler and M. Snir. 2011. Generic topology mapping strategies for large-scale parallel architectures. In Proceedings of the 25th International Conference on Supercomputing (ICS’11). 75--84.
    [17]
    G. Karypis and V. Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 1 (1998), 359--392.
    [18]
    G. Karypis and V. Kumar. 1998. Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48, 1 (1998), 96--129.
    [19]
    G. Mercier and J. Clet-Ortega. 2009. Towards an efficient process placement policy for MPI applications in multicore environments. In Proceedings of the European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting, Lecture Notes in Computer Science, Vol. 5759. Springer, 104--115.
    [20]
    G. Mercier and E. Jeannot. 2011. Improving MPI applications performance on multicore clusters with rank reordering. In Proceedings of the 18th European MPI Users’ Group Meeting, Lecture Notes in Computer Science, Vol. 6960. 39--49.
    [21]
    H. Müller-Merbach. 1970. Optimale Reihenfolgen. Ökonometrie und Unternehmensforschung, Vol. 15. Springer-Verlag.
    [22]
    F. Pellegrini. [n.d.]. Scotch Home Page. Retrieved from http://www. labri.fr/pelegrin/scotch.
    [23]
    S. Sahni and T. F. Gonzalez. 1976. P-complete approximation problems. J. ACM 23, 3 (1976), 555--565.
    [24]
    P. Sanders and C. Schulz. 2011. Engineering multilevel graph partitioning algorithms. In Proceedings of the 19th European Symposium on Algorithms, Lecture Notes in Computer Science, Vol. 6942. Springer, 469--480.
    [25]
    P. Sanders and C. Schulz. 2013. Think locally, act globally: Highly balanced graph partitioning. In Proceedings of the 12th International Symposium on Experimental Algorithms (SEA’13).
    [26]
    K. Schloegel, G. Karypis, and V. Kumar. 2003. Graph partitioning for high performance scientific simulations. In The Sourcebook of Parallel Computing. 491--541.
    [27]
    C. Schulz and D. Strash. 2019. Graph partitioning: Formulations and applications to big data. In Encyclopedia of Big Data Technologies, S. Sakr and A. Y. Zomaya (Eds.). Springer.
    [28]
    A. J. Soper, C. Walshaw, and M. Cross. 2004. A combined evolutionary search and multilevel optimisation approach to graph-partitioning. Global Optim. 29, 2 (2004), 225--241.
    [29]
    R. V. Southwell. 1935. Stress-calculation in frameworks by the method of “systematic relaxation of constraints.”Proc. Roy. Soc. Lond. 151, 872 (1935), 56--95.
    [30]
    J. L. Träff. 2002. Implementing the MPI process topology mechanism. In ACM/IEEE Supercomputing.
    [31]
    J. T. Vogelstein, J. M. Conroy, V. Lyzinski, L. J. Podrazik, S. G. Kratzer, E. T. Harley, D. E. Fishkind, R. J. Vogelstein, and C. E. Priebe. 2015. Fast Approximate Quadratic Programming for Graph Matching. PLoS ONE 10, 4 (2015).
    [32]
    C. Walshaw and M. Cross. 2000. Mesh partitioning: A multilevel balancing and refinement algorithm. SIAM J. Sci. Comput. 22, 1 (2000), 63--80.
    [33]
    H. Yu, I.-H. Chung, and J. E. Moreira. 2006. Topology mapping for blue gene/l supercomputer. In Proceedings of the ACM/IEEE Supercomputing. ACM Press, 116.

    Cited By

    View all
    • (2022)Process mapping on any topology with TopoMatchJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.08.002170(39-52)Online publication date: Dec-2022
    • (2021)An MPI-based Algorithm for Mapping Complex Networks onto Hierarchical ArchitecturesEuro-Par 2021: Parallel Processing10.1007/978-3-030-85665-6_11(167-182)Online publication date: 1-Sep-2021

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Journal of Experimental Algorithmics
    ACM Journal of Experimental Algorithmics  Volume 25, Issue
    Special Issue ALENEX 2018 and Regular Papers
    2020
    313 pages
    ISSN:1084-6654
    EISSN:1084-6654
    DOI:10.1145/3388470
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 September 2020
    Accepted: 01 July 2020
    Revised: 01 February 2020
    Received: 01 July 2019
    Published in JEA Volume 25

    Author Tags

    1. Process mapping
    2. local search
    3. quadratic assigment problem

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Process mapping on any topology with TopoMatchJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.08.002170(39-52)Online publication date: Dec-2022
    • (2021)An MPI-based Algorithm for Mapping Complex Networks onto Hierarchical ArchitecturesEuro-Par 2021: Parallel Processing10.1007/978-3-030-85665-6_11(167-182)Online publication date: 1-Sep-2021

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media