Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3432261.3432262acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcasiaConference Proceedingsconference-collections
research-article

A Deep Reinforcement Learning Method for Solving Task Mapping Problems with Dynamic Traffic on Parallel Systems

Published: 20 January 2021 Publication History

Abstract

Efficient mapping of application communication patterns to the network topology is a critical problem for optimizing the performance of communication bound applications on parallel computing systems. The problem has been extensively studied in the past, but they mostly formulate the problem as finding an isomorphic mapping between two static graphs with edges annotated by traffic volume and network bandwidth. But in practice, the network performance is difficult to be accurately estimated, and communication patterns are often changing over time and not easily obtained. Therefore, this work proposes a deep reinforcement learning (DRL) approach to explore better task mappings by utilizing the performance prediction and runtime communication behaviors provided from a simulator to learn an efficient task mapping algorithm. We extensively evaluated our approach using both synthetic and real applications with varied communication patterns on Torus and Dragonfly networks. Compared with several existing approaches from literature and software library, our proposed approach found task mappings that consistently achieved comparable or better application performance. Especially for a real application, the average improvement of our approach on Torus and Dragonfly networks are 11% and 16%, respectively. In comparison, the average improvements of other approaches are all less than 6%.

References

[1]
Ravichandra Addanki, Shaileshh Bojja Venkatakrishnan, Shreyan Gupta, Hongzi Mao, and Mohammad Alizadeh. 2019. Placeto: Learning Generalizable Device Placement Algorithms for Distributed Machine Learning. arXiv e-prints (June 2019).
[2]
Kadir Akbudak, Enver Kayaaslan, and Cevdet Aykanat. 2013. Hypergraph Partitioning Based Models and Methods for Exploiting Cache Locality in Sparse Matrix-Vector Multiplication. SIAM Journal on Scientific Computing 35, 3 (2013), C237–C262.
[3]
Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, and Samy Bengio. 2017. Neural Combinatorial Optimization with Reinforcement Learning.
[4]
A. Bhatele, N. Jain, K. E. Isaacs, R. Buch, T. Gamblin, S. H. Langer, and L. V. Kale. 2014. Optimizing the performance of parallel applications on a 5D torus via task mapping. In International Conference on High Performance Computing. 1–10.
[5]
A. Bhatele and L. V. Kale. 2011. Heuristic-Based Techniques for Mapping Irregular Communication Graphs to Mesh Topologies. In IEEE International Conference on High Performance Computing and Communications. 765–771.
[6]
Abhinav Bhatelé, Laxmikant V. Kalé, and Sameer Kumar. 2009. Dynamic Topology Aware Load Balancing Algorithms for Molecular Dynamics Applications. In Proceedings of ACM/IEEE Conference on Supercomputing. 110–116.
[7]
A. Bhatelé, G. R. Gupta, L. V. Kalé, and I. Chung. 2010. Automated mapping of regular communication graphs on mesh interconnects. In International Conference on High Performance Computing. 1–10.
[8]
Bokhari. 1981. On the Mapping Problem. IEEE Trans. Comput. C-30, 3 (1981), 207–214.
[9]
S. W. Bollinger and S. F. Midkiff. 1991. Heuristic technique for processor and link assignment in multicomputers. IEEE Trans. Comput. 40, 3 (1991), 325–333.
[10]
Rajkumar Buyya and Manzur Murshed. 2002. GridSim: A Toolkit for the Modeling and Simulation of Distributed Resource Management and Scheduling for Grid Computing. Concurrency and Computation: Practice and Experience 14 (11 2002).
[11]
Rodrigo N. Calheiros, Rajiv Ranjan, Anton Beloglazov, César A. F. De Rose, and Rajkumar Buyya. 2011. CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms. Softw. Pract. Exper. 41, 1 (Jan. 2011), 23–50.
[12]
Henri Casanova, Arnaud Giersch, Arnaud Legrand, Martin Quinson, and Frédéric Suter. 2014. Versatile, Scalable, and Accurate Simulation of Distributed Applications and Platforms. J. Parallel and Distrib. Comput. 74, 10 (June 2014), 2899–2917.
[13]
Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec. 2011), 25 pages.
[14]
A. Degomme, A. Legrand, G. S. Markomanolis, M. Quinson, M. Stillwell, and F. Suter. 2017. Simulating MPI Applications: The SMPI Approach. IEEE Transactions on Parallel and Distributed Systems 28, 8 (2017), 2387–2400.
[15]
M. Deveci, K. Kaya, B. Uçar, and Ü. V. Çatalyürek. 2015. Fast and High Quality Topology-Aware Task Mapping. In 2015 IEEE International Parallel and Distributed Processing Symposium. 197–206. https://doi.org/10.1109/IPDPS.2015.93
[16]
M. Deveci, S. Rajamanickam, V. J. Leung, K. Pedretti, S. L. Olivier, D. P. Bunde, U. V. Çatalyürek, and K. Devine. 2014. Exploiting Geometric Partitioning in Task Mapping for Parallel Computers. In IEEE IPDPS. 27–36.
[17]
Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, Matthias Plappert, Alec Radford, John Schulman, Szymon Sidor, Yuhuai Wu, and Peter Zhokhov. 2017. OpenAI Baselines. https://github.com/openai/baselines.
[18]
Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, and Koray Kavukcuoglu. 2018. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. arXiv e-prints (Feb. 2018).
[19]
Greg Faanes, Abdulla Bataineh, Duncan Roweth, Tom Court, Edwin Froese, Bob Alverson, Tim Johnson, Joe Kopnick, Mike Higgins, and James Reinhard. 2012. Cray Cascade: A Scalable HPC System Based on a Dragonfly Network. In Proceedings of ACM/IEEE Conference on Supercomputing. 9.
[20]
Yuanxiang Gao, Li Chen, and Baochun Li. 2018. Spotlight: Optimizing Device Placement for Training Deep Neural Networks. In Proceedings of International Conference on Machine Learning, Vol. 80. 1676–1684.
[21]
S. Gertphol, Yang Yu, A. Alhusaini, and V. K. Prasanna. 2001. An integer programming approach for static mapping of paths onto heterogeneous real-time systems. In IPDPS. 993–1000.
[22]
Roland Glantz, Henning Meyerhenke, and Alexander Noe. 2014. Algorithms for Mapping Parallel Processes onto Grid and Torus Architectures. (11 2014).
[23]
Torsten Hoefler and Marc Snir. 2011. Generic Topology Mapping Strategies for Large-Scale Parallel Architectures. In Proceedings of ACM/IEEE Conference on Supercomputing. 75–84.
[24]
K. Huang, X. Zhang, D. Zheng, M. Yu, X. Jiang, X. Yan, L. B. de Brisolara, and A. A. Jerraya. 2019. A Scalable and Adaptable ILP-Based Approach for Task Mapping on MPSoC Considering Load Balance and Communication Optimization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38, 9(2019), 1744–1757.
[25]
Hrvoje Jasak. 2009. OpenFOAM: Open source CFD in research and industry. International Journal of Naval Architecture and Ocean Engineering (2009), 89 – 94.
[26]
George Karypis and Vipin Kumar. 1996. Parallel Multilevel K-Way Partitioning Scheme for Irregular Graphs. In Proceedings of ACM/IEEE Conference on Supercomputing. 35–es.
[27]
Bob Lantz, Brandon Heller, and Nick McKeown. 2010. A Network in a Laptop: Rapid Prototyping for Software-Defined Networks. Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks, 19.
[28]
Hongzi Mao, Mohammad Alizadeh, Ishai Menache, and Srikanth Kandula. 2016. Resource Management with Deep Reinforcement Learning. In Proceedings of the 15th ACM Workshop on Hot Topics in Networks. 50–56.
[29]
Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Omkar Pathak, Sungmin Bae, Azade Nazi, Jiwoo Pak, Andy Tong, Kavya Srinivasa, William Hang, Emre Tuncer, Anand Babu, Quoc V. Le, James Laudon, Richard Ho, Roger Carpenter, and Jeff Dean. 2020. Chip Placement with Deep Reinforcement Learning. arXiv e-prints (April 2020).
[30]
Azalia Mirhoseini, Hieu Pham, Quoc Le, Mohammad Norouzi, Samy Bengio, Benoit Steiner, Yuefeng Zhou, Naveen Kumar, Rasmus Larsen, and Jeff Dean. 2017. Device Placement Optimization with Reinforcement Learning. https://arxiv.org/abs/1706.04972
[31]
Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous Methods for Deep Reinforcement Learning. arXiv e-prints (Feb. 2016).
[32]
MohammadReza Nazari, Afshin Oroojlooy, Lawrence Snyder, and Martin Takac. 2018. Reinforcement Learning for Solving the Vehicle Routing Problem. In Proceedings of International Conference on Neural Information Processing Systems. 9839–9849.
[33]
F. Pellegrini. 1994. Static mapping by dual recursive bipartitioning of process architecture graphs. In Proceedings of IEEE Scalable High Performance Computing Conference. 486–493.
[34]
François Pellegrini and Jean Roman. 1996. Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In High-Performance Computing and Networking, Heather Liddell, Adrian Colbrook, Bob Hertzberger, and Peter Sloot (Eds.). Springer, 493–498.
[35]
Steve Plimpton. 1995. Fast Parallel Algorithms for Short-Range Molecular Dynamics. J. Comput. Phys. 117, 1 (1995), 1 – 19.
[36]
Peter Sanders and Christian Schulz. 2013. Think Locally, Act Globally: Highly Balanced Graph Partitioning. In Proceedings of International Symposium on Experimental Algorithms, Vol. 7933. Springer, 164–175.
[37]
Kirk Schloegel, George Karypis, and Vipin Kumar. 2002. Parallel static and dynamic multi-constraint graph partitioning. Concurrency and Computation: Practice and Experience 14 (03 2002), 219–240.
[38]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. arXiv e-prints (July 2017).
[39]
D. Tetzlaff and S. Glesner. 2010. Intelligent Task Mapping Using Machine Learning. In 2010 International Conference on Computational Intelligence and Software Engineering. 1–4.
[40]
Dirk Tetzlaff and Sabine Glesner. 2012. Making MPI Intelligent. Software Engineering (Workshops) P-199, 75 – 88.
[41]
Brendan Vastenhouw and Rob Bisseling. 2002. A Two-Dimensional Data Distribution Method For Parallel Sparse Matrix-Vector Multiplication. SIAM Rev. 47 (06 2002).
[42]
Bernd Waschneck, André Reichstaller, Lenz Belzner, Thomas Altenmüller, Thomas Bauernhansl, Alexander Knapp, and Andreas Kyek. 2018. Optimization of global production scheduling with deep reinforcement learning. Procedia CIRP 72 (01 2018), 1264–1269.
[43]
Christopher John Cornish Hellaby Watkins. 1989. Learning from Delayed Rewards. Ph.D. Dissertation. King’s College, Cambridge, UK. http://www.cs.rhul.ac.uk/~chrisw/new_thesis.pdf
[44]
Ronald J. Williams. 1992. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Mach. Learn. 8, 3–4 (May 1992), 229–256.

Cited By

View all
  • (2023)A Deep Reinforcement Learning Approach for Competitive Task Assignment in Enterprise BlockchainIEEE Access10.1109/ACCESS.2023.327685911(48236-48247)Online publication date: 2023
  • (2023)Dynamic Resource Management for Machine Learning Pipeline WorkloadsSN Computer Science10.1007/s42979-023-02101-84:5Online publication date: 30-Aug-2023
  • (2022)Efficient Task-Mapping of Parallel Applications Using a Space-Filling CurveProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569657(384-397)Online publication date: 8-Oct-2022

Index Terms

  1. A Deep Reinforcement Learning Method for Solving Task Mapping Problems with Dynamic Traffic on Parallel Systems
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Other conferences
          HPCAsia '21: The International Conference on High Performance Computing in Asia-Pacific Region
          January 2021
          143 pages
          ISBN:9781450388429
          DOI:10.1145/3432261
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 20 January 2021

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Algorithm
          2. Deep Learning
          3. Parallel applications
          4. Task mapping

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          HPC Asia 2021

          Acceptance Rates

          Overall Acceptance Rate 69 of 143 submissions, 48%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)17
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 13 Jan 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2023)A Deep Reinforcement Learning Approach for Competitive Task Assignment in Enterprise BlockchainIEEE Access10.1109/ACCESS.2023.327685911(48236-48247)Online publication date: 2023
          • (2023)Dynamic Resource Management for Machine Learning Pipeline WorkloadsSN Computer Science10.1007/s42979-023-02101-84:5Online publication date: 30-Aug-2023
          • (2022)Efficient Task-Mapping of Parallel Applications Using a Space-Filling CurveProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569657(384-397)Online publication date: 8-Oct-2022

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media