research-article

A Deep Reinforcement Learning Method for Solving Task Mapping Problems with Dynamic Traffic on Parallel Systems

Authors:

I-Hsin ChungAuthors Info & Claims

HPCAsia '21: The International Conference on High Performance Computing in Asia-Pacific Region

Pages 1 - 10

https://doi.org/10.1145/3432261.3432262

Published: 20 January 2021 Publication History

Abstract

Efficient mapping of application communication patterns to the network topology is a critical problem for optimizing the performance of communication bound applications on parallel computing systems. The problem has been extensively studied in the past, but they mostly formulate the problem as finding an isomorphic mapping between two static graphs with edges annotated by traffic volume and network bandwidth. But in practice, the network performance is difficult to be accurately estimated, and communication patterns are often changing over time and not easily obtained. Therefore, this work proposes a deep reinforcement learning (DRL) approach to explore better task mappings by utilizing the performance prediction and runtime communication behaviors provided from a simulator to learn an efficient task mapping algorithm. We extensively evaluated our approach using both synthetic and real applications with varied communication patterns on Torus and Dragonfly networks. Compared with several existing approaches from literature and software library, our proposed approach found task mappings that consistently achieved comparable or better application performance. Especially for a real application, the average improvement of our approach on Torus and Dragonfly networks are 11% and 16%, respectively. In comparison, the average improvements of other approaches are all less than 6%.

References

[1]

Ravichandra Addanki, Shaileshh Bojja Venkatakrishnan, Shreyan Gupta, Hongzi Mao, and Mohammad Alizadeh. 2019. Placeto: Learning Generalizable Device Placement Algorithms for Distributed Machine Learning. arXiv e-prints (June 2019).

[2]

Kadir Akbudak, Enver Kayaaslan, and Cevdet Aykanat. 2013. Hypergraph Partitioning Based Models and Methods for Exploiting Cache Locality in Sparse Matrix-Vector Multiplication. SIAM Journal on Scientific Computing 35, 3 (2013), C237–C262.

Digital Library

[3]

Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, and Samy Bengio. 2017. Neural Combinatorial Optimization with Reinforcement Learning.

[4]

A. Bhatele, N. Jain, K. E. Isaacs, R. Buch, T. Gamblin, S. H. Langer, and L. V. Kale. 2014. Optimizing the performance of parallel applications on a 5D torus via task mapping. In International Conference on High Performance Computing. 1–10.

[5]

A. Bhatele and L. V. Kale. 2011. Heuristic-Based Techniques for Mapping Irregular Communication Graphs to Mesh Topologies. In IEEE International Conference on High Performance Computing and Communications. 765–771.

Digital Library

[6]

Abhinav Bhatelé, Laxmikant V. Kalé, and Sameer Kumar. 2009. Dynamic Topology Aware Load Balancing Algorithms for Molecular Dynamics Applications. In Proceedings of ACM/IEEE Conference on Supercomputing. 110–116.

Digital Library

[7]

A. Bhatelé, G. R. Gupta, L. V. Kalé, and I. Chung. 2010. Automated mapping of regular communication graphs on mesh interconnects. In International Conference on High Performance Computing. 1–10.

[8]

Bokhari. 1981. On the Mapping Problem. IEEE Trans. Comput. C-30, 3 (1981), 207–214.

Digital Library

[9]

S. W. Bollinger and S. F. Midkiff. 1991. Heuristic technique for processor and link assignment in multicomputers. IEEE Trans. Comput. 40, 3 (1991), 325–333.

Digital Library

[10]

Rajkumar Buyya and Manzur Murshed. 2002. GridSim: A Toolkit for the Modeling and Simulation of Distributed Resource Management and Scheduling for Grid Computing. Concurrency and Computation: Practice and Experience 14 (11 2002).

[11]

Rodrigo N. Calheiros, Rajiv Ranjan, Anton Beloglazov, César A. F. De Rose, and Rajkumar Buyya. 2011. CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms. Softw. Pract. Exper. 41, 1 (Jan. 2011), 23–50.

Digital Library

[12]

Henri Casanova, Arnaud Giersch, Arnaud Legrand, Martin Quinson, and Frédéric Suter. 2014. Versatile, Scalable, and Accurate Simulation of Distributed Applications and Platforms. J. Parallel and Distrib. Comput. 74, 10 (June 2014), 2899–2917.

[13]

Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec. 2011), 25 pages.

Digital Library

[14]

A. Degomme, A. Legrand, G. S. Markomanolis, M. Quinson, M. Stillwell, and F. Suter. 2017. Simulating MPI Applications: The SMPI Approach. IEEE Transactions on Parallel and Distributed Systems 28, 8 (2017), 2387–2400.

Digital Library

[15]

M. Deveci, K. Kaya, B. Uçar, and Ü. V. Çatalyürek. 2015. Fast and High Quality Topology-Aware Task Mapping. In 2015 IEEE International Parallel and Distributed Processing Symposium. 197–206. https://doi.org/10.1109/IPDPS.2015.93

Digital Library

[16]

M. Deveci, S. Rajamanickam, V. J. Leung, K. Pedretti, S. L. Olivier, D. P. Bunde, U. V. Çatalyürek, and K. Devine. 2014. Exploiting Geometric Partitioning in Task Mapping for Parallel Computers. In IEEE IPDPS. 27–36.

[17]

Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, Matthias Plappert, Alec Radford, John Schulman, Szymon Sidor, Yuhuai Wu, and Peter Zhokhov. 2017. OpenAI Baselines. https://github.com/openai/baselines.

[18]

Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, and Koray Kavukcuoglu. 2018. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. arXiv e-prints (Feb. 2018).

[19]

Greg Faanes, Abdulla Bataineh, Duncan Roweth, Tom Court, Edwin Froese, Bob Alverson, Tim Johnson, Joe Kopnick, Mike Higgins, and James Reinhard. 2012. Cray Cascade: A Scalable HPC System Based on a Dragonfly Network. In Proceedings of ACM/IEEE Conference on Supercomputing. 9.

Digital Library

[20]

Yuanxiang Gao, Li Chen, and Baochun Li. 2018. Spotlight: Optimizing Device Placement for Training Deep Neural Networks. In Proceedings of International Conference on Machine Learning, Vol. 80. 1676–1684.

[21]

S. Gertphol, Yang Yu, A. Alhusaini, and V. K. Prasanna. 2001. An integer programming approach for static mapping of paths onto heterogeneous real-time systems. In IPDPS. 993–1000.

[22]

Roland Glantz, Henning Meyerhenke, and Alexander Noe. 2014. Algorithms for Mapping Parallel Processes onto Grid and Torus Architectures. (11 2014).

[23]

Torsten Hoefler and Marc Snir. 2011. Generic Topology Mapping Strategies for Large-Scale Parallel Architectures. In Proceedings of ACM/IEEE Conference on Supercomputing. 75–84.

Digital Library

[24]

K. Huang, X. Zhang, D. Zheng, M. Yu, X. Jiang, X. Yan, L. B. de Brisolara, and A. A. Jerraya. 2019. A Scalable and Adaptable ILP-Based Approach for Task Mapping on MPSoC Considering Load Balance and Communication Optimization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38, 9(2019), 1744–1757.

Digital Library

[25]

Hrvoje Jasak. 2009. OpenFOAM: Open source CFD in research and industry. International Journal of Naval Architecture and Ocean Engineering (2009), 89 – 94.

[26]

George Karypis and Vipin Kumar. 1996. Parallel Multilevel K-Way Partitioning Scheme for Irregular Graphs. In Proceedings of ACM/IEEE Conference on Supercomputing. 35–es.

Digital Library

[27]

Bob Lantz, Brandon Heller, and Nick McKeown. 2010. A Network in a Laptop: Rapid Prototyping for Software-Defined Networks. Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks, 19.

Digital Library

[28]

Hongzi Mao, Mohammad Alizadeh, Ishai Menache, and Srikanth Kandula. 2016. Resource Management with Deep Reinforcement Learning. In Proceedings of the 15th ACM Workshop on Hot Topics in Networks. 50–56.

Digital Library

[29]

Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Omkar Pathak, Sungmin Bae, Azade Nazi, Jiwoo Pak, Andy Tong, Kavya Srinivasa, William Hang, Emre Tuncer, Anand Babu, Quoc V. Le, James Laudon, Richard Ho, Roger Carpenter, and Jeff Dean. 2020. Chip Placement with Deep Reinforcement Learning. arXiv e-prints (April 2020).

[30]

Azalia Mirhoseini, Hieu Pham, Quoc Le, Mohammad Norouzi, Samy Bengio, Benoit Steiner, Yuefeng Zhou, Naveen Kumar, Rasmus Larsen, and Jeff Dean. 2017. Device Placement Optimization with Reinforcement Learning. https://arxiv.org/abs/1706.04972

[31]

Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous Methods for Deep Reinforcement Learning. arXiv e-prints (Feb. 2016).

Digital Library

[32]

MohammadReza Nazari, Afshin Oroojlooy, Lawrence Snyder, and Martin Takac. 2018. Reinforcement Learning for Solving the Vehicle Routing Problem. In Proceedings of International Conference on Neural Information Processing Systems. 9839–9849.

[33]

F. Pellegrini. 1994. Static mapping by dual recursive bipartitioning of process architecture graphs. In Proceedings of IEEE Scalable High Performance Computing Conference. 486–493.

[34]

François Pellegrini and Jean Roman. 1996. Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In High-Performance Computing and Networking, Heather Liddell, Adrian Colbrook, Bob Hertzberger, and Peter Sloot (Eds.). Springer, 493–498.

[35]

Steve Plimpton. 1995. Fast Parallel Algorithms for Short-Range Molecular Dynamics. J. Comput. Phys. 117, 1 (1995), 1 – 19.

Digital Library

[36]

Peter Sanders and Christian Schulz. 2013. Think Locally, Act Globally: Highly Balanced Graph Partitioning. In Proceedings of International Symposium on Experimental Algorithms, Vol. 7933. Springer, 164–175.

[37]

Kirk Schloegel, George Karypis, and Vipin Kumar. 2002. Parallel static and dynamic multi-constraint graph partitioning. Concurrency and Computation: Practice and Experience 14 (03 2002), 219–240.

[38]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. arXiv e-prints (July 2017).

[39]

D. Tetzlaff and S. Glesner. 2010. Intelligent Task Mapping Using Machine Learning. In 2010 International Conference on Computational Intelligence and Software Engineering. 1–4.

[40]

Dirk Tetzlaff and Sabine Glesner. 2012. Making MPI Intelligent. Software Engineering (Workshops) P-199, 75 – 88.

[41]

Brendan Vastenhouw and Rob Bisseling. 2002. A Two-Dimensional Data Distribution Method For Parallel Sparse Matrix-Vector Multiplication. SIAM Rev. 47 (06 2002).

[42]

Bernd Waschneck, André Reichstaller, Lenz Belzner, Thomas Altenmüller, Thomas Bauernhansl, Alexander Knapp, and Andreas Kyek. 2018. Optimization of global production scheduling with deep reinforcement learning. Procedia CIRP 72 (01 2018), 1264–1269.

[43]

Christopher John Cornish Hellaby Watkins. 1989. Learning from Delayed Rewards. Ph.D. Dissertation. King’s College, Cambridge, UK. http://www.cs.rhul.ac.uk/~chrisw/new_thesis.pdf

[44]

Ronald J. Williams. 1992. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Mach. Learn. 8, 3–4 (May 1992), 229–256.

Digital Library

Cited By

Volpe GMangini AFanti M(2023)A Deep Reinforcement Learning Approach for Competitive Task Assignment in Enterprise BlockchainIEEE Access10.1109/ACCESS.2023.327685911(48236-48247)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3276859
Chiang MZhang LChou YChou J(2023)Dynamic Resource Management for Machine Learning Pipeline WorkloadsSN Computer Science10.1007/s42979-023-02101-84:5Online publication date: 30-Aug-2023
https://dl.acm.org/doi/10.1007/s42979-023-02101-8
Kwon OKang JLee SKim WSong JKloeckner AMoreira J(2022)Efficient Task-Mapping of Parallel Applications Using a Space-Filling CurveProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569657(384-397)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3559009.3569657

Index Terms

A Deep Reinforcement Learning Method for Solving Task Mapping Problems with Dynamic Traffic on Parallel Systems

Index terms have been assigned to the content through auto-classification.

Recommendations

Hierarchical task mapping for parallel applications on supercomputers

As the scale of supercomputers grows, so does the size of the interconnect network. Topology-aware task mapping, which maps parallel application processes onto processors to reduce communication cost, becomes increasingly important. Previous works ...
Dynamic Task Mapping with Congestion Speculation for Reconfigurable Network-on-Chip

Network-on-Chip (NoC) has been proposed as a promising communication architecture to replace the dedicated interconnections and shared buses for future embedded system platforms. In such a parallel platform, mapping application tasks to the NoC is a key ...
A Majority-Based Reliability-Aware Task Mapping in High-Performance Homogenous NoC Architectures
Special Issue on Autonomous Battery-Free Sensing and Communication, Special Issue on ESWEEK 2016 and Regular Papers

This article presents a new reliability-aware task mapping approach in a many-core platform at design time for applications with DAG-based task graphs. The main goal is to devise a task mapping which meets a predefined reliability threshold considering ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

HPCAsia '21: The International Conference on High Performance Computing in Asia-Pacific Region

January 2021

143 pages

ISBN:9781450388429

DOI:10.1145/3432261

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 January 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

HPC Asia 2021

HPC Asia 2021: The International Conference on High Performance Computing in Asia-Pacific Region

January 20 - 22, 2021

Virtual Event, Republic of Korea

Acceptance Rates

Overall Acceptance Rate 69 of 143 submissions, 48%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
305
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Volpe GMangini AFanti M(2023)A Deep Reinforcement Learning Approach for Competitive Task Assignment in Enterprise BlockchainIEEE Access10.1109/ACCESS.2023.327685911(48236-48247)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3276859
Chiang MZhang LChou YChou J(2023)Dynamic Resource Management for Machine Learning Pipeline WorkloadsSN Computer Science10.1007/s42979-023-02101-84:5Online publication date: 30-Aug-2023
https://dl.acm.org/doi/10.1007/s42979-023-02101-8
Kwon OKang JLee SKim WSong JKloeckner AMoreira J(2022)Efficient Task-Mapping of Parallel Applications Using a Space-Filling CurveProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569657(384-397)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3559009.3569657

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents