short-paper

Follow the Leader: Alternating CPU/GPU Computations in PDES

Authors:

Romolo Marotta,

Alessandro Pellegrini,

Philipp AndelfingerAuthors Info & Claims

SIGSIM-PADS '24: Proceedings of the 38th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

Pages 47 - 51

https://doi.org/10.1145/3615979.3656056

Published: 24 June 2024 Publication History

Abstract

Despite the successes of graphics processing units (GPUs) in accelerating simulations in several research fields, their use is largely restricted to domain-specific workloads that consistently offer the large degree of inherent parallelism and computational intensity at which GPUs excel. When targeting generic discrete-event simulations, whose dynamics can vary wildly over time, a static choice between a GPU-based and traditional CPU-based execution is likely to be suboptimal. Here, we explore a parallel discrete-event (PDES) execution scheme for CPU-GPU platforms that aims to approximate an optimal dynamic device choice. Starting from an intermediate model state, a current “leader” device running the simulation is periodically challenged by a brief concurrent run on another device starting from an intermediate model state. Based on the gathered performance measurements, a forecasting scheme determines the leader for the next period. The execution time and power consumption of this scheme hinge on 1) an efficient mechanism for providing the “follower” device with a consistent model state, and 2) robust performance forecasting to justify the device choices. We present these building blocks, their implementation combining the existing CPU and GPU simulators ROOT-Sim and GPUTW, and measurement results demonstrating substantially reduced execution time without increasing energy consumption over a static device choice.

References

[1]

Philipp Andelfinger and Hannes Hartenstein. 2014. Exploiting the parallelism of large-scale application-layer networks by adaptive GPU-based simulation. In Proceedings of the Winter Simulation Conference 2014. IEEE, Piscataway, NJ, USA, 3471–3482.

[2]

Philipp Andelfinger, Jens Mittag, and Hannes Hartenstein. 2011. GPU-based architectures and their benefit for accurate and efficient wireless network simulations. In 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems. IEEE, Piscataway, NJ, USA, 421–424.

Digital Library

[3]

Scott Bai and David M Nicol. 2008. Gpu coprocessing for wireless network simulation. In Symposium on Application Accelerators in High Performance Computing(SAAHPC’08). IEEE Computer Society, Washington, DC, USA, 3 pages.

[4]

Miguel Castro, Rodrigo Rodrigues, and Barbara Liskov. 2003. BASE: Using abstraction to improve fault tolerance. ACM transactions on computer systems 21, 3 (Aug. 2003), 236–269. https://doi.org/10.1145/859716.859718

Digital Library

[5]

John F Croix and Sunil P Khatri. 2009. Introduction to GPU programming for EDA. In Proceedings of the 2009 International Conference on Computer-Aided Design(ICCAD ’09). ACM, New York, NY, USA, 276–280.

Digital Library

[6]

Roland Ewald, Jan Himmelspach, and Adelinde M Uhrmacher. 2008. An algorithm selection approach for simulation systems. In 2008 22nd Workshop on Principles of Advanced and Distributed Simulation. IEEE, Piscataway, NJ, USA, 91–98.

Digital Library

[7]

Andreas K Fidjeland, Etienne B Roesch, Murray P Shanahan, and Wayne Luk. 2009. NeMo: a platform for neural modelling of spiking neurons using GPUs. In 2009 20th IEEE international conference on application-specific systems, architectures and processors. IEEE, Piscataway, NJ, USA, 137–144.

Digital Library

[8]

Richard M Fujimoto. 1990. Performance of Time Warp Under Synthetic Workloads. In Distributed Simulation(PADS ’90), David Nicol (Ed.). Society for Computer Simulation International, San Diego, CA, USA, 23–28.

[9]

Everette S Gardner Jr and ED McKenzie. 1985. Forecasting trends in time series. Management science 31, 10 (1985), 1237–1246.

[10]

Matthew Garrett. 2007. Powering Down: Smart power management is all about doing more with the resources we have.Queue 5, 7 (2007), 16–21.

Digital Library

[11]

Charles C. Holt. 2004. Forecasting seasonals and trends by exponentially weighted moving averages. International Journal of Forecasting 20, 1 (2004), 5–10. https://doi.org/10.1016/j.ijforecast.2003.09.015

[12]

Maria Hybinette and Richard M Fujimoto. 2001. Cloning parallel simulations. ACM transactions on modeling and computer simulation: a publication of the Association for Computing Machinery 11, 4 (Oct. 2001), 378–407. https://doi.org/10.1145/508366.508370

Digital Library

[13]

David Jefferson, Brian Beckman, Frederick Wieland, Leo Blume, and Mike DiLoreto. 1987. Time warp operating system. In Proceedings of the eleventh ACM Symposium on Operating systems principles. ACM, New York, NY, USA, 77–93.

Digital Library

[14]

Till Köster, Nicola M Drüeke, and Adelinde M Uhrmacher. 2018. Latency optimized execution of sequential simulators by parallel parameter optimization. In Proceedings of the 2018 Winter Simulation Conference. IEEE, Piscataway, NJ, USA, 4230–4231.

Digital Library

[15]

Xinhu Liu and Philipp Andelfinger. 2017. Time Warp on the GPU: Design and assessment. In Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation(SIGSIM PADS ’17). ACM, New York, NY, USA, 109–120. https://doi.org/10.1145/3064911.3064912

Digital Library

[16]

Unai Lopez-Novoa, Alexander Mendiburu, and Jose Miguel-Alonso. 2014. A survey of performance modeling and simulation techniques for accelerator-based computing. IEEE Transactions on Parallel and Distributed Systems 26, 1 (2014), 272–281.

[17]

Jayram Moorkanikara Nageswaran, Nikil Dutt, Jeffrey L Krichmar, Alex Nicolau, and Alex Veidenbaum. 2009. Efficient simulation of large-scale spiking neural networks using CUDA graphics processors. In 2009 International Joint Conference on Neural Networks. IEEE, Piscataway, NJ, USA, 2145–2152.

Digital Library

[18]

Cristobal A Navarro, Nancy Hitschfeld-Kahler, and Luis Mateu. 2014. A survey on parallel computing and its applications in data-parallel problems using GPU architectures. Communications in Computational Physics 15, 2 (2014), 285–329.

[19]

John A Nelder and Roger Mead. 1965. A simplex method for function minimization. The computer journal 7, 4 (1965), 308–313.

[20]

Quang Anh Pham Nguyen, Philipp Andelfinger, Wen Jun Tan, Wentong Cai, and Alois Knoll. 2021. Transitioning spiking neural network simulators to heterogeneous hardware. ACM Transactions on Modeling and Computer Simulation (TOMACS) 31, 2 (2021), 1–26.

Digital Library

[21]

David M Nicol. 1993. The cost of conservative synchronization in parallel discrete event simulations. Journal of the ACM (JACM) 40, 2 (1993), 304–333.

Digital Library

[22]

NVIDIA. 2024. CUDA, release: 10.3. https://developer.nvidia.com/cuda-toolkit

[23]

Hyungwook Park and Paul A Fishwick. 2010. A GPU-based application framework supporting fast discrete-event simulation. Simulation 86, 10 (2010), 613–628.

Digital Library

[24]

Alessandro Pellegrini, Roberto Vitali, and Francesco Quaglia. 2012. The ROme OpTimistic Simulator: Core Internals and Programming Model. In Proceedings of the 4th International ICST Conference on Simulation Tools and Techniques(SIMUTOOLS). ICST, Brussels, Belgium, 96–98. https://doi.org/10.4108/icst.simutools.2011.245551

[25]

Alessandro Pellegrini, Roberto Vitali, and Francesco Quaglia. 2015. Autonomic State Management for Optimistic Simulation Platforms. IEEE Transactions on Parallel and Distributed Systems 26 (2015), 1560–1569. https://doi.org/10.1109/TPDS.2014.2323967

Digital Library

[26]

Kalyan S Perumalla. 2006. Discrete-event execution alternatives on general purpose graphical processing units (GPGPUs). In 20th Workshop on Principles of Advanced and Distributed Simulation (PADS’06). IEEE, Piscataway, NJ, USA, 74–81.

Digital Library

[27]

Francesco Quaglia. 2007. Software Diversity-Based Active Replication as an Approach for Enhancing the Performance of Advanced Simulation Systems. Int. J. Found. Comput. Sci. 18, 3 (2007), 495–515. https://doi.org/10.1142/S0129054107004802

[28]

Roberto Vitali, Alessandro Pellegrini, and Francesco Quaglia. 2012. A load-sharing architecture for high performance optimistic simulations on multi-core machines. In Proceedings of the 19th International Conference on High Performance Computing(HiPC ’12). IEEE, Piscataway, NJ, USA, 1–10. https://doi.org/10.1109/hipc.2012.6507510

[29]

Jiajian Xiao, Philipp Andelfinger, Wentong Cai, Paul Richmond, Alois Knoll, and David Eckhoff. 2020. OpenABLext: An automatic code generation framework for agent-based simulations on CPU-GPU-FPGA heterogeneous platforms. Concurrency and Computation: Practice and Experience 32, 21 (2020), e5807.

[30]

Jiajian Xiao, Philipp Andelfinger, David Eckhoff, Wentong Cai, and Alois Knoll. 2019. A survey on agent-based simulation using hardware accelerators. ACM Computing Surveys (CSUR) 51, 6 (2019), 1–35.

Digital Library

[31]

Aiqi Zhu, Qi Chang, Ji Xu, and Wei Ge. 2023. A dynamic load balancing algorithm for CFD–DEM simulation with CPU–GPU heterogeneous computing. Powder Technology 428 (2023), 118782.

Cited By

Marotta RPellegrini ARossetti MHunter S(2024)Model-Driven Engineering for High-Performance Parallel Discrete Event Simulations on Heterogeneous ArchitecturesProceedings of the Winter Simulation Conference10.5555/3712729.3712913(2202-2213)Online publication date: 15-Dec-2024
https://dl.acm.org/doi/10.5555/3712729.3712913
Inverso O(2024)Reproducibility Report for the Paper: Follow the Leader: Alternating CPU/GPU Computations in PDESProceedings of the 38th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3615979.3665109(135-140)Online publication date: 24-Jun-2024
https://dl.acm.org/doi/10.1145/3615979.3665109
Marotta RPellegrini A(2024)Model-Driven Engineering for High-Performance Parallel Discrete Event Simulations on Heterogeneous Architectures2024 Winter Simulation Conference (WSC)10.1109/WSC63780.2024.10838978(2202-2213)Online publication date: 15-Dec-2024
https://doi.org/10.1109/WSC63780.2024.10838978

Index Terms

Follow the Leader: Alternating CPU/GPU Computations in PDES
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
2. Computing methodologies
  1. Modeling and simulation
    1. Simulation types and techniques
      1. Discrete-event simulation
      2. Massively parallel and high-performance simulations

Recommendations

Time Warp on the GPU: Design and Assessment
SIGSIM-PADS '17: Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

The parallel execution of discrete-event simulations on commodity GPUs has been shown to achieve high event rates. Most previous proposals have focused on conservative synchronization, which typically extracts only limited parallelism in cases of low ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing
CLUSTER '10: Proceedings of the 2010 IEEE International Conference on Cluster Computing

In this paper, we describe our experiment developing an implementation of the Linpack benchmark for TianHe-1, a petascale CPU/GPU supercomputer system, the largest GPU-accelerated system ever attempted before. An adaptive optimization framework is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGSIM-PADS '24: Proceedings of the 38th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

June 2024

155 pages

ISBN:9798400703638

DOI:10.1145/3615979

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSIM: ACM Special Interest Group on Simulation and Modeling

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Funding Sources

Ministero dell'Università e della Ricerca

Conference

SIGSIM-PADS '24

Sponsor:

SIGSIM

SIGSIM-PADS '24: 38th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

June 24 - 26, 2024

GA, Atlanta, USA

Acceptance Rates

Overall Acceptance Rate 398 of 779 submissions, 51%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
34
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)3

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Marotta RPellegrini ARossetti MHunter S(2024)Model-Driven Engineering for High-Performance Parallel Discrete Event Simulations on Heterogeneous ArchitecturesProceedings of the Winter Simulation Conference10.5555/3712729.3712913(2202-2213)Online publication date: 15-Dec-2024
https://dl.acm.org/doi/10.5555/3712729.3712913
Inverso O(2024)Reproducibility Report for the Paper: Follow the Leader: Alternating CPU/GPU Computations in PDESProceedings of the 38th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3615979.3665109(135-140)Online publication date: 24-Jun-2024
https://dl.acm.org/doi/10.1145/3615979.3665109
Marotta RPellegrini A(2024)Model-Driven Engineering for High-Performance Parallel Discrete Event Simulations on Heterogeneous Architectures2024 Winter Simulation Conference (WSC)10.1109/WSC63780.2024.10838978(2202-2213)Online publication date: 15-Dec-2024
https://doi.org/10.1109/WSC63780.2024.10838978

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten