Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Spatial/Temporal Locality-Based Load-sharing in Speculative Discrete Event Simulation on Multi-core Machines

Published: 25 November 2024 Publication History

Abstract

Shared-memory multi-processor/multi-core machines have become a reference for many application contexts. In particular, the recent literature on speculative parallel discrete event simulation has reshuffled the architectural organization of simulation systems in order to deeply exploit the main features of this type of machine. A core aspect dealt with has been the full sharing of the workload at the level of individual simulation events, which enables keeping the rollback incidence minimal. However, making each worker thread continuously switch its execution between events destined to different simulation objects does not favor locality. This problem appears even more evident in the case of Non-Uniform-Memory-Access (NUMA) machines, in which memory accesses generating a cache miss to be served by a far NUMA node give rise to both higher latency and higher traffic at the level of the NUMA interconnection. In this article, we propose a workload-sharing algorithm in which the worker threads can have short-term binding with specific simulation objects to favor spatial locality. The new bindings—carried out when a thread decides to switch its execution to other simulation objects—are based on both (a) the timeline according to which the object states have passed through the caching hierarchy and (b) the (dynamic) placement of objects within the NUMA architecture. At the same time, our solution still enables the worker threads to focus their activities on the events to be processed whose timestamps are closer to the simulation commit horizon—hence, we exploit temporal locality along virtual time and keep the rollback incidence minimal. In our design, we exploit lock-free constructs to support scalable thread synchronization while accessing the shared event pool. Furthermore, we exploit a multi-view approach of the event pool content, which additionally favors local accesses to the parts of the event pool that are currently relevant for the thread activity. Our solution has been released as an integration within the USE (Ultimate-Share-Everything) open-source speculative simulation platform available to the community. Furthermore, in this article we report the results of an experimental study that shows the effectiveness of our proposal.

References

[1]
Francesco Antonacci, Alessandro Pellegrini, and Francesco Quaglia. 2013. Consistent and efficient output-streams management in optimistic simulation platforms. In SIGSIM Principles of Advanced Discrete Simulation, SIGSIM-PADS’13, Montreal, QC, Canada, May 19–22, 2013, Margaret L. Loper and Gabriel A. Wainer (Eds.). ACM, New York, NY, USA, 315–326. DOI:
[2]
Rassul Ayani and Hassan Rajaei. 1994. Parallel simulation based on conservative time windows: A performance study. Concurr. Pract. Exp. 6, 2 (1994), 119–142. DOI:
[3]
Christopher D. Carothers and Richard Fujimoto. 2000. Efficient execution of time warp programs on heterogeneous, NOW platforms. IEEE Trans. Parallel Distributed Syst. 11, 3 (2000), 299–317. DOI:
[4]
Christopher D. Carothers, Kalyan S. Perumalla, and Richard Fujimoto. 1999. Efficient optimistic parallel simulations using reverse computation. ACM Trans. Model. Comput. Simul. 9, 3 (1999), 224–253. DOI:
[5]
Christopher D. Carothers, Kalyan S. Perumalla, and Richard M. Fujimoto. 1999. The effect of state-saving in optimistic simulation on a cache-coherent non-uniform memory access architecture. In Proceedings of the 31st Conference on Winter Simulation: Simulation—a Bridge to the Future - Volume 2 (Phoenix, Arizona, USA) (WSC’99). ACM, New York, NY, USA, 1624–1633. DOI:
[6]
Li-li Chen, Ya-shuai Lu, Yi-ping Yao, Shao-liang Peng, and Ling-da Wu. 2011. A well-balanced time warp system on multi-core environments. In Proceedings of the 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation (PADS’11). IEEE, USA, 1–9. DOI:
[7]
Davide Cingolani, Alessandro Pellegrini, and Francesco Quaglia. 2017. Transparently mixing undo logs and software reversibility for state recovery in optimistic PDES. ACM Trans. Model. Comput. Simul. 27, 2 (2017), 11:1–11:26. DOI:
[8]
George Dantzig. 1957. Discrete-variable extremum problems. Operational Research 5 (1957).
[9]
Phillip M. Dickens, David M. Nicol, Paul F. Reynolds Jr., and John Mark Duva. 1993. The impact of adding aggressiveness to a non-aggressive windowing protocol. In Proceedings of the 25th Winter Simulation Conference, Los Angeles, California, USA, December 12–15, 1993, Gerald W. Evans, Mansooreh Mollaghasemi, Edward C. Russell, and William E. Biles (Eds.). ACM Press, New York, NY, USA, 731–739. DOI:
[10]
Phillip M. Dickens, David M. Nicol, Paul F. Reynolds Jr., and John Mark Duva. 1996. Analysis of bounded time warp and comparison with YAWNS. ACM Trans. Model. Comput. Simul. 6, 4 (1996), 297–320. DOI:
[11]
Tom Dickman, Sounak Gupta, and Philip A. Wilsey. 2013. Event pool structures for PDES on many-core Beowulf clusters. In Proceedings of the 2013 ACM/SIGSIM Conference on Principles of Advanced Discrete Simulation. ACM, New York, NY, USA, 103–114. DOI:
[12]
Shlomi Dolev, Danny Hendler, and Adi Suissa. 2008. CAR-STM: Scheduling-based collision avoidance and resolution for software transactional memory. In Proceedings of the 27th Annual ACM Symposium on Principles of Distributed Computing, PODC 2008, Toronto, Canada, August 18–21, 2008, Rida A. Bazzi and Boaz Patt-Shamir (Eds.). ACM, New York, NY, USA, 125–134. DOI:
[13]
Steve Franks, Fabian Gomes, Brian W. Unger, and John G. Cleary. 1997. State saving for interactive optimistic simulation. In Proceedings of the 11th Workshop on Parallel and Distributed Simulation, PADS’97, Lockenhaus, Austria, June 10–13, 1997, Alois Ferscha, Rassul Ayani, and Carl Tropper (Eds.)., USA, 72–79. DOI:
[14]
Yaosheng Fu. 2017. Architectural Support for Large-scale Shared Memory Systems. Ph.D. Dissertation. Princeton University. https://dataspace.princeton.edu/handle/88435/dsp01pv63g290r
[15]
Richard Fujimoto. 1989. Time warp on a shared memory multiprocessor. In Proceedings of the International Conference on Parallel Processing, ICPP’89, The Pennsylvania State University, University Park, PA, USA, August 1989. Volume 3: Algorithms and Applications. Pennsylvania State University Press, USA, 242–249.
[16]
Richard M. Fujimoto. 1990. Performance of time warp under synthetic workloads. In Proceedings of the Multiconference on Distributed Simulation. Society for Computer Simulation, USA, 23–28.
[17]
Richard M. Fujimoto and Maria Hybinette. 1997. Computing global virtual time in shared-memory multiprocessors. ACM Transactions on Modeling and Computer Simulation 7, 4 (1997), 425–446. DOI:
[18]
Richard M. Fujimoto and Kiran S. Panesar. 1995. Buffer management in shared-memory time warp systems. In Proceedings of the 9th Workshop on Parallel and Distributed Simulation (Lake Placid, New York, USA) (PADS’95). IEEE, USA, 149–156. DOI:
[19]
David W. Glazer and Carl Tropper. 1993. On process migration and load balancing in time warp. IEEE Trans. Parallel Distributed Syst. 4, 3 (1993), 318–327. DOI:
[20]
Sounak Gupta and Philip A. Wilsey. 2014. Lock-free pending event set management in time warp. In Proceedings of the 2nd ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (Denver, Colorado, USA) (SIGSIM PADS’14). ACM, New York, NY, USA, 15–26. DOI:
[21]
Joshua Hay and Philip A. Wilsey. 2015. Experiments with hardware-based transactional memory in parallel simulation. In Proceedings of the 2015 ACM/SIGSIM Conference on Principles of Advanced Discrete Simulation (PADS). ACM, New York, NY, USA, 75–86. DOI:
[22]
Yuxiong He, Charles E. Leiserson, and William M. Leiserson. 2010. The Cilkview scalability analyzer. In Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures (Thira, Santorini, Greece) (SPAA’10). ACM, New York, NY, USA, 145–156. DOI:
[23]
Mauro Ianni, Romolo Marotta, Davide Cingolani, Alessandro Pellegrini, and Francesco Quaglia. 2018. The ultimate share-everything PDES system. In Proceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (Rome, Italy) (SIGSIM-PADS’18). ACM, New York, NY, USA, 73–84. DOI:
[24]
David R. Jefferson. 1985. Virtual time. ACM Trans. Program. Lang. Syst. 7, 3 (1985), 404–425. DOI:
[25]
Sunil Kandukuri and Stephen Boyd. 2002. Optimal power control in interference-limited fading wireless channels with outage-probability specifications. IEEE Transactions on Wireless Communications 1, 1 (2002), 46–55.
[26]
Jonatan Lindén and Bengt Jonsson. 2013. A skiplist-based concurrent priority queue with minimal memory contention. In Proceedings of the 17th International Conference on Principles of Distributed Systems - Volume 8304 (Nice, France) (OPODIS 2013). Springer, Berlin, 206–220. DOI:
[27]
Romolo Marotta, Mauro Ianni, Alessandro Pellegrini, and Francesco Quaglia. 2017. A conflict-resilient lock-free calendar queue for scalable share-everything PDES platforms. In Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (Singapore, Republic of Singapore) (SIGSIM-PADS’17). ACM, New York, NY, USA, 15–26. DOI:
[28]
John M. Mellor-Crummey and Michael L. Scott. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9, 1 (1991), 21–65. DOI:
[29]
Cristina Montañola-Sales, Joan Francesc Gilabert-Navarro, Josep Casanovas-Garcia, Clara Prats, Daniel López, Joaquim Valls, Pere-Joan Cardona, and Cristina Vilaplana. 2015. Modeling tuberculosis in Barcelona: A solution to speed-up agent-based simulations. In Proceedings of the 2015 Winter Simulation Conference, Huntington Beach, CA, USA, December 6–9, 2015. IEEE/ACM, USA, 1295–1306. DOI:
[30]
Federica Montesano, Romolo Marotta, and Francesco Quaglia. 2022. Spatial/temporal locality-based load-sharing in speculative discrete event simulation on multi-core machines. In SIGSIM-PADS’22: SIGSIM Conference on Principles of Advanced Discrete Simulation, Atlanta, GA, USA, June 8–10, 2022, Kalyan Perumalla, Margaret Loper, Dong (Kevin) Jin, and Christopher D. Carothers (Eds.). ACM, New York, NY, USA, 81–92. DOI:
[31]
Federica Montesano, Romolo Marotta, and Francesco Quaglia. 2023. Artifact of “Spatial/Temporal Locality-based Load-sharing in Speculative Discrete Event Simulation on Multi-core Machines” (TOMACS). DOI:
[32]
Avinash C. Palaniswamy and Philip A. Wilsey. 1993. Adaptive bounded time windows in an optimistically synchronized simulator. In 3rd Great Lakes Symposium on Design Automation of High Performance VLSI Systems, Kalamazoo, MI, USA, March 5–6, 1993. IEEE, USA, 114–118. DOI:
[33]
Alessandro Pellegrini and Francesco Quaglia. 2014. Wait-free global virtual time computation in shared memory time-warp systems. In Proceedings of the 26th International Conference on Computer Architecture and High Performance Computing (SBAC-PAD)., USA, 9–16. DOI:
[34]
Alessandro Pellegrini and Francesco Quaglia. 2015. NUMA time warp. In Proceedings of the 3rd ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (London, UK) (SIGSIM PADS’15). ACM, New York, NY, USA, 59–70. DOI:
[35]
Alessandro Pellegrini and Francesco Quaglia. 2019. Cross-state events: A new approach to parallel discrete event simulation and its speculative runtime support. J. Parallel and Distrib. Comput. 132 (2019), 48–68. DOI:
[36]
Bruno R. Preiss, Wayne M. Loucks, and Ian D. MacIntyre. 1994. Effects of the checkpoint interval on time and space in time warp. ACM Trans. Model. Comput. Simul. 4, 3 (1994), 223–253. DOI:
[37]
Francesco Quaglia. 1998. Event history based sparse state saving in time warp. In Proceedings of the 12th Workshop on Parallel and Distributed Simulation, PADS’98, Banff, Alberta, Canada, May 26–29, 1998, Brian W. Unger and Alois Ferscha (Eds.)., USA, 72–79. DOI:
[38]
Francesco Quaglia. 2015. A low-overhead constant-time lowest-timestamp-first CPU scheduler for high-performance optimistic simulation platforms. Simulation Modelling Practice and Theory 53 (2015), 103–122.
[39]
Francesco Quaglia and Vittorio Cortellessa. 2002. On the processor scheduling problem in time warp synchronization. ACM Trans. Model. Comput. Simul. 12, 3 (2002), 143–175. DOI:
[40]
Maryan Rab, Romolo Marotta, Mauro Ianni, Alessandro Pellegrini, and Francesco Quaglia. 2021. NUMA-aware non-blocking calendar queue. In Proceedings of the IEEE/ACM 24th International Symposium on Distributed Simulation and Real Time Applications (Prague, Czech Republic) (DS-RT’20). IEEE Press, USA, 59–67.
[41]
Dhananjai Madhava Rao and Julius Higiro. 2019. Managing pending events in sequential and parallel simulations using three-tier heap and two-tier ladder queue. ACM Trans. Model. Comput. Simul. 29, 2 (2019), 9:1–9:28. DOI:
[42]
Daniel A. Reed and Jack J. Dongarra. 2015. Exascale computing and big data. Commun. ACM 58, 7 (2015), 56–68. DOI:
[43]
Peter L Reiher and David Jefferson. 1990. Virtual time based dynamic load management in the time warp operating system. Transactions of the Society for Computer Simulation 7, 2 (1990), 91–120.
[44]
Robert Rönngren, Michael Liljenstam, Rassul Ayani, and Johan Montagnat. 1996. Transparent incremental state saving in time warp parallel discrete event simulation. In Proceedings of the 10th Workshop on Parallel and Distributed Simulation, PADS’96, Philadelphia, PA, USA, May 22–24, 1996, Wayne M. Loucks and Bruno R. Preiss (Eds.)., USA, 70–77. DOI:
[45]
Caitlin J. Ross, Christopher D. Carothers, Misbah Mubarak, Robert B. Ross, Jianping Kelvin Li, and Kwan-Liu Ma. 2018. Leveraging shared memory in the ROSS time warp simulator for complex network simulations. In 2018 Winter Simulation Conference, WSC 2018, Gothenburg, Sweden, December 9–12, 2018, Björn Johansson and Sanjay Jain (Eds.). IEEE, USA, 3837–3848. DOI:
[46]
Markus Schordan, Tomas Oppelstrup, David R. Jefferson, and Peter D. Barnes Jr.2018. Generation of reversible C++ code for optimistic parallel discrete event simulation. New Gener. Comput. 36, 3 (2018), 257–280. DOI:
[47]
Emiliano Silvestri, Alessandro Pellegrini, Pierangelo di Sanzo, and Francesco Quaglia. 2022. Effective runtime management of tasks and priorities in GNU OpenMP applications. IEEE Trans. Computers 71, 10 (2022), 2632–2645. DOI:
[48]
Sudhir Srinivasan and Paul F. Reynolds Jr.1998. Elastic time. ACM Trans. Model. Comput. Simul. 8, 2 (1998), 103–139. DOI:
[49]
Brian Paul Swenson and George F. Riley. 2012. A new approach to zero-copy message passing with reversible memory allocation in multi-core architectures. In Proceedings of the 2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation (PADS’12). IEEE, USA, 44–52. DOI:
[50]
Wenjie Tang, Yiping Yao, Tianlin Li, Xiao Song, and Feng Zhu. 2020. An adaptive persistence and work-stealing combined algorithm for load balancing on parallel discrete event simulation. ACM Trans. Model. Comput. Simul. 30, 2 (2020), 12:1–12:26. DOI:
[51]
W. T. Tang, R. S. M. Goh, and I. L. Thng. 2005. Ladder queue: An O(1) priority queue structure for large-scale discrete event simulation. ACM Transactions on Modeling and Computer Simulation 15, 3 (2005), 175–204.
[52]
R. Kent Treiber. 1986. Systems Programming: Coping with Parallelism. Technical Report. IBM, Inc., Thomas J. Watson Research.
[53]
Roberto Vitali, Alessandro Pellegrini, and Gionata Cerasuolo. 2012. Cache-aware memory manager for optimistic simulations. In International ICST Conference on Simulation Tools and Techniques, SIMUTOOLS’12, Sirmione-Desenzano, Italy, March 19–23, 2012, George F. Riley, Francesco Quaglia, and Jan Himmelspach (Eds.). ICST/ACM, Brussels, BEL, 129–138. DOI:
[54]
Roberto Vitali, Alessandro Pellegrini, and Francesco Quaglia. 2012. Load sharing for optimistic parallel simulations on multi core machines. ACM SIGMETRICS Performance Evaluation Review 40, 3 (Jan2012), 2–11. DOI:
[55]
Roberto Vitali, Alessandro Pellegrini, and Francesco Quaglia. 2012. Towards symmetric multi-threaded optimistic simulation kernels. In 26th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation, PADS 2012, Zhangjiajie, China, July 15–19, 2012., USA, 211–220. DOI:
[56]
Jingjing Wang, Deepak Jagtap, Nael B. Abu-Ghazaleh, and Dmitry Ponomarev. 2014. Parallel discrete event simulation for multi-core systems: Analysis and optimization. IEEE Transactions on Parallel and Distributed Systems 25, 6 (2014), 1574–1584. DOI:
[57]
Zhonge Xiao, Fabian Gomes, Brian W. Unger, and John G. Cleary. 1995. A fast asynchronous GVT algorithm for shared memory multiprocessor architectures. In Proceedings of the 9th Workshop on Parallel and Distributed Simulation, PADS’95, Lake Placid, New York, USA, June 14–16, 1995., USA, 203–208. DOI:

Cited By

View all
  • (2024)Introduction to the Special Issue on PADS 2022ACM Transactions on Modeling and Computer Simulation10.1145/369827335:1(1-3)Online publication date: 25-Nov-2024
  • (2024)Full-Stack Revision of Memory and Data Management in PDES on Multi-Core MachinesProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658831(417-420)Online publication date: 3-Jun-2024
  • (2024)Towards the Optimization of Memory and Data Management in Speculative PDES on Multi-Core MachinesProceedings of the 38th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3615979.3662150(63-64)Online publication date: 24-Jun-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Modeling and Computer Simulation
ACM Transactions on Modeling and Computer Simulation  Volume 35, Issue 1
January 2025
170 pages
EISSN:1558-1195
DOI:10.1145/3696814
  • Editor:
  • Wentong Cai
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 November 2024
Online AM: 08 January 2024
Accepted: 07 December 2023
Revised: 25 November 2023
Received: 21 November 2022
Published in TOMACS Volume 35, Issue 1

Check for updates

Badges

Author Tags

  1. PDES
  2. load sharing
  3. shared memory
  4. multi-core

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)208
  • Downloads (Last 6 weeks)44
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Introduction to the Special Issue on PADS 2022ACM Transactions on Modeling and Computer Simulation10.1145/369827335:1(1-3)Online publication date: 25-Nov-2024
  • (2024)Full-Stack Revision of Memory and Data Management in PDES on Multi-Core MachinesProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658831(417-420)Online publication date: 3-Jun-2024
  • (2024)Towards the Optimization of Memory and Data Management in Speculative PDES on Multi-Core MachinesProceedings of the 38th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3615979.3662150(63-64)Online publication date: 24-Jun-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media