Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

AIR: Application-Level Interference Resilience for PDES on Multicore Systems

Published: 16 April 2015 Publication History

Abstract

Parallel discrete event simulation (PDES) harnesses parallel processing to improve the performance and capacity of simulation, supporting bigger and more detailed models simulated for more scenarios. The presence of interference from other users can lead to dramatic slowdown in the performance of the simulation. Interference is typically managed using operating system scheduling support (e.g., gang scheduling), a heavyweight approach with some drawbacks. We propose an application-level approach to interference resilience through alternative simulation scheduling and mapping algorithms. More precisely, the most resilient simulators allow dynamic mapping of simulation event execution to processing resources (a work pool model). However, this model has significant scheduling overhead and poor cache locality. Thus, we investigate using application-level interference mitigation where the application detects the presence of interference and reacts by changing the thread task allocation. Specifically, we propose a locality-aware adaptive dynamic mapping (LADM) algorithm that adjusts the number of active threads on the fly by detecting the presence of interference. LADM avoids having the application stall when threads are inactive due to context switching. We investigate different mechanisms for monitoring the level of interference and different approaches for remapping tasks. We show that LADM can substantially reduce the impact of interference while maintaining memory locality.

References

[1]
D. F. Anat, D. G. Feitelson, A. Batat, G. Benhanokh, D. Er-el, Y. Etsion, A. Kavas, T. Klainer, and M. A. Volovic. 1999. The ParPar System: A Software MPP. High Performance Cluster Computing 1 (1999), 754--770.
[2]
G. R. Andrews. 1999. Foundations of Multithreaded, Parallel, and Distributed Programming. Addison-Wesley.
[3]
M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia. 2010. A view of cloud computing. Commun. ACM 53, 4 (April 2010), 50--58.
[4]
R. H. Arpaci, A. C. Dusseau, A. M. Vahdat, L. T. Liu, T. E. Anderson, and D. A. Patterson. 1995. The interaction of parallel and sequential workloads on a network of workstations. SIGMETRICS Perform. Eval. Rev. 23, 1 (May 1995), 267--278.
[5]
K. Bahulkar, J. Wang, N. Abu-Ghazaleh, and D. Ponomarev. 2012. Partitioning on dynamic behavior for parallel discrete event simulation. In Principles of Advanced and Distributed Simulation (PADS). IEEE, 221--230.
[6]
C. Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University.
[7]
R. D. Blumofe and C. E. Leiserson. 1999. Scheduling multithreaded computations by work stealing. J. ACM 46, 5 (Sept. 1999), 720--748.
[8]
C. Carothers, D. Bauer, and S. Pearce. 2000. ROSS: A high-performance, low memory, modular time warp system. In Principles of Advanced and Distributed Simulation (PADS). IEEE, 53--60.
[9]
C. Carothers, K. Perumalla, and R. Fujimoto. 1999. Efficient optimistic parallel simulations using reverse computation. ACM TOMACS (1999).
[10]
C. D. Carothers and R. M. Fujimoto. 2000. Efficient execution of time warp programs on heterogeneous, NOW platforms. IEEE Trans. Parallel Distrib. Syst. 11 (2000), 299--317.
[11]
C. D. Carothers, R. M. Fujimoto, and Y.-B. Lin. 1995. A case study in simulating PCS networks using time warp. In Principles of Advanced and Distributed Simulation (PADS). IEEE, 87--94.
[12]
R. Child and P. Wilsey. 2012. Dynamically adjusting core frequencies to accelerate time warp simulations in many-core processors. In Proc. ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS). IEEE, 35--43.
[13]
P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes. 2010. Cache hierarchy and memory subsystem of the AMD Opteron processor. IEEE Micro 30, 2 (2010), 16--29.
[14]
D. G. Feitelson and L. Rudolph. 1992. Gang scheduling performance benefits for fine-grain synchronization. J. Parallel Distrib. Comput. 16 (1992), 306--318.
[15]
M. Frigo, C. E. Leiserson, and K. H. Randall. 1998. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation. 212--223.
[16]
R. Fujimoto. 1990a. Parallel discrete event simulation. Commun. ACM 33, 10 (Oct. 1990), 30--53.
[17]
R. Fujimoto. 1990b. Performance of time warp under synthetic workloads. Proc. SCS Multiconference on Distributed Simulation 22, 1 (1990), 23--28.
[18]
R. Fujimoto. 2000. Parallel and Distributed Simulation Systems. Wiley Interscience.
[19]
R. Gupta. 1989. The fuzzy barrier: A mechanism for high speed synchronization of processors. In Proc. ASPLOS. 54--63.
[20]
D. Jagtap, K. Bahulkar, D. Ponomarev, and N. Abu-Ghazaleh. 2012a. Characterizing and understanding PDES behavior on Tilera architecture. In Proceedings of the Workshop on Principles of Advanced and Distributed Simulation (PADS’12).
[21]
D. Jagtap, N. Abu-Ghazaleh, and D. Ponomarev. 2012b. Optimization of parallel discrete event simulator for multi-core systems. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’12). IEEE, 520--531.
[22]
D. Jefferson. 1985. Virtual Time. ACM Tran. Program. Lang. Syst. 7, 3 (July 1985), 405--425.
[23]
M. A. Jette, A. B. Yoo, and M. Grondona. 2002. SLURM: Simple Linux utility for resource management. In Proceedings of Job Scheduling Strategies for Parallel Processing (JSSPP’03). Lecture Notes in Computer Science, Springer-Verlag, 44--60.
[24]
M. T. Jones. 2009. Inside the Linux 2.6 Completely Fair Scheduler: Providing Fair Access to CPUs since 2.6.23. Retrieved from http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/.
[25]
R. Koo and S. Toueg. 1987. Checkpointing and rollback-recovery for distributed systems. IEEE Trans. Software Eng. SE-13 (Jan. 1987), 23--31.
[26]
A. W. Malik, A. J. Park, and R. M. Fujimoto. 2009. Optimistic synchronization of parallel simulations in cloud computing environments. In Proceedings of the International Conference on Cloud Computing. 49--56.
[27]
A. Palaniswamy and P. A. Wilsey. 1993. An analytical comparison of periodic checkpointing and incremental state saving. In Proceedings of the 7th Workshop on Parallel and Distributed Simulation (PADS’93). Society for Computer Simulation, 127--134.
[28]
K. H. Shum. 1998. Replicating parallel simulation on heterogeneous clusters. J. Syst. Architecture 44 (1998), 273--292.
[29]
J. Steinman. 2008. The WarpIV Parallel Simulation Kernel version 1.5.2. Retrieved from http://www.warpiv.com/.
[30]
S. C. Tay, Y. M. Teo, and S. T. Kong. 1997. Speculative parallel simulation with an adaptive throttle scheme. In Principles of Advanced and Distributed Simulation (PADS). IEEE, 116--123.
[31]
D. Tsafrir, Y. Etsion, D. Feitelson, and S. Kirkpatrick. 2005. System noise, OS clock ticks, and fine-grained parallel applications. In Proceedings of the ACM/IEEE Conference on Supercomputing. ACM, 303--312.
[32]
S. J. Turner. 1998. Models of computation for parallel discrete event simulation. J. Syst. Architecture (March 1998), 395--409.
[33]
R. Vitali, A. Pellegrini, and F. Quaglia. 2012. Towards symmetric multi-threaded optimistic simulation kernels. In Principles of Advanced and Distributed Simulation (PADS’12). IEEE, 211--220.
[34]
J. Wang, N. Abu-Ghazaleh, and D. Ponomarev. 2013. Interference resilient PDES on multi-core systems: towards proportional slowdown. In Proceedings of the 2013 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (SIGSIM-PADS’13). 115--126.
[35]
J. Wang, D. Jagtap, N. Abu-Ghazaleh, and D. Ponomarev. 2014. Parallel discrete event simulation for multi-core systems: Analysis and optimization. IEEE Trans. Parallel Distrib. Syst. 25, 6 (2014), 1574--1584.
[36]
J. Wang, D. Ponomarev, and N. Abu-Ghazaleh. 2013. Can PDES scale in environments with heterogeneous delays? In Proceedings of the SIGSIM-PADS Conference.
[37]
Y. Wiseman and D. G. Feitelson. 2003. Paired gang scheduling. IEEE Trans. Parallel Distrib. Syst. 14, 6 (2003), 581--592.
[38]
F. Xian, W. Srisa-an, and H. Jiang. 2008. Contention-aware Scheduler: Unlocking Execution Parallelism in Multithreaded Java Programs. In Proceedings of the 23rd ACM SIGPLAN Conference on Object-oriented Programming Systems Languages and Applications. 163--180.
[39]
Srikanth B. Yoginath and Kalyan S. Perumalla. 2013. Optimized Hypervisor Scheduler for Parallel Discrete Event Simulations on Virtual Machine Platforms. In Proceedings of the 6th International ICST Conference on Simulation Tools and Techniques (SimuTools’13). 1--9.
[40]
G. Zheng. 2005. Achieving High Performance on Extremely Large Parallel Machines: Performance Prediction and Load Balancing. Ph.D. Dissertation. Champaign, IL. Advisor(s) Kale, Laxmikant V. AAI3202198.
[41]
S. Zhuravlev, S. Blagodurov, and A. Fedorova. 2010. Addressing shared resource contention in multicore processors via scheduling. In Proceedings of ASPLOS. ACM, 129--142.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Modeling and Computer Simulation
ACM Transactions on Modeling and Computer Simulation  Volume 25, Issue 3
May 2015
146 pages
ISSN:1049-3301
EISSN:1558-1195
DOI:10.1145/2764453
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 April 2015
Accepted: 01 December 2014
Revised: 01 June 2014
Received: 01 February 2014
Published in TOMACS Volume 25, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Interference
  2. PDES
  3. application adaptation
  4. proportional slowdown

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • National Science Foundation
  • Air Force Research Laboratory

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Sampling-based adaptive design strategy for failure probability estimationReliability Engineering & System Safety10.1016/j.ress.2023.109664241(109664)Online publication date: Jan-2024
  • (2019)Cross-state eventsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2019.05.003132:C(48-68)Online publication date: 1-Oct-2019
  • (2018)Optimizing simulation on shared-memory platformsProceedings of the 2018 Winter Simulation Conference10.5555/3320516.3320753(1969-1980)Online publication date: 9-Dec-2018
  • (2018)The Ultimate Share-Everything PDES SystemProceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3200921.3200931(73-84)Online publication date: 14-May-2018

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media