Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Performance Analysis of Work Stealing Strategies in Large-Scale Multithreaded Computing

Published: 26 October 2023 Publication History

Abstract

Distributed systems use randomized work stealing to improve performance and resource utilization. In most prior analytical studies of randomized work stealing, jobs are considered to be sequential and are executed as a whole on a single server. In this article, we consider a homogeneous system of servers where parent jobs spawn child jobs that can feasibly be executed in parallel. When an idle server probes a busy server in an attempt to steal work, it may either steal a parent job or multiple child jobs.
To approximate the performance of this system, we introduce a Quasi-Birth-Death Markov chain and express the performance measures of interest via its unique steady state. We perform simulation experiments that suggest that the approximation error tends to zero as the number of servers in the system becomes large. To further support this observation, we introduce a mean field model and show that its unique fixed point corresponds to the steady state of the Quasi-Birth-Death Markov chain. Using numerical experiments, we compare the performance of various simple stealing strategies as well as optimized strategies.

References

[1]
Dario A. Bini, Beatrice Meini, Sergio Steffé, and Benny Van Houdt. 2006. Structured Markov chains solver: Software tools. In Proceeding of the 2006 Workshop on Tools for Solving Structured Markov Chains. 1–14.
[2]
M. Bladt and B. F. Nielsen. 2017. Matrix-Exponential Distributions in Applied Probability. Vol. 81. Springer.
[3]
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1996. Cilk: An efficient multithreaded runtime system. Journal of Parallel and Distributed Computing 37, 1 (1996), 55–69.
[4]
Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling multithreaded computations by work stealing. Journal of the ACM 46, 5 (1999), 720–748.
[5]
Maury Bramson, Yi Lu, and Balaji Prabhakar. 2010. Randomized load balancing with general service time distributions. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’10). 275–286. DOI:
[6]
Derek L. Eager, Edward D. Lazowska, and John Zahorjan. 1986. A comparison of receiver-initiated and sender-initiated adaptive load sharing. Performance Evaluation 6, 1 (1986), 53–68.
[7]
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN 1998 Conference on Program Language Design and Implementation (PLDI’98). 212–223.
[8]
Nicolas Gast. 2017. Expected values estimated via mean-field approximation are 1/N-accurate. Proceedings of the ACM on Measurement and Analysis of Computing Systems 1, 1 (2017), Article 17, 26 pages.
[9]
Nicolas Gast and Bruno Gaujal. 2010. A mean field model of work stealing in large-scale systems. ACM SIGMETRICS Performance Evaluation Review 38, 1 (2010), 13–24.
[10]
Thierry Gautier, Xavier Besseron, and Laurent Pigeon. 2007. KAAPI: A thread scheduling runtime system for data flow computations on cluster of multi-processors. In Proceedings of the 2007 International Workshop on Parallel Symbolic Computation. 15–23.
[11]
Grzegorz Kielanski and Benny Van Houdt. 2021. Performance analysis of work stealing strategies in large scale multi-threaded computing. In Quantitative Evaluation of Systems. Springer International Publishing, Cham, Switzerland, 329–348.
[12]
J. Kriege and P. Buchholz. 2014. PH and MAP Fitting with Aggregated Traffic Traces. Springer International Publishing, Cham, Switzerland, 1–15. DOI:
[13]
Guy Latouche and V. Ramaswami. 1999. Introduction to Matrix Analytic Methods in Stochastic Modeling. Vol. 5. SIAM.
[14]
Doug Lea. 2000. A Java fork/join framework. In Proceedings of the ACM 2000 Conference on Java Grande (JAVA’00). ACM, New York, NY, 36–43.
[15]
Daan Leijen, Wolfram Schulte, and Sebastian Burckhardt. 2009. The design of a task parallel library. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA’09). ACM, New York, NY, 227–242.
[16]
Wouter Minnebo, Tim Hellemans, and Benny Van Houdt. 2017. On a class of push and pull strategies with single migrations and limited probe rate. Performance Evaluation 113 (2017), 42–67.
[17]
Wouter Minnebo and Benny Van Houdt. 2014. A fair comparison of pull and push strategies in large distributed networks. IEEE/ACM Transactions on Networking 22, 3 (2014), 996–1006.
[18]
Ravi Mirchandaney, Don Towsley, and John A. Stankovic. 1990. Adaptive load sharing in heterogeneous distributed systems. Journal of Parallel and Distributed Computing 9, 4 (1990), 331–346.
[19]
Marcel F. Neuts. 1981. Matrix-Geometric Solutions in Stochastic Models: An Algorithmic Approach. John Hopkins University Press, Baltimore, MD.
[20]
A. Panchenko and A. Thümmler. 2007. Efficient phase-type fitting with aggregated traffic traces. Performance Evaluation 64, 7-8 (Aug.2007), 629–645. DOI:
[21]
Arch Robison, Michael Voss, and Alexey Kukanov. 2008. Optimization via reflection on work stealing in TBB. In Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing. IEEE, Los Alamitos, CA, 1–8.
[22]
Nikki Sonenberg, Grzegorz Kielanski, and Benny Van Houdt. 2021. Performance analysis of work stealing in large-scale multithreaded computing. ACM Transactions on Modeling and Performance Evaluation of Computing Systems 6, 2 (Sept. 2021), Article 6, 28 pages. DOI:
[23]
Ignace Van Spilbeeck and Benny Van Houdt. 2015. Performance of rate-based pull and push strategies in heterogeneous networks. Performance Evaluation 91 (2015), 2–15.
[24]
Mark S. Squillante and Randolph D. Nelson. 1991. Analysis of task migration in shared-memory multiprocessor scheduling. ACM SIGMETRICS Performance Evaluation Review 19, 1 (1991), 143–155.
[25]
Benny Van Houdt. 2019. Randomized work stealing versus sharing in large-scale systems with non-exponential job sizes. IEEE/ACM Transactions on Networking 27, 5 (2019), 2137–2149.
[26]
Niklaus Wirth. 1996. Tasks versus threads: An alternative multiprocessing paradigm. Softw. Concepts Tools 17 (1996), 6–12.

Cited By

View all
  • (2023)Introduction to the Special Issue on QEST 2021ACM Transactions on Modeling and Computer Simulation10.1145/363170733:4(1-2)Online publication date: 30-Nov-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Modeling and Computer Simulation
ACM Transactions on Modeling and Computer Simulation  Volume 33, Issue 4
October 2023
175 pages
ISSN:1049-3301
EISSN:1558-1195
DOI:10.1145/3630105
  • Editor:
  • Wentong Cai
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2023
Online AM: 16 February 2023
Accepted: 07 February 2023
Revised: 17 August 2022
Received: 19 January 2022
Published in TOMACS Volume 33, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Matrix analytic methods
  2. distributed computing

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)99
  • Downloads (Last 6 weeks)13
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Introduction to the Special Issue on QEST 2021ACM Transactions on Modeling and Computer Simulation10.1145/363170733:4(1-2)Online publication date: 30-Nov-2023

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media