Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Compiler support for lightweight context switching

Published: 20 January 2013 Publication History

Abstract

We propose a new language-neutral primitive for the LLVM compiler, which provides efficient context switching and message passing between lightweight threads of control. The primitive, called Swapstack, can be used by any language implementation based on LLVM to build higher-level language structures such as continuations, coroutines, and lightweight threads. As part of adding the primitives to LLVM, we have also added compiler support for passing parameters across context switches. Our modified LLVM compiler produces highly efficient code through a combination of exposing the context switching code to existing compiler optimizations, and adding novel compiler optimizations to further reduce the cost of context switches. To demonstrate the generality and efficiency of our primitives, we add one-shot continuations to C++, and provide a simple fiber library that allows millions of fibers to run on multiple cores, with a work-stealing scheduler and fast inter-fiber sychronization. We argue that compiler-supported lightweight context switching can be significantly faster than using a library to switch between contexts, and provide experimental evidence to support the position.

References

[1]
Abelson, H., Dybvig, R. K., Haynes, C. T., Rozas, G. J., Adams IV, N. I., Friedman, D. P., Kohlbecker, E., Steele, Jr., G. L., Bartley, D. H. Halstead, R., Oxley, D., Sussman, G. J., Brooks, G., Hanson, C., Pitman, K. M., and Wand, M. 1998. Revised report on the algorithmic language scheme. Higher Order Symbol. Comput. 11, 1, 7--105.
[2]
Acar, U., Blelloch, G., and Blumofe, R. 2000. The data locality of work stealing. In Proceedings of the the Symposium on Parallel Algorithms and Architectures. ACM, 1--12.
[3]
Barthelmann, V. 2002. Inter-Task register-register-allocation for static operating systems. In Proceedings of the Conference on Languages, Compilers and Tools for Embedded Systems: Software and Compilers for Embedded Systems (LCTES/SCOPES'02). ACM Press, New York, 149--154.
[4]
Blumofe, R. D. and Leiserson, C. E. 1999. Scheduling multithreaded computations by work stealing. J. ACM 46, 720--748.
[5]
Bruggeman, C., Waddell, O., and Dybvig, R. 1996. Representing control in the presence of one-shot continuations. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM Press, New York, 99--107.
[6]
CLBG. 2011. The computer language benchmarks game. http://shootout.alioth.debian.org/.
[7]
Conway, M. E. 1963. Design of a separable transition-diagram compiler. Comm. ACM 6, 7, 396--408.
[8]
Daniel, C. H. 1995. Introduction to the programming language occam. http://www.eg.bucknell.edu/~cs366/occam.pdf.
[9]
Dinan, J., Larkins, D., Sadayappan, P., Krishnamoorthy, S., and Nieplocha, J. 2009. Scalable work stealing. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. ACM Press, New York, 53.
[10]
Engelschall, R. 2000. Portable multithreading: The signal stack trick for user-space thread creation. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, 20.
[11]
Fober, D., Letz, S., and Orlarey, Y. 2002. Lock-Free techniques for concurrent access to shared objects. Actes des Journes d'Informatique Musicale, Marseille, 143--150.
[12]
Frigo, M., Leiserson, C., and Randall, K. 1998. The implementation of the Cilk-5 multithreaded language. SIGPLAN Not. 33, 5, 212--223.
[13]
Garcia, F. and Fernandez, J. 2000. Posix thread libraries. Linux J. 2000, 70es, 36.
[14]
Google. 2012. TCMalloc: Thread-Caching malloc. http://goog-perftools.sourceforge.net/doc/tcmalloc.html.
[15]
Grunwald, D. and Neves, R. 1996. Whole-Program optimization for time and space efficient threads. SIGOPS Oper. Syst. Rev. 30, 5, 50--59.
[16]
Gustafsson, A. 2005. Threads without the pain. Queue 3, 9, 34--41.
[17]
Hedqvist, P. 1998. A parallel and multithreaded ERLANG implementation. Masters dissertation, Uppsala University, Uppsala, Sweden.
[18]
Herlihy, M., Luchangco, V., and Moir, M. 2003. Obstruction-Free synchronization: Double-Ended queues as an example. In Distrib. Comput. Syst. IEEE, 522--529.
[19]
Jääskelainen, P., Kellomaki, P., Takala, J., Kultala, H., and Lepisto, M. 2008. Reducing context switch overhead with compiler-assisted threading. In Embedded and Ubiquitous Computing, Vol. 2, IEEE, 461--466.
[20]
Kawachiya, K., Koseki, A., and Onodera, T. 2002. Lock reservation: Java locks can mostly do without atomic operations. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. 130--141.
[21]
Kumar, S., Bruggeman, C., and Dybvig, R. K. 1998. Threads yield continuations. LISP Symbol. Comput. 10, 223--236.
[22]
Ladan-Mozes, E. and Shavit, N. 2004. An optimistic approach to lock-free fifo queues. Distrib. Comput. 117--131.
[23]
Lauer, H. and Needham, R. 1979. On the duality of operating system structures. SIGOPS Oper. Syst. Rev. 13, 2, 3--19.
[24]
Li, P., Marlow, S., Peyton Jones, S., and Tolmach, A. 2007. Lightweight concurrency primitives for GHC. In Proceedings of the ACM SIGPLAN Workshop on Haskell. ACM, 107--118.
[25]
Michael, M. and Scott, M. 1996. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the ACM Symposium on Principles of Distributed Computing. ACM, 267--275.
[26]
Matz, M., Hubicka, J., Jaeger, A., and Mitchell, M. 2012. System V application binary interface, AMD64 architecture processor supplement. http://www.x86-64.org/documentation/abi.pdf.
[27]
Nacul, A. C. and Givargis, T. 2005. Lightweight multitasking support for embedded systems using the phantom serializing compiler. In Proceedings of the Conference on Design, Automation and Test in Europe. Vol. 2, IEEE, 742--747.
[28]
Nuth, P. and Dally, W. 1995. The named-state register file: Implementation and performance. In Proceedings of the International Symposium on High Performance Computer Architecture. IEEE, 742--747.
[29]
Onodera, T., Kawachiya, K., and Koseki, A. 2004. Lock reservation for java reconsidered. In Proceedings of the European Conference on Object Oriented Programming (ECOOP'04). 1--22.
[30]
Ousterhout, J. 1996. Why threads are a bad idea(for most purposes). In Usenix Annual Technical Conference.
[31]
Pall, M. 2011. The luajit project. http://luajit.org.
[32]
Russell, K. and Detlefs, D. 2006. Eliminating synchronization-related atomic operations with biased locking and bulk rebiasing. SIGPLAN Not. 41, 10, 263--272.
[33]
Schemenauer, N., Peters, T., and Hetland, M. 2001. Pep 255:Simple generators. http://www.python.org/dev/peps/pep-0255/.
[34]
Sussman, G. J. and Steele, G. L. Jr. 1975. Scheme: An interpreter for extended lambda calculus. http://18.7.29.232/bitstream/handle/1721.1/5794/AIM-349.pdf?sequence=2.
[35]
Tsigas, P. and Zhang, Y. 2001. A simple, fast and scalable non-blocking concurrent fifo queue for shared memory multiprocessor systems. In Proceedings of the ACM Symposium on Parallel Algorithms and Architecture. ACM, 134--143.
[36]
Vasudevan, N., Namjoshi, K., and Edwards, S. 2010. Simple and fast biased locks. In Proceedings of the International Conference on Parallel Architectures and Compliation Techniques. 65--74.
[37]
Von Behren, R., Condit, J., and Brewer, E. 2003a. Why events are a bad idea (for high-concurrency servers). In Conference on Hot Topics in Operating Systems. USENIX Association.
[38]
Von Behren, R., Condit, J., Zhou, F., Necula, G., and Brewer, E. 2003b. Capriccio: Scalable threads for internet services. SIGOPS Oper. Syst. Rev. 37, 5, 268--281.
[39]
Waldspurger, C. A. and Weihl, W. E. 1993. Register relocation: flexible contexts for multithreading. In International Symposium on Computer Architecture. ACM, 120--130.
[40]
Zhou, X. and Petrov, P. 2006. Rapid and low-cost context-switch through embedded processor customization for real-time and control applications. In Proceedings of the Design Automation Conference (DAC'06). ACM Press, New York, 352--357.

Cited By

View all
  • (2023)CPU-free Computing: A Vision with a BlueprintProceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595906(1-14)Online publication date: 22-Jun-2023
  • (2023)Out of Hand for Hardware? Within Reach for Software!Proceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595898(30-37)Online publication date: 22-Jun-2023
  • (2022)A Mixed PS-FCFS Policy for CPU Intensive WorkloadsProceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering10.1145/3489525.3511678(199-210)Online publication date: 9-Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 9, Issue 4
Special Issue on High-Performance Embedded Architectures and Compilers
January 2013
876 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2400682
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 January 2013
Accepted: 01 October 2012
Revised: 01 October 2012
Received: 01 June 2012
Published in TACO Volume 9, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Compiler
  2. continuation
  3. fiber
  4. synchronization

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)244
  • Downloads (Last 6 weeks)30
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)CPU-free Computing: A Vision with a BlueprintProceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595906(1-14)Online publication date: 22-Jun-2023
  • (2023)Out of Hand for Hardware? Within Reach for Software!Proceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595898(30-37)Online publication date: 22-Jun-2023
  • (2022)A Mixed PS-FCFS Policy for CPU Intensive WorkloadsProceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering10.1145/3489525.3511678(199-210)Online publication date: 9-Apr-2022
  • (2022)What You See Is What You Get: Practical Effect Handlers in Capability-Passing StyleErnst Denert Award for Software Engineering 202010.1007/978-3-030-83128-8_3(15-43)Online publication date: 28-Feb-2022
  • (2021)CTXBack: Enabling Low Latency GPU Context Switching via Context Flashback2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00021(121-130)Online publication date: May-2021
  • (2019)Designing a low-level virtual machine for implementing real-time managed languagesProceedings of the 11th ACM SIGPLAN International Workshop on Virtual Machines and Intermediate Languages10.1145/3358504.3361226(1-11)Online publication date: 22-Oct-2019
  • (2018)Compiling with Continuations and LLVMElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.285.5285(131-142)Online publication date: 31-Dec-2018
  • (2018)Hop, Skip, & JumpACM SIGPLAN Notices10.1145/3296975.318641253:3(1-16)Online publication date: 25-Mar-2018
  • (2018)Hop, Skip, & JumpProceedings of the 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3186411.3186412(1-16)Online publication date: 25-Mar-2018

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media