article

Free access

Whole-program optimization for time and space efficient threads

Authors:

Richard NevesAuthors Info & Claims

ACM SIGOPS Operating Systems Review, Volume 30, Issue 5

Pages 50 - 59

https://doi.org/10.1145/248208.237149

Published: 01 September 1996 Publication History

Abstract

Modern languages and operating systems often encourage programmers to use threads, or independent control streams, to mask the overhead of some operations and simplify program structure. Multitasking operating systems use threads to mask communication latency, either with hardwares devices or users. Client-server applications typically use threads to simplify the complex control-flow that arises when multiple clients are used. Recently, the scientific computing community has started using threads to mask network communication latency in massively parallel architectures, allowing computation and communication to be overlapped. Lastly, some architectures implement threads in hardware, using those threads to tolerate memory latency.In general, it would be desirable if threaded programs could be written to expose the largest degree of parallelism possible, or to simplify the program design. However, threads incur time and space overheads, and programmers often compromise simple designs for performance. In this paper, we show how to reduce time and space thread overhead using control flow and register liveness information inferred after compilation. Our techniques work on binaries, are not specific to a particular compiler or thread library and reduce the the overall execution time of fine-grain threaded programs by ≈ 15-30%. We use execution-driven analysis and an instrumented operating system to show why the execution time is reduced and to indicate areas for future work.

References

[1]

Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers Principles, Techniques, and Tools. Addison-Wesley Publishing Company, Reading, Massachusetts, 1986.

Digital Library

[2]

R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. PortenSield, and B. Smith. The Tera computer system. In international Conference on Supercomputing, pages 1-6, June 1990.

Digital Library

[3]

Thomas E. Anderson, Henry M. Levy, Brian N. Bershad, and Edward D. Lazowska. The interaction of architecture and operating system design. In Proceedings of the 18th International Symposium on Computing Architecture, 199 I.

Digital Library

[4]

Andrew W. Appel and Zhong Shao. An empirical and analytic study of stack vs. heap cost for languages with closures. Technical Report CS-TR-450-94, Princeton Universi .ty, March 1994.

[5]

M.E. Benitez and Jack W. Davidson. A portable global optimizer and linker. In Proceedings of the A CM SIGPLAN '88 Conference on Programming Language Design and Implementation, pages 329-338, Atlanta, GA, June 1988.

Digital Library

[6]

D.E. Culler, A. Sah, K.E. Schauser, T. yon Eicken, and J. Wawrzynek. Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine. In Proceedings of the 4th Symposium on Architectural Support for Programming Languages and Operating Systems, pages 164-175, Santa Clara, CA, April 1991.

Digital Library

[7]

Seth Goldstein, Klaus Schauser, and David Culler. Languages, Compilers and Run-Time Systems for Scalable Computers, chapter 12, pages 153-168. Kluwer Academic Publishers, 1996.

[8]

Dirk Gmnwald. Heaps o' stacks: Time and space efficient threads without operating system support. Technical Report CU-CS-750-94, University of Colorado, November }{ 994.

[9]

Robert Heib, R. Kent Dybvig, and Carl Bmggeman. Representing control in the presenceof first-class continuations. In Proceedings of the ACM .SIGPLAN '90 Conference on Programming Language Design and Implementation, pages 66-77, June 1990.

Digital Library

[10]

Robert Henry, Allan Porterfield, and Burton Smith. Tera compiler overview. (discussion), January 1994.

[11]

Richard K. Johnsson and John D. Wick. An overview of the Mesa processor architecture. In Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems, pages 20-29, Palo Alto, Califomia, 1982.

Digital Library

[12]

R. Kessler and M. Hill. Page placement algorithms for large direct-mapped real-index caches. A CM Transactions on Computer Systems, 10(4):338-359, November 1992.

Digital Library

[13]

Butler W. Lampson. Fast procedure calls. In Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems, pages 66-76, Palo Alto, Califomia, 1982.

Digital Library

[14]

Kelvin D. Nilsen and William J. Schmidt. A high-performance hardware assisted real-time garbage collection system. In Proceedings of the 5th Symposium on Architectural Support for Programming Languages and Operating Systems, Oct 1994.

Digital Library

[15]

John Plevyak, Vijay Karamcheti, Xingbin Zhang, and Andrew A. Chien. A hybrid execution model for finegrained languages on distributed memory multicomputers. In Proceedings SuperComputing '95, San Diego, California, November 1995.

Digital Library

[16]

atsa Santhanam and Daryl Odnert. Register allocation across procedure and module boundaries. In Proceedings of the ACM SIGPLAN '90 Conference on Programming Language Design and Implementation, pages 28-39, June 1990.

Digital Library

[17]

Zhong Shao and Andrew W. Appel. Space-efficient closure representations. In Proceedings of the 1994 ACM Conference on LiSP and Functional Programming, pages 150-161. ACM, June 1994.

Digital Library

[18]

Jasweinder Pal Singh, Wolf-Dietrich Weber, and Anoop Gupta. Splash: Stanford parallel applications for sharedmemory. Technical Report CSL-TR-91-469, Computer Systems Laboratory, Stanford, 1991.

Digital Library

[19]

Amitabh Srivastava and Alan Eustace. Atom: A system for building customized program analysis tools. In Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation. ACM, ACM, 1994.

Digital Library

[20]

Amitabh Srivastava and David W. Wall. A practical system for intermodule code optimizations at link-time. Journal of Programming Languages, March 1992. (Also available as DEC-WRL TR-92-6).

[21]

P.A. Steenkiste and J.L. Hennessy. A simple interprocedural register allocation algorithm and its effectiveness for Lisp. ACM Transactions on Programming Languages and Systems, 11 (1):1-32, January 1989.

Digital Library

[22]

David W. Wall. Global register allocation at link time. In Proceedings of the ACM SIGPLAN '86 Conference on Programming Language Design and Implementation, pages 264-275. ACM, 1986.

Digital Library

[23]

David W. Wall and Michael L. Powell. The mahler experience: Using an intermediate language as the machine description. In Proceedings of the 2nd Symposium on Architectural Support for Programming Languages and Operating Systems, pages 100-104. ACM, October 1987.

Cited By

Dolan SMuralidharan SGregg D(2013)Compiler support for lightweight context switchingACM Transactions on Architecture and Code Optimization10.1145/2400682.24006959:4(1-25)Online publication date: 20-Jan-2013
https://dl.acm.org/doi/10.1145/2400682.2400695
Dietrich CLohmann D(2018)Semi-Extended Tasks: Efficient Stack Sharing Among Blocking Threads2018 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS.2018.00049(338-349)Online publication date: Dec-2018
https://doi.org/10.1109/RTSS.2018.00049
Yang XCooprider NRegehr J(2009)Eliminating the call stack to save RAMACM SIGPLAN Notices10.1145/1543136.154246144:7(60-69)Online publication date: 19-Jun-2009
https://dl.acm.org/doi/10.1145/1543136.1542461
Show More Cited By

Index Terms

Whole-program optimization for time and space efficient threads

Recommendations

Whole-program optimization for time and space efficient threads
ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems

Modern languages and operating systems often encourage programmers to use threads, or independent control streams, to mask the overhead of some operations and simplify program structure. Multitasking operating systems use threads to mask communication ...
Whole-program optimization for time and space efficient threads

Modern languages and operating systems often encourage programmers to use threads, or independent control streams, to mask the overhead of some operations and simplify program structure. Multitasking operating systems use threads to mask communication ...
Efficient runahead threads
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques

Runahead Threads (RaT) is a promising solution that enables a thread to speculatively run ahead and prefetch data instead of stalling for a long-latency load in a simultaneous multithreading processor. With this capability, RaT can reduces resource ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review

ACM SIGOPS Operating Systems Review Volume 30, Issue 5

Dec. 1996

273 pages

ISSN:0163-5980

DOI:10.1145/248208

Chairmen:
Bill Dally
Massachusetts Institute of Technology
,
Susan Eggers
Univ. of Washington, Seattle

Issue’s Table of Contents

ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
October 1996
290 pages
ISBN:0897917677
DOI:10.1145/237090
Chairmen:
Bill Dally
Massachusetts Institute of Technology
,
Susan Eggets
Univ. of Washington, Seattle

Copyright © 1996 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 1996

Published in SIGOPS Volume 30, Issue 5

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
681
Total Downloads

Downloads (Last 12 months)88
Downloads (Last 6 weeks)27

Reflects downloads up to 13 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Dolan SMuralidharan SGregg D(2013)Compiler support for lightweight context switchingACM Transactions on Architecture and Code Optimization10.1145/2400682.24006959:4(1-25)Online publication date: 20-Jan-2013
https://dl.acm.org/doi/10.1145/2400682.2400695
Dietrich CLohmann D(2018)Semi-Extended Tasks: Efficient Stack Sharing Among Blocking Threads2018 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS.2018.00049(338-349)Online publication date: Dec-2018
https://doi.org/10.1109/RTSS.2018.00049
Yang XCooprider NRegehr J(2009)Eliminating the call stack to save RAMACM SIGPLAN Notices10.1145/1543136.154246144:7(60-69)Online publication date: 19-Jun-2009
https://dl.acm.org/doi/10.1145/1543136.1542461
Yang XCooprider NRegehr JKirsch CKandemir M(2009)Eliminating the call stack to save RAMProceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems10.1145/1542452.1542461(60-69)Online publication date: 19-Jun-2009
https://dl.acm.org/doi/10.1145/1542452.1542461
Middha BSimpson MBarua R(2008)MTSSACM Transactions on Embedded Computing Systems10.1145/1376804.13768147:4(1-37)Online publication date: 1-Aug-2008
https://dl.acm.org/doi/10.1145/1376804.1376814
Cooprider NRegehr J(2007)Offline compression for on-chip ramACM SIGPLAN Notices10.1145/1273442.125077642:6(363-372)Online publication date: 10-Jun-2007
https://dl.acm.org/doi/10.1145/1273442.1250776
Cooprider NRegehr JFerrante JMcKinley K(2007)Offline compression for on-chip ramProceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/1250734.1250776(363-372)Online publication date: 15-Jun-2007
https://dl.acm.org/doi/10.1145/1250734.1250776
Middha BSimpson MBarua RConte TFaraboschi PMangione-Smith BNajjar W(2005)MTSSProceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1086297.1086323(191-201)Online publication date: 24-Sep-2005
https://dl.acm.org/doi/10.1145/1086297.1086323
Zhuang XPande S(2004)Balancing register allocation across threads for a multithreaded network processorACM SIGPLAN Notices10.1145/996893.99687639:6(289-300)Online publication date: 9-Jun-2004
https://dl.acm.org/doi/10.1145/996893.996876
Zhuang XPande SPugh WChambers C(2004)Balancing register allocation across threads for a multithreaded network processorProceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation10.1145/996841.996876(289-300)Online publication date: 9-Jun-2004
https://dl.acm.org/doi/10.1145/996841.996876
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents