Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Dynamo: a transparent dynamic optimization system

Published: 01 May 2000 Publication History

Abstract

We describe the design and implementation of Dynamo, a software dynamic optimization system that is capable of transparently improving the performance of a native instruction stream as it executes on the processor. The input native instruction stream to Dynamo can be dynamically generated (by a JIT for example), or it can come from the execution of a statically compiled native binary. This paper evaluates the Dynamo system in the latter, more challenging situation, in order to emphasize the limits, rather than the potential, of the system. Our experiments demonstrate that even statically optimized native binaries can be accelerated Dynamo, and often by a significant degree. For example, the average performance of -O optimized SpecInt95 benchmark binaries created by the HP product C compiler is improved to a level comparable to their -O4 optimized version running without Dynamo. Dynamo achieves this by focusing its efforts on optimization opportunities that tend to manifest only at runtime, and hence opportunities that might be difficult for a static compiler to exploit. Dynamo's operation is transparent in the sense that it does not depend on any user annotations or binary instrumentation, and does not require multiple runs, or any special compiler, operating system or hardware support. The Dynamo prototype presented here is a realistic implementation running on an HP PA-8000 workstation under the HPUX 10.20 operating system.

References

[1]
Auslander, J., Philipose, M., Chambers, C., Eggers, S.J., and Bershad, B.N. 1996. Fast, effective dynamic compilation. In Proceedings of the SIGPLAN'96 Conference on Programming Language Design and Implementation (PLDI'96).
[2]
Bala, V., Duesterwald, E., and Banerjia, S. 1999. Transparent dynamic optimization: The design and implementation of Dynamo. Hewlett Packard Laboratories Technical Report HPL-1999-78. June 1999.
[3]
Bala V., and Freudenberger, S. 1996. Dynamic optimization: the Dynamo project at HP Labs Cambridge (project proposal). HP Labs internal memo, Feb 1996.
[4]
Ball, T., and Larus, J.R. 1996. Efficient path profiling. In Proceedings of the 29th Annual International Symposium on Microarchitecture (MICRO-29), Paris. 46-57.
[5]
Bedichek, R. 1995. Talisman: fast and accurate multicomputer simulation. In Proceedings of the 1995 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems.
[6]
Chambers, C., and Ungar, D. 1989. Customization: optimizing compiler technology for Self, a dynamically-typed object-orientied programming language. In Proceedings of the SIGPLAN'89 Conference on Programming Language Design and Implementation. 146-160.
[7]
Chernoff, A., Herdeg, M., Hookway, R., Reeve, C., Rubin, N., Tye, T., Yadavalli, B., and Yates, J. 1998. FX!32: a profile-directed binary translator. IEEE Micro, Vol 18, No. 2, March/April 1998.
[8]
Cmelik, R.F., and Keppel, D. 1993. Shade: a fast instruction set simulator for execution profiling. Technical Report UWCSE-93-06-06, Dept. Computer Science and Engineering, University .of Washington.
[9]
Consel, C., and Noel, F. 1996. A general approach for runtime specialization and its application to C. In Proceedings of the 23th Annual Symposium on Principles of Programming Languages. 145-156.
[10]
Cramer, T., Friedman, R., Miller, T., Seberger, D., Wilson, R., and Wolczko, M. 1997. Compiling Java Just In Time. IEEE Micro, May/Jun 1997.
[11]
Deutsch, L.P. and Schiffman A.M. 1984. Efficient implementation of the Smalltalk-80 system. In Proceedings of the 11th Annual ACM Symposium on Principles of Programming Languages. 297-302.
[12]
Ebcioglu K., and Altman, E.R. 1997. DAISY: Dynamic compilation for 100% architectural compatibility. In Proceedings of the 24th Annual International Symposium on Computer Architecture. 26-37.
[13]
Engler, D.R. 1996. VCODE: a retargetable, extensible, very fast dynamic code generation system. In Proceedings of the SIGPLAN'96 Conference on Programming Language Design and Implementation (PLDI' 96).
[14]
Fisher, J., and Freudenberger, S. 1992. Predicting conditional branch directions from previous runs of a program. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 5). Oct 1992.85-95.
[15]
Friendly, D.H., Patel, S.J., and Part., Y.N. 1998. Putting the fill unit to work: dynamnic optimizations for trace cache microprocessors. In Proceedings of the 31st Annual Internation Symposium on Microarchitecture (MICRO-31), Dallas. 173-181.
[16]
Grant, B., Philipose, M., Mock, M., Chambers, C., and Eggers, S.J. An evaluation of staged run-time optimizations in DyC. In Proceedings of the SIGPLAN'99 Conference on Programming Language Design and Implementation. 293- 303.
[17]
Herold, S.A. 1998. Using complete machine simulation to understand computer system behavior. Ph.D. thesis, Dept. Computer Science, Stanford University.
[18]
Hwu, W.W., Mahlke, S.A., Chen, W.Y., Chang, P. P., Warter, N.J., Bringmann, R.A., Ouellette, R.Q., Hank, R.E., Kiyohara, T., Haab, G.E., Holm, J.G., and Lavery, D.M. 1993.The superblock: an effective structure for VLIW and superscalar compilation. The Journal of Supercomputing 7, (Jan.). 229- 248.
[19]
Keller, J. 1996. The 21264: a superscalar Alpha processor with out-of-order execution. Presented at the 9th Annual Microprocessor Forum, San Jose, CA.
[20]
Kelly, E.K., Cmelik, R.F., and Wing, M.J. 1998. Memory controller for a microprocessor for detecting a failure of speculation on the physical nature of a component being addressed. U.S. Patent 5,832,205, Nov. 1998.
[21]
Kumar, A. 1996. The HP PA-8000 RISC CPU: a high performance out-of-order processor. In Proceedings of Hot Chips VIII, Palo Alto, CA.
[22]
Leone, M. and Dybvig, R.K. 1997. Dynamo: a staged compiler architecture for dynamic program optimization. Technical Report #490, Dept. of Computer Science, Indiana University.
[23]
Leone, M. and Lee, P. 1996. Optimizing ML with run-time code generation. In Proceedings of the SIGPLAN'96 Conference on Programming Language Design and Implementation. 137-148.
[24]
Marlet, R., Consel, C., and Boinot, P. Efficient incremental run-time specialization for free. In Proceedings of the SIGPLAN '99 Conference on Programming Language Design and Implementation. 281-292.
[25]
Papworth, D. 1996. Tuning the Pentium Pro microarchitecture. IEEE Micro, (Apr.). 8-15.
[26]
Poletta, M., Engler, D.R., and Kaashoek, M.F. 1997. tcc: a system for fast flexible, and high-level dynamic code generation. In Proceedings of the SIGPLAN '97 Conference on Programming Language Design and Implementation. 109- 121.
[27]
Rotenberg, E., Bennett, S., and Smith, J.E. 1996. Trace cache: a low latency approach to high bandwidth instruction fetching. In Proceedings of the 29th Annual International Symposium on Microarchitecture (MICRO-29), Paris. 24-35.
[28]
Sannella, M., Maloney, J., Freeman-Benson, B., and Borning, A. 1993. Multi-way versus one-way constraints in user interfaces: experiences with the Deltablue algorithm. Software - Practice and Experience 23, 5 (May). 529-566.
[29]
Sites, R.L., Chernoff, A., Kirk, M.B., Marks, M.P., and Robinson, S.G. Binary Translation. Digital Technical Journal, Vol 4, No. 4, Special Issue, 1992.
[30]
Stears, P. 1994. Emulating the x86 and DOS/Windows in RISC environments. In Proceedings of the Microprocessor Forum, San Jose, CA.
[31]
Witchel, E. and Rosenblum R. 1996. Embra: fast and flexible machine simulation. In Proceedings of the SIGMETRICS '96 Conference on Measurement and Modeling of Computer Systems. 68-78

Cited By

View all
  • (2024)BTBench: A Benchmark for Comprehensive Binary Translation Performance Evaluation2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00014(36-47)Online publication date: 5-May-2024
  • (2023)AST vs. Bytecode: Interpreters in the Age of Meta-CompilationProceedings of the ACM on Programming Languages10.1145/36228087:OOPSLA2(318-346)Online publication date: 16-Oct-2023
  • (2023)Efficient condition code emulation for dynamic binary translation systemsThird International Symposium on Computer Engineering and Intelligent Communications (ISCEIC 2022)10.1117/12.2660798(25)Online publication date: 2-Feb-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 35, Issue 5
May 2000
357 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/358438
Issue’s Table of Contents
  • cover image ACM Conferences
    PLDI '00: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
    August 2000
    358 pages
    ISBN:1581131992
    DOI:10.1145/349299
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2000
Published in SIGPLAN Volume 35, Issue 5

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)747
  • Downloads (Last 6 weeks)117
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)BTBench: A Benchmark for Comprehensive Binary Translation Performance Evaluation2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00014(36-47)Online publication date: 5-May-2024
  • (2023)AST vs. Bytecode: Interpreters in the Age of Meta-CompilationProceedings of the ACM on Programming Languages10.1145/36228087:OOPSLA2(318-346)Online publication date: 16-Oct-2023
  • (2023)Efficient condition code emulation for dynamic binary translation systemsThird International Symposium on Computer Engineering and Intelligent Communications (ISCEIC 2022)10.1117/12.2660798(25)Online publication date: 2-Feb-2023
  • (2023)BibliographyEngineering a Compiler10.1016/B978-0-12-815412-0.00023-1(793-813)Online publication date: 2023
  • (2023)Runtime OptimizationEngineering a Compiler10.1016/B978-0-12-815412-0.00020-6(713-755)Online publication date: 2023
  • (2022)Virtual environments for the practical training of ICT engineering students: supported by the e-learning modalityRevista de Computo Aplicado10.35429/JCA.2022.18.6.1.31(1-31)Online publication date: 31-May-2022
  • (2022)Adoption of Blockchain Technology for Enhanced Traceability of Livestock-Based ProductsSustainability10.3390/su14201314814:20(13148)Online publication date: 13-Oct-2022
  • (2022)CrossDBT: An LLVM-Based User-Level Dynamic Binary Translation EmulatorEuro-Par 2022: Parallel Processing10.1007/978-3-031-12597-3_1(3-18)Online publication date: 22-Aug-2022
  • (2021)Lightweight on-stack replacement in languages with unstructured loopsProceedings of the 13th ACM SIGPLAN International Workshop on Virtual Machines and Intermediate Languages10.1145/3486606.3486782(4-13)Online publication date: 19-Oct-2021
  • (2021)Control-Flow Integrity Enforcement with Dynamic Code OptimizationNovel Techniques in Recovering, Embedding, and Enforcing Policies for Control-Flow Integrity10.1007/978-3-030-73141-0_5(77-94)Online publication date: 1-May-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media