Article

Free access

Optimizations and oracle parallelism with dynamic translation

Authors:

Kemal Ebcioğlu,

Erik R. Altman,

Michael Gschwind, and

Sumedh SathayeAuthors Info & Claims

MICRO 32: Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture

November 1999

Pages 284 - 295

Published: 16 November 1999 Publication History

PDF eReader Publisher Site

Abstract

We describe several optimizations which can be employed in a dynamic binary translation (DBT) system, where low compilation/translation overhead is essential. These optimizations achieve a high degree of ILP, sometimes even surpassing a static compiler employing more sophisticated, and more time-consuming algorithms [9]. We present results in which we employ these optimizations in a dynamic binary translation system capable of computing oracle parallelism.

References

[1]

A.V. Aho, R. Sethi and J.D. Ullman, Compilers- Principles, Techniques, and Tools, Addison-Wesley Publishers, Reading, MA, 1986.

Digital Library

[2]

J.L. Baer and D.P. Bovet, Compilation of Arithmetic Expressions for Parallel Computations, Proceedings of IFIP Congress, North-Holland, Amsterdam, pp. 340-346, 1968.

[3]

R. Brent, The Parallel Evaluation of General Arithmetic Expressions, Journal of the ACM, Vol. 21, No. 2, pp. 201-206, April 1974.

Digital Library

[4]

R. Brent and R. Towle, On the Time Required to Parse an Arithmetic Expression for Parallel Processing, International Conference on Parallel Processing, edited by P.H. Enslow, pp. 254, IEEE, August 1976.

[5]

A. Chernoff, M. Herdeg, R. Hookway, C. Reeve, N. Rubin, T. Tye, S. B. Yadavalli, J. Yates, FX!32-A Profile-Directed Binary Translator, IEEE Micro, Vol. 18, No. 2, pp. 56-64, March 1998.

Digital Library

[6]

J. Cocke and J.T. Schwartz, Programming Languages and Their Compilers: Preliminary Notes, Technical Report, Courant Institute of Mathematical Sciences, New York University, 1970.

[7]

K. Ebcio~lu, Some Design Ideas for a VLIW Architecture for Sequential-Natured Software, In Parallel Processing (Proceedings of IFIP WG 10.3 Working Conference on Parallel Processing), edited by M. Cosnard et al., pp. 3-21, North Holland, 1988.

[8]

K. Ebcio~lu and T. Nakatani, A New Compilation Technique for Parallelizing Loops with Unpredictable Branches on a VLIW Architecture, In Languages and Compilers for Parallel Computing, D. Gelemter, A. Nicolau, and D. Padua (eds.), Research Monographs in Parallel and Distributed Computing, pp. 213-224, MIT Press, 1990.

Digital Library

[9]

K. Ebcio/glu and E. Altman, DAISY: Dynamic Compilation for 100% Architectural Compatibility, Report No. RC 20538, IBM T.J. Watson Research Center, Yorktown Heights, NY, 1996, http: //www. research, ibm. com/vliw/pubs .html

[10]

K. Ebcio~lu and E. Altman, DAISY: Dynamic Compilation for 100% Architectural Compatibility, Proceedings of ISCA- 24, pp. 26-37, Denver, CO, June 1997.

Digital Library

[11]

K. Ebcio~glu, E. Altman, S. Sathaye, and M. Gschwind Execution-basedScheduling for VLIW Architectures, To Appear in Proceedings of Europar-99, Toulouse, France, August/September 1999.

Digital Library

[12]

M. Emami, R. Ghiya, and L.J. Hendren. Context-sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers, Proceedings of SIGPLAN PLDI, pp. 242-256, Orlando, FL, June 1994.

Digital Library

[13]

L.J. Hendren, J. Hummel, and A. Nicolau, Abstractions for Recursive Pointer Data Structures: Improving the Analysis and Transformation of Imperative Programs, Proceedings of SIGPLAN PLDI, pp. 249-260, San Francisco, CA, June I992.

Digital Library

[14]

IBM and Motorola,The PowerPC Microprocessor Family: The Programming Environments Manual for 32-Bit Microprocessors, www. mot.com/SPS/PowerPC/teksupport/teklibrary/manuats/pem32b, pc

[15]

M. S. Lam and R. P. Wilson, Limits of Control Flow on Parallelism, Proceedings of ISCA-19, pp. 46-57, Gold Coast, Australia, May 1992.

Digital Library

[16]

Leslie Lamport, How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs, IEEE Transactions on Computers, Vol. 28, No. 9, pp. 690-691, September t 979.

Digital Library

[17]

M.H. Lipasti and J.P. Shen, Exceeding the Datafiow Limit via Value Prediction, Proceedings of Micro-29, Paris, France, December 1996.

Digital Library

[18]

C. May, MIMIC: A Fast System/370 Simulator, Proceedings of SIGPLAN'87 Symposium on Interpreters and Interpretive Techniques, pp. 1-13, St. Paul, MN, June 1987.

Digital Library

[19]

A. Moshovos and G. Sohi, Streamlining Inter-operation Memory Commttnication via Data Dependence Prediction, Proceedings of Micro-30, Research Triangle Park, NC, December 1997.

Digital Library

[20]

M. Moudgill and J. Moreno, Run-time Detection and Recovery from Incorrectly Ordered Memory Operations, Report No. RC 20857, IBM T.J. Watson Research Center, Yorktown Heights, NY, 1997, http: //www. research, ibm. com/vliw/pubs .html

[21]

T. Nakatani and K. Ebcioglu, Combining as a Compilation Technique for VLIW Architectures, Proceedings of Micro-22, pp. 43-57, Dublin, Ireland, August 1989.

Digital Library

[22]

A. Nicolau, Percolation Scheduling: A Parallel Compilation Technique, TR 85-678, Department of Computer Science, Cornell University, 1985.

Digital Library

[23]

A. Nicolau and R. Potasman, Incremental Tree Height Reduction for High Level Synthesis, Proceedings of the 28th ACM/IEEE Design Automation Conference, pp. 770-774, San Francisco, CA, June 1991.

Digital Library

[24]

B.R. Rau and C.D. Glaeser, Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing, Proceedings of Micro- 14, pp. 183-198, 1981.

Digital Library

[25]

G.M. Silberman and K. Ebcio~lu, An Architectural Framework for Supporting Heterogeneous Instruction-Set Architectures, IEEE Computer, Vol. 26, No. 6, pp. 39-56, June 1993.

Digital Library

[26]

J.E. Smith, S. Sastry, T. Hell, T. Bezenek, M. Zhong, and V. lyengar, Achieving High Performance via Co-Designed Virtual Machines, http://www.ece.wisc.edu/~es/pitches/vms.ps, November 5, 1998.

[27]

Sun Microsystems, The Java Hotspot Peformance Engine Architecture, http://java.sun.com/products/hotspot/whitepaper.html, April 27, 1999.

[28]

K. B. Theobald, G. R. Gao and L. J. Hendren, On the Limits of Program Parallelism and its Smoothability, Proceedings of Micro-25, pp. l 0-19, Portland, OR, December 1992.

Digital Library

[29]

D.W. Wall, Limits oflnstruction-Level Parallelism, Proceedings of ASPLOS-IV, pp. 176-188, Santa Clara, CA, April 1991.

Digital Library

[30]

E. Witchel and M. Rosenblum, Embra: Fast and Flexible Machine Simulation, Proceedings of ACM SIGMET- RICS'96, pp. 68-79, Philadelphia, PA, May 1996.

Digital Library

Cited By

Ebcioglu KSan I(2022)Highly Parallel Multi-FPGA System Compilation from Sequential C/C++ Code in the AWS CloudACM Transactions on Reconfigurable Technology and Systems10.1145/350769815:4(1-42)Online publication date: 8-Aug-2022
https://dl.acm.org/doi/10.1145/3507698
Rong HPark HWu YWang C(2014)Just-In-Time Software PipeliningProceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization10.1145/2581122.2544148(11-22)Online publication date: 15-Feb-2014
https://dl.acm.org/doi/10.1145/2581122.2544148
Rong HPark HWu YWang C(2014)Just-In-Time Software PipeliningProceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization10.1145/2544137.2544148(11-22)Online publication date: 15-Feb-2014
https://dl.acm.org/doi/10.1145/2544137.2544148
Show More Cited By

Index Terms

Optimizations and oracle parallelism with dynamic translation
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
2. Hardware
  1. Electronic design automation
    1. Methodologies for EDA
  2. Integrated circuits
    1. Logic circuits
      1. Arithmetic and datapath circuits
      2. Design modules and hierarchy

Recommendations

Improving SIMD Parallelism via Dynamic Binary Translation

Recent trends in SIMD architecture have tended toward longer vector lengths, and more enhanced SIMD features have been introduced in newer vector instruction sets. However, legacy or proprietary applications compiled with short-SIMD ISA cannot benefit ...
Read More
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism
ISCA 2004

The performance of memory-bound commercial applicationssuch as databases is limited by increasing memory latencies. Inthis paper, we show that exploiting memory-level parallelism(MLP) is an effective approach for improving the performance ofthese ...
Read More
Low overhead dynamic binary translation on ARM
PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation

The ARMv8 architecture introduced AArch64, a 64-bit execution mode with a new instruction set, while retaining binary compatibility with previous versions of the ARM architecture through AArch32, a 32-bit execution mode. Most hardware implementations ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO 32: Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture

November 1999

299 pages

ISBN:076950437X

Chairmen:
Ronny Ronen
Intel Israel
,
Matthew Farrens
Univ. of California, Davis
,
Ilan Spillinger
Intel Israel

Copyright © Copyright (c) 1998 Institute of Electrical and Electronics Engineers, Inc. All rights reserved.

Sponsors

IEEE TC - MICRO: IEEE TC - MICRO
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

IEEE Computer Society

United States

Publication History

Published: 16 November 1999

Check for updates

Qualifiers

Article

Conference

MICRO99

Sponsor:

IEEE TC - MICRO
SIGMICRO

MICRO99: 32nd Annual ACM/IEEE International Symposium on Microarchitecture

November 16 - 18, 1999

Haifa, Israel

Acceptance Rates

MICRO 32 Paper Acceptance Rate 27 of 131 submissions, 21%;

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Sponsor:
sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
314
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)2

Other Metrics

View Author Metrics

Citations

Cited By

Ebcioglu KSan I(2022)Highly Parallel Multi-FPGA System Compilation from Sequential C/C++ Code in the AWS CloudACM Transactions on Reconfigurable Technology and Systems10.1145/350769815:4(1-42)Online publication date: 8-Aug-2022
https://dl.acm.org/doi/10.1145/3507698
Rong HPark HWu YWang C(2014)Just-In-Time Software PipeliningProceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization10.1145/2581122.2544148(11-22)Online publication date: 15-Feb-2014
https://dl.acm.org/doi/10.1145/2581122.2544148
Rong HPark HWu YWang C(2014)Just-In-Time Software PipeliningProceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization10.1145/2544137.2544148(11-22)Online publication date: 15-Feb-2014
https://dl.acm.org/doi/10.1145/2544137.2544148
Porto JAraujo GBorin EWu Y(2010)Trace execution automata in dynamic binary translationProceedings of the 2010 international conference on Computer Architecture10.1007/978-3-642-24322-6_10(99-116)Online publication date: 19-Jun-2010
https://dl.acm.org/doi/10.1007/978-3-642-24322-6_10
Gangwar ABalakrishnan MPanda PKumar A(2007)Evaluation of bus based interconnect mechanisms in clustered VLIW architecturesInternational Journal of Parallel Programming10.1007/s10766-007-0045-235:6(507-527)Online publication date: 1-Dec-2007
https://dl.acm.org/doi/10.1007/s10766-007-0045-2
Zilles CSohi G(2000)Understanding the backward slices of performance degrading instructionsACM SIGARCH Computer Architecture News10.1145/342001.33967628:2(172-181)Online publication date: 1-May-2000
https://dl.acm.org/doi/10.1145/342001.339676
Zilles CSohi GBerenbaum AEmer J(2000)Understanding the backward slices of performance degrading instructionsProceedings of the 27th annual international symposium on Computer architecture10.1145/339647.339676(172-181)Online publication date: 10-Jun-2000
https://dl.acm.org/doi/10.1145/339647.339676
Gschwind MEbcioğlu KAltman ESathaye SReynders JVeidenbaum A(2000)Binary translation and architecture convergence issues for IBM system/390Proceedings of the 14th international conference on Supercomputing10.1145/335231.335264(336-347)Online publication date: 8-May-2000
https://dl.acm.org/doi/10.1145/335231.335264

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents