Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/320080.320124acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article
Free access

Optimizations and oracle parallelism with dynamic translation

Published: 16 November 1999 Publication History
  • Get Citation Alerts
  • Abstract

    We describe several optimizations which can be employed in a dynamic binary translation (DBT) system, where low compilation/translation overhead is essential. These optimizations achieve a high degree of ILP, sometimes even surpassing a static compiler employing more sophisticated, and more time-consuming algorithms [9]. We present results in which we employ these optimizations in a dynamic binary translation system capable of computing oracle parallelism.

    References

    [1]
    A.V. Aho, R. Sethi and J.D. Ullman, Compilers- Principles, Techniques, and Tools, Addison-Wesley Publishers, Reading, MA, 1986.
    [2]
    J.L. Baer and D.P. Bovet, Compilation of Arithmetic Expressions for Parallel Computations, Proceedings of IFIP Congress, North-Holland, Amsterdam, pp. 340-346, 1968.
    [3]
    R. Brent, The Parallel Evaluation of General Arithmetic Expressions, Journal of the ACM, Vol. 21, No. 2, pp. 201-206, April 1974.
    [4]
    R. Brent and R. Towle, On the Time Required to Parse an Arithmetic Expression for Parallel Processing, International Conference on Parallel Processing, edited by P.H. Enslow, pp. 254, IEEE, August 1976.
    [5]
    A. Chernoff, M. Herdeg, R. Hookway, C. Reeve, N. Rubin, T. Tye, S. B. Yadavalli, J. Yates, FX!32-A Profile-Directed Binary Translator, IEEE Micro, Vol. 18, No. 2, pp. 56-64, March 1998.
    [6]
    J. Cocke and J.T. Schwartz, Programming Languages and Their Compilers: Preliminary Notes, Technical Report, Courant Institute of Mathematical Sciences, New York University, 1970.
    [7]
    K. Ebcio~lu, Some Design Ideas for a VLIW Architecture for Sequential-Natured Software, In Parallel Processing (Proceedings of IFIP WG 10.3 Working Conference on Parallel Processing), edited by M. Cosnard et al., pp. 3-21, North Holland, 1988.
    [8]
    K. Ebcio~lu and T. Nakatani, A New Compilation Technique for Parallelizing Loops with Unpredictable Branches on a VLIW Architecture, In Languages and Compilers for Parallel Computing, D. Gelemter, A. Nicolau, and D. Padua (eds.), Research Monographs in Parallel and Distributed Computing, pp. 213-224, MIT Press, 1990.
    [9]
    K. Ebcio/glu and E. Altman, DAISY: Dynamic Compilation for 100% Architectural Compatibility, Report No. RC 20538, IBM T.J. Watson Research Center, Yorktown Heights, NY, 1996, http: //www. research, ibm. com/vliw/pubs .html
    [10]
    K. Ebcio~lu and E. Altman, DAISY: Dynamic Compilation for 100% Architectural Compatibility, Proceedings of ISCA- 24, pp. 26-37, Denver, CO, June 1997.
    [11]
    K. Ebcio~glu, E. Altman, S. Sathaye, and M. Gschwind Execution-basedScheduling for VLIW Architectures, To Appear in Proceedings of Europar-99, Toulouse, France, August/September 1999.
    [12]
    M. Emami, R. Ghiya, and L.J. Hendren. Context-sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers, Proceedings of SIGPLAN PLDI, pp. 242-256, Orlando, FL, June 1994.
    [13]
    L.J. Hendren, J. Hummel, and A. Nicolau, Abstractions for Recursive Pointer Data Structures: Improving the Analysis and Transformation of Imperative Programs, Proceedings of SIGPLAN PLDI, pp. 249-260, San Francisco, CA, June I992.
    [14]
    IBM and Motorola,The PowerPC Microprocessor Family: The Programming Environments Manual for 32-Bit Microprocessors, www. mot.com/SPS/PowerPC/teksupport/teklibrary/manuats/pem32b, pc
    [15]
    M. S. Lam and R. P. Wilson, Limits of Control Flow on Parallelism, Proceedings of ISCA-19, pp. 46-57, Gold Coast, Australia, May 1992.
    [16]
    Leslie Lamport, How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs, IEEE Transactions on Computers, Vol. 28, No. 9, pp. 690-691, September t 979.
    [17]
    M.H. Lipasti and J.P. Shen, Exceeding the Datafiow Limit via Value Prediction, Proceedings of Micro-29, Paris, France, December 1996.
    [18]
    C. May, MIMIC: A Fast System/370 Simulator, Proceedings of SIGPLAN'87 Symposium on Interpreters and Interpretive Techniques, pp. 1-13, St. Paul, MN, June 1987.
    [19]
    A. Moshovos and G. Sohi, Streamlining Inter-operation Memory Commttnication via Data Dependence Prediction, Proceedings of Micro-30, Research Triangle Park, NC, December 1997.
    [20]
    M. Moudgill and J. Moreno, Run-time Detection and Recovery from Incorrectly Ordered Memory Operations, Report No. RC 20857, IBM T.J. Watson Research Center, Yorktown Heights, NY, 1997, http: //www. research, ibm. com/vliw/pubs .html
    [21]
    T. Nakatani and K. Ebcioglu, Combining as a Compilation Technique for VLIW Architectures, Proceedings of Micro-22, pp. 43-57, Dublin, Ireland, August 1989.
    [22]
    A. Nicolau, Percolation Scheduling: A Parallel Compilation Technique, TR 85-678, Department of Computer Science, Cornell University, 1985.
    [23]
    A. Nicolau and R. Potasman, Incremental Tree Height Reduction for High Level Synthesis, Proceedings of the 28th ACM/IEEE Design Automation Conference, pp. 770-774, San Francisco, CA, June 1991.
    [24]
    B.R. Rau and C.D. Glaeser, Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing, Proceedings of Micro- 14, pp. 183-198, 1981.
    [25]
    G.M. Silberman and K. Ebcio~lu, An Architectural Framework for Supporting Heterogeneous Instruction-Set Architectures, IEEE Computer, Vol. 26, No. 6, pp. 39-56, June 1993.
    [26]
    J.E. Smith, S. Sastry, T. Hell, T. Bezenek, M. Zhong, and V. lyengar, Achieving High Performance via Co-Designed Virtual Machines, http://www.ece.wisc.edu/~es/pitches/vms.ps, November 5, 1998.
    [27]
    Sun Microsystems, The Java Hotspot Peformance Engine Architecture, http://java.sun.com/products/hotspot/whitepaper.html, April 27, 1999.
    [28]
    K. B. Theobald, G. R. Gao and L. J. Hendren, On the Limits of Program Parallelism and its Smoothability, Proceedings of Micro-25, pp. l 0-19, Portland, OR, December 1992.
    [29]
    D.W. Wall, Limits oflnstruction-Level Parallelism, Proceedings of ASPLOS-IV, pp. 176-188, Santa Clara, CA, April 1991.
    [30]
    E. Witchel and M. Rosenblum, Embra: Fast and Flexible Machine Simulation, Proceedings of ACM SIGMET- RICS'96, pp. 68-79, Philadelphia, PA, May 1996.

    Cited By

    View all
    • (2022)Highly Parallel Multi-FPGA System Compilation from Sequential C/C++ Code in the AWS CloudACM Transactions on Reconfigurable Technology and Systems10.1145/350769815:4(1-42)Online publication date: 8-Aug-2022
    • (2014)Just-In-Time Software PipeliningProceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization10.1145/2581122.2544148(11-22)Online publication date: 15-Feb-2014
    • (2014)Just-In-Time Software PipeliningProceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization10.1145/2544137.2544148(11-22)Online publication date: 15-Feb-2014
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MICRO 32: Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
    November 1999
    299 pages
    ISBN:076950437X

    Sponsors

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 16 November 1999

    Check for updates

    Qualifiers

    • Article

    Conference

    MICRO99
    Sponsor:

    Acceptance Rates

    MICRO 32 Paper Acceptance Rate 27 of 131 submissions, 21%;
    Overall Acceptance Rate 484 of 2,242 submissions, 22%

    Upcoming Conference

    MICRO '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)2

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Highly Parallel Multi-FPGA System Compilation from Sequential C/C++ Code in the AWS CloudACM Transactions on Reconfigurable Technology and Systems10.1145/350769815:4(1-42)Online publication date: 8-Aug-2022
    • (2014)Just-In-Time Software PipeliningProceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization10.1145/2581122.2544148(11-22)Online publication date: 15-Feb-2014
    • (2014)Just-In-Time Software PipeliningProceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization10.1145/2544137.2544148(11-22)Online publication date: 15-Feb-2014
    • (2010)Trace execution automata in dynamic binary translationProceedings of the 2010 international conference on Computer Architecture10.1007/978-3-642-24322-6_10(99-116)Online publication date: 19-Jun-2010
    • (2007)Evaluation of bus based interconnect mechanisms in clustered VLIW architecturesInternational Journal of Parallel Programming10.1007/s10766-007-0045-235:6(507-527)Online publication date: 1-Dec-2007
    • (2000)Understanding the backward slices of performance degrading instructionsACM SIGARCH Computer Architecture News10.1145/342001.33967628:2(172-181)Online publication date: 1-May-2000
    • (2000)Understanding the backward slices of performance degrading instructionsProceedings of the 27th annual international symposium on Computer architecture10.1145/339647.339676(172-181)Online publication date: 10-Jun-2000
    • (2000)Binary translation and architecture convergence issues for IBM system/390Proceedings of the 14th international conference on Supercomputing10.1145/335231.335264(336-347)Online publication date: 8-May-2000

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media