Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/977395.977666acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
Article

Ispike: A Post-link Optimizer for the Intel®Itanium®Architecture

Published: 20 March 2004 Publication History
  • Get Citation Alerts
  • Abstract

    Ispike is post-link optimizer developed for theIntel®Itanium Processor Family (IPF) processors.TheIPF architecture poses both opportunities and challenges topost-link optimizations.IPF offers a rich set of performancecounters to collect detailed profile information at a low cost,which is essential to post-link optimization being practical.At the same time, the prediction and bundling features onIPF make post-link code transformation more challengingthan on other architectures.In Ispike, we have implementedoptimizations like code layout, instruction prefetching, datalayout, and data prefetching that exploit the IPF advantages,and strategies that cope with the IPF-specific challenges.Using SPEC CINT2000 as benchmarks, we showthat Ispike improves performance by as much as 40% on theItanium®2 processor, with average improvement of 8.5%and 9.9% over executables generated by the Intel®Electroncompiler and by the Gcc compiler, respectively.We alsodemonstrate that statistical profiles collected via IPF performancecounters and complete profiles collected via instrumentationproduce equal performance benefit, but theprofiling overhead is significantly lower for performancecounters.

    References

    [1]
    {1} J. Anderson, L. M. Berc, J. Dean, S. Ghemawat, M. R. Henzinger, S.-T. Leung, R. L. Sites, M. T. Vandevoorde, C. A. Waldspurger, and W. E. Weihi. Continuous profiling: Where have all the cycles gone. In Proceedings of the 16th Symposium on Operating System Principles, October 1997.
    [2]
    {2} R. Cohn. Pin User Manual. http://systems.cs.colorado.edu/Pin, Nov 2003.
    [3]
    {3} R. Cohn, D. Goodwin, and P. G. Lowney. Optimizing Alpha executables on Windows NT with Spike. Digital Technical Journal, 9(4):3-20, 1997.
    [4]
    {4} J.-F. Collard and D. Lavery. Optimizations to prevent cache penalties for the intel itanium 2 processor. In Proceedings of the 2003 International Conference on Code Generation and Optimization, March 2003.
    [5]
    {5} Intel Corporation. Intel Itanium Software Conventions & Runtime Architecture Guide. http://developer.intel.com/design/itanium/family/, May 2001.
    [6]
    {6} C. Dulong, R. Krishnaiyer, D. Kulkarni, D. Lavery, W. Li, J. Ng, and D. Sehr. An overview of the intel ia-64 compiler. Intel Technology Journal, 4 th quarter:1-15, 1999.
    [7]
    {7} D. Gillies, R. Chaiken, R. Sree, and D. Connors. Ia64 binary-level branch reversal. Technical Report MSR-TR-2003-85, Microsoft Research, 2003.
    [8]
    {8} David W. Goodwin. Interprocedural dataflow analysis in an executable optimizer. In Proceedings of the ACM SIGPLAN 97 Conference on Programming Language Design and Implementation, pages 122-133, 1997.
    [9]
    {9} G. Haber, M. Klausner, V. Eisenberg, B. Mendelson, and M. Gurevich. Optimization opportunities created by global data reordering. In Proceedings of the 2003 International Conference on Code Generation and Optimization, March 2003.
    [10]
    {10} E. A. Henis, G. Haber, M. Klausner, and A. Warshavsky. Feedback based post-link optimization for large subsystems. In Proceedings of the 2nd Workshop on Feedback Directed Optimization, pages 13-20, November 1999.
    [11]
    {11} Intel Corporation. Intel Itanium 2 Processor Reference Manual for Software Development and Optimization, May 2002.
    [12]
    {12} Intel Corporation. Intel Itanium Architecture Software Developer's Manual Vol. 1: Application Architecture, Oct. 2002.
    [13]
    {13} C-K Luk and P. G. Lowney. Patent application for Methods and Apparatus for Stride Profiling: A Software Application. Intel Corporation, 2003.
    [14]
    {14} C.-K. Luk and T. C. Mowry. Cooperative prefetching: Compiler and hardware support for effective instruction prefetching in modern processors. In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture , pages 182-193, December 1998.
    [15]
    {15} C-K Luk, R. Muth, H. Patil, P. G. Lowney, R. Cohn, and R. Weiss. Profile-guided post-link stride prefetching. In Proceedings of 2002 International Conference on Supercomputing, pages 167-178, June 2002.
    [16]
    {16} D. Mosberger and S. Eranian. IA-64 Linux Kernel Design and Implementation, chapter 9.3: Kernel Support for Performance Monitoring. Hewlett-Packard Company, 2002.
    [17]
    {17} T. C. Mowry, M. S. Lam, and A. Gupta. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 62-73, October 1992.
    [18]
    {18} R. Muth, S. Debray, S. Watterson, and K. DeBosschere. Alto: A link-time optimizer for the Compaq Alpha. Software: Practice and Experience, 21(1):67- 101, 2001.
    [19]
    {19} K. Pettis and R. Hansen. Profile guided code positioning. In Proceedings of the ACM SIGPLAN 90 Conference on Programming Language Design and Implementation, pages 16-27, June 1990.
    [20]
    {20} V. Ramasamy and R. Hundt. Dynamic binary instrumentation on IA-64. In Proceedings of the First EPIC Workshop, Dec. 2001.
    [21]
    {21} T. Romer, G. Voelker, D. Lee, A. Wolman, W. Wong, H. Levy, B. Bershad, and B. Chen. Instrumentation and optimization of win32/intel executables using Etch. In Proceedings of the USENIX Windows NT Workshop, pages 1-7, August 1997.
    [22]
    {22} A. Samuel. C++ ABI for Itanium: Exception Handling. http://www.codesourcery.com/cxx-abi/abi-eh.html.
    [23]
    {23} N. Snavely, S. Debray, and G. Andrews. Predicate analysis and if-conversion in an itanium link-time optimizer. In Proceedings of the Second EPIC Workshop, 2002.
    [24]
    {24} A. Srivastava, A. Edwards, and H. Vo. Vulcan: Binary transformation in a distributed environment. Technical Report MSR-TR-2001-50, Microsoft Research, April 2001.
    [25]
    {25} A. Srivastava and D. W. Wall. A practical system for intermodule code optimization. Journal of Programming Languages, 1(1):1-18, March 1993.
    [26]
    {26} Y. Wu. Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching. In Proceedings of the ACM SIGPLAN 02 Conference on Programming Language Design and Implementation, pages 210-221, 2002.
    [27]
    {27} X. Zhang, Z. Wang, N. Gloy, J. B. Chen, and M. D. Smith. System support for automatic profiling and optimization. In Proceedings of the 16th Symposium on Operating System Principles, October 1997.

    Cited By

    View all
    • (2022)Shooting Down the Server Front-End BottleneckACM Transactions on Computer Systems10.1145/348449238:3-4(1-30)Online publication date: 4-Jan-2022
    • (2021)RippleProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00063(734-747)Online publication date: 14-Jun-2021
    • (2019)BOLT: a practical binary optimizer for data centers and beyondProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314876(2-14)Online publication date: 16-Feb-2019
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CGO '04: Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
    March 2004
    301 pages
    ISBN:0769521029

    Sponsors

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 20 March 2004

    Check for updates

    Qualifiers

    • Article

    Conference

    CGO04

    Acceptance Rates

    CGO '04 Paper Acceptance Rate 25 of 79 submissions, 32%;
    Overall Acceptance Rate 312 of 1,061 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Shooting Down the Server Front-End BottleneckACM Transactions on Computer Systems10.1145/348449238:3-4(1-30)Online publication date: 4-Jan-2022
    • (2021)RippleProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00063(734-747)Online publication date: 14-Jun-2021
    • (2019)BOLT: a practical binary optimizer for data centers and beyondProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314876(2-14)Online publication date: 16-Feb-2019
    • (2019)The janus triad: exploiting parallelism through dynamic binary modificationProceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3313808.3313812(88-100)Online publication date: 14-Apr-2019
    • (2017)libPRISMProceedings of the International Conference on Supercomputing10.1145/3079079.3079101(1-10)Online publication date: 14-Jun-2017
    • (2015)PEBILCluster Computing10.1007/s10586-013-0307-218:1(1-14)Online publication date: 1-Mar-2015
    • (2013)A compiler-level intermediate representation based binary analysis and rewriting systemProceedings of the 8th ACM European Conference on Computer Systems10.1145/2465351.2465380(295-308)Online publication date: 15-Apr-2013
    • (2011)MAO -- An extensible micro-architectural optimizerProceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization10.5555/2190025.2190077(1-10)Online publication date: 2-Apr-2011
    • (2009)QilinProceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/1669112.1669121(45-55)Online publication date: 12-Dec-2009
    • (2008)PFetchProceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture10.1145/1509084.1509085(1-8)Online publication date: 26-Oct-2008
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media