Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Software profiling for hot path prediction: less is more

Published: 12 November 2000 Publication History

Abstract

Recently, there has been a growing interest in exploiting profile information in adaptive systems such as just-in-time compilers, dynamic optimizers and, binary translators. In this paper, we show that sophisticated software profiling schemes that provide highly accurate information in an offline setting are ill-suited for these dynamic code generation systems. We experimentally demonstrate that hot path predictions must be made early in order to control the rising cost of missed opportunity that result from the prediction delay. We also show that existing sophisticated path profiling schemes, if used in an online setting, offer no prediction advantages over simpler schemes that exhibit much lower runtime overheads.Based on these observation we developed a new low-overhead software profiling scheme for hot path prediction. Using an abstract metric we compare our scheme to path profile based prediction and show that our scheme achieves comparable prediction quality. In our second set of experiments we include runtime overhead and evaluate the performance of our scheme in a realistic application: Dynamo, a dynamic optimization system. The results show that our prediction scheme clearly outperforms path profile based prediction and thus confirm that less profiling as exhibited in our scheme will actually lead to more effective hot path prediction.

References

[1]
Ammons, G., Ball, T., and Larus, J.R. Exploiting hardware performance counters with flow and context sensitive profiling. In Proc. of the 1997 Conf. on Programming Language Design and Implementation, June 1997.
[2]
Anderson, J.M., Berc, L.M., Dean, J., Ghemawat, S., Henzinger, M.R., Leung, S.A., Sites, R.L., Vandevoorde, M.T., Waldspurger, C.A., and Weihl, W.E. Continuous profiling: Where have all the cycles gone? In Proc. of the 16th ACM Symp. on Operating Systems Principles, St. Malo, France. October 1997.
[3]
Bala, V., Duesterwald, E., and Banerjia, S. Transparent dynamic optimization: The design and implementation of Dynamo. Hewlett Packard Laboratories Technical Report HPL-1999-78. June 1999.
[4]
Bala, V., Duesterwald, E., and Banerjia, S. Dynamo: A transparent runtime optimization system. In Proc. of the 2000 Conf. on Programming Language Design and Implementation. Vancouver, B.C., June 2000.
[5]
Ball, T. and Larus, J.R. Efficient path profiling. In Proc. of the 29 th Int. Symp. on Microarchitecture, Paris. 1996.
[6]
Ball, T., Mataga, P. and Sagiv, M. Edge profiling versus path profiling: The showdown. In Proc. of the 25 th Symp. on Principles of Programming Languages, San Diego, CA, January 1998.
[7]
Burke, M., Choi, J.-D., Fink, S., Grove, D., Hind, M., Sarkar, V., Serrano, M.J., Sreedhar, V.C., Srinivasa, H. The Jalapeno Dynamic Optimizing Compiler for Java. In Proc. of the 1999 ACM Java Grande Conference, San Francisco, CA. June 1999.
[8]
Chang, P., Mahlke, S.A., and Hwu, W.M. Using profile information to assist classic code optimization. Software - Practice and Experience, Vol. 21, No. 12, December 1991.
[9]
Calder, B. and Grunwald, D. Fast and accurate instruction fetch and branch prediction. In Proc. of the 21 st Int. Symp. on Computer Architecture. April 1994.
[10]
Cmelik, R.F. and Keppel, D. Shade: a fast instruction set simulator for execution profiling. Technical Report UWCSE- 93- 06-06, Dept. Comp. Science and Engineering, Univ. Washington. 1993.
[11]
Ebcioglu, K., Altman E., Sathaye, S., and Gschwind, M. Execution-based scheduling for VLIW architectures. In Proc. of Europar'99, Lecture Notes in Computer Science 1685, Springer-Verlag 1999.
[12]
McFarling, S., and Hennesy, J. Reducing the cost of branches. In Proc. of the 13 th Int. Symp. on Computer Architecture. 1986.
[13]
Merten, C.M., Trick, A., George, C.N., Gyllenhaal, J.C., and Hwu, W.-M.W. A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization. In Proc. of the 26 th Int. Symp. on Computer Architecture. Atlanta, Georgia. 1999,
[14]
Pan, S, So, K., and Rahmeh, J. Improving the accuracy of dynamic branch prediction using branch correlation. In Proc. of the 5 th Int. Conf. on Architectural Support for Programming Languages and Operating Systems. 1992.
[15]
Rotenberg, E., Bennett, S., and Smith, J.E. Trace cache: a low latency approach to high bandwidth instruction fetching. In Proc. of the 29 th Int. Symp. on Microarchitecture, Paris. 1996.
[16]
Sannella, M., Maloney, J., Freeman-Benson, B., and Borning, A. Multi-way versus one-way constraints in user interfaces: experiences with the DeltaBlue algorithm. Software - Practice and Experience 23, 5 (May). 529-566. 1993.
[17]
Sathaye, S., Ledak, P., LeBlanc, J., Kosonocky, S., Gschwind, M., Fritts, J., Filan, Z., Bright, A., Appenzeller, D., Altman, E., and Agricola, C. BOA: Targeting multigigahertz with binary translation. In Proc. of the 1999 Workshop on Binary Translation, Newport Beach, CA., October 1999.
[18]
Smith, M. Private communication, March 2000.
[19]
Yeh, T. and Patt, Y. A comparison of dynamic branch predictors that use two levels of branch history. In Proc. of the 20 th Int. Symp. on Computer Architecture. 1993.
[20]
Young, C. and Smith, M. Static correlated branch prediction. ACM Transactions on Programming Languages and Systems, Vol. 21, No. 5, September 1999.
[21]
Zhang, X. et al. System support for automatic profiling and optimization. In Proc. of the 16 th ACM Symposium on Operating Systems Principles, St. Malo, France. Oct. 1997.

Cited By

View all
  • (2021)Employing Simulation to Facilitate the Design of Dynamic Binary Translators2021 IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD53543.2021.00022(104-113)Online publication date: Oct-2021
  • (2020)Optimising dynamic binary modification across 64-bit Arm microarchitecturesProceedings of the 16th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3381052.3381322(185-197)Online publication date: 17-Mar-2020
  • (2020)Butterfly Space: An Architectural Approach for Investigating Performance Issues2020 IEEE International Conference on Software Architecture (ICSA)10.1109/ICSA47634.2020.00027(202-213)Online publication date: Mar-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2000
Published in SIGOPS Volume 34, Issue 5

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)126
  • Downloads (Last 6 weeks)22
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Employing Simulation to Facilitate the Design of Dynamic Binary Translators2021 IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD53543.2021.00022(104-113)Online publication date: Oct-2021
  • (2020)Optimising dynamic binary modification across 64-bit Arm microarchitecturesProceedings of the 16th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3381052.3381322(185-197)Online publication date: 17-Mar-2020
  • (2020)Butterfly Space: An Architectural Approach for Investigating Performance Issues2020 IEEE International Conference on Software Architecture (ICSA)10.1109/ICSA47634.2020.00027(202-213)Online publication date: Mar-2020
  • (2020)A Systematic Review of Search Strategies in Dynamic Symbolic ExecutionComputer Standards & Interfaces10.1016/j.csi.2020.10344472(103444)Online publication date: Oct-2020
  • (2020)Evaluation and Mitigation of Timing Side-Channel Leakages on Multiple-Target Dynamic Binary TranslatorsHigh Performance Computing Systems10.1007/978-3-030-41050-6_10(152-167)Online publication date: 14-Feb-2020
  • (2019)FSCT: A new fuzzy search strategy in concolic testingInformation and Software Technology10.1016/j.infsof.2018.11.006107(137-158)Online publication date: Mar-2019
  • (2017)Optimizing Memory Access Performance Using Hardware Assisted Virtualization in Retargetable Dynamic Binary Translation2017 Euromicro Conference on Digital System Design (DSD)10.1109/DSD.2017.41(40-46)Online publication date: Aug-2017
  • (2015)Identification of Dynamic Circuit Specialization Opportunities in RTL CodeACM Transactions on Reconfigurable Technology and Systems10.1145/26296408:1(1-24)Online publication date: 6-Mar-2015
  • (2013)Superblock compilation and other optimization techniques for a Java-based DBT machine emulatorACM SIGPLAN Notices10.1145/2517326.245152148:7(33-40)Online publication date: 16-Mar-2013
  • (2013)Superblock compilation and other optimization techniques for a Java-based DBT machine emulatorProceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments10.1145/2451512.2451521(33-40)Online publication date: 16-Mar-2013
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media