Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Impact of Intrinsic Profiling Limitations on Effectiveness of Adaptive Optimizations

Published: 12 December 2016 Publication History

Abstract

Many performance optimizations rely on or are enhanced by runtime profile information. However, both offline and online profiling techniques suffer from intrinsic and practical limitations that affect the quality of delivered profile data. The quality of profile data is its ability to accurately predict (relevant aspects of) future program behavior. While these limitations are known, their impact on the effectiveness of profile-guided optimizations, compared to the ideal performance, is not as well understood. We define ideal performance for adaptive optimizations as that achieved with a precise profile of future program behavior.
In this work, we study and quantify the performance impact of fundamental profiling limitations by comparing the effectiveness of typical adaptive optimizations when using the best profiles generated by offline and online schemes against a baseline where the adaptive optimization is given access to profile information about the future execution of the program. We model and compare the behavior of three adaptive JVM optimizations—heap memory management using object usage profiles, code cache management using method usage profiles, and selective just-in-time compilation using method hotness profiles—for the Java DaCapo benchmarks. Our results provide insight into the advantages and drawbacks of current profiling strategies and shed light on directions for future profiling research.

References

[1]
Neha Agarwal, David Nellans, Mark Stephenson, Mike O’Connor, and Stephen W. Keckler. 2015. Page placement strategies for GPUs within heterogeneous memory systems. SIGPLAN Not. 50, 4 (March 2015), 607--618.
[2]
Jennifer M. Anderson, Lance M. Berc, Jeffrey Dean, Sanjay Ghemawat, Monika R. Henzinger, Shun-Tak A. Leung, Richard L. Sites, Mark T. Vandevoorde, Carl A. Waldspurger, and William E. Weihl. 1997. Continuous profiling: Where have all the cycles gone? ACM Trans. Comput. Syst. 15, 4 (Nov. 1997), 357--390.
[3]
Matthew Arnold, Stephen Fink, David Grove, Michael Hind, and Peter F. Sweeney. 2000a. Adaptive optimization in the Jalapeno JVM. ACM SIGPLAN Not. 35, 10 (2000), 47--65.
[4]
Matthew Arnold, Stephen Fink, David Grove, Michael Hind, and Peter F. Sweeney. 2000b. Adaptive optimization in the Jalapeno JVM: The controller’s analytical model. In Proceedings of the 3rd ACM Workshop on Feedback Directed and Dynamic Optimization (FDDO’00).
[5]
Matthew Arnold, Stephen Fink, David Grove, Michael Hind, and Peter F. Sweeney. 2005. A survey of adaptive optimization in virtual machines. Proc. IEEE 92, 2 (Feb. 2005), 449--466.
[6]
Matthew Arnold and David Grove. 2005. Collecting and exploiting high-accuracy call graph profiles in virtual machines. In Proceedings of the Symposium on Code Generation and Optimization. 51--62.
[7]
Matthew Arnold, Michael Hind, and Barbara G. Ryder. 2002. Online feedback-directed optimization of Java. SIGPLAN Not. 37, 11 (2002), 111--129.
[8]
Matthew Arnold and Barbara G. Ryder. 2001. A framework for reducing the cost of instrumented code. In Proceedings of the Conference on Programming Language Design and Implementation. 168--179.
[9]
Paul Berube. 2012. Methodologies for Many-Input Feedback-Directed Optimization. Ph.D. Dissertation. University of Alberta, Edmonton, Alberta, Canada.
[10]
Walter Binder. 2006. Portable and accurate sampling profiling for Java. Softw. Pract. Exper. 36, 6 (2006), 615--650.
[11]
Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo benchmarks: Java benchmarking development and analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications (OOPSLA’06). ACM, 169--190.
[12]
Jacob Brock, Xiaoming Gu, Bin Bao, and Chen Ding. 2013. Pacman: Program-assisted cache management. SIGPLAN Not. 48, 11 (June 2013), 39--50.
[13]
Brad Calder, Chandra Krintz, Simmi John, and Todd Austin. 1998. Cache-conscious data placement. SIGPLAN Not. 33, 11 (Oct. 1998), 139--149.
[14]
Bing-Jing Chang, Yuan-Hao Chang, Hung-Sheng Chang, Tei-Wei Kuo, and Hsiang-Pang Li. 2014. A PCM translation layer for integrated memory and storage management. In Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis. ACM, Article 6, 10 pages.
[15]
Pohua P. Chang, Scott A. Mahlke, and Wen mei W. Hwu. 1991. Using profile information to assist classic code optimizations. Software Prac. Exper. 21 (1991), 1301--1321.
[16]
Guoyang Chen, Bo Wu, Dong Li, and Xipeng Shen. 2014. PORPLE: An extensible optimizer for portable data placement on GPU. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47’14). IEEE, 88--100.
[17]
Trishul M. Chilimbi and Ran Shaham. 2006. Cache-conscious coallocation of hot data streams. In Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’06). ACM, New York, NY, 252--262.
[18]
Howard David, Chris Fallin, Eugene Gorbatov, Ulf R. Hanebutte, and Onur Mutlu. 2011. Memory power management via dynamic voltage/frequency scaling. In Proceedings of the 8th ACM International Conference on Autonomic Computing (ICAC’11). ACM, 31--40.
[19]
Yufei Ding, Mingzhou Zhou, Zhijia Zhao, Sarah Eisenstat, and Xipeng Shen. 2014. Finding the limit: Examining the potential and complexity of compilation scheduling for JIT-based runtime systems. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). ACM, 607--622.
[20]
Evelyn Duesterwald and Vasanth Bala. 2000. Software profiling for hot path prediction: Less is more. SIGPLAN Not. 35, 11 (Nov. 2000), 202--211.
[21]
Evelyn Duesterwald, Calin Cascaval, and Sandhya Dwarkadas. 2003. Characterizing and predicting program behavior and its variability. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT’03). IEEE, 220--230.
[22]
Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically rigorous Java performance evaluation. In Proceedings of the Conference on Object-Oriented Programming Systems and Applications. 57--76.
[23]
Darryl Gove and Lawrence Spracklen. 2007. Evaluating the correspondence between training and reference workloads in SPEC CPU2006. SIGARCH Comput. Archit. News 35, 1 (March 2007), 122--129.
[24]
Rentong Guo, Xiaofei Liao, Hai Jin, Jianhui Yue, and Guang Tan. 2015. NightWatch: Integrating lightweight and transparent cache pollution control into dynamic memory allocation systems. In 2015 USENIX Annual Technical Conference (USENIX ATC’15). USENIX, 307--318.
[25]
Urs Hoelzle and Luiz Andre Barroso. 2009. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan and Claypool Publishers.
[26]
Peter Hofer and Hanspeter Mössenböck. 2014. Efficient and accurate stack trace sampling in the Java HotSpot virtual machine. In Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering (ICPE’14). 277--280.
[27]
Urs Hölzle and David Ungar. 1996. Reconciling responsiveness with performance in pure object-oriented languages. ACM Trans. Program. Lang. Syst. 18, 4 (1996), 355--400.
[28]
Wei Chung Hsu, Howard Chen, Pen Chung Yew, and Dong-Yuan Chen. 2002. On the predictability of program behavior using different input data sets. In Proceedings of the 6th Annual Workshop on Interaction Between Compilers and Computer Architectures (INTERACT’02). IEEE, 45--53.
[29]
Xianglong Huang, Stephen M. Blackburn, Kathryn S. McKinley, J. Eliot B. Moss, Zhenlin Wang, and Perry Cheng. 2004. The garbage collection advantage: Improving program locality. In Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA’04). ACM, 69--80.
[30]
Wen-Mei W. Hwu, Scott A. Mahlke, William Y. Chen, Pohua P. Chang, Nancy J. Warter, Roger A. Bringmann, Roland G. Ouellette, Richard E. Hank, Tokuzo Kiyohara, Grant E. Haab, John G. Holm, and Daniel M. Lavery. 1993. The superblock: An effective technique for VLIW and superscalar compilation. J. Supercomput. 7, 1--2 (1993), 229--248.
[31]
Hiroshi Inoue and Toshio Nakatani. 2009. How a Java VM can get more from a hardware performance monitor. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA’09). ACM, 137--154.
[32]
Michael R. Jantz and Prasad A. Kulkarni. 2013. Exploring single and multilevel JIT compilation policy for modern machines. ACM Trans. Archit. Code Optim. 10, 4, Article 22 (Dec. 2013), 29 pages.
[33]
Michael R. Jantz, Forrest J. Robinson, Prasad A. Kulkarni, and Kshitij A. Doshi. 2015. Cross-layer memory management for managed language applications. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM, 488--504.
[34]
JEDEC. 2009. DDR3 SDRAM Standard. Retrieved from http://www.jedec.org/standards-documents/docs/jesd-79-3d.
[35]
Yunlian Jiang, Eddy Z. Zhang, Kai Tian, Feng Mao, Malcom Gethers, Xipeng Shen, and Yaoqing Gao. 2010. Exploiting statistical correlations for proactive prediction of program behaviors. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’10). 248--256.
[36]
Thomas Kotzmann, Christian Wimmer, Hanspeter Mössenböck, Thomas Rodriguez, Kenneth Russell, and David Cox. 2008. Design of the Java HotSpot™ client compiler for Java 6. ACM Trans. Archit. Code Optim. 5, 1 (2008), 1--32.
[37]
Chandra Krintz, David Grove, Vivek Sarkar, and Brad Calder. 2000. Reducing the overhead of dynamic compilation. Software: Pract. Exper. 31, 8 (Dec. 2000), 717--738.
[38]
Prasad A. Kulkarni. 2011. JIT compilation policy for modern machines. In Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA’11). ACM, 773--788.
[39]
Charles Lefurgy, Karthick Rajamani, Freeman Rawson, Wes Felter, Michael Kistler, and Tom W. Keller. 2003. Energy management for commercial servers. Computer 36, 12 (Dec. 2003), 39--48.
[40]
Kevin Lim, Jichuan Chang, Trevor Mudge, Parthasarathy Ranganathan, Steven K. Reinhardt, and Thomas F. Wenisch. 2009. Disaggregated memory for expansion and sharing in blade servers. In Proceedings of the 36th Annual International Symposium on Computer Architecture. ACM, 267--278.
[41]
Krishna T. Malladi, Benjamin C. Lee, Frank A. Nothaft, Christos Kozyrakis, Karthika Periyathambi, and Mark Horowitz. 2012. Towards energy-proportional datacenter memory with mobile DRAM. In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA’12). IEEE, 37--48.
[42]
Feng Mao and Xipeng Shen. 2009. Cross-input learning and discriminative prediction in evolvable virtual machines. In Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’09). 92--101.
[43]
Mitesh R. Meswani, Sergey Blagodurov, David Roberts, John Slice, Mike Ignatowski, and Gabriel H. Loh. 2015. Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA’15). 126--136.
[44]
Markus Mock, Craig Chambers, and Susan J. Eggers. 2000. Calpa: A tool for automating selective dynamic compilation. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO 33’00). 291--302.
[45]
Tipp Moseley, Alex Shye, Vijay Janapa Reddi, Dirk Grunwald, and Ramesh Peri. 2007. Shadow profiling: Hiding instrumentation costs with parallelism. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’07). IEEE, 198--208.
[46]
Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. 2010. Evaluating the accuracy of Java profilers. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). ACM, 187--197.
[47]
Manjiri A. Namjoshi and Prasad A. Kulkarni. 2010. Novel online profiling for virtual machines. In Proceedings of the Conference on Virtual Execution Environments (VEE’10). 133--144.
[48]
Oracle. 2014. Java Virtual Machine Tool Interface (JVM TI). Retrieved from http://docs.oracle.com/javase/6/docs/technotes/guides/jvmti/.
[49]
Michael Paleczny, Christopher Vick, and Cliff Click. 2001. The Java HotSpot™ server compiler. In Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium (JVM’01). USENIX, 1--12.
[50]
Karl Pettis and Robert C. Hansen. 1990. Profile guided code positioning. In Proceedings of the ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation (PLDI’90). 16--27.
[51]
Forrest Robinson, Michael Jantz, and Prasad Kulkarni. 2016. Code cache management in managed language VMs to reduce memory consumption for embedded systems. In Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’16). ACM.
[52]
Shai Rubin, Rastislav Bodík, and Trishul Chilimbi. 2002. An efficient profile-analysis framework for data-layout optimizations. In Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’02). ACM, 140--153.
[53]
Mehrzad Samadi, Amir Hormati, Mojtaba Mehrara, Janghaeng Lee, and Scott Mahlke. 2012. Adaptive input-aware compilation for graphics engines. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’12). ACM, 13--22.
[54]
Xipeng Shen, Yixun Liu, Eddy Z. Zhang, and Poornima Bhamidipati. 2013. An infrastructure for tackling input-sensitivity of GPU program optimizations. Int. J. Parallel Program. 41, 6 (Dec. 2013), 855--869.
[55]
Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian, and Al Davis. 2010. Micro-pages: Increasing DRAM efficiency with locality-aware data placement. SIGARCH Comput. Archit. News 38, 1 (March 2010), 219--230.
[56]
Kai Tian, Yunlian Jiang, Eddy Z. Zhang, and Xipeng Shen. 2010. An input-centric paradigm for program dynamic optimizations. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA’10). ACM, 125--139.
[57]
Steven Wallace and Kim Hazelwood. 2007. SuperPin: Parallelizing dynamic instrumentation for real-time performance. In Proceedings of the Symposium on Code Generation and Optimization. 209--220.
[58]
John Whaley. 2000. A portable sampling-based profiler for Java virtual machines. In Proceedings of the ACM 2000 Conference on Java Grande (JAVA’00). 78--87.
[59]
Bo Wu, Zhijia Zhao, Xipeng Shen, Yunlian Jiang, Yaoqing Gao, and Raul Silvera. 2012. Exploiting inter-sequence correlations for program behavior prediction. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA’12). 851--866.
[60]
Xi Yang, Stephen M. Blackburn, and Kathryn S. McKinley. 2015. Computer performance microscopy with shim. In Proceedings of the Symposium on Computer Architecture (ISCA’15). ACM, 170--184.
[61]
Qin Zhao, Ioana Cutcutache, and Weng-Fai Wong. 2008. PiPA: Pipelined profiling and analysis on multi-core systems. In Proceedings of the Symposium on Code Generation and Optimization. 185--194.

Cited By

View all
  • (2018)MemBrain: Automated Application Guidance for Hybrid Memory Systems2018 IEEE International Conference on Networking, Architecture and Storage (NAS)10.1109/NAS.2018.8515694(1-10)Online publication date: Oct-2018

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 13, Issue 4
December 2016
648 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3012405
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 December 2016
Accepted: 01 October 2016
Revised: 01 October 2016
Received: 01 April 2016
Published in TACO Volume 13, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Profiling
  2. profile-guided optimizations

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • National Science Foundation, under CAREER
  • 2016 Intel SSG award

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)39
  • Downloads (Last 6 weeks)7
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2018)MemBrain: Automated Application Guidance for Hybrid Memory Systems2018 IEEE International Conference on Networking, Architecture and Storage (NAS)10.1109/NAS.2018.8515694(1-10)Online publication date: Oct-2018

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media