Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Assessing and optimizing the performance impact of the just-in-time configuration parameters - a case study on PyPy

Published: 01 August 2019 Publication History
  • Get Citation Alerts
  • Abstract

    Many modern programming languages (e.g., Python, Java, and JavaScript) support just-in-time (JIT) compilation to speed up the execution of a software system. During runtime, the JIT compiler translates the frequently executed part of the system into efficient machine code, which can be executed much faster compared to the default interpreted mode. There are many JIT configuration parameters, which vary based on the programming languages and types of the jitting strategies (method vs. tracing-based). Although there are many existing works trying to improve various aspects of the jitting process, there are very few works which study the performance impact of the JIT configuration settings. In this paper, we performed an empirical study on the performance impact of the JIT configuration settings of PyPy. PyPy is a popular implementation of the Python programming language. Due to PyPy's efficient JIT compiler, running Python programs under PyPy is usually much faster than other alternative implementations of Python (e.g., cPython, Jython, and IronPython). To motivate the need for tuning PyPy's JIT configuration settings, we first performed an exploratory study on two microbenchmark suites. Our findings show that systems executed under PyPy's default JIT configuration setting may not yield the best performance. Optimal JIT configuration settings vary from systems to systems. Larger portions of the code being jitted do not necessarily lead to better performance. To cope with these findings, we developed an automated approach, ESM-MOGA, to tuning the JIT configuration settings. ESM-MOGA, which stands for effect-size measure-based multi-objective genetic algorithm, automatically explores the PyPy's JIT configuration settings for optimal solutions. Case studies on three open source systems show that systems running under the resulting configuration settings significantly out-perform (5% - 60% improvement in average peak performance) the default configuration settings.

    References

    [1]
    Abdessalem RB, Panichella A, Nejati S, Briand LC, Stifter T (2018) Testing autonomous cars for feature interaction failures using many-objective search. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering (ASE).
    [2]
    Alghmadi HM, Syer MD, Shang W, Hassan AE (2016) An automated approach for recommending when to stop performance tests. In: 2016 IEEE international conference on software maintenance and evolution (ICSME), pp 279-289.
    [3]
    Apache JMeter (2015) http://jmeter.apache.org/, visited 2015-10-23.
    [4]
    Barrett E, Bolz-Tereick CF, Killick R, Mount S, Tratt L (2017) Virtual machine warmup blows hot and cold. In: Proceedings of the ACM Programming Language 1(OOPSLA), pp 52:1-52:27.
    [5]
    Bolz CF, Cuni A, Fijalkowski M, Rigo A (2009) Tracing the meta-level: Pypy's tracing jit compiler. In: Proceedings of the 4th workshop on the implementation, compilation, optimization of object-oriented languages and programming systems (ICOOOLPS), pp 18-25.
    [6]
    Bondi AB (2007) Automating the analysis of load test results to assess the scalability and stability of a component. In: Proceedings of the 2007 computer measurement group conference (CMG), pp 133-146.
    [7]
    Brecht T, Arjomandi E, Li C, Pham H (2006) Controlling garbage collection and heap growth to reduce the execution time of java applications. ACM Trans Program Lang Syst 28(5):908-941.
    [8]
    Candan KS, Li WS, Luo Q, Hsiung WP, Agrawal D (2001) Enabling dynamic content caching for database-driven web sites. In: Proceedings of the 2001 ACM SIGMOD international conference on management of data (SIGMOD), pp 532-543.
    [9]
    Canfora G, Lucia AD, Penta MD, Oliveto R, Panichella A, Panichella S (2013) Multi-objective cross-project defect prediction. In: Proceedings of the 2013 IEEE 6th international conference on software testing, verification and validation (ICST).
    [10]
    Clark M (2017) How the BBC builds websites that scale. http://www.creativebloq.com/features/how-the-bbc-builds-websites-that-scale. Last accessed 10/06/2017.
    [11]
    Cramer T, Friedman R, Miller T, Seberger D, Wilson R, Wolczko M (1997) Compiling java just in time. IEEE Micro 17(3):36-43.
    [12]
    Deb K, Agrawal S, Pratap A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii. In: International conference on parallel problem solving from nature. Springer, pp 849-858.
    [13]
    Distributed Evolutionary Algorithms in Python (DEAP) (2017) https://github.com/DEAP/deap. Last accessed 10/06/2017.
    [14]
    Duan S, Thummala V, Babu S (2009) Tuning Database Configuration Parameters with iTuned. Proceedings of the VLDB Endowment 2(1):1246-1257.
    [15]
    Eaton K (2017) How One Second Could Cost Amazon $1.6 Billion In Sales. https://www.fastcompany.com/1825005/how-one-second-could-cost-amazon-16-billion-sales. Last accessed 10/06/2017.
    [16]
    Evaluation tools in Python (DEAP) (2017) http://deap.readthedocs.io/en/master/api/tools.html?highlight=dominance. Last accessed 10/06/2017.
    [17]
    Fraser G, Arcuri A (2011) Evosuite: automatic test suite generation for object-oriented software. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th european conference on foundations of software engineering (ESEC/FSE).
    [18]
    Gal A, Eich B, Shaver M, Anderson D, Mandelin D, Haghighat MR, Kaplan B, Hoare G, Zbarsky B, Orendorff J, Ruderman J, Smith EW, Reitmaier R, Bebenita M, Chang M, Franz M (2009) Trace-based just-in-time type specialization for dynamic languages. In: Proceedings of the 30th ACM SIGPLAN conference on programming language design and implementation (PLDI), pp 465-478.
    [19]
    Gao R, Jiang ZMJ (2017) An exploratory study on assessing the impact of environment variations on the results of load tests. In: Proceedings of the 14th international conference on mining software repositories (MSR).
    [20]
    Gao R, Jiang ZMJ, Barna C, Litoiu M (2016) A framework to evaluate the effectiveness of different load testing analysis techniques. In: 2016 IEEE international conference on software testing, verification and validation (ICST).
    [21]
    Georges A, Buytaert D, Eeckhout L (2007) Statistically rigorous java performance evaluation. In: Proceedings of the 22nd international conference on object-oriented programming, systems, languages and applications (OOPSLA).
    [22]
    Gewirtz D (2017) Which programming languages are most popular (and what does that even mean)? http://www.zdnet.com/article/which-programming-languages-are-most-popular-and-what-does-that-even-mean/. Last accessed 10/06/2017.
    [23]
    Golovin D, Solnik B, Moitra S, Kochanski G, Karro J, Sculley D (2017) Google vizier: a service for blackbox optimization. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (KDD).
    [24]
    Gong L, Pradel M, Sen K (2015) Jitprof: Pinpointing jit-unfriendly javascript code. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering (ESEC/FSE), pp 357-368.
    [25]
    Grigorik I (2017) Optimizing Encoding and Transfer Size of Text-Based Assets. https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/optimize-encoding-and-transfer. Last accessed 10/06/2017.
    [26]
    Hashemi M (2014) 10 Myths of Enterprise Python. https://www.paypal-engineering.com/2014/12/10/10-myths-of-enterprise-python/. Last accessed 10/06/2017.
    [27]
    Henard C, Papadakis M, Harman M, Traon YL (2015) Combining multi-objective search and constraint solving for configuring large software product lines. In: Proceedings of the 37th international conference on software engineering (ICSE).
    [28]
    Hopkins WG (2016) A new view of statistics. [Online accessed 2017-10-14] http://www.sportsci.org/resource/stats/index.html.
    [29]
    Hoskins DS, Colbourn CJ, Montgomery DC (2005) Software performance testing using covering arrays: Efficient screening designs with categorical factors. In: Proceedings of the 5th international workshop on software and performance (WOSP).
    [30]
    Hoste K, Georges A, Eeckhout L (2010) Automated just-in-time compiler tuning. In: Proceedings of the 8th annual IEEE/ACM international symposium on code generation and optimization (CGO), pp 62-72.
    [31]
    IBM Java 8 JIT and AOT command-line options (2017) https://www.ibm.com/support/knowledgecenter/SSYKE2_8.0.0/com.ibm.java.aix.80.doc/diag/appendixes/cmdline/commands_jit.html. Last accessed 10/06/2017.
    [32]
    Insights GP (2017) Remove Render-Blocking JavaScript. https://developers.google.com/speed/docs/insights/BlockingJS. Last accessed 10/06/2017.
    [33]
    Jamshidi P, Siegmund N, Velez M, Kästner C, Patel A, Agarwal Y (2017) Transfer learning for performance modeling of configurable systems: an exploratory analysis. In: Proceedings of the international conference on automated software engineering (ASE).
    [34]
    Jamshidi P, Siegmund N, Velez M, Kästner C, Patel A, Agarwal Y (2017) Transfer learning for performance modeling of configurable systems: an exploratory analysis. In: Proceedings of the 32nd IEEE/ACM international conference on automated software engineering (ASE).
    [35]
    Jantz MR, Kulkarni PA (2013) Exploring single and multilevel jit compilation policy for modern machines 1. ACM Trans Archit Code Optim (TACO) 10(4):22:1-22:29.
    [36]
    Java Microbenchmark Harness (JMH) (2017) http://openjdk.java.net/projects/code-tools/jmh/. Last accessed 10/06/2017.
    [37]
    Jiang ZM, Hassan AE (2015) A survey on load testing of large-scale software systems. IEEE Trans Softw Eng 41:1-1.
    [38]
    Kampenes VB, Dybå T, Hannay JE, Sjøberg DIK (2007) Systematic review: A systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11-12):1073-1086.
    [39]
    Komorn R (2016) Python in production engineering. https://code.facebook.com/posts/1040181199381023/python-in-production-engineering/. Last accessed 10/06/2017.
    [40]
    Lengauer P, Mössenböck H (2014) The taming of the Shrew: increasing performance by automatic parameter tuning for java garbage collectors. In: Proceedings of the 5th ACM/SPEC international conference on performance engineering (ICPE), pp 111-122.
    [41]
    Libi? P, Bulej L, Horky V, T?ma P (2014) On the limits of modeling generational garbage collector performance. In: Proceedings of the 5th ACM/SPEC international conference on performance engineering (ICPE).
    [42]
    Lion D, Chiu A, Sun H, Zhuang X, Grcevski N, Yuan D (2016) Don't get caught in the cold, warm-up your jvm: Understand and eliminate jvm warm-up overhead in data-parallel systems. In: Proceedings of the 12th USENIX conference on operating systems design and implementation (OSDI), pp 383-400.
    [43]
    Martí L, García J, Berlanga A, Molina JM (2009) An approach to stopping criteria for multi-objective optimization evolutionary algorithms: The mgbm criterion. In: IEEE congress on evolutionary computation, 2009. CEC'09. IEEE, pp 1263-1270.
    [44]
    Oaks S (2014) Java performance: the definitive guide, 1st. O'Reilly Media, Inc, Sebastopol.
    [45]
    Osogami T, Kato S (2007) Optimizing system configurations quickly by guessing at the performance. In: Proceedings of the 2007 ACM SIGMETRICS international conference on measurement and modeling of computer systems (SIGMETRICS).
    [46]
    Oracle Java 8 Advanced JIT Compiler Options (2017) https://docs.oracle.com/javase/8/docs/technotes/tools/windows/java.html#BABDDFII. Last accessed 10/06/2017.
    [47]
    Performance monitoring tools for Linux (2015) https://github.com/sysstat/sysstat, visited 2015-10-23.
    [48]
    PyPy speed center (2017) http://speed.pypy.org/. Last accessed 10/04/2017.
    [49]
    Quokka CMS (Content Management System) - Python FlaskandMongoDB (2017) http://quokkaproject.org/. Last accessed 10/06/2017.
    [50]
    Replication package (2018) https://github.com/seasun525/PyPyJITTuner.
    [51]
    Romano J, Kromrey JD, Coraggio J, Skowronek J (2006) Appropriate statistics for ordinal level data: Should we really be using t-test and Cohen'sd for evaluating group differences on the NSSE and other surveys? In: Annual meeting of the Florida Association of Institutional Research.
    [52]
    Saleor - An e-commerce storefront for Python and Django (2017) http://getsaleor.com/. Last accessed 10/06/2017.
    [53]
    Shamshiri S, Rojas JM, Fraser G, McMinn P (2015) Random or genetic algorithm search for object-oriented test suite generation? In: Proceedings of the 2015 annual conference on genetic and evolutionary computation (GECCO).
    [54]
    Siegmund N, Grebhahn A, Apel S, Kästner C (2015) Performance-influence models for highly configurable systems. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering, ESEC/FSE 2015. ACM.
    [55]
    Siegmund N, Kolesnikov SS, Kästner C, Apel S, Batory D, Rosenmüller M, Saake G (2012) Predicting Performance via Automated Feature-interaction Detection. In: Proceedings of the 34th international conference on software engineering (ICSE).
    [56]
    Singer J, Brown G, Watson I, Cavazos J (2007) Intelligent selection of application-specific garbage collectors. In: Proceedings of the 6th International Symposium on Memory Management, ISMM '07.
    [57]
    Singh R, Bezemer CP, Shang W, Hassan AE (2016) Optimizing the performance-related configurations of object-relational mapping frameworks using a multi-objective genetic algorithm. In: Proceedings of the 7th ACM/SPEC on international conference on performance engineering (ICPE), pp 309-320.
    [58]
    Sopitkamol M, Menascé DA (2005) A method for evaluating the impact of software configuration parameters on e-commerce sites. In: Proceedings of the 5th international workshop on software and performance (WOSP), pp 53-64.
    [59]
    TechEmpower Web Framework Benchmarks (2017) https://www.techempower.com/benchmarks/. Last accessed 10/04/2017.
    [60]
    The Python Profilers (2018) https://docs.python.org/2/library/profile.html. Last accessed 10/28/2018.
    [61]
    Thonangi R, Thummala V, Babu S (2008) finding good configurations in High-Dimensional spaces: doing more with less. In: 2008 IEEE international symposium on modeling, analysis and simulation of computers and telecommunication systems.
    [62]
    Tracing a Program As It Runs (2018) https://pymotw.com/2/sys/tracing.html. Last accessed 10/28/2018.
    [63]
    vmprof - a statistical program profiler (2017) http://vmprof.com/. Last accessed 10/06/2017.
    [64]
    Wang K, Lin X, Tang W (2012) Predator - An experience guided configuration optimizer for Hadoop MapReduce. In: 4Th IEEE international conference on cloud computing technology and science proceedings.
    [65]
    Wagtail CMS: Django Content Management System (2017) https://wagtail.io/. Last accessed 10/06/2017.
    [66]
    What is Load Balancing? (2017) https://www.nginx.com/resources/glossary/load-balancing/. Last accessed 10/06/2017.
    [67]
    What is python used for at Google? (2017) https://www.quora.com/What-is-python-used-for-at-Google. Last accessed 10/06/2017.
    [68]
    Wimmer C, Brunthaler S (2013) Zippy on truffle: a fast and simple implementation of python. In: Proceedings of the 2013 companion publication for conference on systems, programming, & applications: software for humanity (SPLASH).
    [69]
    Würthinger T, Wimmer C, Humer C, Wöß A, Stadler L, Seaton C, Duboscq G, Simon D, Grimmer M (2017) Practical partial evaluation for high-performance dynamic language runtimes. In: Proceedings of the 38th ACM SIGPLAN conference on programming language design and implementation, PLDI 2017. ACM, New York, pp 662-676.
    [70]
    Würthinger T, Wimmer C, Wöß A, Stadler L, Duboscq G, Humer C, Richards G, Simon D, Wolczko M (2013) One vm to rule them all. In: Proceedings of the 2013 ACM international symposium on new ideas, new paradigms, and reflections on programming & software (Onward!).
    [71]
    Xi B, Liu Z, Raghavachari M, Xia CH, Zhang L (2004) A smart hill-climbing algorithm for application server configuration. In: Proceedings of the 13th international conference on world wide web (WWW). ACM, New York, pp 287-296.
    [72]
    Xu T, Jin X, Huang P, Zhou Y, Lu S, Jin L, Pasupathy S (2016) Early detection of configuration errors to reduce failure damage. In: Proceedings of the 12th USENIX conference on operating systems design and implementation (OSDI).
    [73]
    Yilmaz C, Porter A, Krishna AS, Memon AM, Schmidt DC, Gokhale AS, Natarajan B (2007) Reliable effects screening: a distributed continuous quality assurance process for monitoring performance degradation in evolving software systems. IEEE Trans Softw Eng (TSE) 33(2):124-141.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Empirical Software Engineering
    Empirical Software Engineering  Volume 24, Issue 4
    August 2019
    1171 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 August 2019

    Author Tags

    1. Just-in-time compilation
    2. Performance analysis
    3. Performance optimization
    4. Performance testing
    5. Software configuration

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media