Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Eliminating abstraction overhead of Java stream pipelines using ahead-of-time program optimization

Published: 13 November 2020 Publication History

Abstract

Java 8 introduced streams that allow developers to work with collections of data using functional-style operations. Streams are often used in pipelines of operations for processing the data elements, which leads to concise and elegant program code. However, the declarative data processing style comes at a cost. Compared to processing the data with traditional imperative language mechanisms, constructing stream pipelines requires extra heap objects and virtual method calls, which often results in significant run-time overheads.
In this work we investigate how to mitigate these overheads to enable processing data in the declarative style without sacrificing performance. We argue that ahead-of-time bytecode-to-bytecode transformation is a suitable approach to optimization of stream pipelines, and we present a static analysis that is designed to guide such transformations. Experimental results show a significant performance gain, and that the technique works for realistic stream pipelines. For 10 of 11 micro-benchmarks, the optimizer is able to produce bytecode that is as effective as hand-written imperative-style code. Additionally, 77% of 6879 stream pipelines found in real-world Java programs are optimized successfully.

Supplementary Material

Auxiliary Presentation Video (oopsla20main-p155-p-video.mp4)
Java 8 introduced streams that allow developers to work with collections of data using functional-style operations. Compared to processing the data with traditional imperative language mechanisms, stream pipelines requires extra heap objects and virtual method calls, which results in significant run-time overheads. We investigate how to mitigate these overheads to enable processing data in the declarative style without sacrificing performance. We argue that ahead-of-time transformation is a suitable approach to optimization of stream pipelines, and we present a static analysis that is designed to guide such transformations. Experimental results show a significant performance gain, and that the technique works for realistic stream pipelines. For 10 of 11 micro-benchmarks, the optimizer is able to produce bytecode that is as effective as hand-written imperative-style code. Additionally, 77% of 6879 stream pipelines found in real-world Java programs are optimized successfully.

References

[1]
Radoslaw Adamus, Tomasz Marek Kowalski, and Jacek Wislicki. 2015. A step towards genuine declarative languageintegrated queries. In 2015 Federated Conference on Computer Science and Information Systems, FedCSIS 2015, Lódz, Poland, September 13-16, 2015, Vol. 5. IEEE, 935-946. https://doi.org/10.15439/2015F156
[2]
Ole Agesen. 1995. The Cartesian Product Algorithm: Simple and Precise Type Inference Of Parametric Polymorphism. In ECOOP' 95-Object-Oriented Programming, 9th European Conference, Århus, Denmark, August 7-11, 1995, Proceedings (Lecture Notes in Computer Science), Vol. 952. Springer, 2-26. https://doi.org/10.1007/3-540-49538-X_2
[3]
Matthew Arnold, Stephen J. Fink, David Grove, Michael Hind, and Peter F. Sweeney. 2005. A Survey of Adaptive Optimization in Virtual Machines. Proc. IEEE 93, 2 ( 2005 ), 449-466. https://doi.org/10.1109/JPROC. 2004.840305
[4]
Matthew Arnold, Stephen J. Fink, Vivek Sarkar, and Peter F. Sweeney. 2000. A comparative study of static and profile-based heuristics for inlining. In Proceedings of ACM SIGPLAN Workshop on Dynamic and Adaptive Compilation and Optimization (Dynamo 2000 ), Boston, MA, USA, January 18, 2000. ACM, 52-64. https://doi.org/10.1145/351397.351416
[5]
John Aycock. 2003. A brief history of just-in-time. ACM Comput. Surv. 35, 2 ( 2003 ), 97-113. https://doi.org/10.1145/857076. 857077
[6]
Aggelos Biboudis, Nick Palladinos, George Fourtounis, and Yannis Smaragdakis. 2015. Streams a la carte: Extensible Pipelines with Object Algebras. In 29th European Conference on Object-Oriented Programming, ECOOP 2015, July 5-10, 2015, Prague, Czech Republic (LIPIcs), Vol. 37. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 591-613. https: //doi.org/10.4230/LIPIcs.ECOOP. 2015.591
[7]
Aggelos Biboudis, Nick Palladinos, and Yannis Smaragdakis. 2014. Clash of the Lambdas. CoRR abs/1406.6631 ( 2014 ). arXiv: 1406. 6631
[8]
Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly declarative specification of sophisticated points-to analyses. In Proceedings of the 24th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2009, October 25-29, 2009, Orlando, Florida, USA. ACM, 243-262. https://doi.org/10.1145/1640089. 1640108
[9]
Zoran Budimlic and Ken Kennedy. 1997. Optimizing Java: theory and practice. Concurrency-Practice and Experience 9, 6 ( 1997 ), 445-463.
[10]
Zoran Budimlic and Ken Kennedy. 1998. Static interprocedural optimizations in Java. Technical Report. Center for Research on Parallel Computation, Rice University, Technical Report CRPC-TR98746.
[11]
David Callahan, Keith D. Cooper, Ken Kennedy, and Linda Torczon. 1986. Interprocedural constant propagation. In Proceedings of the 1986 SIGPLAN Symposium on Compiler Construction, Palo Alto, California, USA, June 25-27, 1986. ACM, 152-161. https://doi.org/10.1145/12276.13327
[12]
David R. Chase, Mark N. Wegman, and F. Kenneth Zadeck. 1990. Analysis of Pointers and Structures. In Proceedings of the ACM SIGPLAN'90 Conference on Programming Language Design and Implementation (PLDI), White Plains, New York, USA, June 20-22, 1990. ACM, 296-310. https://doi.org/10.1145/93542.93585
[13]
Jong-Deok Choi, Manish Gupta, Mauricio J. Serrano, Vugranam C. Sreedhar, and Samuel P. Midkif. 1999. Escape Analysis for Java. In Proceedings of the 1999 ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages & Applications (OOPSLA '99), Denver, Colorado, USA, November 1-5, 1999. ACM, 1-19. https://doi.org/10.1145/320384.320386
[14]
Jefrey Dean, David Grove, and Craig Chambers. 1995. Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis. In ECOOP' 95-Object-Oriented Programming, 9th European Conference, Århus, Denmark, August 7-11, 1995, Proceedings (Lecture Notes in Computer Science), Vol. 952. Springer, 77-101. https://doi.org/10.1007/3-540-49538-X_5
[15]
David Detlefs and Ole Agesen. 1999. Inlining of Virtual Methods. In ECOOP' 99-Object-Oriented Programming, 13th European Conference, Lisbon, Portugal, June 14-18, 1999, Proceedings (Lecture Notes in Computer Science), Vol. 1628. Springer, 258-278. https://doi.org/10.1007/3-540-48743-3_12
[16]
Julian Dolby and Andrew A. Chien. 1998. An Evaluation of Automatic Object Inline Allocation Techniques. In Proceedings of the 1998 ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages & Applications (OOPSLA '98), Vancouver, British Columbia, Canada, October 18-22, 1998. ACM, 1-20. https://doi.org/10.1145/286936.286943
[17]
Julian Dolby, Stephen J. Fink, and Manu Sridharan. 2010. T.J. Watson Libraries for Analysis. http://wala.sourceforge.net/
[18]
Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically rigorous Java performance evaluation. In Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2007, October 21-25, 2007, Montreal, Quebec, Canada. ACM, 57-76. https://doi.org/10.1145/1297027.1297033
[19]
Manohar Jonnalagedda and Sandro Stucki. 2015. Fold-based fusion as a library: a generative programming pearl. In Proceedings of the 6th ACM SIGPLAN Symposium on Scala, Scala@PLDI 2015, Portland, OR, USA, June 15-17, 2015. ACM, 41-50. https://doi.org/10.1145/2774975.2774981
[20]
John B. Kam and Jefrey D. Ullman. 1977. Monotone Data Flow Analysis Frameworks. Acta Inf. 7 ( 1977 ), 305-317. https://doi.org/10.1007/BF00290339
[21]
Rafi Khatchadourian, Yiming Tang, and Mehdi Bagherzadeh. 2020a. Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams. Science of Computer Programming ( 2020 ), 102476. https://doi.org/10.1016/j.scico. 2020.102476
[22]
Rafi Khatchadourian, Yiming Tang, Mehdi Bagherzadeh, and Baishakhi Ray. 2020b. An Empirical Study on the Use and Misuse of Java 8 Streams. In Fundamental Approaches to Software Engineering-23rd International Conference, FASE 2020, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020, Dublin, Ireland, April 25-30, 2020, Proceedings (Lecture Notes in Computer Science), Vol. 12076. Springer, 97-118. https://doi.org/10.1007/978-3-030-45234-6_5
[23]
Oleg Kiselyov, Aggelos Biboudis, Nick Palladinos, and Yannis Smaragdakis. 2017. Stream fusion, to completeness. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, January 18-20, 2017. ACM, 285-299. https://doi.org/10.1145/3093333.3009880
[24]
Ondrej Lhoták and Laurie J. Hendren. 2003. Scaling Java Points-to Analysis Using SPARK. In Compiler Construction, 12th International Conference, CC 2003, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2003, Warsaw, Poland, April 7-11, 2003, Proceedings (Lecture Notes in Computer Science), Vol. 2622. Springer, 153-169. https://doi.org/10.1007/3-540-36579-6_12
[25]
Davood Mazinanian, Ameya Ketkar, Nikolaos Tsantalis, and Danny Dig. 2017. Understanding the use of lambda expressions in Java. PACMPL 1, OOPSLA ( 2017 ), 85 : 1-85 : 31. https://doi.org/10.1145/3133909
[26]
Nuthan Munaiah, Steven Kroh, Craig Cabrey, and Meiyappan Nagappan. 2017. Curating GitHub for engineered software projects. Empirical Software Engineering 22, 6 ( 2017 ), 3219-3253. https://doi.org/10.1007/s10664-017-9512-6
[27]
Derek Gordon Murray, Michael Isard, and Yuan Yu. 2011. Steno: automatic optimization of declarative queries. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4-8, 2011. ACM, 121-131. https://doi.org/10.1145/1993498.1993513
[28]
Erik M. Nystrom, Hong-Seok Kim, and Wen-mei W. Hwu. 2004. Importance of heap specialization in pointer analysis. In Proceedings of the 2004 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis For Software Tools and Engineering, PASTE'04, Washington, DC, USA, June 7-8, 2004. ACM, 43-48. https://doi.org/10.1145/996821.996836
[29]
Oracle. 2014a. Java Microbenchmarking Harness. http://openjdk.java.net/projects/code-tools/jmh/
[30]
Oracle. 2014b. java.util.stream documentation for JDK 8. https://docs.oracle.com/javase/8/docs/api/java/util/stream/packagesummary.html
[31]
Oracle. 2014c. JDK 8. https://openjdk.java.net/projects/jdk8/
[32]
Nick Palladinos and Kostas Rontogiannis. 2014. LinqOptimizer: An automatic query optimizer for LINQ to Objects and PLINQ. http://nessos.github.io/LinqOptimizer/
[33]
Young Gil Park and Benjamin Goldberg. 1992. Escape Analysis on Lists. In Proceedings of the ACM SIGPLAN'92 Conference on Programming Language Design and Implementation (PLDI), San Francisco, California, USA, June 17-19, 1992. ACM, 116-127. https://doi.org/10.1145/143095.143125
[34]
Aleksandar Prokopec, David Leopoldseder, Gilles Duboscq, and Thomas Würthinger. 2017. Making collection operations optimal with aggressive JIT compilation. In Proceedings of the 8th ACM SIGPLAN International Symposium on Scala, SCALA@SPLASH 2017, Vancouver, BC, Canada, October 22-23, 2017. ACM, 29-40. https://doi.org/10.1145/3136000.3136002
[35]
Aleksandar Prokopec and Dmitry Petrashko. 2013. ScalaBlitz: Lightning-fast Scala collections framework. https://scalablitz.github.io/
[36]
John Rose. 2015. Hotspot-dev mailing list: Perspectives on Streams Performance. http://mail.openjdk.java.net/pipermail/ hotspot-compiler-dev/2015-March/017278.html
[37]
Ulrik Pagh Schultz, Julia L. Lawall, and Charles Consel. 2003. Automatic program specialization for Java. ACM Trans. Program. Lang. Syst. 25, 4 ( 2003 ), 452-499. https://doi.org/10.1145/778559.778561
[38]
Denys Shabalin and Martin Odersky. 2018. Interflow: interprocedural flow-sensitive type inference and method duplication. In Proceedings of the 9th ACM SIGPLAN International Symposium on Scala, SCALA@ICFP 2018, St. Louis, MO, USA, September 28, 2018. ACM, 61-71. https://doi.org/10.1145/3241653.3241660
[39]
Micha Sharir and Amir Pnueli. 1981. Two approaches to interprocedural data flow analysis. Prentice-Hall, Chapter 7, 189-234.
[40]
Yannis Smaragdakis, Martin Bravenboer, and Ondrej Lhoták. 2011. Pick your contexts well: understanding object-sensitivity. In Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, January 26-28, 2011. ACM, 17-30. https://doi.org/10.1145/1926385.1926390
[41]
Johannes Späth, Lisa Nguyen Quang Do, Karim Ali, and Eric Bodden. 2016. Boomerang: Demand-Driven Flow-and Context-Sensitive Pointer Analysis for Java. In 30th European Conference on Object-Oriented Programming, ECOOP 2016, July 18-22, 2016, Rome, Italy (LIPIcs), Vol. 56. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 22 : 1-22 : 26. https://doi.org/10.4230/LIPIcs.ECOOP. 2016.22
[42]
Manu Sridharan and Rastislav Bodík. 2006. Refinement-based context-sensitive points-to analysis for Java. In Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, Ottawa, Ontario, Canada, June 11-14, 2006. ACM, 387-400. https://doi.org/10.1145/1133981.1134027
[43]
Philip Wadler. 1990. Deforestation: Transforming Programs to Eliminate Trees. Theor. Comput. Sci. 73, 2 ( 1990 ), 231-248. https://doi.org/10.1016/ 0304-3975 ( 90 ) 90147-A
[44]
Richard C. Waters. 1991. Automatic Transformation of Series Expressions into Loops. ACM Trans. Program. Lang. Syst. 13, 1 ( 1991 ), 52-98. https://doi.org/10.1145/114005.102806

Cited By

View all
  • (2023)Exploiting Partially Context-sensitive Profiles to Improve Performance of Hot CodeACM Transactions on Programming Languages and Systems10.1145/361293745:4(1-64)Online publication date: 13-Sep-2023
  • (2023)Large‐scale characterization of Java streamsSoftware: Practice and Experience10.1002/spe.321353:9(1763-1792)Online publication date: 5-Jun-2023
  • (2022)SQL to Stream with S2S: An Automatic Benchmark Generator for the Java Stream APIProceedings of the 21st ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3564719.3568699(179-186)Online publication date: 29-Nov-2022
  • Show More Cited By

Index Terms

  1. Eliminating abstraction overhead of Java stream pipelines using ahead-of-time program optimization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Programming Languages
    Proceedings of the ACM on Programming Languages  Volume 4, Issue OOPSLA
    November 2020
    3108 pages
    EISSN:2475-1421
    DOI:10.1145/3436718
    Issue’s Table of Contents
    This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 November 2020
    Published in PACMPL Volume 4, Issue OOPSLA

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. Java 8
    2. program optimization
    3. static program analysis

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)272
    • Downloads (Last 6 weeks)43
    Reflects downloads up to 03 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Exploiting Partially Context-sensitive Profiles to Improve Performance of Hot CodeACM Transactions on Programming Languages and Systems10.1145/361293745:4(1-64)Online publication date: 13-Sep-2023
    • (2023)Large‐scale characterization of Java streamsSoftware: Practice and Experience10.1002/spe.321353:9(1763-1792)Online publication date: 5-Jun-2023
    • (2022)SQL to Stream with S2S: An Automatic Benchmark Generator for the Java Stream APIProceedings of the 21st ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3564719.3568699(179-186)Online publication date: 29-Nov-2022
    • (2022)FlooProceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services10.1145/3498361.3538929(168-182)Online publication date: 27-Jun-2022
    • (2022)Characterizing Java Streams in the Wild2022 26th International Conference on Engineering of Complex Computer Systems (ICECCS)10.1109/ICECCS54210.2022.00025(143-152)Online publication date: Mar-2022
    • (2022)Optimizing Parallel Java Streams2022 26th International Conference on Engineering of Complex Computer Systems (ICECCS)10.1109/ICECCS54210.2022.00012(23-32)Online publication date: Mar-2022

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media