Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3168828acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article

Analyzing and optimizing task granularity on the JVM

Published: 24 February 2018 Publication History

Abstract

Task granularity, i.e., the amount of work performed by parallel tasks, is a key performance attribute of parallel applications. On the one hand, fine-grained tasks (i.e., small tasks carrying out few computations) may introduce considerable parallelization overheads. On the other hand, coarse-grained tasks (i.e., large tasks performing substantial computations) may not fully utilize the available CPU cores, resulting in missed parallelization opportunities. In this paper, we provide a better understanding of task granularity for applications running on a Java Virtual Machine. We present a novel profiler which measures the granularity of every executed task. Our profiler collects carefully selected metrics from the whole system stack with only little overhead, and helps the developer locate performance problems. We analyze task granularity in the DaCapo and ScalaBench benchmark suites, revealing several inefficiencies related to fine-grained and coarse-grained tasks. We demonstrate that the collected task-granularity profiles are actionable by optimizing task granularity in two benchmarks, achieving speedups up to 1.53x.

References

[1]
Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanovic, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA. 169-190.
[2]
K. Y. Chen, J. M. Chang, and T. W. Hou. 2011. Multithreading in Java: Performance and Scalability on Multicore Systems. IEEE Trans. Comput. 60, 11 (2011), 1521-1534.
[3]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, Third Edition (3rd ed.). The MIT Press.
[4]
Kristof Du Bois, Jennifer B. Sartor, Stijn Eyerman, and Lieven Eeckhout. 2013. Bottle Graphs: Visualizing Scalability Bottlenecks in Multithreaded Applications. In OOPSLA. 355-372.
[5]
Bruno Dufour, Karel Driesen, Laurie Hendren, and Clark Verbrugge. 2003. Dynamic Metrics for Java. In OOPSLA. 149-168.
[6]
S. Eyerman, K. Du Bois, and L. Eeckhout. 2012. Speedup Stacks: Identifying Scaling Bottlenecks in Multi-threaded Applications. In ISPASS. 145-155.
[7]
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The Implementation of the Cilk-5 Multithreaded Language. In PLDI. 212-223.
[8]
H2. 2017. http://www.h2database.com. (2017).
[9]
Matthias Hauswirth, Peter F. Sweeney, Amer Diwan, and Michael Hind. 2004. Vertical Profiling: Understanding the Behavior of Object-priented Applications. In OOPSLA. 251-269.
[10]
Yuxiong He, Charles E. Leiserson, and William M. Leiserson. 2010. The Cilkview Scalability Analyzer. In SPAA. 145-156.
[11]
IBM. 2007. DayTrader. https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaag/wascrypt/l0wscry00_daytrader.htm. (2007).
[12]
ICL. 2017. PAPI. http://icl.utk.edu/papi/. (2017).
[13]
Joseph JaJa. 1992. Introduction to Parallel Algorithms. Addison-Wesley Professional.
[14]
James C. Warner. 2013. top(1) - Linux man page. https://linux.die.net/man/1/top. (2013).
[15]
Tomas Kalibera, Matthew Mole, Richard Jones, and Jan Vitek. 2012. A Black-box Approach to Understanding Concurrency in DaCapo. In OOPSLA. 335-354.
[16]
Philipp Lengauer, Verena Bitto, Hanspeter Mössenböck, and Markus Weninger. 2017. A Comprehensive Java Benchmark Study on Memory and Garbage Collection Behavior of DaCapo, DaCapo Scala, and SPECjvm2008. In ICPE. 3-14.
[17]
Linux man. 2017. Documentation of CLOCK_MONOTONIC in clock_gettime(). https://linux.die.net/man/3/clock_gettime. (2017).
[18]
Luká¿ Marek, Stephen Kell, Yudi Zheng, Lubomír Bulej, Walter Binder, Petr Tüma, Danilo Ansaloni, Aibek Sarimbekov, and Andreas Sewe. 2013. ShadowVM: Robust and Comprehensive Dynamic Program Analysis for the Java Platform. In GPCE. 105-114.
[19]
Luká¿ Marek, Alex Villazón, Yudi Zheng, Danilo Ansaloni, Walter Binder, and Zhengwei Qi. 2012. DiSL: A Domain-specific Language for Bytecode Instrumentation. In AOSD. 239-250.
[20]
Albert Noll and Thomas Gross. 2013. Online Feedback-directed Optimizations for Parallel Java Code. In OOPSLA. 713-728.
[21]
Oracle. 2016. Java Native Interface. http://docs.oracle.com/javase/8/docs/technotes/guides/jni/. (2016).
[22]
Oracle. 2016. Java Platform, Standard Edition 8 API Specification. https://docs.oracle.com/javase/8/docs/api/. (2016).
[23]
Oracle. 2016. Java Virtual Machine Tool Interface (JVM TI). https://docs.oracle.com/javase/8/docs/technotes/guides/jvmti/. (2016).
[24]
Oracle. 2016. The Parallel Collector. https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/parallel.html. (2016).
[25]
Oracle. 2017. Documentation of System.nanotime(). https://docs.oracle.com/javase/8/docs/api/java/lang/System.html. (2017).
[26]
perf. 2015. Linux profiling with performance counters. https://perf.wiki.kernel.org. (2015).
[27]
James Reinders. 2007. Intel Threading Building Blocks (1st ed.). O'Reilly & Associates, Inc.
[28]
Andrea Rosà, Lydia Y. Chen, and Walter Binder. 2016. Actor Profiling in Virtual Execution Environments. SIGPLAN Not. 52, 3 (2016), 36-46.
[29]
Andrea Rosà, Eduardo Rosales, and Walter Binder. 2017. Accurate Reification of Complete Supertype Information for Dynamic Analysis on the JVM. In GPCE. 104-116.
[30]
M. Roth, M. J. Best, C. Mustard, and A. Fedorova. 2012. Deconstructing the Overhead in Parallel Applications. In IISWC. 59-68.
[31]
Tao B. Schardl, Bradley C. Kuszmaul, I-Ting Angelina Lee, William M. Leiserson, and Charles E. Leiserson. 2015. The Cilkprof Scalability Profiler. In SPAA. 89-100.
[32]
Andreas Sewe, Mira Mezini, Aibek Sarimbekov, and Walter Binder. 2011. Da Capo Con Scala: Design and Analysis of a Scala Benchmark Suite for the Java Virtual Machine. In OOPSLA. 657-676.
[33]
The Apache Software Foundation. 2016. Lucene. https://lucene.apache.org. (2016).
[34]
The Eclipse Foundation. 2016. Jetty. http://www.eclipse.org/jetty/. (2016).
[35]
The Stanford Natural Language Processing Group. 2010. Stanford Topic Modeling Toolbox. https://nlp.stanford.edu/software/tmt/tmt-0.4/. (2010).
[36]
TPC. 1998. TPC-C. http://www.tpc.org/tpcc/. (1998).
[37]
Adarsh Yoga and Santosh Nagarakatte. 2017. A Fast Causal Profiler for Task Parallel Programs. In ESEC/FSE. 15-26.
[38]
Tingting Yu and Michael Pradel. 2016. SyncProf: Detecting, Localizing, and Optimizing Synchronization Bottlenecks. In ISSTA. 389-400.

Cited By

View all
  • (2024)FlowProf: Profiling Multi-threaded Programs using Information-FlowProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641577(137-149)Online publication date: 17-Feb-2024
  • (2023)Optimization-Aware Compiler-Level Event ProfilingACM Transactions on Programming Languages and Systems10.1145/359147345:2(1-50)Online publication date: 26-Jun-2023
  • (2020)Kaizen: a scalable concolic fuzzing tool for ScalaProceedings of the 11th ACM SIGPLAN International Symposium on Scala10.1145/3426426.3428487(25-32)Online publication date: 13-Nov-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '18: Proceedings of the 2018 International Symposium on Code Generation and Optimization
February 2018
377 pages
ISBN:9781450356176
DOI:10.1145/3179541
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Java Virtual Machine
  2. actionable profiler
  3. parallel applications
  4. performance analysis
  5. task granularity

Qualifiers

  • Research-article

Funding Sources

  • Oracle
  • Swiss National Science Foundation

Conference

CGO '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)2
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)FlowProf: Profiling Multi-threaded Programs using Information-FlowProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641577(137-149)Online publication date: 17-Feb-2024
  • (2023)Optimization-Aware Compiler-Level Event ProfilingACM Transactions on Programming Languages and Systems10.1145/359147345:2(1-50)Online publication date: 26-Jun-2023
  • (2020)Kaizen: a scalable concolic fuzzing tool for ScalaProceedings of the 11th ACM SIGPLAN International Symposium on Scala10.1145/3426426.3428487(25-32)Online publication date: 13-Nov-2020
  • (2020)FJProfProceedings of the 13th EAI International Conference on Performance Evaluation Methodologies and Tools10.1145/3388831.3388851(128-135)Online publication date: 18-May-2020
  • (2019)NAB: automated large-scale multi-language dynamic program analysis in public code repositoriesProceedings Companion of the 2019 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity10.1145/3359061.3362777(9-10)Online publication date: 20-Oct-2019
  • (2019)Analysis and Optimization of Task Granularity on the Java Virtual MachineACM Transactions on Programming Languages and Systems10.1145/333849741:3(1-47)Online publication date: 16-Jul-2019
  • (2019)Optimization coaching for fork/join applications on the Java virtual machineCompanion Proceedings of the 3rd International Conference on the Art, Science, and Engineering of Programming10.1145/3328433.3328441(1-3)Online publication date: 1-Apr-2019
  • (2019)Parallelism-centric what-if and differential analysesProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314621(485-501)Online publication date: 8-Jun-2019
  • (2019)JUniVerseProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3297453(1768-1775)Online publication date: 8-Apr-2019

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media