research-article

Analyzing and optimizing task granularity on the JVM

Authors:

Eduardo Rosales,

Walter BinderAuthors Info & Claims

CGO '18: Proceedings of the 2018 International Symposium on Code Generation and Optimization

Pages 27 - 37

https://doi.org/10.1145/3168828

Published: 24 February 2018 Publication History

Abstract

Task granularity, i.e., the amount of work performed by parallel tasks, is a key performance attribute of parallel applications. On the one hand, fine-grained tasks (i.e., small tasks carrying out few computations) may introduce considerable parallelization overheads. On the other hand, coarse-grained tasks (i.e., large tasks performing substantial computations) may not fully utilize the available CPU cores, resulting in missed parallelization opportunities. In this paper, we provide a better understanding of task granularity for applications running on a Java Virtual Machine. We present a novel profiler which measures the granularity of every executed task. Our profiler collects carefully selected metrics from the whole system stack with only little overhead, and helps the developer locate performance problems. We analyze task granularity in the DaCapo and ScalaBench benchmark suites, revealing several inefficiencies related to fine-grained and coarse-grained tasks. We demonstrate that the collected task-granularity profiles are actionable by optimizing task granularity in two benchmarks, achieving speedups up to 1.53x.

References

[1]

Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanovic, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA. 169-190.

Digital Library

[2]

K. Y. Chen, J. M. Chang, and T. W. Hou. 2011. Multithreading in Java: Performance and Scalability on Multicore Systems. IEEE Trans. Comput. 60, 11 (2011), 1521-1534.

Digital Library

[3]

Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, Third Edition (3rd ed.). The MIT Press.

Digital Library

[4]

Kristof Du Bois, Jennifer B. Sartor, Stijn Eyerman, and Lieven Eeckhout. 2013. Bottle Graphs: Visualizing Scalability Bottlenecks in Multithreaded Applications. In OOPSLA. 355-372.

Digital Library

[5]

Bruno Dufour, Karel Driesen, Laurie Hendren, and Clark Verbrugge. 2003. Dynamic Metrics for Java. In OOPSLA. 149-168.

Digital Library

[6]

S. Eyerman, K. Du Bois, and L. Eeckhout. 2012. Speedup Stacks: Identifying Scaling Bottlenecks in Multi-threaded Applications. In ISPASS. 145-155.

Digital Library

[7]

Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The Implementation of the Cilk-5 Multithreaded Language. In PLDI. 212-223.

Digital Library

[8]

H2. 2017. http://www.h2database.com. (2017).

[9]

Matthias Hauswirth, Peter F. Sweeney, Amer Diwan, and Michael Hind. 2004. Vertical Profiling: Understanding the Behavior of Object-priented Applications. In OOPSLA. 251-269.

Digital Library

[10]

Yuxiong He, Charles E. Leiserson, and William M. Leiserson. 2010. The Cilkview Scalability Analyzer. In SPAA. 145-156.

Digital Library

[11]

IBM. 2007. DayTrader. https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaag/wascrypt/l0wscry00_daytrader.htm. (2007).

[12]

ICL. 2017. PAPI. http://icl.utk.edu/papi/. (2017).

[13]

Joseph JaJa. 1992. Introduction to Parallel Algorithms. Addison-Wesley Professional.

Digital Library

[14]

James C. Warner. 2013. top(1) - Linux man page. https://linux.die.net/man/1/top. (2013).

[15]

Tomas Kalibera, Matthew Mole, Richard Jones, and Jan Vitek. 2012. A Black-box Approach to Understanding Concurrency in DaCapo. In OOPSLA. 335-354.

Digital Library

[16]

Philipp Lengauer, Verena Bitto, Hanspeter Mössenböck, and Markus Weninger. 2017. A Comprehensive Java Benchmark Study on Memory and Garbage Collection Behavior of DaCapo, DaCapo Scala, and SPECjvm2008. In ICPE. 3-14.

Digital Library

[17]

Linux man. 2017. Documentation of CLOCK_MONOTONIC in clock_gettime(). https://linux.die.net/man/3/clock_gettime. (2017).

[18]

Luká¿ Marek, Stephen Kell, Yudi Zheng, Lubomír Bulej, Walter Binder, Petr Tüma, Danilo Ansaloni, Aibek Sarimbekov, and Andreas Sewe. 2013. ShadowVM: Robust and Comprehensive Dynamic Program Analysis for the Java Platform. In GPCE. 105-114.

Digital Library

[19]

Luká¿ Marek, Alex Villazón, Yudi Zheng, Danilo Ansaloni, Walter Binder, and Zhengwei Qi. 2012. DiSL: A Domain-specific Language for Bytecode Instrumentation. In AOSD. 239-250.

Digital Library

[20]

Albert Noll and Thomas Gross. 2013. Online Feedback-directed Optimizations for Parallel Java Code. In OOPSLA. 713-728.

Digital Library

[21]

Oracle. 2016. Java Native Interface. http://docs.oracle.com/javase/8/docs/technotes/guides/jni/. (2016).

[22]

Oracle. 2016. Java Platform, Standard Edition 8 API Specification. https://docs.oracle.com/javase/8/docs/api/. (2016).

[23]

Oracle. 2016. Java Virtual Machine Tool Interface (JVM TI). https://docs.oracle.com/javase/8/docs/technotes/guides/jvmti/. (2016).

[24]

Oracle. 2016. The Parallel Collector. https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/parallel.html. (2016).

[25]

Oracle. 2017. Documentation of System.nanotime(). https://docs.oracle.com/javase/8/docs/api/java/lang/System.html. (2017).

[26]

perf. 2015. Linux profiling with performance counters. https://perf.wiki.kernel.org. (2015).

[27]

James Reinders. 2007. Intel Threading Building Blocks (1st ed.). O'Reilly & Associates, Inc.

Digital Library

[28]

Andrea Rosà, Lydia Y. Chen, and Walter Binder. 2016. Actor Profiling in Virtual Execution Environments. SIGPLAN Not. 52, 3 (2016), 36-46.

Digital Library

[29]

Andrea Rosà, Eduardo Rosales, and Walter Binder. 2017. Accurate Reification of Complete Supertype Information for Dynamic Analysis on the JVM. In GPCE. 104-116.

Digital Library

[30]

M. Roth, M. J. Best, C. Mustard, and A. Fedorova. 2012. Deconstructing the Overhead in Parallel Applications. In IISWC. 59-68.

Digital Library

[31]

Tao B. Schardl, Bradley C. Kuszmaul, I-Ting Angelina Lee, William M. Leiserson, and Charles E. Leiserson. 2015. The Cilkprof Scalability Profiler. In SPAA. 89-100.

Digital Library

[32]

Andreas Sewe, Mira Mezini, Aibek Sarimbekov, and Walter Binder. 2011. Da Capo Con Scala: Design and Analysis of a Scala Benchmark Suite for the Java Virtual Machine. In OOPSLA. 657-676.

Digital Library

[33]

The Apache Software Foundation. 2016. Lucene. https://lucene.apache.org. (2016).

[34]

The Eclipse Foundation. 2016. Jetty. http://www.eclipse.org/jetty/. (2016).

[35]

The Stanford Natural Language Processing Group. 2010. Stanford Topic Modeling Toolbox. https://nlp.stanford.edu/software/tmt/tmt-0.4/. (2010).

[36]

TPC. 1998. TPC-C. http://www.tpc.org/tpcc/. (1998).

[37]

Adarsh Yoga and Santosh Nagarakatte. 2017. A Fast Causal Profiler for Task Parallel Programs. In ESEC/FSE. 15-26.

Digital Library

[38]

Tingting Yu and Michael Pradel. 2016. SyncProf: Detecting, Localizing, and Optimizing Synchronization Bottlenecks. In ISSTA. 389-400.

Digital Library

Cited By

Nahian ADemsky BRodríguez GSadayappan PSukumaran-Rajam A(2024)FlowProf: Profiling Multi-threaded Programs using Information-FlowProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641577(137-149)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641577
Basso MProkopec ARosà ABinder W(2023)Optimization-Aware Compiler-Level Event ProfilingACM Transactions on Programming Languages and Systems10.1145/359147345:2(1-50)Online publication date: 26-Jun-2023
https://dl.acm.org/doi/10.1145/3591473
Ashouri MSalvaneschi GAmin N(2020)Kaizen: a scalable concolic fuzzing tool for ScalaProceedings of the 11th ACM SIGPLAN International Symposium on Scala10.1145/3426426.3428487(25-32)Online publication date: 13-Nov-2020
https://dl.acm.org/doi/10.1145/3426426.3428487
Show More Cited By

Index Terms

Analyzing and optimizing task granularity on the JVM
1. General and reference
  1. Cross-computing tools and techniques
    1. Metrics
2. Software and its engineering
  1. Software organization and properties
    1. Extra-functional properties
      1. Software performance

Recommendations

Analysis and Optimization of Task Granularity on the Java Virtual Machine

Task granularity, i.e., the amount of work performed by parallel tasks, is a key performance attribute of parallel applications. On the one hand, fine-grained tasks (i.e., small tasks carrying out few computations) may introduce considerable ...
Understanding task granularity on the JVM: profiling, analysis, and optimization
Programming '18: Companion Proceedings of the 2nd International Conference on the Art, Science, and Engineering of Programming

Task granularity, i.e., the amount of work performed by parallel tasks, is a key performance attribute of parallel applications. On the one hand, fine-grained tasks (i.e., small tasks carrying out few computations) may introduce considerable ...
An Adaptive Task Granularity Based Scheduling for Task-centric Parallelism
HPCC '14: Proceedings of the 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS)

Different from data parallel model, task parallel computing model is very important for complex analysis and data mining. Task granularity is a key factor that significantly affects the performance of task-centric parallel programs. However, current ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CGO '18: Proceedings of the 2018 International Symposium on Code Generation and Optimization

February 2018

377 pages

ISBN:9781450356176

DOI:10.1145/3179541

General Chairs:
Jens Knoop
Vienna University of Technology, Austria
,
Markus Schordan
Lawrence Livermore National Laboratory, USA
,
Program Chairs:
Teresa Johnson
Google, USA
,
Michael O'Boyle
University of Edinburgh, UK

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Oracle
Swiss National Science Foundation

Conference

CGO '18

Sponsor:

CGO '18: 16th Annual IEEE/ACM International Symposium on Code Generation and Optimization

February 24 - 28, 2018

Vienna, Austria

Acceptance Rates

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
251
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)2

Reflects downloads up to 01 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nahian ADemsky BRodríguez GSadayappan PSukumaran-Rajam A(2024)FlowProf: Profiling Multi-threaded Programs using Information-FlowProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641577(137-149)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641577
Basso MProkopec ARosà ABinder W(2023)Optimization-Aware Compiler-Level Event ProfilingACM Transactions on Programming Languages and Systems10.1145/359147345:2(1-50)Online publication date: 26-Jun-2023
https://dl.acm.org/doi/10.1145/3591473
Ashouri MSalvaneschi GAmin N(2020)Kaizen: a scalable concolic fuzzing tool for ScalaProceedings of the 11th ACM SIGPLAN International Symposium on Scala10.1145/3426426.3428487(25-32)Online publication date: 13-Nov-2020
https://dl.acm.org/doi/10.1145/3426426.3428487
Rosales ERosà ABinder W(2020)FJProfProceedings of the 13th EAI International Conference on Performance Evaluation Methodologies and Tools10.1145/3388831.3388851(128-135)Online publication date: 18-May-2020
https://dl.acm.org/doi/10.1145/3388831.3388851
Villazón ASun HRosà ARosales EBonetta DDefilippis IOporto SBinder WSmaragdakis Y(2019)NAB: automated large-scale multi-language dynamic program analysis in public code repositoriesProceedings Companion of the 2019 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity10.1145/3359061.3362777(9-10)Online publication date: 20-Oct-2019
https://dl.acm.org/doi/10.1145/3359061.3362777
Rosà ARosales EBinder W(2019)Analysis and Optimization of Task Granularity on the Java Virtual MachineACM Transactions on Programming Languages and Systems10.1145/333849741:3(1-47)Online publication date: 16-Jul-2019
https://dl.acm.org/doi/10.1145/3338497
Rosales ERosà ABinder W(2019)Optimization coaching for fork/join applications on the Java virtual machineCompanion Proceedings of the 3rd International Conference on the Art, Science, and Engineering of Programming10.1145/3328433.3328441(1-3)Online publication date: 1-Apr-2019
https://dl.acm.org/doi/10.1145/3328433.3328441
Yoga ANagarakatte SMcKinley KFisher K(2019)Parallelism-centric what-if and differential analysesProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314621(485-501)Online publication date: 8-Jun-2019
https://dl.acm.org/doi/10.1145/3314221.3314621
Javed OVillazón ABinder WHung CPapadopoulos G(2019)JUniVerseProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3297453(1768-1775)Online publication date: 8-Apr-2019
https://dl.acm.org/doi/10.1145/3297280.3297453

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents