Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-12597-3_3guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Accurate Fork-Join Profiling on the Java Virtual Machine

Published: 22 August 2022 Publication History

Abstract

The fork-join model for parallel computing has become very popular and is included in the Java class library since Java 7. While understanding and optimizing the performance of fork-join computations is of paramount importance, accurately profiling them on the Java Virtual Machine (JVM) is challenging due to the complexity of the API. In this paper, we present a novel model for analyzing fork-join computations on the JVM, addressing the peculiarities of the Java fork-join framework, including features such as task unforking and task reuse. We implement our model in a profiler that detects every spawned fork-join task, capturing all task dependencies and aiming at collecting cycle-accurate task-granularity data. We evaluate our profiler against a dedicated fork-join profiler for the JVM, showing that our tool achieves higher profile accuracy and introduces less overhead.

References

[1]
Adhianto L et al. HPCTOOLKIT: tools for performance analysis of optimized parallel programs Concurrency Comput. Pract. Exp. 2010 22 6 685-701
[2]
Blumofe RD, Joerg CF, Kuszmaul BC, Leiserson CE, Randall KH, and Zhou Y Cilk: an efficient multithreaded runtime system J. Parallel Distrib. Comput. 1996 37 1 55-69
[3]
Blumofe RD and Leiserson CE Scheduling multithreaded computations by work stealing J. ACM 1999 46 5 720-748
[4]
Chen, S., et al.: Scheduling threads for constructive cache sharing on CMPs. In: SPAA, pp. 105–115 (2007).
[5]
Conway, M.E.: A multiprocessor system design. In: AFIPS, pp. 139–146 (1963).
[6]
Fonseca, A., Cabral, B.: Evaluation of runtime cut-off approaches for parallel programs. In: VECPAR, pp. 121–134 (2016).
[7]
Fonseca, A., Stork, S.: AeminiumBenchmarks (2016). https://github.com/AEminium/AeminiumBenchmarks
[8]
Frigo M, Leiserson CE, and Randall KH The implementation of the Cilk-5 multithreaded language SIGPLAN Not. 1998 33 5 212-223
[9]
Guo, Y., Barik, R., Raman, R., Sarkar, V.: Work-first and help-first scheduling policies for async-finish task parallelism. In: IPDPS, pp. 1–12 (2009).
[10]
Haller, P., Tu, S.: The Scala Actors API (2022). https://docs.scala-lang.org/overviews/core/actors.html
[11]
He, Y., Leiserson, C.E., Leiserson, W.M.: The Cilkview scalability analyzer. In: SPAA, pp. 145–156 (2010).
[13]
Lea, D.: A Java Fork/Join framework. In: JAVA, pp. 36–43 (2000).
[14]
Lifflander, J., Krishnamoorthy, S., Kale, L.V.: Steal tree: low-overhead tracing of work stealing schedulers. In: PLDI, pp. 507–518 (2013).
[15]
Marek L et al. ShadowVM: robust and comprehensive dynamic program analysis for the Java platform ACM SIGPLAN Not. 2013 49 3 105-114
[16]
Marek, L., Villazón, A., Zheng, Y., Ansaloni, D., Binder, W., Qi, Z.: DiSL: a domain-specific language for bytecode instrumentation. In: AOSD, pp. 239–250 (2012).
[17]
Mohr, B., Brown, D., Malony, A.: TAU: a portable parallel program analysis environment for pC++. In: CONPAR – VAPP VI, pp. 29–40 (1994).
[18]
Nyman L and Laakso M Notes on the history of Fork and Join IEEE Ann. Hist. Comput. 2016 38 3 84-87
[21]
Prokopec, A., et al.: Renaissance: benchmarking suite for parallel applications on the JVM. In: PLDI, pp. 31–47 (2019).
[22]
Renaissance Suite: Documentation Overview. https://renaissance.dev/docs (2019)
[23]
Rosà, A., Rosales, E., Binder, W.: Analysis and optimization of task granularity on the java virtual machine. ACM Trans. Program. Lang. Syst. 41(3) (2019).
[25]
Rosales, E., Rosà, A., Binder, W.: FJProf: profiling Fork/Join applications on the Java Virtual Machine. In: VALUETOOLS, pp. 128–135 (2020).
[26]
Rosà A and Binder W Optimizing type-specific instrumentation on the JVM with reflective supertype information J. Vis. Lang. Comput. 2018 49 29-45
[27]
Schardl, T.B., Kuszmaul, B.C., Lee, I.T.A., Leiserson, W.M., Leiserson, C.E.: The Cilkprof scalability profiler. In: SPAA, pp. 89–100 (2015).
[28]
Tallent NR and Mellor-Crummey JM Identifying performance bottlenecks in work-stealing computations Computer 2009 42 12 44-50
[29]
Teng, Q.M., Wang, H.C., Xiao, Z., Sweeney, P.F., Duesterwald, E.: THOR: a performance analysis tool for Java applications running on multicore systems. IBM J. Res. Dev. 54(5), 4:1–4:17 (2010).
[30]
The Clojure Team: Reducers (2019). https://clojure.org/reference/reducers
[31]
The GPars Team: GPars - A Concurrency & Parallelism Framework for Groovy and Java (2016). http://www.gpars.org

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Euro-Par 2022: Parallel Processing: 28th International Conference on Parallel and Distributed Computing, Glasgow, UK, August 22–26, 2022, Proceedings
Aug 2022
442 pages
ISBN:978-3-031-12596-6
DOI:10.1007/978-3-031-12597-3

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 22 August 2022

Author Tags

  1. Fork-join Parallelism
  2. Work Stealing
  3. Accurate Profiling
  4. Task Granularity
  5. Task Dependencies
  6. Java

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media