Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3491204.3527473acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article

Analysis of Garbage Collection Patterns to Extend Microbenchmarks for Big Data Workloads

Published: 19 July 2022 Publication History

Abstract

Java uses automatic memory allocation where the user does not have to explicitly free used memory. This is done by the garbage collector. Garbage Collection (GC) can take up a significant amount of time, especially in Big Data applications running large workloads where garbage collection can take up to 50 percent of the application's run time. Although benchmarks have been designed to trace garbage collection events, these are not specifically suited for Big Data workloads, due to their unique memory usage patterns. We have developed a free and open source pipeline to extract and analyze object-level details from any Java program including benchmarks and Big Data applications such as Hadoop. The data contains information such as lifetime, class and allocation site of every object allocated by the program. Through the analysis of this data, we propose a small set of benchmarks designed to emulate some of the patterns observed in Big Data applications. These benchmarks also allow us to experiment and compare some Java programming patterns.

References

[1]
Apache Hadoop team. 2012. Dancing Links codebase. http://svn.apache. org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoopmapreduce- examples/src/main/java/org/apache/hadoop/examples/dancing/
[2]
David H. Bailey. 2006. The BBP Algorithm for Pi. Lawrence Berkeley National Laboratory (2006).
[3]
Verena Bitto, Philipp Lengauer, and Hanspeter Mössenböck. 2015. Efficient Rebuilding of Large Java Heaps from Event Traces. In Proceedings of the Principles and Practices of Programming on The Java Platform (Melbourne, FL, USA) (PPPJ '15). Association for Computing Machinery, New York, NY, USA, 76--89. https: //doi.org/10.1145/2807426.2807433
[4]
Yu Chan, Andy Wellings, Ian Gray, and Neil Audsley. 2014. On the Locality of Java 8 Streams in Real-Time Big Data Applications. In Proceedings of the 12th International Workshop on Java Technologies for Real-Time and Embedded Systems (Niagara Falls, NY, USA) (JTRES '14). Association for Computing Machinery, New York, NY, USA, 20--28. https://doi.org/10.1145/2661020.2661028
[5]
Dask Development Team. 2016. Dask: Library for dynamic task scheduling. https: //dask.org
[6]
Isitor Emmanuel and Clare Stanier. 2016. Defining Big Data. In Proceedings of the International Conference on Big Data and Advanced Wireless Technologies (Blagoevgrad, Bulgaria) (BDAW '16). Association for Computing Machinery, New York, NY, USA, Article 5, 6 pages. https://doi.org/10.1145/3010089.3010090
[7]
Palak Gupta and Nidhi Tyagi. 2015. An approach towards big data - A review. In International Conference on Computing, Communication Automation. 118--123. https://doi.org/10.1109/CCAA.2015.7148356
[8]
Ionut Balosin. 2019. Ionut Benchmarks. https://ionutbalosin.com/2019/12/jvmgarbage- collectors-benchmarks-report-19--12/
[9]
Nusrat Sharmin Islam, Xiaoyi Lu, Md. Wasi-ur Rahman, Jithin Jose, and Dhabaleswar K. (DK) Panda. 2014. A Micro-benchmark Suite for Evaluating HDFS Operations on Modern Clusters. In Specifying Big Data Benchmarks, Tilmann Rabl, Meikel Poess, Chaitanya Baru, and Hans-Arno Jacobsen (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 129--147. https://doi.org/10.1007/978--3- 642--53974--9_12
[10]
Jiri Sedlacek,Tomas Hurka. 2019. VisualVM. https://visualvm.github.io/index. html
[11]
Donald E. Knuth. 2000. Dancing links. arXiv:cs/0011047 [cs.DS]
[12]
Martin Larose and Marc Feeley. 1998. A Compacting Incremental Collector and Its Performance in a Production Quality Compiler. SIGPLAN Not. 34, 3 (oct 1998), 1--9. https://doi.org/10.1145/301589.286861
[13]
Philipp Lengauer, Verena Bitto, Stefan Fitzek, Markus Weninger, and Hanspeter Mössenböck. 2016. Efficient Memory Traces with Full Pointer Information. In Proceedings of the 13th International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools (Lugano, Switzerland) (PPPJ '16). Association for Computing Machinery, New York, NY, USA, Article 4, 11 pages. https://doi.org/10.1145/2972206.2972220
[14]
Philipp Lengauer, Verena Bitto, and Hanspeter Mössenböck. 2015. Accurate and Efficient Object Tracing for Java Applications. In Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering (Austin, Texas, USA) (ICPE '15). Association for Computing Machinery, New York, NY, USA, 51--62. https://doi.org/10.1145/2668930.2688037
[15]
Philipp Lengauer, Verena Bitto, and Hanspeter Mössenböck. 2016. Efficient and Viable Handling of Large Object Traces. In Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering (Delft, The Netherlands) (ICPE '16). Association for Computing Machinery, New York, NY, USA, 249--260. https://doi.org/10.1145/2851553.2851555
[16]
OpenJDK Authors. 2020. Java Micro-Benchmarks Harness (JMH). https://openjdk. java.net/projects/code-tools/jmh/
[17]
Oracle. 2019. Concurrent Mark Sweep (CMS) Collector. https://docs.oracle.com/ en/java/javase/12/gctuning/concurrent-mark-sweep-cms-collector.html
[18]
Oracle. 2019. Parallel Collector. https://docs.oracle.com/en/java/javase/12/ gctuning/parallel-collector1.html
[19]
Matthew Rocklin. 2015. Dask: Parallel Computation with Blocked algorithms and Task Scheduling. In Proceedings of the 14th Python in Science Conference, Kathryn Huff and James Bergstra (Eds.). 130 -- 136.
[20]
Sanaz Tavakolisomeh. 2020. Selecting a JVM Garbage Collector for Big Data and Cloud Services. In Proceedings of the 21st International Middleware Conference Doctoral Symposium (Delft, Netherlands) (Middleware'20 Doctoral Symposium). Association for Computing Machinery, New York, NY, USA, 22--25. https://doi. org/10.1145/3429351.3431745
[21]
The Apache Parquet team. 2013. Apache Parquet. https://parquet.apache.org/
[22]
Lijie Xu, Tian Guo, Wensheng Dou, Wei Wang, and Jun Wei. 2019. An Experimental Evaluation of Garbage Collectors on Big Data Applications. Proc. VLDB Endow. 12, 5 (jan 2019), 570--583. https://doi.org/10.14778/3303753.3303762
[23]
Xiaolan Zhang and Margo I Seltzer. 2001. HBench: JGC-An Application-Specific Benchmark Suite for Evaluating JVM Garbage Collector Performance. In COOTS, Vol. 1. 4--4.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICPE '22: Companion of the 2022 ACM/SPEC International Conference on Performance Engineering
July 2022
166 pages
ISBN:9781450391597
DOI:10.1145/3491204
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. big data
  2. garbage collection
  3. hadoop
  4. java
  5. java virtual machine

Qualifiers

  • Research-article

Conference

ICPE '22

Acceptance Rates

ICPE '22 Paper Acceptance Rate 14 of 58 submissions, 24%;
Overall Acceptance Rate 252 of 851 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 80
    Total Downloads
  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)1
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media