Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

NG2C: pretenuring garbage collection with dynamic generations for HotSpot big data applications

Published: 18 June 2017 Publication History

Abstract

Big Data applications suffer from unpredictable and unacceptably high pause times due to Garbage Collection (GC). This is the case in latency-sensitive applications such as on-line credit-card fraud detection, graph-based computing for analysis on social networks, etc. Such pauses compromise latency requirements of the whole application stack and result from applications' aggressive buffering/caching of data, exposing an ill-suited GC design, which assumes that most objects will die young and does not consider that applications hold large amounts of middle-lived data in memory.
To avoid such pauses, we propose NG2C, a new GC algorithm that combines pretenuring with user-defined dynamic generations. By being able to allocate objects into different generations, NG2C is able to group objects with similar lifetime profiles in the same generation. By allocating objects with similar lifetime profiles close to each other, i.e. in the same generation, we avoid object promotion (copying between generations) and heap fragmentation (which leads to heap compactions) both responsible for most of the duration of HotSpot GC pause times.
NG2C is implemented for the OpenJDK 8 HotSpot Java Virtual Machine, as an extension of the Garbage First GC. We evaluate NG2C using Cassandra, Lucene, and GraphChi with three different GCs: Garbage First (G1), Concurrent Mark Sweep (CMS), and NG2C. Results show that NG2C decreases the worst observable GC pause time by up to 94.8% for Cassandra, 85.0% for Lucene and 96.45% for GraphChi, when compared to current collectors (G1 and CMS). In addition, NG2c has no negative impact on application throughput or memory usage.

References

[1]
A. W. Appel. Simple Generational Garbage Collection and Fast Allocation. Softw. Pract. Exper., 19(2):171–183, Feb. 1989. ISSN 0038-0644.
[2]
D. F. Bacon, P. Cheng, and V. T. Rajan. Controlling Fragmentation and Space Consumption in the Metronome, a Real-time Garbage Collector for Java. In Proceedings of the 2003 ACM SIGPLAN Conference on Language, Compiler, and Tool for Embedded Systems, LCTES ’03, pages 81–92, New York, NY, USA, 2003. ACM. ISBN 1-58113-647- 1.
[3]
W. S. Beebee Jr and M. Rinard. An implementation of scoped memory for Real-Time Java. In International Workshop on Embedded Software, pages 289–305. Springer, 2001.
[4]
S. M. Blackburn and K. S. McKinley. Immix: A Mark-region Garbage Collector with Space Efficiency, Fast Collection, and Mutator Performance. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’08, pages 22–32, New York, NY, USA, 2008. ACM. ISBN 978-1-59593-860-2.
[5]
S. M. Blackburn, S. Singhai, M. Hertz, K. S. McKinely, and J. E. B. Moss. Pretenuring for Java. In Proceedings of the 16th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA ’01, pages 342–352, New York, NY, USA, 2001. ACM. ISBN 1-58113-335-9.
[6]
S. M. Blackburn, R. Jones, K. S. McKinley, and J. E. B. Moss. Beltway: Getting Around Garbage Collection Gridlock. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, PLDI ’02, pages 153–164, New York, NY, USA, 2002. ACM.
[7]
ISBN 1-58113-463-0.
[8]
C. Boyapati, A. Salcianu, W. Beebee, Jr., and M. Rinard. Ownership Types for Safe Region-based Memory Management in Real-time Java. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, PLDI ’03, pages 324– 337, New York, NY, USA, 2003. ACM. ISBN 1-58113-662-5.
[9]
Y. Bu, V. Borkar, G. Xu, and M. J. Carey. A Bloat-aware Design for Big Data Applications. In Proceedings of the 2013 International Symposium on Memory Management, ISMM ’13, pages 119–130, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2100-6.
[10]
J. Caballero, G. Grieco, M. Marron, and A. Nappa. Undangle: Early Detection of Dangling Pointers in Use-after-free and Double-free Vulnerabilities. In Proceedings of the 2012 International Symposium on Software Testing and Analysis, ISSTA 2012, pages 133–143, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1454-1.
[11]
P. Cheng, R. Harper, and P. Lee. Generational Stack Collection and Profile-driven Pretenuring. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation, PLDI ’98, pages 162–173, New York, NY, USA, 1998. ACM. ISBN 0-89791-987-4.
[12]
D. Clifford, H. Payer, M. Stanton, and B. L. Titzer. Memento Mori: Dynamic Allocation-site-based Optimizations. In Proceedings of the 2015 International Symposium on Memory Management, ISMM ’15, pages 105–117, New York, NY, USA, 2015. ACM. ISBN 978-1-4503- 3589-8.
[13]
C. Cowan, P. Wagle, C. Pu, S. Beattie, and J. Walpole. Buffer overflows: Attacks and defenses for the vulnerability of the decade. In DARPA Information Survivability Conference and Exposition, 2000. DISCEX’00. Proceedings, volume 2, pages 119–129. IEEE, 2000.
[14]
J. Dean and L. A. Barroso. The Tail at Scale. Commun. ACM, 56(2): 74–80, Feb. 2013. ISSN 0001-0782.
[15]
D. Detlefs, C. Flood, S. Heller, and T. Printezis. Garbage-first Garbage Collection. In Proceedings of the 4th International Symposium on Memory Management, ISMM ’04, pages 37–48, New York, NY, USA, 2004. ACM. ISBN 1-58113-945-4.
[16]
D. Doligez and X. Leroy. A Concurrent, Generational Garbage Collector for a Multithreaded Implementation of ML. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’93, pages 113–123, New York, NY, USA, 1993. ACM. ISBN 0-89791-560-7.
[17]
D. Gay and A. Aiken. Language Support for Regions. In Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation, PLDI ’01, pages 70–80, New York, NY, USA, 2001. ACM.
[18]
ISBN 1-58113-414-2.
[19]
D. Gay and B. Steensgaard. Fast Escape Analysis and Stack Allocation for Object-Based Programs. In Proceedings of the 9th International Conference on Compiler Construction, CC ’00, pages 82–93, London, UK, UK, 2000. Springer-Verlag. ISBN 3-540-67263-X.
[20]
L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. A Study of the Scalability of Stop-the-world Garbage Collectors on Multicores. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’13, pages 229–240, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-1870-9.
[21]
I. Gog, J. Giceva, M. Schwarzkopf, K. Vaswani, D. Vytiniotis, G. Ramalingan, D. Murray, S. Hand, and M. Isard. Broom: Sweeping out Garbage Collection from Big Data Systems. In Proceedings of the 15th USENIX Conference on Hot Topics in Operating Systems, HOTOS’15, pages 2–2, Berkeley, CA, USA, 2015. USENIX Association.
[22]
D. Grossman, G. Morrisett, T. Jim, M. Hicks, Y. Wang, and J. Cheney. Region-based Memory Management in Cyclone. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, PLDI ’02, pages 282–293, New York, NY, USA, 2002. ACM. ISBN 1-58113-463-0.
[23]
S. Z. Guyer and K. S. McKinley. Finding Your Cronies: Static Analysis for Dynamic Object Colocation. In Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA ’04, pages 237– 250, New York, NY, USA, 2004. ACM. ISBN 1-58113-831-8.
[24]
N. Hallenberg, M. Elsman, and M. Tofte. Combining Region Inference and Garbage Collection. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, PLDI ’02, pages 141–152, New York, NY, USA, 2002. ACM. ISBN 1-58113-463-0.
[25]
T. L. Harris. Dynamic Adaptive Pre-tenuring. In Proceedings of the 2nd International Symposium on Memory Management, ISMM ’00, pages 127–136, New York, NY, USA, 2000.
[26]
ACM. ISBN 1-58113-263-8.
[27]
M. Hicks, G. Morrisett, D. Grossman, and T. Jim. Experience with Safe Manual Memory-management in Cyclone. In Proceedings of the 4th International Symposium on Memory Management, ISMM ’04, pages 73–84, New York, NY, USA, 2004. ACM. ISBN 1-58113-945- 4.
[28]
R. L. Hudson and J. E. B. Moss. Incremental collection of mature objects. In Memory Management, pages 388–403. Springer, 1992.
[29]
R. L. Hudson, R. Morrison, J. E. B. Moss, and D. S. Munro. Garbage Collecting the World: One Car at a Time. In Proceedings of the 12th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA ’97, pages 162– 175, New York, NY, USA, 1997. ACM. ISBN 0-89791-908-4.
[30]
R. Jones and C. Ryder. Garbage collection should be lifetime aware. Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems, pages 182–196, 2006.
[31]
R. Jones, A. Hosking, and E. Moss. The garbage collection handbook: the art of automatic memory management. CRC Press, 2016.
[32]
R. E. Jones and C. Ryder. A Study of Java Object Demographics. In Proceedings of the 7th International Symposium on Memory Management, ISMM ’08, pages 121–130, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-134-7.
[33]
M. Jump, S. M. Blackburn, and K. S. McKinley. Dynamic Object Sampling for Pretenuring. In Proceedings of the 4th International Symposium on Memory Management, ISMM ’04, pages 152– 162, New York, NY, USA, 2004. ACM. ISBN 1-58113-945-4.
[34]
S. Kowshik, D. Dhurjati, and V. Adve. Ensuring Code Safety Without Runtime Checks for Real-time Control Systems. In Proceedings of the 2002 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES ’02, pages 288– 297, New York, NY, USA, 2002. ACM. ISBN 1-58113-575-0.
[35]
H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a Social Network or a News Media? In Proceedings of the 19th International Conference on World Wide Web, WWW ’10, pages 591–600, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-799-8.
[36]
A. Kyrola, G. Blelloch, and C. Guestrin. GraphChi: Large-scale Graph Computation on Just a PC. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12, pages 31–46, Berkeley, CA, USA, 2012. USENIX Association. ISBN 978-1-931971-96-6.
[37]
A. Lakshman and P. Malik. Cassandra: A Decentralized Structured Storage System. SIGOPS Oper. Syst. Rev., 44(2):35–40, Apr. 2010. ISSN 0163-5980.
[38]
P. Lengauer and H. Mössenböck. The Taming of the Shrew: Increasing Performance by Automatic Parameter Tuning for Java Garbage Collectors. In Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering, ICPE ’14, pages 111–122, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2733-6.
[39]
H. Lieberman and C. Hewitt. A Real-time Garbage Collector Based on the Lifetimes of Objects. Communications of the ACM, 26(6):419– 429, June 1983. ISSN 0001-0782.
[40]
L. Lu, X. Shi, Y. Zhou, X. Zhang, H. Jin, C. Pei, L. He, and Y. Geng. Lifetime-based Memory Management for Distributed Data Processing Systems. Proc. VLDB Endow., 9(12):936–947, Aug. 2016.
[41]
ISSN 2150-8097.
[42]
S. Marion, R. Jones, and C. Ryder. Decrypting the Java Gene Pool. In Proceedings of the 6th International Symposium on Memory Management, ISMM ’07, pages 67–78, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-893-0.
[43]
S. Marlow, T. Harris, R. P. James, and S. Peyton Jones. Parallel Generational-copying Garbage Collection with a Block-structured Heap. In Proceedings of the 7th International Symposium on Memory Management, ISMM ’08, pages 11–20, New York, NY, USA, 2008.
[44]
ACM. ISBN 978-1-60558-134-7.
[45]
L. Mastrangelo, L. Ponzanelli, A. Mocci, M. Lanza, M. Hauswirth, and N. Nystrom. Use at Your Own Risk: The Java Unsafe API in the Wild. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015, pages 695–710, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3689-5.
[46]
M. McCandless, E. Hatcher, and O. Gospodnetic. Lucene in Action, Second Edition: Covers Apache Lucene 3.0. Manning Publications Co., Greenwich, CT, USA, 2010. ISBN 1933988177, 9781933988177.
[47]
K. Nguyen, K. Wang, Y. Bu, L. Fang, J. Hu, and G. Xu. FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’15, pages 675–690, New York, NY, USA, 2015.
[48]
ACM. ISBN 978-1-4503-2835-7.
[49]
K. Nguyen, L. Fang, G. Xu, B. Demsky, S. Lu, S. Alamian, and O. Mutlu. Yak: A High-performance Big-data-friendly Garbage Collector. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI’16, pages 349– 365, Berkeley, CA, USA, 2016. USENIX Association. ISBN 978- 1-931971-33-1.
[50]
D. Ongaro, S. M. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum. Fast Crash Recovery in RAMCloud. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP ’11, pages 29–41, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0977-6.
[51]
K. Ousterhout, R. Rasti, S. Ratnasamy, S. Shenker, and B.-G. Chun. Making Sense of Performance in Data Analytics Frameworks. In Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation, NSDI’15, pages 293–307, Berkeley, CA, USA, 2015. USENIX Association. ISBN 978-1-931971-218.
[52]
F. Pizlo, L. Ziarek, and J. Vitek. Real Time Java on Resourceconstrained Platforms with Fiji VM. In Proceedings of the 7th International Workshop on Java Technologies for Real-Time and Embedded Systems, JTRES ’09, pages 110–119, New York, NY, USA, 2009.
[53]
ACM. ISBN 978-1-60558-732-5.
[54]
R. Power and J. Li. Piccolo: Building Fast, Distributed Programs with Partitioned Tables. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI’10, pages 293– 306, Berkeley, CA, USA, 2010. USENIX Association.
[55]
S. M. Rumble, D. Ongaro, R. Stutsman, M. Rosenblum, and J. K. Ousterhout. It’s Time for Low Latency. In Proceedings of the 13th USENIX Conference on Hot Topics in Operating Systems, HotOS’13, pages 11–11, Berkeley, CA, USA, 2011. USENIX Association.
[56]
J. Seligmann and S. Grarup. Incremental mature garbage collection using the train algorithm. In European Conference on Object-Oriented Programming, pages 235–252. Springer, 1995.
[57]
A. Shinnar, D. Cunningham, V. Saraswat, and B. Herta. M3R: Increased Performance for In-memory Hadoop Jobs. Proc. VLDB Endow., 5(12):1736–1747, Aug. 2012. ISSN 2150-8097.
[58]
C. Stancu, C. Wimmer, S. Brunthaler, P. Larsen, and M. Franz. Safe and Efficient Hybrid Memory Management for Java. In Proceedings of the 2015 International Symposium on Memory Management, ISMM ’15, pages 81–92, New York, NY, USA, 2015. ACM. ISBN 978-1- 4503-3589-8.
[59]
G. Tene, B. Iyengar, and M. Wolf. C4: The Continuously Concurrent Compacting Collector. In Proceedings of the International Symposium on Memory Management, ISMM ’11, pages 79–88, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0263-0.
[60]
D. Ungar. Generation Scavenging: A Non-disruptive High Performance Storage Reclamation Algorithm. In Proceedings of the First ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, SDE 1, pages 157–167, New York, NY, USA, 1984. ACM.
[61]
ISBN 0-89791-131-8.
[62]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster Computing with Working Sets. In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud’10, pages 10–10, Berkeley, CA, USA, 2010. USENIX Association.
[63]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI’12, pages 2–2, Berkeley, CA, USA, 2012. USENIX Association.
[64]
H. Zhang, G. Chen, B. C. Ooi, K.-L. Tan, and M. Zhang. In-Memory Big Data Management and Processing: A Survey. IEEE Transactions on Knowledge & Data Engineering, 27(7):1920–1948, 2015. ISSN 1041-4347.

Cited By

View all
  • (2024)Characterization and Reclamation of Frozen Garbage in Managed FaaS WorkloadsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629579(281-297)Online publication date: 22-Apr-2024
  • (2023)Optimization-Aware Compiler-Level Event ProfilingACM Transactions on Programming Languages and Systems10.1145/3591473Online publication date: 10-Apr-2023
  • (2020)PlatinumProceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference10.5555/3489146.3489157(159-172)Online publication date: 15-Jul-2020
  • Show More Cited By

Index Terms

  1. NG2C: pretenuring garbage collection with dynamic generations for HotSpot big data applications

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 52, Issue 9
ISMM '17
September 2017
127 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3156685
Issue’s Table of Contents
  • cover image ACM Conferences
    ISMM 2017: Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory Management
    June 2017
    127 pages
    ISBN:9781450350440
    DOI:10.1145/3092255
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2017
Published in SIGPLAN Volume 52, Issue 9

Check for updates

Author Tags

  1. Big Data
  2. Garbage Collection
  3. Latency

Qualifiers

  • Article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)2
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Characterization and Reclamation of Frozen Garbage in Managed FaaS WorkloadsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629579(281-297)Online publication date: 22-Apr-2024
  • (2023)Optimization-Aware Compiler-Level Event ProfilingACM Transactions on Programming Languages and Systems10.1145/3591473Online publication date: 10-Apr-2023
  • (2020)PlatinumProceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference10.5555/3489146.3489157(159-172)Online publication date: 15-Jul-2020
  • (2024)TeraHeap: Exploiting Flash Storage for Mitigating DRAM Pressure in Managed Big Data FrameworksACM Transactions on Programming Languages and Systems10.1145/370059346:4(1-37)Online publication date: 15-Oct-2024
  • (2024)Multiverse Notebook: Shifting Data Scientists to Time TravelersProceedings of the ACM on Programming Languages10.1145/36498388:OOPSLA1(754-783)Online publication date: 29-Apr-2024
  • (2024)Jade: A High-throughput Concurrent Copying Garbage CollectorProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650087(1160-1174)Online publication date: 22-Apr-2024
  • (2023)TeraHeap: Reducing Memory Pressure in Managed Big Data FrameworksProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582045(694-709)Online publication date: 25-Mar-2023
  • (2023) RemOrphan : Object Storage Sustainability Through Rem oving Offline-Processed Orphan Garbage Data IEEE Access10.1109/ACCESS.2023.331921711(107049-107067)Online publication date: 2023
  • (2023)BestGC: An Automatic GC SelectorIEEE Access10.1109/ACCESS.2023.329439811(72357-72373)Online publication date: 2023
  • (2021)Bridging the performance gap for copy-based garbage collectors atop non-volatile memoryProceedings of the Sixteenth European Conference on Computer Systems10.1145/3447786.3456246(343-358)Online publication date: 21-Apr-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media