research-article

Let It Go: Relieving Garbage Collection Pain for Latency Critical Applications in Golang

Authors:

Sang-Yoon Chang,

Chengzhong XuAuthors Info & Claims

HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing

Pages 169 - 180

https://doi.org/10.1145/3588195.3592998

Published: 07 August 2023 Publication History

Abstract

Garbage Collection (GC) is a representative automatic memory manager widely deployed in popular programming languages, such as Java, C\#, and Golang (Go). Through GC, these languages provide programmers with flexibility and safety. However, GC leads to non-trivial overhead in compute and memory resources during application runtime. GC threads compete with non-GC threads (mutators) of an application, which particularly impacts latency-critical (LC) applications and causes long tail latency. Existing GC approaches do not efficiently address the interference, as GC is triggered passively without a global insight of the application; or they employ incremental GC to reduce the interference, while the incremental progress is not dynamically tailored during GC process according to runtime characteristics, which leads to significant performance degradation upon bursty requests.

We present LEGO, an efficient and non-intrusive GC framework that deploys a novel elastic incremental GC mechanism integrated with an adaptive GC scheduler to reduce CPU contention between GC and mutators, and improve resource utilization and QoS of LC applications. We choose to develop LEGO in Go, as Go is specifically designed for cloud applications and the current GC mechanisms for JVM are ineffective for Go due to its unique GPM thread scheduling and memory allocation model. LEGO adapts incremental GC into Go to tackle the full GC issue and addresses resource contention with a proactive scheduler for adaptive GC triggering. Importantly, LEGO leverages the elastic incremental GC mechanism in mitigating the interference from unavoidable GC in face of bursts of requests. We implement and evaluate LEGO with popular LC applications developed in Go. Results show that compared to the default Go GC and a tailored G1 GC, LEGO significantly improves both the tail latency and throughput for LC applications.

References

[1]

2020. GO GC problem in Discord. https://discord.com/blog/why-discord-is-switching-from-go-to-rust.

[2]

2022. Flexible I/O tester. https://github.com/axboe/fio.

[3]

2022. Go-YCSB. https://github.com/pingcap/go-ycsb.

[4]

2023. Badger. https://pkg.go.dev/github.com/dgraph-io/badger.

[5]

2023. Beego. https://github.com/beego/beego.

[6]

2023. Gin Web Framework. https://gin-gonic.com/.

[7]

2023. Go Case Studies. https://go.dev/solutions/#case-studies.

[8]

2023. Rosedb. https://github.com/roseduan/rosedb.

[9]

2023. TiDB: Open, Unified, Distributed SQL. https://www.pingcap.com/.

[10]

2023. Tri Color Marking. https://en.wikipedia.org/wiki/Tracing_garbage_collection#Tri-color_marking.

[11]

Daniel S Berger, Benjamin Berg, Timothy Zhu, Siddhartha Sen, and Mor Harchol- Balter. 2018. RobinHood: Tail Latency Aware Caching--Dynamic Reallocation from Cache-Rich to Cache-Poor. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). Oct. 8--10, 2018, CARLSBAD, CA, USA.

[12]

Rodrigo Bruno, Duarte Patricio, José Simão, Luis Veiga, and Paulo Ferreira. 2019. Runtime object lifetime profiler for latency sensitive big data applications. In Proceedings of the Fourteenth EuroSys Conference 2019 (EuroSys). March 25--28, 2019, Dresden Germany.

Digital Library

[13]

Wei Chen, Aidi Pi, Shaoqi Wang, and Xiaobo Zhou. 2019. Os-augmented oversubscription of opportunistic memory with a user-assisted oom killer. In Proceedings of the 20th International Middleware Conference (Middleware). Dec. 9--13, 2019, Davis CA USA.

Digital Library

[14]

Wei Chen, Aidi Pi, Shaoqi Wang, and Xiaobo Zhou. 2019. Pufferfish: Container-driven elastic memory management for data-intensive applications. In Proceedings of the ACM Symposium on Cloud Computing (SoCC). Nov. 20--23, 2019, Santa Cruz CA USA.

Digital Library

[15]

Wei Chen, Jia Rao, and Xiaobo Zhou. 2017. Preemptive, low latency datacenter scheduling via lightweight virtualization. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). July 12--14, 2017, Santa Clara, CA.

Digital Library

[16]

Youmin Chen, Youyou Lu, Fan Yang, Qing Wang, Yang Wang, and Jiwu Shu. 2020. Flatstore: An efficient log-structured key-value storage engine for persistent memory. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). March 16--20, 2020, Lausanne, Switzerland.

Digital Library

[17]

Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM symposium on Cloud computing (SoCC). June 10--11, 2010, Indianapolis Indiana USA.

Digital Library

[18]

Oracle Corperation. 2016. Java Garbage Collection Stop The World Event. https://www.oracle.com/technetwork/tutorials/tutorials-1873457.html

[19]

Oracle Corporation. 2022. The Shenandoah Garbage Collector. https://wiki.openjdk.org/display/shenandoah/Main

[20]

Oracle Corporation. 2022. The Z Garbage Collector. https://openjdk.java.net/projects/zgc/

[21]

Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware scheduling for heterogeneous datacenters. ACM SIGPLAN Notices (2013).

[22]

Christina Delimitrou and Christos Kozyrakis. 2017. Bolt: I know what you did last summer... in the cloud. ACM SIGARCH Computer Architecture News (2017).

[23]

Henri Maxime Demoulin, Joshua Fried, Isaac Pedisich, Marios Kogias, Boon Thau Loo, Linh Thi Xuan Phan, and Irene Zhang. 2021. When Idling is Ideal: Optimizing Tail-Latency for Heavy-Tailed Datacenter Workloads with Perséphone. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP). Oct. 26--29, 2021, Virtual Event Germany.

Digital Library

[24]

David Detlefs, Christine Flood, Steve Heller, and Tony Printezis. 2004. Garbage-first garbage collection. In Proceedings of the 4th international symposium on Memory management (ISMM). Oct. 24--25, 2004, Vancouver BC Canada.

Digital Library

[25]

Joshua Fried, Zhenyuan Ruan, Amy Ousterhout, and Adam Belay. 2020. Caladan: Mitigating interference at microsecond timescales. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). Nov. 4--6, 2020, virtual.

Digital Library

[26]

Sanjay Ghemawat and Paul Menage. 2005. TCMalloc: Thread-caching malloc. http://goog-perftools.sourceforge.net/doc/tcmalloc.html.

[27]

Ionel Gog, Jana Giceva, Malte Schwarzkopf, Kapil Vaswani, Dimitrios Vytiniotis, Ganesan Ramalingam, Manuel Costa, Derek G Murray, Steven Hand, and Michael Isard. 2015. Broom: Sweeping out garbage collection from big data systems. In 15th Workshop on Hot Topics in Operating Systems (HotOS XV). May 18--20, 2015, Kartause Ittingen, Switzerland.

[28]

Calin Iorgulescu, Reza Azimi, Youngjin Kwon, Sameh Elnikety, Manoj Syamala, Vivek Narasayya, Herodotos Herodotou, Paulo Tomita, Alex Chen, Jack Zhang, et al. 2018. PerfIso: Performance Isolation for Commercial Latency-Sensitive Services. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). July 11--13, 2018, Boston, MA, USA.

Digital Library

[29]

Harshad Kasture, Davide B Bartolini, Nathan Beckmann, and Daniel Sanchez. 2015. Rubik: Fast analytical power management for latency-critical systems. In 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). Dec. 5--12, 2015, Waikiki, HI, USA. IEEE.

Digital Library

[30]

Harshad Kasture and Daniel Sanchez. 2016. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In 2016 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 1--10.

[31]

David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: Improving resource efficiency at scale. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA). June 13--17, 2015, Portland Oregon.

Digital Library

[32]

Lu Lu, Xuanhua Shi, Yongluan Zhou, Xiong Zhang, Hai Jin, Cheng Pei, Ligang He, and Yuanzhen Geng. 2016. Lifetime-Based Memory Management for Distributed Data Processing Systems. Proc. VLDB Endow. (2016).

Digital Library

[33]

Martin Maas, Tim Harris, Krste Asanovic, and John Kubiatowicz. 2015. Trash day: Coordinating garbage collection in distributed systems. In 15th Workshop on Hot Topics in Operating Systems (HotOS XV). May 18--20, 2015, Kartause Ittingen, Switzerland.

[34]

Spyros Makridakis and Michele Hibon. 1997. ARMA models and the Box--Jenkins methodology. Journal of forecasting 16, 3 (1997), 147--163.

[35]

Jason Mars and Lingjia Tang. [n. d.]. Whare-map: Heterogeneity in" homogeneous" warehouse-scale computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA). June 23--27, 2013, Tel-Aviv Israel.

Digital Library

[36]

Kris Mok. 2011. JVM@Taobao. https://www.slideshare.net/RednaxelaFX/jvm-taobao

[37]

Khanh Nguyen, Lu Fang, Guoqing Xu, Brian Demsky, Shan Lu, Sanazsadat Alamian, and Onur Mutlu. 2016. Yak: A High-Performance Big-Data-Friendly Garbage Collector. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). Nov. 2--4, 2016, SAVANNAH, GA, USA.

Digital Library

[38]

Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang, Jianfei Hu, and Guoqing Xu. 2015. Facade: A compiler and runtime for (almost) object-bounded big data applications. ACM SIGARCH Computer Architecture News (2015).

Digital Library

[39]

Aidi Pi, Junxian Zhao, Shaoqi Wang, and Xiaobo Zhou. 2021. Memory at your service: fast memory allocation for latency-critical services. In Proceedings of the 22nd International Middleware Conference (Middleware). Dec. 6--10, 2021, Québec city Canada.

Digital Library

[40]

Aidi Pi, Xiaobo Zhou, and Chengzhong Xu. 2022. Holmes: SMT Interference Diagnosis and CPU Scheduling for Job Co-Location. In Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing.

Digital Library

[41]

David K. Rensin. 2015. Kubernetes - Scheduling the Future at Cloud Scale. http://www.oreilly.com/webops-perf/free/kubernetes.csp

[42]

Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, and John Wilkes. 2013. Omega: Flexible, Scalable Schedulers for Large Compute Clusters. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys). April 15--17, 2013, Prague Czech Republic.

Digital Library

[43]

Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-Scale Cluster Management at Google with Borg. In Proceedings of the Tenth European Conference on Computer Systems (EuroSys). April 21--24, 2015, Bordeaux France.

Digital Library

[44]

Chenxi Wang, Haoran Ma, Shi Liu, Yuanqi Li, Zhenyuan Ruan, Khanh Nguyen, Michael D Bond, Ravi Netravali, Miryung Kim, and Guoqing Harry Xu. 2020. Semeru: A Memory-Disaggregated Managed Runtime. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). Nov. 4--6, 2020, virtual.

Digital Library

[45]

Chenxi Wang, Haoran Ma, Shi Liu, Yifan Qiao, Jonathan Eyolfson, Christian Navasca, Shan Lu, and Guoqing Harry Xu. 2022. {MemLiner}: Lining up Tracing and Application for a {Far-Memory-Friendly} Runtime. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22).

[46]

Jingjing Wang and Magdalena Balazinska. 2017. Elastic memory management for cloud data analytics. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). July 12--14, 2017, Santa Clara, CA.

Digital Library

[47]

Mingyu Wu, Ziming Zhao, Yanfei Yang, Haoyu Li, Haibo Chen, Binyu Zang, Haibing Guan, Sanhong Li, Chuansheng Lu, and Tongbao Zhang. 2020. Platinum: A {CPU-Efficient} Concurrent Garbage Collector for {Tail-Reduction} of Interactive Services. In 2020 USENIX Annual Technical Conference (USENIX ATC 20).

[48]

Yanfei Yang, Mingyu Wu, Haibo Chen, and Binyu Zang. 2021. Bridging the performance gap for copy-based garbage collectors atop non-volatile memory. In Proceedings of the Sixteenth European Conference on Computer Systems (EuroSys).

Digital Library

[49]

Junxian Zhao, Aidi Pi, Shaoqi Wang, and Xiaobo Zhou. 2021. Flashbyte: Improving memory efficiency with lightweight native storage. In 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid). May 10--13, 2021, Melbourne, Australia. IEEE.

[50]

Junxian Zhao, Aidi Pi, Xiaobo Zhou, Sang-Yoon Chang, and Chengzhong Xu. 2022. Improving Concurrent GC for Latency Critical Services in Multi-tenant Systems. In Proceedings of the 23rd ACM/IFIP International Middleware Conference (Middleware).

Digital Library

[51]

Wenyu Zhao, Stephen M. Blackburn, and Kathryn S. McKinley. 2022. Low-Latency, High-Throughput Garbage Collection. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI) (San Diego, CA, USA).

[52]

Timothy Zhu, Michael A Kozuch, and Mor Harchol-Balter. 2017. Workloadcompactor: Reducing datacenter cost while providing tail latency slo guarantees. In Proceedings of the 2017 Symposium on Cloud Computing (SoCC). Sept. 24--27, 2017, Santa Clara California.

Digital Library

Index Terms

Let It Go: Relieving Garbage Collection Pain for Latency Critical Applications in Golang
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Garbage collection

Recommendations

Low-latency, high-throughput garbage collection
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation

To achieve short pauses, state-of-the-art concurrent copying collectors such as C4, Shenandoah, and ZGC use substantially more CPU cycles and memory than simpler collectors. They suffer from design limitations: i) concurrent copying with inherently ...
Age-based garbage collection

Modern generational garbage collectors look for garbage among the young objects, because they have high mortality; however, these objects include the very youngest objects, which clearly are still live. We introduce new garbage collection algorithms, ...
Age-based garbage collection
OOPSLA '99: Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications

Modern generational garbage collectors look for garbage among the young objects, because they have high mortality; however, these objects include the very youngest objects, which clearly are still live. We introduce new garbage collection algorithms, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing

August 2023

350 pages

ISBN:9798400701559

DOI:10.1145/3588195

General Chair:
Ali R. Butt
Virginia Tech, USA
,
Program Chairs:
Ningfang Mi
Northeastern University, USA
,
Kyle Chard
University of Chicago & Argonne National Laboratory, USA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

HPDC '23

Sponsor:

HPDC '23: The 32nd International Symposium on High-Performance Parallel and Distributed Computing

June 16 - 23, 2023

FL, Orlando, USA

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
217
Total Downloads

Downloads (Last 12 months)217
Downloads (Last 6 weeks)18

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents