Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3588195.3592998acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Let It Go: Relieving Garbage Collection Pain for Latency Critical Applications in Golang

Published: 07 August 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Garbage Collection (GC) is a representative automatic memory manager widely deployed in popular programming languages, such as Java, C\#, and Golang (Go). Through GC, these languages provide programmers with flexibility and safety. However, GC leads to non-trivial overhead in compute and memory resources during application runtime. GC threads compete with non-GC threads (mutators) of an application, which particularly impacts latency-critical (LC) applications and causes long tail latency. Existing GC approaches do not efficiently address the interference, as GC is triggered passively without a global insight of the application; or they employ incremental GC to reduce the interference, while the incremental progress is not dynamically tailored during GC process according to runtime characteristics, which leads to significant performance degradation upon bursty requests.
    We present LEGO, an efficient and non-intrusive GC framework that deploys a novel elastic incremental GC mechanism integrated with an adaptive GC scheduler to reduce CPU contention between GC and mutators, and improve resource utilization and QoS of LC applications. We choose to develop LEGO in Go, as Go is specifically designed for cloud applications and the current GC mechanisms for JVM are ineffective for Go due to its unique GPM thread scheduling and memory allocation model. LEGO adapts incremental GC into Go to tackle the full GC issue and addresses resource contention with a proactive scheduler for adaptive GC triggering. Importantly, LEGO leverages the elastic incremental GC mechanism in mitigating the interference from unavoidable GC in face of bursts of requests. We implement and evaluate LEGO with popular LC applications developed in Go. Results show that compared to the default Go GC and a tailored G1 GC, LEGO significantly improves both the tail latency and throughput for LC applications.

    References

    [1]
    2020. GO GC problem in Discord. https://discord.com/blog/why-discord-is-switching-from-go-to-rust.
    [2]
    2022. Flexible I/O tester. https://github.com/axboe/fio.
    [3]
    2022. Go-YCSB. https://github.com/pingcap/go-ycsb.
    [4]
    2023. Badger. https://pkg.go.dev/github.com/dgraph-io/badger.
    [5]
    2023. Beego. https://github.com/beego/beego.
    [6]
    2023. Gin Web Framework. https://gin-gonic.com/.
    [7]
    2023. Go Case Studies. https://go.dev/solutions/#case-studies.
    [8]
    2023. Rosedb. https://github.com/roseduan/rosedb.
    [9]
    2023. TiDB: Open, Unified, Distributed SQL. https://www.pingcap.com/.
    [10]
    2023. Tri Color Marking. https://en.wikipedia.org/wiki/Tracing_garbage_collection#Tri-color_marking.
    [11]
    Daniel S Berger, Benjamin Berg, Timothy Zhu, Siddhartha Sen, and Mor Harchol- Balter. 2018. RobinHood: Tail Latency Aware Caching--Dynamic Reallocation from Cache-Rich to Cache-Poor. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). Oct. 8--10, 2018, CARLSBAD, CA, USA.
    [12]
    Rodrigo Bruno, Duarte Patricio, José Simão, Luis Veiga, and Paulo Ferreira. 2019. Runtime object lifetime profiler for latency sensitive big data applications. In Proceedings of the Fourteenth EuroSys Conference 2019 (EuroSys). March 25--28, 2019, Dresden Germany.
    [13]
    Wei Chen, Aidi Pi, Shaoqi Wang, and Xiaobo Zhou. 2019. Os-augmented oversubscription of opportunistic memory with a user-assisted oom killer. In Proceedings of the 20th International Middleware Conference (Middleware). Dec. 9--13, 2019, Davis CA USA.
    [14]
    Wei Chen, Aidi Pi, Shaoqi Wang, and Xiaobo Zhou. 2019. Pufferfish: Container-driven elastic memory management for data-intensive applications. In Proceedings of the ACM Symposium on Cloud Computing (SoCC). Nov. 20--23, 2019, Santa Cruz CA USA.
    [15]
    Wei Chen, Jia Rao, and Xiaobo Zhou. 2017. Preemptive, low latency datacenter scheduling via lightweight virtualization. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). July 12--14, 2017, Santa Clara, CA.
    [16]
    Youmin Chen, Youyou Lu, Fan Yang, Qing Wang, Yang Wang, and Jiwu Shu. 2020. Flatstore: An efficient log-structured key-value storage engine for persistent memory. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). March 16--20, 2020, Lausanne, Switzerland.
    [17]
    Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM symposium on Cloud computing (SoCC). June 10--11, 2010, Indianapolis Indiana USA.
    [18]
    Oracle Corperation. 2016. Java Garbage Collection Stop The World Event. https://www.oracle.com/technetwork/tutorials/tutorials-1873457.html
    [19]
    Oracle Corporation. 2022. The Shenandoah Garbage Collector. https://wiki.openjdk.org/display/shenandoah/Main
    [20]
    Oracle Corporation. 2022. The Z Garbage Collector. https://openjdk.java.net/projects/zgc/
    [21]
    Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware scheduling for heterogeneous datacenters. ACM SIGPLAN Notices (2013).
    [22]
    Christina Delimitrou and Christos Kozyrakis. 2017. Bolt: I know what you did last summer... in the cloud. ACM SIGARCH Computer Architecture News (2017).
    [23]
    Henri Maxime Demoulin, Joshua Fried, Isaac Pedisich, Marios Kogias, Boon Thau Loo, Linh Thi Xuan Phan, and Irene Zhang. 2021. When Idling is Ideal: Optimizing Tail-Latency for Heavy-Tailed Datacenter Workloads with Perséphone. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP). Oct. 26--29, 2021, Virtual Event Germany.
    [24]
    David Detlefs, Christine Flood, Steve Heller, and Tony Printezis. 2004. Garbage-first garbage collection. In Proceedings of the 4th international symposium on Memory management (ISMM). Oct. 24--25, 2004, Vancouver BC Canada.
    [25]
    Joshua Fried, Zhenyuan Ruan, Amy Ousterhout, and Adam Belay. 2020. Caladan: Mitigating interference at microsecond timescales. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). Nov. 4--6, 2020, virtual.
    [26]
    Sanjay Ghemawat and Paul Menage. 2005. TCMalloc: Thread-caching malloc. http://goog-perftools.sourceforge.net/doc/tcmalloc.html.
    [27]
    Ionel Gog, Jana Giceva, Malte Schwarzkopf, Kapil Vaswani, Dimitrios Vytiniotis, Ganesan Ramalingam, Manuel Costa, Derek G Murray, Steven Hand, and Michael Isard. 2015. Broom: Sweeping out garbage collection from big data systems. In 15th Workshop on Hot Topics in Operating Systems (HotOS XV). May 18--20, 2015, Kartause Ittingen, Switzerland.
    [28]
    Calin Iorgulescu, Reza Azimi, Youngjin Kwon, Sameh Elnikety, Manoj Syamala, Vivek Narasayya, Herodotos Herodotou, Paulo Tomita, Alex Chen, Jack Zhang, et al. 2018. PerfIso: Performance Isolation for Commercial Latency-Sensitive Services. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). July 11--13, 2018, Boston, MA, USA.
    [29]
    Harshad Kasture, Davide B Bartolini, Nathan Beckmann, and Daniel Sanchez. 2015. Rubik: Fast analytical power management for latency-critical systems. In 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). Dec. 5--12, 2015, Waikiki, HI, USA. IEEE.
    [30]
    Harshad Kasture and Daniel Sanchez. 2016. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In 2016 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 1--10.
    [31]
    David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: Improving resource efficiency at scale. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA). June 13--17, 2015, Portland Oregon.
    [32]
    Lu Lu, Xuanhua Shi, Yongluan Zhou, Xiong Zhang, Hai Jin, Cheng Pei, Ligang He, and Yuanzhen Geng. 2016. Lifetime-Based Memory Management for Distributed Data Processing Systems. Proc. VLDB Endow. (2016).
    [33]
    Martin Maas, Tim Harris, Krste Asanovic, and John Kubiatowicz. 2015. Trash day: Coordinating garbage collection in distributed systems. In 15th Workshop on Hot Topics in Operating Systems (HotOS XV). May 18--20, 2015, Kartause Ittingen, Switzerland.
    [34]
    Spyros Makridakis and Michele Hibon. 1997. ARMA models and the Box--Jenkins methodology. Journal of forecasting 16, 3 (1997), 147--163.
    [35]
    Jason Mars and Lingjia Tang. [n. d.]. Whare-map: Heterogeneity in" homogeneous" warehouse-scale computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA). June 23--27, 2013, Tel-Aviv Israel.
    [36]
    Kris Mok. 2011. JVM@Taobao. https://www.slideshare.net/RednaxelaFX/jvm-taobao
    [37]
    Khanh Nguyen, Lu Fang, Guoqing Xu, Brian Demsky, Shan Lu, Sanazsadat Alamian, and Onur Mutlu. 2016. Yak: A High-Performance Big-Data-Friendly Garbage Collector. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). Nov. 2--4, 2016, SAVANNAH, GA, USA.
    [38]
    Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang, Jianfei Hu, and Guoqing Xu. 2015. Facade: A compiler and runtime for (almost) object-bounded big data applications. ACM SIGARCH Computer Architecture News (2015).
    [39]
    Aidi Pi, Junxian Zhao, Shaoqi Wang, and Xiaobo Zhou. 2021. Memory at your service: fast memory allocation for latency-critical services. In Proceedings of the 22nd International Middleware Conference (Middleware). Dec. 6--10, 2021, Québec city Canada.
    [40]
    Aidi Pi, Xiaobo Zhou, and Chengzhong Xu. 2022. Holmes: SMT Interference Diagnosis and CPU Scheduling for Job Co-Location. In Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing.
    [41]
    David K. Rensin. 2015. Kubernetes - Scheduling the Future at Cloud Scale. http://www.oreilly.com/webops-perf/free/kubernetes.csp
    [42]
    Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, and John Wilkes. 2013. Omega: Flexible, Scalable Schedulers for Large Compute Clusters. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys). April 15--17, 2013, Prague Czech Republic.
    [43]
    Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-Scale Cluster Management at Google with Borg. In Proceedings of the Tenth European Conference on Computer Systems (EuroSys). April 21--24, 2015, Bordeaux France.
    [44]
    Chenxi Wang, Haoran Ma, Shi Liu, Yuanqi Li, Zhenyuan Ruan, Khanh Nguyen, Michael D Bond, Ravi Netravali, Miryung Kim, and Guoqing Harry Xu. 2020. Semeru: A Memory-Disaggregated Managed Runtime. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). Nov. 4--6, 2020, virtual.
    [45]
    Chenxi Wang, Haoran Ma, Shi Liu, Yifan Qiao, Jonathan Eyolfson, Christian Navasca, Shan Lu, and Guoqing Harry Xu. 2022. {MemLiner}: Lining up Tracing and Application for a {Far-Memory-Friendly} Runtime. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22).
    [46]
    Jingjing Wang and Magdalena Balazinska. 2017. Elastic memory management for cloud data analytics. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). July 12--14, 2017, Santa Clara, CA.
    [47]
    Mingyu Wu, Ziming Zhao, Yanfei Yang, Haoyu Li, Haibo Chen, Binyu Zang, Haibing Guan, Sanhong Li, Chuansheng Lu, and Tongbao Zhang. 2020. Platinum: A {CPU-Efficient} Concurrent Garbage Collector for {Tail-Reduction} of Interactive Services. In 2020 USENIX Annual Technical Conference (USENIX ATC 20).
    [48]
    Yanfei Yang, Mingyu Wu, Haibo Chen, and Binyu Zang. 2021. Bridging the performance gap for copy-based garbage collectors atop non-volatile memory. In Proceedings of the Sixteenth European Conference on Computer Systems (EuroSys).
    [49]
    Junxian Zhao, Aidi Pi, Shaoqi Wang, and Xiaobo Zhou. 2021. Flashbyte: Improving memory efficiency with lightweight native storage. In 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid). May 10--13, 2021, Melbourne, Australia. IEEE.
    [50]
    Junxian Zhao, Aidi Pi, Xiaobo Zhou, Sang-Yoon Chang, and Chengzhong Xu. 2022. Improving Concurrent GC for Latency Critical Services in Multi-tenant Systems. In Proceedings of the 23rd ACM/IFIP International Middleware Conference (Middleware).
    [51]
    Wenyu Zhao, Stephen M. Blackburn, and Kathryn S. McKinley. 2022. Low-Latency, High-Throughput Garbage Collection. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI) (San Diego, CA, USA).
    [52]
    Timothy Zhu, Michael A Kozuch, and Mor Harchol-Balter. 2017. Workloadcompactor: Reducing datacenter cost while providing tail latency slo guarantees. In Proceedings of the 2017 Symposium on Cloud Computing (SoCC). Sept. 24--27, 2017, Santa Clara California.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing
    August 2023
    350 pages
    ISBN:9798400701559
    DOI:10.1145/3588195
    • General Chair:
    • Ali R. Butt,
    • Program Chairs:
    • Ningfang Mi,
    • Kyle Chard
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 August 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. GC scheduler
    2. Golang (Go)
    3. garbage collection (GC)
    4. latency critical (LC)
    5. tail latency

    Qualifiers

    • Research-article

    Conference

    HPDC '23

    Acceptance Rates

    Overall Acceptance Rate 166 of 966 submissions, 17%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 217
      Total Downloads
    • Downloads (Last 12 months)217
    • Downloads (Last 6 weeks)18
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media