Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A study of the scalability of stop-the-world garbage collectors on multicores

Published: 16 March 2013 Publication History

Abstract

Large-scale multicore architectures create new challenges for garbage collectors (GCs). In particular, throughput-oriented stop-the-world algorithms demonstrate good performance with a small number of cores, but have been shown to degrade badly beyond approximately 8 cores on a 48-core with OpenJDK 7. This negative result raises the question whether the stop-the-world design has intrinsic limitations that would require a radically different approach. Our study suggests that the answer is no, and that there is no compelling scalability reason to discard the existing highly-optimised throughput-oriented GC code on contemporary hardware. This paper studies the default throughput-oriented garbage collector of OpenJDK 7, called Parallel Scavenge. We identify its bottlenecks, and show how to eliminate them using well-established parallel programming techniques. On the SPECjbb2005, SPECjvm2008 and DaCapo 9.12 benchmarks, the improved GC matches the performance of Parallel Scavenge at low core count, but scales well, up to 48~cores.

References

[1]
T. A. Anderson. Optimizations in a private nursery-based garbage collector. In ISMM '10, pages 21--30. ACM, 2010.
[2]
A. W. Appel. Simple generational garbage collection and fast allocation. SP&E, 19 (2): 171--183, 1989.
[3]
S. M. Blackburn and K. S. McKinley. Immix: a mark-region garbage collector with space efficiency, fast collection, and mutator performance. In PLDI '08, pages 22--32. ACM, 2008.
[4]
S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA '06, pages 169--190. ACM, 2006.
[5]
M. Dashti, A. Fedorova, J. Funston, F. Gaud, R. Lachaize, B. Lepers, V. Quema, and M. Roth. Traffic management: A holistic approach to memory placement on numa systems. In ASPLOS '13. ACM, 2013.
[6]
D. Detlefs, C. Flood, S. Heller, and T. Printezis. Garbage-first garbage collection. In ISMM '04, pages 37--48. ACM, 2004.
[7]
D. Doligez and X. Leroy. A concurrent, generational garbage collector for a multithreaded implementation of ml. In POPL '93, pages 113--123. ACM, 1993.
[8]
C. H. Flood, D. Detlefs, N. Shavit, and X. Zhang. Parallel garbage collection for shared memory multiprocessors. In JVM '01, pages 21--21. USENIX Association, 2001.
[9]
H. Franke and R. Russell M. K. Fuss, futexes and furwocks: Fast userlevel locking in linux. In Ottawa Linux Symposium, OLS '02, pages 479--495, 2002.
[10]
L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. Assessing the scalability of garbage collectors on many cores. In SOSP Workshop on Programming Languages and Operating Systems, PLOS '11, pages 1--5. ACM, 2011.
[11]
B. Iyengar, G. Tene, M. Wolf, and E. Gehringer. The collie: a wait-free compacting collector. In ISMM '12, pages 61--72. ACM, 2012.
[12]
R. Jones, A. Hosking, and E. Moss. The garbage collection handbook: the art of automatic memory management. Chapman & Hall/CRC, 1st edition, 2011.
[13]
H. Lieberman and C. Hewitt. A real-time garbage collector based on the lifetimes of objects. CACM, 26 (6): 419--429, 1983.
[14]
LinuxMemPolicy. What is linux memory policy? http://www.kernel.org/doc/Documentation/vm/numa_memory_policy.txt, 2012.
[15]
J.-P. Lozi, F. David, G. Thomas, J. Lawall, and G. Muller. Remote Core Locking: migrating critical-section execution to improve the performance of multithreaded applications. In USENIX ATC '12, pages 65--76. USENIX Association, 2012.
[16]
S. Marlow and S. Peyton Jones. Multicore garbage collection with local heaps. In ISMM '11, pages 21--32. ACM, 2011.
[17]
S. Marlow, T. Harris, R. P. James, and S. Peyton Jones. Parallel generational-copying garbage collection with a block-structured heap. In ISMM '08, pages 11--20. ACM, 2008.
[18]
M. M. Michael and M. L. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In PODC '96, pages 267--275. ACM, 1996.
[19]
C. E. Oancea, A. Mycroft, and S. M. Watt. A new approach to parallelising tracing algorithms. In ISMM '09, pages 10--19. ACM, 2009.
[20]
T. Ogasawara. NUMA-aware memory manager with dominant-thread-based copying GC. In OOPSLA '09, pages 377--390. ACM, 2009.
[21]
OpenJDK Memory. Memory management in the Java hotspot#8482; virtual machine. Technical report, Sun Microsystems, 2006.
[22]
F. Pizlo, D. Frampton, E. Petrank, and B. Steensgaard. Stopless: a real-time garbage collector for multiprocessors. In ISMM '07, pages 159--172. ACM, 2007.
[23]
F. Pizlo, L. Ziarek, P. Maj, A. L. Hosking, E. Blanton, and J. Vitek. Schism: fragmentation-tolerant real-time garbage collection. In PLDI '10, pages 146--159. ACM, 2010.
[24]
K. Sivaramakrishnan, L. Ziarek, and S. Jagannathan. Eliminating read barriers through procrastination and cleanliness. In ISMM '12, pages 49--60. ACM, 2012.
[25]
SPECjbb2005. SPECjbb2005 home page. http://www.spec.org/jbb2005/, 2012.
[26]
SPECjvm2008. SPECjvm2008 home page. http://www.spec.org/jvm2008/, 2012.
[27]
B. Steensgaard. Thread-specific heaps for multi-threaded programs. In ISMM '00, pages 18--24. ACM, 2000.
[28]
G. Tene, B. Iyengar, and M. Wolf. C4: the continuously concurrent compacting collector. In ISMM '11, pages 79--88. ACM, 2011.
[29]
M. M. Tikir and J. K. Hollingsworth. NUMA-aware Java heaps for server applications. In IPDPS '05, pages 108--117. IEEE Computer Society, 2005.
[30]
Tilera. TILE-Gx processor family. http://www.tilera.com/products/processors/TILE-Gx_Family, 2012.
[31]
D. Ungar. Generation scavenging: A non-disruptive high performance storage reclamation algorithm. In SDE '84, pages 157--167. ACM, 1984.
[32]
J. Zhou and B. Demsky. Memory management for many-core processors with software configurable locality policies. In ISMM '12, pages 3--14. ACM, 2012.

Cited By

View all
  • (2020)InvaliDBProceedings of the VLDB Endowment10.14778/3415478.341553213:12(3032-3045)Online publication date: 14-Sep-2020
  • (2023)Scaling Up Performance of Managed Applications on NUMA SystemsProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595270(1-14)Online publication date: 6-Jun-2023
  • (2023)Concurrent GCs and Modern Java Workloads: A Cache PerspectiveProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595269(71-84)Online publication date: 6-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 41, Issue 1
ASPLOS '13
March 2013
540 pages
ISSN:0163-5964
DOI:10.1145/2490301
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
    March 2013
    574 pages
    ISBN:9781450318709
    DOI:10.1145/2451116
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 March 2013
Published in SIGARCH Volume 41, Issue 1

Check for updates

Author Tags

  1. garbage collection
  2. multicore
  3. numa

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)3
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)InvaliDBProceedings of the VLDB Endowment10.14778/3415478.341553213:12(3032-3045)Online publication date: 14-Sep-2020
  • (2023)Scaling Up Performance of Managed Applications on NUMA SystemsProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595270(1-14)Online publication date: 6-Jun-2023
  • (2023)Concurrent GCs and Modern Java Workloads: A Cache PerspectiveProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595269(71-84)Online publication date: 6-Jun-2023
  • (2023)DJXPerf: Identifying Memory Inefficiencies via Object-Centric Profiling for JavaProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580010(81-94)Online publication date: 17-Feb-2023
  • (2022)Layered Contention Mitigation for Cloud Storage2022 IEEE 15th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD55607.2022.00036(167-178)Online publication date: Jul-2022
  • (2021)Performance Evaluation of Intel Optane Memory for Managed WorkloadsACM Transactions on Architecture and Code Optimization10.1145/345134218:3(1-26)Online publication date: 22-Apr-2021
  • (2021)Bridging the performance gap for copy-based garbage collectors atop non-volatile memoryProceedings of the Sixteenth European Conference on Computer Systems10.1145/3447786.3456246(343-358)Online publication date: 21-Apr-2021
  • (2020)PlatinumProceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference10.5555/3489146.3489157(159-172)Online publication date: 15-Jul-2020
  • (2020)You can’t hide you can’t run: a performance assessment of managed applications on a NUMA machineProceedings of the 17th International Conference on Managed Programming Languages and Runtimes10.1145/3426182.3426189(80-88)Online publication date: 4-Nov-2020
  • (2020)Efficient nursery sizing for managed languages on multi-core processors with shared cachesProceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3368826.3377908(1-15)Online publication date: 22-Feb-2020
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media