Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Hoard: a scalable memory allocator for multithreaded applications

Published: 01 November 2000 Publication History
  • Get Citation Alerts
  • Abstract

    Parallel, multithreaded C and C++ programs such as web servers, database managers, news servers, and scientific applications are becoming increasingly prevalent. For these applications, the memory allocator is often a bottleneck that severely limits program performance and scalability on multiprocessor systems. Previous allocators suffer from problems that include poor performance and scalability, and heap organizations that introduce false sharing. Worse, many allocators exhibit a dramatic increase in memory consumption when confronted with a producer-consumer pattern of object allocation and freeing. This increase in memory consumption can range from a factor of P (the number of processors) to unbounded memory consumption.This paper introduces Hoard, a fast, highly scalable allocator that largely avoids false sharing and is memory efficient. Hoard is the first allocator to simultaneously solve the above problems. Hoard combines one global heap and per-processor heaps with a novel discipline that provably bounds memory consumption and has very low synchronization costs in the common case. Our results on eleven programs demonstrate that Hoard yields low average fragmentation and improves overall program performance over the standard Solaris allocator by up to a factor of 60 on 14 processors, and up to a factor of 18 over the next best allocator we tested.

    References

    [1]
    U. Acar, E. Berger, R. Blumofe, and D. Papadopoulos. Hood: A threads library for multiprogrammed multiprocessors. http://www.cs.utexas.edu/users/hood, Sept. 1999.
    [2]
    J. Barnes and P. Hut. A hierarchical O(N log N) force-calculation algorithm. Nature, 324:446--449, 1986.
    [3]
    bCandid.com, Inc. http://www.bcandid.com.
    [4]
    E. D. Berger and R. D. Blumofe. Hoard: A fast, scalable, and memory-efficient allocator for shared-memory multiprocessors. Technical Report UTCS-TR99-22, The University of Texas at Austin, 1999.
    [5]
    B. Bigler, S. Allan, and R. Oldehoeft. Parallel dynamic storage allocation. International Conference on Parallel Processing, pages 272-275, 1985.
    [6]
    R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science (FOCS), pages 356--368, Santa Fe, New Mexico, Nov. 1994.
    [7]
    Coyote Systems, Inc. http://www.coyotesystems.com.
    [8]
    C. S. Ellis and T. J. Olson. Algorithms for parallel memory allocation. International Journal of Parallel Programming, 17(4):303-345, 1988.
    [9]
    W. Gloger. Dynamic memory allocator implementations in linux system libraries. http:llwww.dent.med.uni-muenehen.del" wmglo/malloc-slides.html.
    [10]
    A. Gottlieb and J. Wilson. Using the buddy system for concurrent memory allocation. Technical Report System Software Note 6, Courant Institute, 1981.
    [11]
    A. Gottlieb and J. Wilson. Parallelizing the usual buddy algorithm. Technical Report System Software Note 37, Courant Institute, 1982.
    [12]
    D. Grunwald, B. Zorn, and R. Henderson. Improving the cache locality of memory allocation. In R. Cartwright, editor, Proceedings of the Conference on Programming Language Design and Implementation, pages 177-186, New York, NY, USA, June 1993. ACM Press.
    [13]
    A. K. Iyengar. Dynamic Storage Allocation on a Multiprocessor. PhD thesis, MIT, 1992. MIT Laboratory for Computer Science Technical Report MIT/LCS/TR-560.
    [14]
    A. K. Iyengar. Parallel dynamic storage allocation algorithms. In Fifth IEEE Symposium on Parallel and Distributed Processing. IEEE Press, 1993.
    [15]
    T. Jeremiassen and S. Eggers. Reducing false sharing on shared memory multiprocessors through compile time data transformations. In ACM Symposium on Principles and Practice of Parallel Programming, pages 179-188, July 1995.
    [16]
    T. Johnson. A concurrent fast-fits memory manager. Technical Report TR91-009, University of Florida, Department of CIS, 1991.
    [17]
    T. Johnson and T. Davis. Space efficient parallel buddy memory management. Technical Report TR92-008, University of Florida, Department of CIS, 1992.
    [18]
    M. S. Johnstone. Non-Compacting Memory Allocation and Real-Time Garbage Collection. PhD thesis, University of Texas at Austin, Dec. 1997.
    [19]
    M. S. Johnstone and P. R. Wilson. The memory fragmentation problem: Solved? In ISMM, Vancouver, B.C., Canada, 1998.
    [20]
    K. Kennedy and K, S. McKinley. Optimizing for parallelism and data locality. In Proceedings of the Sixth International Conference on Supercomputing, pages 323-334, Distributed Computing, July 1992.
    [21]
    M. R. Krishnan. Heap: Pleasures and pains. Microsoft Developer Newsletter, Feb. 1999.
    [22]
    P. Larson and M. Krishnan. Memory allocation for long-running server applications. In ISMM, Vancouver, B.C., Canada, 1998.
    [23]
    D. Lea. A memory allocator. http://g.oswego.edu/dl/html/malloe.html.
    [24]
    B. Lewis. comp. programming, t h r e a d s FAQ. http://www.lambdacs.conffnewsgroup/FAQ.html.
    [25]
    P. E. McKenney and J. Slingwine. Efficient kernel memory allocation on shared-memory multiprocessor. In USENIX Association, editor, Proceedings of the Winter 1993 USENIX Conference: January 25-29, 1993, San Diego, California, USA, pages 295-305, Berkeley, CA, USA, Winter 1993. USENIX.
    [26]
    MicroQuill, Inc. http://www.mieroquill.eom.
    [27]
    MySQL, Inc. The mysql database manager. http://www.mysql.org.
    [28]
    G. J. Narlikar and G. E. Blelloch. Space-efficient scheduling of nested parallelism. ACM Transactions on Programming Languages and Systems, 21(1): 138-173, January 1999.
    [29]
    J. M. Robson. Worst case fragmentation of first fit and best fit storage allocation strategies. ACM Computer Journal, 20(3):242-244, Aug. 1977.
    [30]
    SGI. The standard template library for c++: Allocators. http :llwww.sg).comlTechno l ogy lS TLI All ocators.html.
    [31]
    Standard Performance Evaluation Corporation. SPECweb99. http:l/www.spee.orglosglweb99L
    [32]
    D. Stefanovi6. Properties of Age-Based Automatic Memory Reclamation Algorithms. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst, Massachusetts, Dec. 1998.
    [33]
    D. Stein and D. Shah. Implementing lightweight threads. In Proceedings of the 1992 USENIX Summer Conference, pages 1-9, 1992.
    [34]
    H. Stone. Parallel memory allocation using the FETCH-AND-ADD instruction. Technical Report RC 9674, IBM T. J. Watson Research Center, Nov. 1982.
    [35]
    Time-Warner/AOL, Inc. AOLserver 3.0. http://www.aolserver.com.
    [36]
    J. Torrellas, M. S. Lam, and J. L. Hennessy. False sharing and spatial locality in multiprocessor caches. IEEE Transactions on Computers, 43(6):651-663, 1994.
    [37]
    V.-Y. Vee and W.-J. Hsu. A scalable and efficient storage allocator on shared-memory multiprocessors. In International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'99), pages 230-235, Fremantle, Western Australia, June 1999.

    Cited By

    View all
    • (2024)SyncMalloc: A Synchronized Host-Device Co-Management System for GPU Dynamic Memory Allocation across All ScalesProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673069(179-188)Online publication date: 12-Aug-2024
    • (2024)Characterizing a Memory Allocator at Warehouse ScaleProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651350(192-206)Online publication date: 27-Apr-2024
    • (2024)RT-Mimalloc: A New Look at Dynamic Memory Allocation for Real-Time Systems2024 IEEE 30th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS61025.2024.00022(173-185)Online publication date: 13-May-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 35, Issue 11
    Nov. 2000
    269 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/356989
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 November 2000
    Published in SIGPLAN Volume 35, Issue 11

    Check for updates

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)555
    • Downloads (Last 6 weeks)38
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)SyncMalloc: A Synchronized Host-Device Co-Management System for GPU Dynamic Memory Allocation across All ScalesProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673069(179-188)Online publication date: 12-Aug-2024
    • (2024)Characterizing a Memory Allocator at Warehouse ScaleProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651350(192-206)Online publication date: 27-Apr-2024
    • (2024)RT-Mimalloc: A New Look at Dynamic Memory Allocation for Real-Time Systems2024 IEEE 30th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS61025.2024.00022(173-185)Online publication date: 13-May-2024
    • (2024) Improving the Relationship Between B + -Tree and Memory Allocator for Persistent Memory 2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00299(3906-3919)Online publication date: 13-May-2024
    • (2023)Beyond RSS: Towards Intelligent Dynamic Memory Management (Work in Progress)Proceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes10.1145/3617651.3622989(158-164)Online publication date: 19-Oct-2023
    • (2023)OOM-Guard: Towards Improving the Ergonomics of Rust OOM Handling via a Reservation-Based ApproachProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616303(733-744)Online publication date: 30-Nov-2023
    • (2023)DFCPP Runtime Library for Dataflow ProgrammingProceedings of the 52nd International Conference on Parallel Processing Workshops10.1145/3605731.3605887(145-152)Online publication date: 7-Aug-2023
    • (2023)Partial Failure Resilient Memory Management System for (CXL-based) Distributed Shared MemoryProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613135(658-674)Online publication date: 23-Oct-2023
    • (2023)Releasing Memory with Optimistic Access: A Hybrid Approach to Memory Reclamation and Allocation in Lock-Free ProgramsProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591089(177-186)Online publication date: 17-Jun-2023
    • (2023)PMA: A Persistent Memory Allocator with High Efficiency and Crash Consistency Guarantee2023 IEEE 41st International Conference on Computer Design (ICCD)10.1109/ICCD58817.2023.00036(182-189)Online publication date: 6-Nov-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media