research-article

wfspan: Wait-free Dynamic Memory Management

Authors:

Xiangzhen Ouyang,

Yian ZhuAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 21, Issue 4

Article No.: 43, Pages 1 - 26

https://doi.org/10.1145/3533724

Published: 23 August 2022 Publication History

Abstract

Dynamic memory allocation plays a vital role in modern application programs. Modern lock-free memory allocators based on hardware atomic primitives usually provide good performance. However, threads may starve in these lock-free implementations, leading to unbounded worst-case execution time that is not allowed in real-time embedded systems. This article presents decentralized dynamic memory management, wfspan, based on non-linearizable wait-free lists. It employs a helping mechanism to ensure no starvation in the lock-free implementation. From the perspective of design tradeoff, wfspan guarantees bounded execution steps in both allocation and deallocation procedure, at the cost of increasing bounded worst-case memory footprint. The results of running benchmarks on an x86/64 and an aarch64 machine illustrate that wfspan achieves competitive performance and memory footprint compared to lock-based and lock-free practical memory allocators while showing superior to other allocators in terms of worst-case execution time.

References

[1]

OLogN Technologies AG. 2018. Retrieved from alloc-test. https://github.com/node-dot-cpp/alloc-test.

[2]

Andrew W. Appel and David A. Naumann. 2020. Verified sequential malloc/free. In Proceedings of the ACM SIGPLAN International Symposium on Memory Management. 48–59.

Digital Library

[3]

ARM. 2020. ARM BIG.LITTLE. Retrieved from https://www.arm.com/why-arm/technologies/big-little.

[4]

Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. 2000. Hoard: A scalable memory allocator for multithreaded applications. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IX). Association for Computing Machinery, New York, NY, 117–128.

Digital Library

[5]

Andreia Correia, Pedro Ramalhete, and Pascal Felber. 2020. A wait-free universal construction for large objects. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’20). Association for Computing Machinery, New York, NY, 102–116.

Digital Library

[6]

Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In Tools and Algorithms for the Construction and Analysis of Systems, C. R. Ramakrishnan and Jakob Rehof (Eds.). Springer, Berlin, 337–340.

[7]

Dave Dice and Alex Garthwaite. 2002. Mostly lock-free malloc. In Proceedings of the 3rd International Symposium on Memory Management (ISMM’02). Association for Computing Machinery, New York, NY, 163–174.

Digital Library

[8]

Jason Evans. 2006. A Scalable Concurrent malloc(3) Implementation for FreeBSD. Retrieved from https://www.bsdcan.org/2006/papers/jemalloc.pdf.

[9]

Panagiota Fatourou and Nikolaos D. Kallimanis. 2014. Highly-efficient wait-free synchronization. Theor. Comput. Syst. 55, 3 (October 2014), 475–520.

Digital Library

[10]

Google. 2021. TCMalloc: Thread-Caching Malloc. Retrieved from https://google.github.io/tcmalloc/design.html.

[11]

Dirk Grunwald, Benjamin Zorn, and Robert Henderson. 1993. Improving the cache locality of memory allocation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’93). Association for Computing Machinery, New York, NY, 177–186.

Digital Library

[12]

Maurice Herlihy. 1991. Wait-free synchronization. ACM Trans. Program. Lang. Syst. 13, 1 (Jan. 1991), 124–149.

Digital Library

[13]

M. Herlihy, V. Luchangco, and M. Moir. 2003. Obstruction-free synchronization: Double-ended queues as an example. In Proceedings of the 23rd International Conference on Distributed Computing Systems.522–529.

[14]

J. Herter, P. Backes, F. Haupenthal, and J. Reineke. 2011. CAMA: A predictable cache-aware memory allocator. In Proceedings of the 23rd Euromicro Conference on Real-Time Systems. 23–32.

Digital Library

[15]

Intel. 2021. New Intel Core Processors with Intel Hybrid Technology. Retrieved from https://www.intel.com/content/www/us/en/products/docs/processors/core/core-processors-with-hybrid-technology-brief.html.

[16]

Mattias Jansson. 2018. Rampant Pixels Memory Allocator Benchmark. Retrieved from https://github.com/mjansson/rpmalloc-benchmark.

[17]

Andi Kleen. 2013. Lock Elision in the GNU C Library. Retrieved from https://lwn.net/Articles/534758/.

[18]

Alex Kogan and Erez Petrank. 2011. Wait-free queues with multiple enqueuers and dequeuers. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP’11). Association for Computing Machinery, New York, NY, 223–234.

Digital Library

[19]

Alex Kogan and Erez Petrank. 2012. A methodology for creating fast wait-free data structures. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12). Association for Computing Machinery, New York, NY, 141–150.

Digital Library

[20]

Bradley C. Kuszmaul. 2015. SuperMalloc: A super fast multithreaded malloc for 64-bit machines. In Proceedings of the 2015 International Symposium on Memory Management (ISMM’15). Association for Computing Machinery, New York, NY, 41–55.

Digital Library

[21]

Per-Åke Larson and Murali Krishnan. 1998. Memory allocation for long-running server applications. In Proceedings of the 1st International Symposium on Memory Management (ISMM’98). Association for Computing Machinery, New York, NY, 176–185.

Digital Library

[22]

Daan Leijen. 2021. mimalloc-bench. Retrieved from https://github.com/daanx/mimalloc-bench.

[23]

Daan Leijen, Benjamin Zorn, and Leonardo de Moura. 2019. Mimalloc: Free list sharding in action. In Programming Languages and Systems, Anthony Widjaja Lin (Ed.). Springer International Publishing, Cham, 244–265.

[24]

Chuck Lever and David Boreham. 2000. Malloc() Performance in a Multithreaded Linux Environment. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (San Diego, California) (ATEC’00). USENIX Association, USA, 301–311.

[25]

Paul Liétar, Theodore Butler, Sylvan Clebsch, Sophia Drossopoulou, Juliana Franco, Matthew J. Parkinson, Alex Shamis, Christoph M. Wintersteiger, and David Chisnall. 2019. Snmalloc: A message passing allocator. In Proceedings of the ACM SIGPLAN International Symposium on Memory Management (ISMM’19). Association for Computing Machinery, New York, NY, 122–135.

Digital Library

[26]

M. Masmano, I. Ripoll, A. Crespo, and J. Real. 2004. TLSF: A new dynamic memory allocator for real-time systems. In Proceedings of the 16th Euromicro Conference on Real-Time Systems (ECRTS’04). IEEE Computer Society, USA, 79–86.

[27]

Maged M. Michael. 2004. Hazard pointers: Safe memory reclamation for lock-free objects. IEEE Trans. Parallel Distrib. Syst. 15, 6 (June 2004), 491–504.

Digital Library

[28]

Maged M. Michael. 2004. Scalable lock-free dynamic memory allocation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’04). Association for Computing Machinery, New York, NY, 35–46.

Digital Library

[29]

Maged M. Michael and Michael L. Scott. 1996. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the 15th Annual ACM Symposium on Principles of Distributed Computing (PODC’96). Association for Computing Machinery, New York, NY, 267–275.

Digital Library

[30]

microquill. 2007. Smart Heap. Retrieved from http://www.microquill.com/smartheap/sh_tspec.htm.

[31]

Ruslan Nikolaev and Binoy Ravindran. 2020. Universal wait-free memory reclamation. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’20). Association for Computing Machinery, New York, NY, 130–143.

Digital Library

[32]

Yaqiong Peng and Zhiyu Hao. 2018. FA-Stack: A fast array-based stack with wait-free progress guarantee. IEEE Trans. Parallel Distrib. Syst. 29, 4 (2018), 843–857.

[33]

Erez Petrank, Madanlal Musuvathi, and Bjarne Steesngaard. 2009. Progress guarantee for parallel programs via bounded lock-freedom. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’09). Association for Computing Machinery, New York, NY, 144–154.

Digital Library

[34]

Isabelle Puaut. 2002. Real-time performance of dynamic memory allocation algorithms. In Proceedings of the 14th Euromicro Conference on Real-Time Systems (Euromicro RTS’02). IEEE, 41–49.

[35]

Ravi Rajwar and James R. Goodman. 2001. Speculative lock elision: Enabling highly concurrent multithreaded execution. In Proceedings of the 34th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO’34). IEEE Computer Society, 294–305.

[36]

Pedro Ramalhete and Andreia Correia. 2017. POSTER: A wait-free queue with wait-free memory reclamation. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’17). Association for Computing Machinery, New York, NY, 453–454.

Digital Library

[37]

Sepideh Roghanchi, Jakob Eriksson, and Nilanjana Basu. 2017. Ffwd: Delegation is (much) faster than you think. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP’17). Association for Computing Machinery, New York, NY, 342–358.

Digital Library

[38]

John Rushby. 1999. Partitioning in Avionics Architectures: Requirements, Mechanisms, and Assurance. Technical Report NASA/CR-1999-209347.

Digital Library

[39]

Peter Sewell, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli, and Magnus O. Myreen. 2010. X86-TSO: A rigorous and usable programmer’s model for X86 multiprocessors. Commun. ACM 53, 7 (July 2010), 89–97.

Digital Library

[40]

Philippe Stellwag, Jakob Krainz, and Wolfgang Schröder-Preikschat. 2010. A waitfree dynamic storage allocator by adopting the helping queue pattern. In Proceedings of the 9th IASTED International Conference on Parallel and Distributed Computing and Networks (Innsbruck, Austria). ACTA Press, Calgary, AB, Canada, 79–87.

[41]

Shahar Timnat and Erez Petrank. 2014. A practical wait-free simulation for lock-free data structures. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’14). Association for Computing Machinery, New York, NY, 357–368.

Digital Library

[42]

Haosen Wen, Joseph Izraelevitz, Wentao Cai, H. Alan Beadle, and Michael L. Scott. 2018. Interval-based memory reclamation. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’18). Association for Computing Machinery, New York, NY, 1–13.

Digital Library

[43]

Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA’95). Association for Computing Machinery, New York, NY, 24–36.

Digital Library

[44]

Chaoran Yang and John Mellor-Crummey. 2016. A wait-free queue as fast as fetch-and-add. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16). Association for Computing Machinery, New York, NY, Article 16, 13 pages.

Digital Library

[45]

H. Yun, R. Mancuso, Z. Wu, and R. Pellizzoni. 2014. PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In Proceedings of the IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS’14). 155–166.

[46]

Yongwang Zhao and David Sanán. 2019. Rely-guarantee reasoning about concurrent memory management in zephyr RTOS. In Proceedings of the 31st International Conference on Computer Aided Verification (CAV’19), New York City, NY, USA, July 15-18, 2019, Proceedings, Part II (Lecture Notes in Computer Science), Isil Dillig and Serdar Tasiran (Eds.), Vol. 11562. Springer, 515–533.

Cited By

Hadjadj YZouaoui CTaleb NMazari SEl Bahri MEl Mezouar M(2023)VCMalloc: A Virtually Contiguous Memory AllocatorIEEE Transactions on Computers10.1109/TC.2023.330273172:12(3431-3442)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1109/TC.2023.3302731
Sundari KNarmadha RRamani S(2022)A Classy Memory Management System (CyM2S) using an Isolated Dynamic Two-Level Memory Allocation (ID2LMA) Algorithm for the Real Time Embedded SystemsInternational Journal of Electrical and Electronics Research10.37391/ijeer.10025410:2(387-393)Online publication date: 30-Jun-2022
https://doi.org/10.37391/ijeer.100254

Index Terms

wfspan: Wait-free Dynamic Memory Management
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Allocation / deallocation strategies
    2. Software system structures
      1. Embedded software
      2. Real-time systems software
2. Theory of computation
  1. Design and analysis of algorithms
    1. Concurrent algorithms

Recommendations

Universal wait-free memory reclamation
PPoPP '20: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

In this paper, we present a universal memory reclamation scheme, Wait-Free Eras (WFE), for deleted memory blocks in wait-free concurrent data structures. WFE's key innovation is that it is completely wait-free. Although some prior techniques provide ...
A Wait-Free Multi-Word Compare-and-Swap Operation

The number of cores in future multi-core systems are expected to increase by 100 fold over the next decade. The fine-grained synchronization methods found in wait-free algorithm designs makes them desirable for these future systems. Unfortunately, such ...
Redesign the Memory Allocator for Non-Volatile Main Memory
Special Issue on Hardware and Algorithms for Learning On-a-chip and Special Issue on Alternative Computing Systems

The non-volatile memory (NVM) has the merits of byte-addressability, fast speed, persistency and low power consumption, which make it attractive to be used as main memory. Commonly, user process dynamically acquires memory through memory allocators. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 21, Issue 4

July 2022

330 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/3551651

Editor:
Tulika Mitra
National University of Singapore, Singapore

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 23 August 2022

Online AM: 04 May 2022

Accepted: 01 April 2022

Revised: 01 April 2022

Received: 01 July 2021

Published in TECS Volume 21, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
457
Total Downloads

Downloads (Last 12 months)160
Downloads (Last 6 weeks)8

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hadjadj YZouaoui CTaleb NMazari SEl Bahri MEl Mezouar M(2023)VCMalloc: A Virtually Contiguous Memory AllocatorIEEE Transactions on Computers10.1109/TC.2023.330273172:12(3431-3442)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1109/TC.2023.3302731
Sundari KNarmadha RRamani S(2022)A Classy Memory Management System (CyM2S) using an Isolated Dynamic Two-Level Memory Allocation (ID2LMA) Algorithm for the Real Time Embedded SystemsInternational Journal of Electrical and Electronics Research10.37391/ijeer.10025410:2(387-393)Online publication date: 30-Jun-2022
https://doi.org/10.37391/ijeer.100254

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents