Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

wfspan: Wait-free Dynamic Memory Management

Published: 23 August 2022 Publication History

Abstract

Dynamic memory allocation plays a vital role in modern application programs. Modern lock-free memory allocators based on hardware atomic primitives usually provide good performance. However, threads may starve in these lock-free implementations, leading to unbounded worst-case execution time that is not allowed in real-time embedded systems. This article presents decentralized dynamic memory management, wfspan, based on non-linearizable wait-free lists. It employs a helping mechanism to ensure no starvation in the lock-free implementation. From the perspective of design tradeoff, wfspan guarantees bounded execution steps in both allocation and deallocation procedure, at the cost of increasing bounded worst-case memory footprint. The results of running benchmarks on an x86/64 and an aarch64 machine illustrate that wfspan achieves competitive performance and memory footprint compared to lock-based and lock-free practical memory allocators while showing superior to other allocators in terms of worst-case execution time.

References

[1]
OLogN Technologies AG. 2018. Retrieved from alloc-test. https://github.com/node-dot-cpp/alloc-test.
[2]
Andrew W. Appel and David A. Naumann. 2020. Verified sequential malloc/free. In Proceedings of the ACM SIGPLAN International Symposium on Memory Management. 48–59.
[3]
[4]
Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. 2000. Hoard: A scalable memory allocator for multithreaded applications. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IX). Association for Computing Machinery, New York, NY, 117–128.
[5]
Andreia Correia, Pedro Ramalhete, and Pascal Felber. 2020. A wait-free universal construction for large objects. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’20). Association for Computing Machinery, New York, NY, 102–116.
[6]
Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In Tools and Algorithms for the Construction and Analysis of Systems, C. R. Ramakrishnan and Jakob Rehof (Eds.). Springer, Berlin, 337–340.
[7]
Dave Dice and Alex Garthwaite. 2002. Mostly lock-free malloc. In Proceedings of the 3rd International Symposium on Memory Management (ISMM’02). Association for Computing Machinery, New York, NY, 163–174.
[8]
Jason Evans. 2006. A Scalable Concurrent malloc(3) Implementation for FreeBSD. Retrieved from https://www.bsdcan.org/2006/papers/jemalloc.pdf.
[9]
Panagiota Fatourou and Nikolaos D. Kallimanis. 2014. Highly-efficient wait-free synchronization. Theor. Comput. Syst. 55, 3 (October 2014), 475–520.
[10]
Google. 2021. TCMalloc: Thread-Caching Malloc. Retrieved from https://google.github.io/tcmalloc/design.html.
[11]
Dirk Grunwald, Benjamin Zorn, and Robert Henderson. 1993. Improving the cache locality of memory allocation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’93). Association for Computing Machinery, New York, NY, 177–186.
[12]
Maurice Herlihy. 1991. Wait-free synchronization. ACM Trans. Program. Lang. Syst. 13, 1 (Jan. 1991), 124–149.
[13]
M. Herlihy, V. Luchangco, and M. Moir. 2003. Obstruction-free synchronization: Double-ended queues as an example. In Proceedings of the 23rd International Conference on Distributed Computing Systems.522–529.
[14]
J. Herter, P. Backes, F. Haupenthal, and J. Reineke. 2011. CAMA: A predictable cache-aware memory allocator. In Proceedings of the 23rd Euromicro Conference on Real-Time Systems. 23–32.
[15]
[16]
Mattias Jansson. 2018. Rampant Pixels Memory Allocator Benchmark. Retrieved from https://github.com/mjansson/rpmalloc-benchmark.
[17]
Andi Kleen. 2013. Lock Elision in the GNU C Library. Retrieved from https://lwn.net/Articles/534758/.
[18]
Alex Kogan and Erez Petrank. 2011. Wait-free queues with multiple enqueuers and dequeuers. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP’11). Association for Computing Machinery, New York, NY, 223–234.
[19]
Alex Kogan and Erez Petrank. 2012. A methodology for creating fast wait-free data structures. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12). Association for Computing Machinery, New York, NY, 141–150.
[20]
Bradley C. Kuszmaul. 2015. SuperMalloc: A super fast multithreaded malloc for 64-bit machines. In Proceedings of the 2015 International Symposium on Memory Management (ISMM’15). Association for Computing Machinery, New York, NY, 41–55.
[21]
Per-Åke Larson and Murali Krishnan. 1998. Memory allocation for long-running server applications. In Proceedings of the 1st International Symposium on Memory Management (ISMM’98). Association for Computing Machinery, New York, NY, 176–185.
[22]
Daan Leijen. 2021. mimalloc-bench. Retrieved from https://github.com/daanx/mimalloc-bench.
[23]
Daan Leijen, Benjamin Zorn, and Leonardo de Moura. 2019. Mimalloc: Free list sharding in action. In Programming Languages and Systems, Anthony Widjaja Lin (Ed.). Springer International Publishing, Cham, 244–265.
[24]
Chuck Lever and David Boreham. 2000. Malloc() Performance in a Multithreaded Linux Environment. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (San Diego, California) (ATEC’00). USENIX Association, USA, 301–311.
[25]
Paul Liétar, Theodore Butler, Sylvan Clebsch, Sophia Drossopoulou, Juliana Franco, Matthew J. Parkinson, Alex Shamis, Christoph M. Wintersteiger, and David Chisnall. 2019. Snmalloc: A message passing allocator. In Proceedings of the ACM SIGPLAN International Symposium on Memory Management (ISMM’19). Association for Computing Machinery, New York, NY, 122–135.
[26]
M. Masmano, I. Ripoll, A. Crespo, and J. Real. 2004. TLSF: A new dynamic memory allocator for real-time systems. In Proceedings of the 16th Euromicro Conference on Real-Time Systems (ECRTS’04). IEEE Computer Society, USA, 79–86.
[27]
Maged M. Michael. 2004. Hazard pointers: Safe memory reclamation for lock-free objects. IEEE Trans. Parallel Distrib. Syst. 15, 6 (June 2004), 491–504.
[28]
Maged M. Michael. 2004. Scalable lock-free dynamic memory allocation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’04). Association for Computing Machinery, New York, NY, 35–46.
[29]
Maged M. Michael and Michael L. Scott. 1996. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the 15th Annual ACM Symposium on Principles of Distributed Computing (PODC’96). Association for Computing Machinery, New York, NY, 267–275.
[30]
microquill. 2007. Smart Heap. Retrieved from http://www.microquill.com/smartheap/sh_tspec.htm.
[31]
Ruslan Nikolaev and Binoy Ravindran. 2020. Universal wait-free memory reclamation. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’20). Association for Computing Machinery, New York, NY, 130–143.
[32]
Yaqiong Peng and Zhiyu Hao. 2018. FA-Stack: A fast array-based stack with wait-free progress guarantee. IEEE Trans. Parallel Distrib. Syst. 29, 4 (2018), 843–857.
[33]
Erez Petrank, Madanlal Musuvathi, and Bjarne Steesngaard. 2009. Progress guarantee for parallel programs via bounded lock-freedom. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’09). Association for Computing Machinery, New York, NY, 144–154.
[34]
Isabelle Puaut. 2002. Real-time performance of dynamic memory allocation algorithms. In Proceedings of the 14th Euromicro Conference on Real-Time Systems (Euromicro RTS’02). IEEE, 41–49.
[35]
Ravi Rajwar and James R. Goodman. 2001. Speculative lock elision: Enabling highly concurrent multithreaded execution. In Proceedings of the 34th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO’34). IEEE Computer Society, 294–305.
[36]
Pedro Ramalhete and Andreia Correia. 2017. POSTER: A wait-free queue with wait-free memory reclamation. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’17). Association for Computing Machinery, New York, NY, 453–454.
[37]
Sepideh Roghanchi, Jakob Eriksson, and Nilanjana Basu. 2017. Ffwd: Delegation is (much) faster than you think. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP’17). Association for Computing Machinery, New York, NY, 342–358.
[38]
John Rushby. 1999. Partitioning in Avionics Architectures: Requirements, Mechanisms, and Assurance. Technical Report NASA/CR-1999-209347.
[39]
Peter Sewell, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli, and Magnus O. Myreen. 2010. X86-TSO: A rigorous and usable programmer’s model for X86 multiprocessors. Commun. ACM 53, 7 (July 2010), 89–97.
[40]
Philippe Stellwag, Jakob Krainz, and Wolfgang Schröder-Preikschat. 2010. A waitfree dynamic storage allocator by adopting the helping queue pattern. In Proceedings of the 9th IASTED International Conference on Parallel and Distributed Computing and Networks (Innsbruck, Austria). ACTA Press, Calgary, AB, Canada, 79–87.
[41]
Shahar Timnat and Erez Petrank. 2014. A practical wait-free simulation for lock-free data structures. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’14). Association for Computing Machinery, New York, NY, 357–368.
[42]
Haosen Wen, Joseph Izraelevitz, Wentao Cai, H. Alan Beadle, and Michael L. Scott. 2018. Interval-based memory reclamation. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’18). Association for Computing Machinery, New York, NY, 1–13.
[43]
Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA’95). Association for Computing Machinery, New York, NY, 24–36.
[44]
Chaoran Yang and John Mellor-Crummey. 2016. A wait-free queue as fast as fetch-and-add. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16). Association for Computing Machinery, New York, NY, Article 16, 13 pages.
[45]
H. Yun, R. Mancuso, Z. Wu, and R. Pellizzoni. 2014. PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In Proceedings of the IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS’14). 155–166.
[46]
Yongwang Zhao and David Sanán. 2019. Rely-guarantee reasoning about concurrent memory management in zephyr RTOS. In Proceedings of the 31st International Conference on Computer Aided Verification (CAV’19), New York City, NY, USA, July 15-18, 2019, Proceedings, Part II (Lecture Notes in Computer Science), Isil Dillig and Serdar Tasiran (Eds.), Vol. 11562. Springer, 515–533.

Cited By

View all
  • (2023)VCMalloc: A Virtually Contiguous Memory AllocatorIEEE Transactions on Computers10.1109/TC.2023.330273172:12(3431-3442)Online publication date: 7-Aug-2023
  • (2022)A Classy Memory Management System (CyM2S) using an Isolated Dynamic Two-Level Memory Allocation (ID2LMA) Algorithm for the Real Time Embedded SystemsInternational Journal of Electrical and Electronics Research10.37391/ijeer.10025410:2(387-393)Online publication date: 30-Jun-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 21, Issue 4
July 2022
330 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3551651
  • Editor:
  • Tulika Mitra
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 23 August 2022
Online AM: 04 May 2022
Accepted: 01 April 2022
Revised: 01 April 2022
Received: 01 July 2021
Published in TECS Volume 21, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Memory allocator
  2. real-time systems
  3. concurrent algorithms
  4. lock-free
  5. wait-free

Qualifiers

  • Research-article
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)160
  • Downloads (Last 6 weeks)8
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)VCMalloc: A Virtually Contiguous Memory AllocatorIEEE Transactions on Computers10.1109/TC.2023.330273172:12(3431-3442)Online publication date: 7-Aug-2023
  • (2022)A Classy Memory Management System (CyM2S) using an Isolated Dynamic Two-Level Memory Allocation (ID2LMA) Algorithm for the Real Time Embedded SystemsInternational Journal of Electrical and Electronics Research10.37391/ijeer.10025410:2(387-393)Online publication date: 30-Jun-2022

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media