Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1122971.1122999acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
Article

"MAMA!": a memory allocator for multithreaded architectures

Published: 29 March 2006 Publication History

Abstract

While the high-performance computing world is dominated by distributed memory computer systems, applications that require random access into large shared data structures continue to motivate development of ever larger shared-memory parallel computers such as Cray's MTA and SGI's Altix systems.To support scalable application performance on such architectures, the memory allocator must be able to satisfy requests at a rate proportional to system size. For example, a 40 processor Cray MTA-2 can experience over 5000 concurrent requests, one from each of its 128 streams per processor. Cray's Eldorado, to be built upon the same network as Sandia's 10,000 processor Red Storm system, will sport thousands of multithreaded processors leading to hundreds of thousands of concurrent requests.In this paper, we present MAMA, a scalable shared-memory allocator designed to service any rate of concurrent requests. MAMA is distinguished from prior work on shared-memory allocators in that it employs software combining to aggregate requests serviced by a single heap structure: Hoard and MTA malloc necessitate repetition of the underlying heap data structures in proportion to processor count. Unlike Hoard, MAMA does not exploit processor-local data structures, limiting its applicability today to systems that sustain high utilization in the presence of global references such as Cray's MTA systems. We believe MAMA's relevance to other shared-memory systems will grow as they become increasingly multithreaded and, consequently, more tolerant of references to non-local memory.We show not only that MAMA scales on Cray MTA systems, but also that it delivers absolute performance competitive with allocators employing heap repetition. In addition, we demonstrate that performance of repetition-based allocators does not scale under heavy loads. We also argue more generally that methods using repetition alone to support concurrency are subject to an impractical tradeoff of scalability against space consumption: when scaled up to meet increasing concurrency demands, repetition-based allocators necessarily house unused space p2 quadratic in the number of processors p. Hierarchical structure may reduce this to p log p, but in building large-scale shared-memory parallel computers, unused memory more than linear in p is unacceptable. MAMA, in contrast, scales to arbitrarily large systems while consuming memory that increases only linearly with system and request size.MAMA is of both theoretical interest for its use of novel algorithmic techniques and practical importance as the concurrency upon which shared-memory performance depends continues to grow and multithreaded architectures emerge that are increasingly latency tolerant. While our work is a very recent contribution to memory allocation technology, MAMA already has been incorporated into production as the cornerstone for global memory allocation in Cray's multithreaded systems.

References

[1]
G. Alverson, P. Briggs, S. Coatney, S. Kahan, and R. Korry. Tera hardware-software cooperation. In Proceedings of the 1997 ACM/IEEE conference on Supercomputing, 1997.
[2]
G. Alverson, S. Kahan, R. Korry, C. McCann, B. Smith. Scheduling on the Tera MTA. In Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing, 1995.
[3]
E. Berger, K. McKinley, R. Blumofe, and P. Wilson. Hoard: a scalable memory allocator for multithreaded applications. In Proceedings of ASPLOS'00, 2000.
[4]
T. Cormen, C. Leiserson, R. Rivest. Introduction to Algorithms MIT Press, 1990.
[5]
J. Feo, D. Harper, S. Kahan, and P. Konecny. ELDORADO. In Proceedings of the Second Conference on Computing Frontiers, 2005, Ischia, Italy, May 4-6, 2005.
[6]
M. Michael. Scalable Lock-Free Dynamic Memory Allocation In Programming Language Design and Implementation, 2004, Washington, DC, June 9-11, 2004.
[7]
C. Okasaki. Purely functional data structures. Cambridge University Press, 1998.
[8]
J-R. Sack, T. Strothotte. A characterization of heaps and its applications. Information and Computation, 86(1):69-86,May 1990.
[9]
N. Shavit and A. Zemach. Combining funnels: a dynamic approach to software combining. Journal of Parallel and Distributed Computing, 11:pages 1355--1387, 2000.

Cited By

View all
  • (2014)Towards software performance engineering for multicore and manycore systemsACM SIGMETRICS Performance Evaluation Review10.1145/2567529.256753141:3(2-11)Online publication date: 10-Jan-2014
  • (2014)Hybrid Parallel Programming Models for AMR Neutron Monte-Carlo TransportSNA + MC 2013 - Joint International Conference on Supercomputing in Nuclear Applications + Monte Carlo10.1051/snamc/201404202(04202)Online publication date: 6-Jun-2014
  • (2013)Introducing kernel-level page reuse for high performance computingProceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness10.1145/2492408.2492414(1-9)Online publication date: 16-Jun-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
March 2006
258 pages
ISBN:1595931899
DOI:10.1145/1122971
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Eldorado
  2. MTA
  3. combining
  4. malloc
  5. multithreading
  6. parallel memory allocation

Qualifiers

  • Article

Conference

PPoPP06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2014)Towards software performance engineering for multicore and manycore systemsACM SIGMETRICS Performance Evaluation Review10.1145/2567529.256753141:3(2-11)Online publication date: 10-Jan-2014
  • (2014)Hybrid Parallel Programming Models for AMR Neutron Monte-Carlo TransportSNA + MC 2013 - Joint International Conference on Supercomputing in Nuclear Applications + Monte Carlo10.1051/snamc/201404202(04202)Online publication date: 6-Jun-2014
  • (2013)Introducing kernel-level page reuse for high performance computingProceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness10.1145/2492408.2492414(1-9)Online publication date: 16-Jun-2013
  • (2012)The myrmics memory allocatorACM SIGPLAN Notices10.1145/2426642.225900147:11(15-24)Online publication date: 15-Jun-2012
  • (2012)The myrmics memory allocatorProceedings of the 2012 international symposium on Memory Management10.1145/2258996.2259001(15-24)Online publication date: 15-Jun-2012
  • (2011)SFMallocProceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2011.57(253-263)Online publication date: 10-Oct-2011
  • (2010)Hashing strategies for the Cray XMT2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)10.1109/IPDPSW.2010.5470688(1-8)Online publication date: Apr-2010
  • (2009)Memory management thread for heap allocation intensive sequential applicationsProceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture10.1145/1621960.1621967(35-42)Online publication date: 13-Sep-2009
  • (2008)Branch-and-Bound interval global optimization on shared memory multiprocessorsOptimization Methods & Software10.1080/1055678080208630023:5(689-701)Online publication date: 1-Oct-2008

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media