research-article

A modern memory management system for OpenMP

Authors:

S. J. Pennycook,

R. NarayanaswamyAuthors Info & Claims

WACCPD '16: Proceedings of the Third International Workshop on Accelerator Programming Using Directives

Pages 25 - 35

Published: 13 November 2016 Publication History

Abstract

Modern computers with multi-/many-core processors and accelerators feature a sophisticated and deep memory hierarchy, potentially including distinct main memory, high-bandwidth memory, texture memory and scratchpad memory. The performance characteristics of these memories are varied, and studies have demonstrated the importance of using them effectively.

In this paper, we propose an extension of the OpenMP API to address the needs of programmers to efficiently optimize their applications to use new memory technologies in a platform-agnostic and portable fashion. Our proposal separately exposes the characteristics of memory resources (such as kind) and the characteristics of allocations (such as alignment), and is fully compatible with existing OpenMP constructs.

References

[1]

S. Borkar and A. A. Chien, "The future of microprocessors," Commun. ACM, vol. 54, no. 5, pp. 67--77, May 2011.

Digital Library

[2]

J. Handy, "Where are DRAM interfaces headed?" April 2014, accessed September 8, 2016. {Online}. Available: http://www.eetimes.com/author.asp?section_id=36&doc_id=1321783

[3]

R. Smith, "AMD HBM deep dive," May 2015, accessed September 8, 2016. {Online}. Available: http://www.anandtech.com/show/9266/amd-hbm-deep-dive

[4]

R. Smith, "Hot Chips 2016: Memory Vendors Discuss Ideas for Future Memory Tech---DDR5, Cheap HBM, & More," August 2016, accessed September 8, 2016. {Online}. Available: http://www.anandtech.com/show/10589/hot-chips-2016-memory-vendors-discuss-ideas-for-future-memory-tech-ddr5-cheap-hbm-more

[5]

T. Valich, "NVIDIA Unveils Pascal GPU: 16GB of memory, 1TB/s Bandwidth," November 2015, accessed September 8, 2016. {Online}. Available: http://vrworld.com/2015/11/16/nvidia-unveils-pascal-gpu-16gb-of-memory-1tbs-bandwidth/

[6]

A. Sodani, R. Gramunt, J. Corbal, H.-S. Kim, K. Vinod, S. Chinthamani, S. Hutsell, R. Agarwal, and Y.-C. Liu, "Knights Landing: Second-generation Intel<sup>®</sup> Xeon Phi™ product," IEEE Micro, vol. 36, no. 2, pp. 34--46, 2016.

Digital Library

[7]

K. Akeley, "Reality engine graphics," in Proceedings of the 20th annual conference on Computer graphics and interactive techniques. ACM, 1993, pp. 109--116.

Digital Library

[8]

R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel, "Scratchpad memory: design alternative for cache on-chip memory in embedded systems," in Proceedings of the tenth international symposium on Hardware/software codesign. ACM, 2002, pp. 73--78.

Digital Library

[9]

J. Kim and Y. Kim, "HBM: Memory solution for bandwidth-hungry processors," in Hot Chips, vol. 26, 2014.

[10]

D. Doerfler, J. Deslippe, S. Williams, L. Oliker, B. Cook, T. Kurth, M. Lobet, T. Malas, J.-L. Vay, and H. Vincenti, "Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor," in Proceedings of the ISC 2016 IXPUG Workshop, June 2016.

[11]

A. Heinecke, A. Breuer, M. Bader, and P. Dubey, High Order Seismic Simulations on the Intel Xeon Phi Processor (Knights Landing). Cham: Springer International Publishing, 2016, pp. 343--362.

[12]

G. Ruetsch and P. Micikevicius, "Optimizing Matrix Transpose in CUDA," NVIDIA, Tech. Rep., 2009.

[13]

M. S. Friedrichs, P. Eastman, V. Vaidyanathan, M. Houston, S. Legrand, A. L. Beberg, D. L. Ensign, C. M. Bruns, and V. S. Pande, "Accelerating Molecular Dynamic Simulation on Graphics Processing Units," Journal of Computational Chemistry, vol. 30, no. 6, pp. 864--872, 2009.

[14]

J. H. Saltzer and M. D. Schroeder, "The protection of information in computer systems," Proceedings of the IEEE, vol. 63, no. 9, pp. 1278--1308, 1975.

[15]

J. L. Hennessy and D. A. Patterson, Computer architecture: a quantitative approach. Elsevier, 2011.

Digital Library

[16]

A. Cox and R. Fowler, "The implementation of a coherent memory abstraction on a NUMA multiprocessor: Experiences with platinum," in Proceedings of the Twelfth ACM Symposium on Operating Systems Principles, ser. SOSP '89. New York, NY, USA: ACM, 1989, pp. 32--44.

Digital Library

[17]

F. Angiolini, F. Menichelli, A. Ferrero, L. Benini, and M. Olivieri, "A post-compiler approach to scratchpad mapping of code," in Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems. ACM, 2004, pp. 259--267.

Digital Library

[18]

O. Avissar, R. Barua, and D. Stewart, "Heterogeneous memory management for embedded systems," in Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems. ACM, 2001, pp. 34--43.

Digital Library

[19]

O. Avissar, R. Barua, and D. Stewart, "An optimal memory allocation scheme for scratch-pad-based embedded systems," ACM Transactions on Embedded Computing Systems (TECS), vol. 1, no. 1, pp. 6--26, 2002.

Digital Library

[20]

M. Kandemir, J. Ramanujam, J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh, "Dynamic management of scratch-pad memory space," in Proceedings of the 38th annual Design Automation Conference. ACM, 2001, pp. 690--695.

Digital Library

[21]

S. Steinke, L. Wehmeyer, B.-S. Lee, and P. Marwedel, "Assigning program and data objects to scratchpad for energy reduction," in Design, Automation and Test in Europe Conference and Exhibition, 2002. Proceedings. IEEE, 2002, pp. 409--415.

Digital Library

[22]

S. Udayakumaran and R. Barua, "Compiler-decided dynamic memory allocation for scratch-pad based embedded systems," in Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems. ACM, 2003, pp. 276--286.

Digital Library

[23]

Advanced Configuration and Power Interface Specification Version 6.1, Unified EFI, January 2016. {Online}. Available: http://www.uefi.org/sites/default/files/resources/ACPI_61.pdf

[24]

"Linux Kernel HugeTLB Documentation," accessed Sept. 9, 2016. {Online}. Available: http://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt

[25]

OpenMP Architecture Review Board, "OpenMP application program interface version 4.5," November 2015. {Online}. Available: http://www.openmp.org/mp-documents/spec45.pdf

[26]

O. W. Group et al., "The OpenACC application programming interface," 2011.

[27]

J. E. Stone, D. Gohara, and G. Shi, "Opencl: A parallel programming standard for heterogeneous computing systems," IEEE Des. Test, vol. 12, no. 3, pp. 66--73, May 2010.

[28]

J. Nickolls, I. Buck, M. Garland, and K. Skadron, "Scalable parallel programming with CUDA," Queue, vol. 6, no. 2, pp. 40--53, Mar. 2008.

Digital Library

[29]

J. Jeffers and J. Reinders, Intel Xeon Phi Coprocessor High-Performance Programming. Elsevier Science, 2013.

Digital Library

[30]

M. Harris, "Unified memory in CUDA 6," November 2013, accessed September 8, 2016. {Online}. Available: https://devblogs.nvidia.com/parallelforall/unified-memory-in-cuda-6/

[31]

H. C. Edwards, C. R. Trott, and D. Sunderland, "Kokkos: Enabling manycore performance portability through polymorphic memory access patterns," Journal of Parallel and Distributed Computing, vol. 74, no. 12, pp. 3202--3216, 2014.

Digital Library

[32]

D. Gibson and A. Litke, "libhugetlbfs." {Online}. Available: http://libhugetlbfs.sourceforge.net

[33]

A. Kleen, "A NUMA API for Linux," Novel Inc, 2005.

[34]

C. Cantalupo, V. Venkatesan, J. Hammond, K. Czurlyo, and S. D. Hammond, "memkind: An extensible heap memory manager for heterogeneous memory platforms and mixed memory policies." Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States), Tech. Rep., 2015.

[35]

E. D. Berger, K. S. McKinley, R. D. Blumofe, and P. R. Wilson, "Hoard: A scalable memory allocator for multithreaded applications," ACM Sigplan Notices, vol. 35, no. 11, pp. 117--128, 2000.

Digital Library

[36]

J. Evans, "A scalable concurrent malloc (3) implementation for FreeBSD," in Proc. of the BSDCan Conference, Ottawa, Canada, 2006.

[37]

S. Ghemawat and P. Menage, "TCmalloc: Thread-caching malloc," 2009. {Online}. Available: http://googperftools.sourceforge.net/doc/tcmalloc.html

[38]

Boost Community, "Boost C++ Libraries." {Online}. Available: http://www.boost.org

[39]

J. Reinders, Intel<sup>®</sup> threading building blocks: outfitting C++ for multi-core processor parallelism. O'Reilly Media, 2007.

Digital Library

[40]

M. Owens and G. Allen, SQLite. Springer, 2010.

Recommendations

Enabling Hybrid PCM Memory System with Inherent Memory Management
RACS '16: Proceedings of the International Conference on Research in Adaptive and Convergent Systems

Replacing the traditional volatile main memory, e.g., DRAM, with a non-volatile phase change memory (PCM) has become a possible solution to reduce the energy consumption of computing systems. To further reduce the bit cost of PCM, the development trend ...
Morphable memory system: a robust architecture for exploiting multi-level phase change memories
ISCA '10

Phase Change Memory (PCM) is emerging as a scalable and power efficient technology to architect future main memory systems. The scalability of PCM is enhanced by the property that PCM devices can store multiple bits per cell. While such Multi-Level Cell ...
Energy-aware flash memory management in virtual memory system

The traditional virtual memory system is designed for decades assuming a magnetic disk as the secondary storage. Recently, flash memory becomes a popular storage alternative for many portable devices with the continuing improvements on its capacity, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WACCPD '16: Proceedings of the Third International Workshop on Accelerator Programming Using Directives

November 2016

94 pages

ISBN:9781509061525

Program Chairs:
Sunita Chandrasekaran
University of Delaware
,
Guido Juckeland
HZDR, Germany

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing
IEEE-CS\DATC: IEEE Computer Society

In-Cooperation

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

IEEE Press

Publication History

Published: 13 November 2016

Check for updates

Qualifiers

Research-article

Conference

SC16

Sponsor:

SIGHPC
IEEE-CS\DATC

SC16: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 13 - 18, 2016

Utah, Salt Lake City

Acceptance Rates

Overall Acceptance Rate 7 of 14 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

Media

Figures

Other

Tables

View Table of Contents