Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3019120.3019123acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

A modern memory management system for OpenMP

Published: 13 November 2016 Publication History

Abstract

Modern computers with multi-/many-core processors and accelerators feature a sophisticated and deep memory hierarchy, potentially including distinct main memory, high-bandwidth memory, texture memory and scratchpad memory. The performance characteristics of these memories are varied, and studies have demonstrated the importance of using them effectively.
In this paper, we propose an extension of the OpenMP API to address the needs of programmers to efficiently optimize their applications to use new memory technologies in a platform-agnostic and portable fashion. Our proposal separately exposes the characteristics of memory resources (such as kind) and the characteristics of allocations (such as alignment), and is fully compatible with existing OpenMP constructs.

References

[1]
S. Borkar and A. A. Chien, "The future of microprocessors," Commun. ACM, vol. 54, no. 5, pp. 67--77, May 2011.
[2]
J. Handy, "Where are DRAM interfaces headed?" April 2014, accessed September 8, 2016. {Online}. Available: http://www.eetimes.com/author.asp?section_id=36&doc_id=1321783
[3]
R. Smith, "AMD HBM deep dive," May 2015, accessed September 8, 2016. {Online}. Available: http://www.anandtech.com/show/9266/amd-hbm-deep-dive
[4]
R. Smith, "Hot Chips 2016: Memory Vendors Discuss Ideas for Future Memory Tech---DDR5, Cheap HBM, & More," August 2016, accessed September 8, 2016. {Online}. Available: http://www.anandtech.com/show/10589/hot-chips-2016-memory-vendors-discuss-ideas-for-future-memory-tech-ddr5-cheap-hbm-more
[5]
T. Valich, "NVIDIA Unveils Pascal GPU: 16GB of memory, 1TB/s Bandwidth," November 2015, accessed September 8, 2016. {Online}. Available: http://vrworld.com/2015/11/16/nvidia-unveils-pascal-gpu-16gb-of-memory-1tbs-bandwidth/
[6]
A. Sodani, R. Gramunt, J. Corbal, H.-S. Kim, K. Vinod, S. Chinthamani, S. Hutsell, R. Agarwal, and Y.-C. Liu, "Knights Landing: Second-generation Intel<sup>®</sup> Xeon Phi™ product," IEEE Micro, vol. 36, no. 2, pp. 34--46, 2016.
[7]
K. Akeley, "Reality engine graphics," in Proceedings of the 20th annual conference on Computer graphics and interactive techniques. ACM, 1993, pp. 109--116.
[8]
R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel, "Scratchpad memory: design alternative for cache on-chip memory in embedded systems," in Proceedings of the tenth international symposium on Hardware/software codesign. ACM, 2002, pp. 73--78.
[9]
J. Kim and Y. Kim, "HBM: Memory solution for bandwidth-hungry processors," in Hot Chips, vol. 26, 2014.
[10]
D. Doerfler, J. Deslippe, S. Williams, L. Oliker, B. Cook, T. Kurth, M. Lobet, T. Malas, J.-L. Vay, and H. Vincenti, "Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor," in Proceedings of the ISC 2016 IXPUG Workshop, June 2016.
[11]
A. Heinecke, A. Breuer, M. Bader, and P. Dubey, High Order Seismic Simulations on the Intel Xeon Phi Processor (Knights Landing). Cham: Springer International Publishing, 2016, pp. 343--362.
[12]
G. Ruetsch and P. Micikevicius, "Optimizing Matrix Transpose in CUDA," NVIDIA, Tech. Rep., 2009.
[13]
M. S. Friedrichs, P. Eastman, V. Vaidyanathan, M. Houston, S. Legrand, A. L. Beberg, D. L. Ensign, C. M. Bruns, and V. S. Pande, "Accelerating Molecular Dynamic Simulation on Graphics Processing Units," Journal of Computational Chemistry, vol. 30, no. 6, pp. 864--872, 2009.
[14]
J. H. Saltzer and M. D. Schroeder, "The protection of information in computer systems," Proceedings of the IEEE, vol. 63, no. 9, pp. 1278--1308, 1975.
[15]
J. L. Hennessy and D. A. Patterson, Computer architecture: a quantitative approach. Elsevier, 2011.
[16]
A. Cox and R. Fowler, "The implementation of a coherent memory abstraction on a NUMA multiprocessor: Experiences with platinum," in Proceedings of the Twelfth ACM Symposium on Operating Systems Principles, ser. SOSP '89. New York, NY, USA: ACM, 1989, pp. 32--44.
[17]
F. Angiolini, F. Menichelli, A. Ferrero, L. Benini, and M. Olivieri, "A post-compiler approach to scratchpad mapping of code," in Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems. ACM, 2004, pp. 259--267.
[18]
O. Avissar, R. Barua, and D. Stewart, "Heterogeneous memory management for embedded systems," in Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems. ACM, 2001, pp. 34--43.
[19]
O. Avissar, R. Barua, and D. Stewart, "An optimal memory allocation scheme for scratch-pad-based embedded systems," ACM Transactions on Embedded Computing Systems (TECS), vol. 1, no. 1, pp. 6--26, 2002.
[20]
M. Kandemir, J. Ramanujam, J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh, "Dynamic management of scratch-pad memory space," in Proceedings of the 38th annual Design Automation Conference. ACM, 2001, pp. 690--695.
[21]
S. Steinke, L. Wehmeyer, B.-S. Lee, and P. Marwedel, "Assigning program and data objects to scratchpad for energy reduction," in Design, Automation and Test in Europe Conference and Exhibition, 2002. Proceedings. IEEE, 2002, pp. 409--415.
[22]
S. Udayakumaran and R. Barua, "Compiler-decided dynamic memory allocation for scratch-pad based embedded systems," in Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems. ACM, 2003, pp. 276--286.
[23]
Advanced Configuration and Power Interface Specification Version 6.1, Unified EFI, January 2016. {Online}. Available: http://www.uefi.org/sites/default/files/resources/ACPI_61.pdf
[24]
"Linux Kernel HugeTLB Documentation," accessed Sept. 9, 2016. {Online}. Available: http://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
[25]
OpenMP Architecture Review Board, "OpenMP application program interface version 4.5," November 2015. {Online}. Available: http://www.openmp.org/mp-documents/spec45.pdf
[26]
O. W. Group et al., "The OpenACC application programming interface," 2011.
[27]
J. E. Stone, D. Gohara, and G. Shi, "Opencl: A parallel programming standard for heterogeneous computing systems," IEEE Des. Test, vol. 12, no. 3, pp. 66--73, May 2010.
[28]
J. Nickolls, I. Buck, M. Garland, and K. Skadron, "Scalable parallel programming with CUDA," Queue, vol. 6, no. 2, pp. 40--53, Mar. 2008.
[29]
J. Jeffers and J. Reinders, Intel Xeon Phi Coprocessor High-Performance Programming. Elsevier Science, 2013.
[30]
M. Harris, "Unified memory in CUDA 6," November 2013, accessed September 8, 2016. {Online}. Available: https://devblogs.nvidia.com/parallelforall/unified-memory-in-cuda-6/
[31]
H. C. Edwards, C. R. Trott, and D. Sunderland, "Kokkos: Enabling manycore performance portability through polymorphic memory access patterns," Journal of Parallel and Distributed Computing, vol. 74, no. 12, pp. 3202--3216, 2014.
[32]
D. Gibson and A. Litke, "libhugetlbfs." {Online}. Available: http://libhugetlbfs.sourceforge.net
[33]
A. Kleen, "A NUMA API for Linux," Novel Inc, 2005.
[34]
C. Cantalupo, V. Venkatesan, J. Hammond, K. Czurlyo, and S. D. Hammond, "memkind: An extensible heap memory manager for heterogeneous memory platforms and mixed memory policies." Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States), Tech. Rep., 2015.
[35]
E. D. Berger, K. S. McKinley, R. D. Blumofe, and P. R. Wilson, "Hoard: A scalable memory allocator for multithreaded applications," ACM Sigplan Notices, vol. 35, no. 11, pp. 117--128, 2000.
[36]
J. Evans, "A scalable concurrent malloc (3) implementation for FreeBSD," in Proc. of the BSDCan Conference, Ottawa, Canada, 2006.
[37]
S. Ghemawat and P. Menage, "TCmalloc: Thread-caching malloc," 2009. {Online}. Available: http://googperftools.sourceforge.net/doc/tcmalloc.html
[38]
Boost Community, "Boost C++ Libraries." {Online}. Available: http://www.boost.org
[39]
J. Reinders, Intel<sup>®</sup> threading building blocks: outfitting C++ for multi-core processor parallelism. O'Reilly Media, 2007.
[40]
M. Owens and G. Allen, SQLite. Springer, 2010.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WACCPD '16: Proceedings of the Third International Workshop on Accelerator Programming Using Directives
November 2016
94 pages
ISBN:9781509061525

Sponsors

In-Cooperation

Publisher

IEEE Press

Publication History

Published: 13 November 2016

Check for updates

Qualifiers

  • Research-article

Conference

SC16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 7 of 14 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media