Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/IPDPS.2005.219guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Fast Address Translation Techniques for Distributed Shared Memory Compilers

Published: 04 April 2005 Publication History

Abstract

The Distributed Shared Memory (DSM) model is designed to leverage the ease ofprogramming of the shared memory paradigm, while enabling the highperformance by expressing locality as in the messagepassing model. Experience, however, has shown that DSM programming languages, such as UPC, may be unable to deliver the expected high level of performance. Initial investigations have shown that among the major reasons is the overhead of translating from the UPC memory model to the target architecture virtualaddresses space, which can be very costly. Experimental measurements have shown this overhead increasing execution time by up to three orders of magnitude. Previous work has also shown that some of this overhead can be avoided by hand-tuning, which on the other hand can significantly decrease the UPC ease of use. In addition, such tuning can only improve the performance of local shared accesses but not remote shared accesses. Therefore, a new technique that resembles the Translation Look Aside Buffers (TLBs) is proposed here. This technique, which is called the Memory Model Translation Buffer (MMTB) has been implemented in the GCC-UPC compiler using two alternative strategies, full-table (FT) and reduced-table (RT). It will be shown that the MMTB strategies can lead to a performance boost of up to 700%, enabling ease-of-programming while performing at a similar performance to hand-tuned UPC and MPI codes.

References

[1]
Brooks, Eugene and Warren Karen, Development and Evaluation of an Efficient Parallel Programming Methodology, Spanning Uniprocessor, Symmetric Shared-memory Multiprocessor and Distributed-memory massively Parallel Architectures, Poster SuperComputing 1995, San Diego, CA, December 3-8, 1995.
[2]
Cantonnet François, Yao Yiyi, Annareddy Smita, Mohamed Ahmed, El-Ghazawi Tarek, Performance Monitoring and Evaluation of a UPC Implementation on a NUMA architecture, International Parallel and Distributed Processing Symposium (IPDPS), Performance Modeling, Evaluation and Optimization of Parallel and Distributed Systems (PMEO) workshop, 2003, Nice France.
[3]
Carlson William and Draper Jesse, Distributed Data Access in AC, Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP), Santa Barbara, CA, July 19-21, 1995, pp.39-47.
[4]
Culler, Dusseau Andrea, Goldstein Seth Copen, Krishnamurthy Arvind, Lumetta Steven, Von Eicken Thorsten and Yelick Katherine, Parallel Programming in Split-C, Proceedings of SuperComputing 1993, Portland, OR, November 15-19, 1993.
[5]
El-Ghazawi Tarek, Programming in UPC, Tutorial (http://upc.gwu.edu), April 2001.
[6]
El-Ghazawi Tarek and Chauvin Sébastien, UPC Benchmarking Issues, 30th Annual Conference IEEE International Conference on Parallel Processing, 2001 (ICPP01) Pages: 365-372.
[7]
El-Ghazawi Tarek and Cantonnet François, UPC Performance and Potential: A NPB Experimental Study, SuperComputing 2002, IEEE, Baltimore MD, November 2002.
[8]
El-Ghazawi Tarek, Carlson William and Draper Jesse, UPC Language Specifications v1.1 (http://upc.gwu.edu), October 2003.
[9]
Gaeke Brian and Yelick Katherine, GUPS (Giga-Updates per Second) Benchmark, Berkeley, 2002.
[10]
Intrepid, The GCC UPC Compiler for SGI Origin Family v3.2.3.5 (http://www.intrepid.com/upc/)
[11]
ISO/IEC 9899:1999, Programming languages -- C, December 1999.
[12]
McCalpin John, Sustainable memory bandwidth in current high performance computers, Technical report, Advanced Systems Division, SGI., October 12, 1995.
[13]
NAS Parallel Benchmark Suite, NASA Advanced Supercomputing, 2002, http://www.nas.nasa.gov/Software/NPB

Cited By

View all
  • (2015)Enabling PGAS Productivity with Hardware Support for Shared Address MappingACM Transactions on Architecture and Code Optimization10.1145/284268612:4(1-26)Online publication date: 22-Dec-2015
  • (2015)Assessing memory access performance of chapel through synthetic benchmarksProceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2015.157(1147-1150)Online publication date: 4-May-2015
  • (2008)A PGAS-Based Algorithm for the Longest Common Subsequence ProblemProceedings of the 14th international Euro-Par conference on Parallel Processing10.1007/978-3-540-85451-7_70(654-664)Online publication date: 26-Aug-2008
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
April 2005
ISBN:0769523129

Publisher

IEEE Computer Society

United States

Publication History

Published: 04 April 2005

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2015)Enabling PGAS Productivity with Hardware Support for Shared Address MappingACM Transactions on Architecture and Code Optimization10.1145/284268612:4(1-26)Online publication date: 22-Dec-2015
  • (2015)Assessing memory access performance of chapel through synthetic benchmarksProceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2015.157(1147-1150)Online publication date: 4-May-2015
  • (2008)A PGAS-Based Algorithm for the Longest Common Subsequence ProblemProceedings of the 14th international Euro-Par conference on Parallel Processing10.1007/978-3-540-85451-7_70(654-664)Online publication date: 26-Aug-2008
  • (2007)Towards a complexity model for design and analysis of PGAS-based algorithmsProceedings of the Third international conference on High Performance Computing and Communications10.5555/2401945.2402020(672-682)Online publication date: 26-Sep-2007
  • (2006)Shared memory programming for large scale machinesProceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/1133981.1133995(108-117)Online publication date: 11-Jun-2006
  • (2006)Shared memory programming for large scale machinesACM SIGPLAN Notices10.1145/1133255.113399541:6(108-117)Online publication date: 11-Jun-2006
  • (2005)Toward an application support layerProceedings of the 6th international conference on Parallel Processing and Applied Mathematics10.1007/11752578_110(912-919)Online publication date: 11-Sep-2005

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media