Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Optimizing Remote Communication in X10

Published: 11 October 2019 Publication History

Abstract

X10 is a partitioned global address space programming language that supports the notion of places; a place consists of some data and some lightweight tasks called activities. Each activity runs at a place and may invoke a place-change operation (using the at-construct) to synchronously perform some computation at another place. These place-change operations can be very expensive, as they need to copy all the required data from the current place to the remote place. However, identifying the necessary number of place-change operations and the required data during each place-change operation are non-trivial tasks, especially in the context of irregular applications (like graph applications) that contain complex code with large amounts of cross-referencing objects—not all of those objects may be actually required, at the remote place. In this article, we present AT-Com, a scheme to optimize X10 code with place-change operations.
AT-Com consists of two inter-related new optimizations: (i) AT-Opt, which minimizes the amount of data serialized and communicated during place-change operations, and (ii) AT-Pruning, which identifies/elides redundant place-change operations and does parallel execution of place-change operations. AT-Opt uses a novel abstraction, called abstract-place-tree, to capture place-change operations in the program. For each place-change operation, AT-Opt uses a novel inter-procedural analysis to precisely identify the data required at the remote place in terms of the variables in the current scope. AT-Opt then emits the appropriate code to copy the identified data-items to the remote place. AT-Pruning introduces a set of program transformation techniques to emit optimized code such that it avoids the redundant place-change operations. We have implemented AT-Com in the x10v2.6.0 compiler and tested it over the IMSuite benchmark kernels. Compared to the current X10 compiler, the AT-Com optimized code achieved a geometric mean speedup of 18.72× and 17.83× on a four-node (32 cores per node) Intel and two-node (16 cores per node) AMD system, respectively.

References

[1]
S. Agarwal, R. Barik, V. K. Nandivada, R. K. Shyamasundar, and P. Varma. 2008. Static detection of place locality and elimination of runtime checks. In APLAS, G. Ramalingam (Ed.). San Francisco, CA, 53--74.
[2]
S. Agarwal, R. Barik, V. Sarkar, and R. K. Shyamasundar. March 2007. May-happen-in-parallel analysis of X10 programs. In PPoPP. 183--193.
[3]
M. Alvanos, M. Farreras, E. Tiotto, J. N. Amaral, and X. Martorell. 2013. Improving communication in PGAS environments: Static and dynamic coalescing in UPC. In ICS. 129--138.
[4]
L. O. Andersen. 1994. Program analysis and specialization for the C programming language. Ph.D. Dissertation. University of Cophenhagen.
[5]
R. Barik and V. Sarkar. September, 2009. Interprocedural load elimination for dynamic optimization of parallel programs. In PACT. 41--52.
[6]
R. Barik, J. Zhao, D. Grove, I. Peshansky, Z. Budimlic, and V. Sarkar. 2011. Communication optimizations for distributed-memory X10 programs. In IPDPS. 1101--1113.
[7]
M. Bauer, S. Treichler, E. Slaughter, and A. Aiken. 2012. Legion: Expressing locality and independence with logical regions. In SC. IEEE Computer Society Press, Los Alamitos, CA, Article 66, 11 pages. http://dl.acm.org/citation.cfm?id=2388996.2389086.
[8]
B. L. Chamberlain, D. Callahan, and H. P. Zima. 2007. Parallel programmability and the chapel language. Int. J. High Perform. Comput. Appl. 21, 3 (Aug. 2007), 291--312.
[9]
S. Chandra, V. Saraswat, V. Sarkar, and R. Bodik. 2008. Type inference for locality analysis of distributed data structures. In PPoPP. 11--22.
[10]
W. Chen, C. Iancu, and K. Yelick. 2005. Communication optimizations for fine-grained UPC applications. In PACT. 267--278.
[11]
J. Choi, M. Gupta, M. Serrano, V. C. Sreedhar, and S. Midkiff. Nov, 1999. Escape analysis for Java. In OOPSLA. 1--19.
[12]
A. Georges, D. Buytaert, and L. Eeckhout. Oct, 2007. Statistically rigorous Java performance evaluation. In OOPSLA. 57--76.
[13]
R. Ghiya and L. J. Hendren. 1996. Is it a tree, a DAG, or a cyclic graph? A shape analysis for heap-directed pointers in C. In POPL. 1--15.
[14]
S. Gupta and V. K. Nandivada. 2015. IMSuite: A benchmark suite for simulating distributed algorithms. J. Parallel Distrib. Comput. 75 (2015), 1--19.
[15]
Habanero. 2009. Habanero Java. Retrieved from http://habanero.rice.edu/hj.
[16]
S. Hiranandani, K. Kennedy, and C. Tseng. 1992. Compiling fortran D for MIMD distributed-memory machines. Commun. ACM 35, 8 (Aug. 1992), 66--80.
[17]
S. Hiranandani, K. Kennedy, and C. Tseng. November, 1991. Compiler optimizations for fortran D on MIMD distributed-memory machines. In SC. 86--100.
[18]
M. Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam, and N. Shenoy. 1999. A global communication optimization technique based on data-flow analysis and linear algebra. ACM Trans. Program. Lang. Syst. 21, 6 (Nov. 1999), 1251--1297.
[19]
S. S. Muchnick. 1997. Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers Inc., USA.
[20]
V. K. Nandivada, J. Shirako, J. Zhao, and V. Sarkar. 2013. A transformation framework for optimizing task-parallel programs. ACM Trans. Program. Lang. Syst. 35, 1 (2013), 1--48.
[21]
J. Paudel, O. Tardieu, and J. N. Amarai. 2014. Optimizing shared data accesses in distributed-memory X10 systems. In HiPC. 1--10.
[22]
S. Pellegrini, T. Hoefler, and T. Fahringer. 2012. Exact dependence analysis for increased communication overlap. In Recent Advances in the Message Passing Interface, J. L. Träff, S. Benkner, and J. J. Dongarra (Eds.). 89--99.
[23]
N. Rinetzky and S. Sagiv. 2001. Interprocedural shape analysis for recursive programs. In CC. 133--149. http://dl.acm.org/citation.cfm?id=647477.727768.
[24]
A. Salcianu and M. Rinard. 2001. Pointer and escape analysis for multithreaded programs. In PPoPP. 12--23.
[25]
A. Sanz, R. Asenjo, J. Lopez, R. Larrosa, A. Navarro, V. Litvinov, S. Choi, and B. L. Chamberlain. 2012. Global data re-allocation via communication aggregation in chapel. In SBAC-PAD. 235--242.
[26]
V. Saraswat, B. Bloom, I. Peshansky, O. Tardieu, and D. Grove. 2016. X10 Language Specification Version 2.6.0. Retrieved from http://x10.sourceforge.net/documentation/languagespec/x10-260.pdf.
[27]
A. Sharma, D. Smith, J. Koehler, R. Barua, and M. Ferguson. 2014. Affine loop optimization based on modulo unrolling in chapel. In PGAS. Article 13, 12 pages.
[28]
A. Thangamani and V. K. Nandivada. Nov, 2018. Optimizing remote data transfers in X10. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018. 1--15.
[29]
J Whaley and M Rinard. Nov, 1999. Compositional pointer and escape analysis for Java programs. In OOPSLA. 187--206.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 16, Issue 4
December 2019
572 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3366460
Issue’s Table of Contents
© 2019 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 October 2019
Accepted: 01 July 2019
Revised: 01 May 2019
Received: 01 March 2019
Published in TACO Volume 16, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. PGAS languages
  2. Remote communication
  3. data serialization
  4. program transformation

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • SERB core research

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 631
    Total Downloads
  • Downloads (Last 12 months)108
  • Downloads (Last 6 weeks)11
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media