Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2751205.2751219acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations

Published: 08 June 2015 Publication History

Abstract

Remote memory access (RMA) is an emerging high-performance programming model that uses RDMA hardware directly. Yet, accessing remote memories cannot invoke activities at the target which complicates implementation and limits performance of data-centric algorithms. We propose Active Access (AA), a mechanism that integrates well-known active messaging (AM) semantics with RMA to enable high-performance distributed data-centric computations. AA supports a new programming model where the user specifies handlers that are triggered when incoming puts and gets reference designated addresses. AA is based on a set of extensions to the Input/Output Memory Management Unit (IOMMU), a unit that provides high-performance hardware support for remapping I/O accesses to memory. We illustrate that AA outperforms existing AM and RMA designs, accelerates various codes such as distributed hashtables or logging schemes, and enables new protocols such as incremental checkpointing for RMA. We also discuss how extended IOMMUs can support a virtualized global address space in a distributed system that offers features known from on-node memory virtualization. We expect that AA and other IOMMU features can enhance the design of HPC operating and runtime systems in large computing centers.

References

[1]
AMD. Software Optimization Guide for the AMD64 Processors, 2005.
[2]
AMD. AMD I/O Virtualization Technology (IOMMU) Spec., 2011.
[3]
N. Amit, M. Ben-Yehuda, and B.-A. Yassour. IOMMU: strategies for mitigating the IOTLB bottleneck. In Proc. of Intl. Conf. on Comp. Arch., ISCA'10, pages 256--274, 2010.
[4]
N. Amit et al. vIOMMU: efficient IOMMU emulation. In USENIX Ann. Tech. Conf., USENIXATC'11, pages 6--6, 2011.
[5]
W. J. Armstrong et al. Advanced virtualization capabilities of POWER5 systems. IBM J. Res. Dev., 49(4/5):523--532, 2005.
[6]
B. W. Barrett et al. The Portals 4.0 network programming interface, 2012. Sandia National Laboratories.
[7]
Ben-Yehuda et al. The price of safety: Evaluating IOMMU performance. In Ottawa Linux Symp.(OLS), pages 9--20, 2007.
[8]
M. Ben-Yehuda et al. Utilizing IOMMUs for virtualization in Linux and Xen. In In Proc. of the Linux Symp., 2006.
[9]
M. Besta and T. Hoefler. Fault Tolerance for Remote Memory Access Programming Models. In Proc. of the 23rd Intl Symp. on High-perf. Par. and Dist. Comp., HPDC '14, pages 37--48, 2014.
[10]
N. Binkert et al. The gem5 simulator. SIGARCH Comput. Archit. News, 39(2):1--7, Aug. 2011.
[11]
D. Bonachea. GASNet Spec., v1. Tech. Rep. UCB/CSD-02-1207, 2002.
[12]
S. Boyd-Wickizer and N. Zeldovich. Tolerating malicious device drivers in Linux. In USENIX Ann. Tech. Conf., USENIXATC'10, pages 9--9, 2010.
[13]
Coral Collaboration. Coral Procurement Benchmarks. In Coral Vendor Meeting, 2013.
[14]
A. Dragojević et al. FaRM: fast remote memory. In Proc. of the 11th USENIX Symp. on Net. Syst. Des. and Impl. (NSDI 14). USENIX, 2014.
[15]
H. Esmaeilzadeh et al. Dark silicon and the end of multicore scaling. In Proc. of Intl. Symp. Comp. Arch., ISCA '11, pages 365--376, 2011.
[16]
Z. Fang et al. Active Memory Operations. In Proc. of the 21st Ann. Intl Conf. on Supercomp., ICS '07, pages 232--241, 2007.
[17]
B. Fitzpatrick. Distributed caching with memcached. Linux journal, 2004(124):5, 2004.
[18]
R. Geambasu et al. Comet: An Active Distributed Key-value Store. In USENIX Conf. on Op. Sys. Des. and Impl., OSDI'10, pages 1--13, 2010.
[19]
R. Gerstenberger, M. Besta, and T. Hoefler. Enabling Highly-scalable Remote Memory Access Programming with MPI-3 One Sided. In Proc. of ACM/IEEE Supercomputing, SC '13, pages 53:1--53:12, 2013.
[20]
T. Hoefler et al. Remote Memory Access Programming in MPI-3. ACM Trans. Par. Comp. (TOPC), 2015. accepted for publication on Dec. 4th.
[21]
IBM. Logical Partition Security in the IBM @server pSeries 690. 2002.
[22]
Intel. Intel Virtualization Technology for Directed I/O (VT-d) Architecture Specification, September 2013.
[23]
ISO Fortran Committee. Fortran 2008 Standard (ISO/IEC 1539-1:2010). 2010.
[24]
Y. Kim, D. Broman, J. Cai, and A. Shrivastaval. WCET-aware dynamic code management on scratchpads for software-managed multicores. In IEEE Real-Time and Emb. Tech. and App. Symp. (RTAS), 2014.
[25]
R. Mijat and A. Nightingale. The ARM Architecture Virtualization Extensions and the importance of System MMU for virtualized solutions and beyond, 2011. ARM White Paper.
[26]
MPI Forum. MPI: A Message-Passing Interface Standard. Ver. 3, 2012.
[27]
S. Novakovic et al. Scale-out NUMA. In Intl. Conf. on Arch. Sup. for Prog. Lang. and Op. Sys., ASPLOS '14, pages 3--18, 2014.
[28]
R. Olsson. PktGen the linux packet generator. In Proc. of the Linux Symp., Ottawa, Canada, volume 2, pages 11--24, 2005.
[29]
Oracle. UltraSPARC Virtual Machine Spec. 2010.
[30]
M. Oskin, F. T. Chong, and T. Sherwood. Active Pages: A Computation Model for Intelligent Memory. In Proc. of the 25th Ann. Intl Symp. on Comp. Arch., ISCA '98, pages 192--203, 1998.
[31]
D. Patterson. The top 10 innovations in the new NVIDIA Fermi architecture, and the top 3 next challenges. NVIDIA Whitepaper, 2009.
[32]
PCI-SIG. PCI Express Base Spec. Rev. 3.0. 2010.
[33]
PCI-SIG. PCI-SIG I/O Virtualization (IOV) Specifications, 2013.
[34]
S. Pope and D. Riddoch. Introduction to OpenOnload, 2011. SolarFlare White Paper.
[35]
R. Recio, B. Metzler, P. Culley, J. Hilland, and D. Garcia. A remote direct memory access protocol specification, Oct 2007. RFC 5040.
[36]
L. Rizzo. netmap: A novel framework for fast packet i/o. In USENIX Annual Technical Conference, pages 101--112, 2012.
[37]
The InfiniBand Trade Association. Infiniband Architecture Spec. Vol. 1-2, Rel. 1.3. InfiniBand Trade Association, 2004.
[38]
UPC Consortium. UPC language spec., v1.2. Technical report, Lawrence Berkeley National Laboratory, 2005. LBNL-59208.
[39]
M. Vasavada, F. Mueller, P. H. Hargrove, and E. Roman. Comparing different approaches for incremental checkpointing: The showdown. In Linux'11: The 13th Annual Linux Symposium, pages 69--79, 2011.
[40]
T. von Eicken, D. E. Culler, S. C. Goldstein, and K. E. Schauser. Active messages: a mechanism for integrated communication and computation. In Proc. of Intl. Symp. Comp. Arch., ISCA '92, pages 256--266, 1992.
[41]
J. Willcock et al. AM++: A Generalized Active Message Framework. In Intl. Conf. on Par. Arch. and Comp. Tech., pages 401--410, 2010.
[42]
Q. Zhu et al. Accelerating sparse matrix-matrix multiplication with 3D-stacked logic-in-memory hardware. In High Perf. Ext. Comp. Conf. (HPEC), pages 1--6. IEEE, 2013.

Cited By

View all
  • (2021)Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.3131677(1-1)Online publication date: 2021
  • (2021)High-Performance Routing With Multipathing and Path Diversity in Ethernet and HPC NetworksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.303576132:4(943-959)Online publication date: 1-Apr-2021
  • (2020)High-performance parallel graph coloring with strong guarantees on work, depth, and qualityProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433833(1-17)Online publication date: 9-Nov-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing
June 2015
446 pages
ISBN:9781450335591
DOI:10.1145/2751205
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. active access
  2. active messages
  3. iommu
  4. one sided communication
  5. rdma
  6. rma

Qualifiers

  • Research-article

Funding Sources

  • 2013 Google European Doctoral Fellowship in Parallel Computing

Conference

ICS'15
Sponsor:
ICS'15: 2015 International Conference on Supercomputing
June 8 - 11, 2015
California, Newport Beach, USA

Acceptance Rates

ICS '15 Paper Acceptance Rate 40 of 160 submissions, 25%;
Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.3131677(1-1)Online publication date: 2021
  • (2021)High-Performance Routing With Multipathing and Path Diversity in Ethernet and HPC NetworksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.303576132:4(943-959)Online publication date: 1-Apr-2021
  • (2020)High-performance parallel graph coloring with strong guarantees on work, depth, and qualityProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433833(1-17)Online publication date: 9-Nov-2020
  • (2020)FatPathsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433736(1-18)Online publication date: 9-Nov-2020
  • (2020)Substream-Centric Maximum Matchings on FPGAACM Transactions on Reconfigurable Technology and Systems10.1145/337787113:2(1-33)Online publication date: 24-Apr-2020
  • (2020)High-Performance Parallel Graph Coloring with Strong Guarantees on Work, Depth, and QualitySC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00103(1-17)Online publication date: Nov-2020
  • (2020)FatPaths: Routing in Supercomputers and Data Centers when Shortest Paths Fall ShortSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00031(1-18)Online publication date: Nov-2020
  • (2020)Communication-Efficient Jaccard similarity for High-Performance Distributed Genome Comparisons2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00118(1122-1132)Online publication date: May-2020
  • (2016)Extended task queuingProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3015012(1-12)Online publication date: 13-Nov-2016
  • (2016)Extended Task Queuing: Active Messages for Heterogeneous SystemsSC16: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2016.79(933-944)Online publication date: Nov-2016
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media