research-article

Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations

Authors:

Torsten HoeflerAuthors Info & Claims

ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing

Pages 155 - 164

https://doi.org/10.1145/2751205.2751219

Published: 08 June 2015 Publication History

Abstract

Remote memory access (RMA) is an emerging high-performance programming model that uses RDMA hardware directly. Yet, accessing remote memories cannot invoke activities at the target which complicates implementation and limits performance of data-centric algorithms. We propose Active Access (AA), a mechanism that integrates well-known active messaging (AM) semantics with RMA to enable high-performance distributed data-centric computations. AA supports a new programming model where the user specifies handlers that are triggered when incoming puts and gets reference designated addresses. AA is based on a set of extensions to the Input/Output Memory Management Unit (IOMMU), a unit that provides high-performance hardware support for remapping I/O accesses to memory. We illustrate that AA outperforms existing AM and RMA designs, accelerates various codes such as distributed hashtables or logging schemes, and enables new protocols such as incremental checkpointing for RMA. We also discuss how extended IOMMUs can support a virtualized global address space in a distributed system that offers features known from on-node memory virtualization. We expect that AA and other IOMMU features can enhance the design of HPC operating and runtime systems in large computing centers.

References

[1]

AMD. Software Optimization Guide for the AMD64 Processors, 2005.

[2]

AMD. AMD I/O Virtualization Technology (IOMMU) Spec., 2011.

[3]

N. Amit, M. Ben-Yehuda, and B.-A. Yassour. IOMMU: strategies for mitigating the IOTLB bottleneck. In Proc. of Intl. Conf. on Comp. Arch., ISCA'10, pages 256--274, 2010.

Digital Library

[4]

N. Amit et al. vIOMMU: efficient IOMMU emulation. In USENIX Ann. Tech. Conf., USENIXATC'11, pages 6--6, 2011.

Digital Library

[5]

W. J. Armstrong et al. Advanced virtualization capabilities of POWER5 systems. IBM J. Res. Dev., 49(4/5):523--532, 2005.

Digital Library

[6]

B. W. Barrett et al. The Portals 4.0 network programming interface, 2012. Sandia National Laboratories.

[7]

Ben-Yehuda et al. The price of safety: Evaluating IOMMU performance. In Ottawa Linux Symp.(OLS), pages 9--20, 2007.

[8]

M. Ben-Yehuda et al. Utilizing IOMMUs for virtualization in Linux and Xen. In In Proc. of the Linux Symp., 2006.

[9]

M. Besta and T. Hoefler. Fault Tolerance for Remote Memory Access Programming Models. In Proc. of the 23rd Intl Symp. on High-perf. Par. and Dist. Comp., HPDC '14, pages 37--48, 2014.

Digital Library

[10]

N. Binkert et al. The gem5 simulator. SIGARCH Comput. Archit. News, 39(2):1--7, Aug. 2011.

Digital Library

[11]

D. Bonachea. GASNet Spec., v1. Tech. Rep. UCB/CSD-02-1207, 2002.

[12]

S. Boyd-Wickizer and N. Zeldovich. Tolerating malicious device drivers in Linux. In USENIX Ann. Tech. Conf., USENIXATC'10, pages 9--9, 2010.

Digital Library

[13]

Coral Collaboration. Coral Procurement Benchmarks. In Coral Vendor Meeting, 2013.

[14]

A. Dragojević et al. FaRM: fast remote memory. In Proc. of the 11th USENIX Symp. on Net. Syst. Des. and Impl. (NSDI 14). USENIX, 2014.

Digital Library

[15]

H. Esmaeilzadeh et al. Dark silicon and the end of multicore scaling. In Proc. of Intl. Symp. Comp. Arch., ISCA '11, pages 365--376, 2011.

Digital Library

[16]

Z. Fang et al. Active Memory Operations. In Proc. of the 21st Ann. Intl Conf. on Supercomp., ICS '07, pages 232--241, 2007.

Digital Library

[17]

B. Fitzpatrick. Distributed caching with memcached. Linux journal, 2004(124):5, 2004.

Digital Library

[18]

R. Geambasu et al. Comet: An Active Distributed Key-value Store. In USENIX Conf. on Op. Sys. Des. and Impl., OSDI'10, pages 1--13, 2010.

Digital Library

[19]

R. Gerstenberger, M. Besta, and T. Hoefler. Enabling Highly-scalable Remote Memory Access Programming with MPI-3 One Sided. In Proc. of ACM/IEEE Supercomputing, SC '13, pages 53:1--53:12, 2013.

Digital Library

[20]

T. Hoefler et al. Remote Memory Access Programming in MPI-3. ACM Trans. Par. Comp. (TOPC), 2015. accepted for publication on Dec. 4th.

Digital Library

[21]

IBM. Logical Partition Security in the IBM @server pSeries 690. 2002.

[22]

Intel. Intel Virtualization Technology for Directed I/O (VT-d) Architecture Specification, September 2013.

[23]

ISO Fortran Committee. Fortran 2008 Standard (ISO/IEC 1539-1:2010). 2010.

[24]

Y. Kim, D. Broman, J. Cai, and A. Shrivastaval. WCET-aware dynamic code management on scratchpads for software-managed multicores. In IEEE Real-Time and Emb. Tech. and App. Symp. (RTAS), 2014.

[25]

R. Mijat and A. Nightingale. The ARM Architecture Virtualization Extensions and the importance of System MMU for virtualized solutions and beyond, 2011. ARM White Paper.

[26]

MPI Forum. MPI: A Message-Passing Interface Standard. Ver. 3, 2012.

[27]

S. Novakovic et al. Scale-out NUMA. In Intl. Conf. on Arch. Sup. for Prog. Lang. and Op. Sys., ASPLOS '14, pages 3--18, 2014.

Digital Library

[28]

R. Olsson. PktGen the linux packet generator. In Proc. of the Linux Symp., Ottawa, Canada, volume 2, pages 11--24, 2005.

[29]

Oracle. UltraSPARC Virtual Machine Spec. 2010.

[30]

M. Oskin, F. T. Chong, and T. Sherwood. Active Pages: A Computation Model for Intelligent Memory. In Proc. of the 25th Ann. Intl Symp. on Comp. Arch., ISCA '98, pages 192--203, 1998.

Digital Library

[31]

D. Patterson. The top 10 innovations in the new NVIDIA Fermi architecture, and the top 3 next challenges. NVIDIA Whitepaper, 2009.

[32]

PCI-SIG. PCI Express Base Spec. Rev. 3.0. 2010.

[33]

PCI-SIG. PCI-SIG I/O Virtualization (IOV) Specifications, 2013.

[34]

S. Pope and D. Riddoch. Introduction to OpenOnload, 2011. SolarFlare White Paper.

[35]

R. Recio, B. Metzler, P. Culley, J. Hilland, and D. Garcia. A remote direct memory access protocol specification, Oct 2007. RFC 5040.

[36]

L. Rizzo. netmap: A novel framework for fast packet i/o. In USENIX Annual Technical Conference, pages 101--112, 2012.

Digital Library

[37]

The InfiniBand Trade Association. Infiniband Architecture Spec. Vol. 1-2, Rel. 1.3. InfiniBand Trade Association, 2004.

[38]

UPC Consortium. UPC language spec., v1.2. Technical report, Lawrence Berkeley National Laboratory, 2005. LBNL-59208.

[39]

M. Vasavada, F. Mueller, P. H. Hargrove, and E. Roman. Comparing different approaches for incremental checkpointing: The showdown. In Linux'11: The 13th Annual Linux Symposium, pages 69--79, 2011.

[40]

T. von Eicken, D. E. Culler, S. C. Goldstein, and K. E. Schauser. Active messages: a mechanism for integrated communication and computation. In Proc. of Intl. Symp. Comp. Arch., ISCA '92, pages 256--266, 1992.

Digital Library

[41]

J. Willcock et al. AM++: A Generalized Active Message Framework. In Intl. Conf. on Par. Arch. and Comp. Tech., pages 401--410, 2010.

Digital Library

[42]

Q. Zhu et al. Accelerating sparse matrix-matrix multiplication with 3D-stacked logic-in-memory hardware. In High Perf. Ext. Comp. Conf. (HPEC), pages 1--6. IEEE, 2013.

Cited By

Besta MFischer MKalavri VKapralov MHoefler T(2021)Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.3131677(1-1)Online publication date: 2021
https://doi.org/10.1109/TPDS.2021.3131677
Besta MDomke JSchneider MKonieczny MGirolamo SSchneider TSingla AHoefler T(2021)High-Performance Routing With Multipathing and Path Diversity in Ethernet and HPC NetworksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.303576132:4(943-959)Online publication date: 1-Apr-2021
https://doi.org/10.1109/TPDS.2020.3035761
Besta MCarigiet AJanda KVonarburg-Shmaria ZGianinazzi LHoefler TCuicchi CQualters IKramer W(2020)High-performance parallel graph coloring with strong guarantees on work, depth, and qualityProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433833(1-17)Online publication date: 9-Nov-2020
https://dl.acm.org/doi/10.5555/3433701.3433833
Show More Cited By

Index Terms

Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations
1. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Distributed programming languages

Recommendations

Page Fault Support for Network Controllers
ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems

Direct network I/O allows network controllers (NICs) to expose multiple instances of themselves, to be used by untrusted software without a trusted intermediary. Direct I/O thus frees researchers from legacy software, fueling studies that innovate in ...
Pre-Copy and post-copy VM live migration for memory intensive applications
Euro-Par'12: Proceedings of the 18th international conference on Parallel processing workshops

Virtualization technology provides a means for server consolidation, reducing the number of physical servers required for running a given workload. Virtual Machine (VM) live migration facilitates the transfer of a running (VM) between physical hosts ...
On the DMA mapping problem in direct device assignment
SYSTOR '10: Proceedings of the 3rd Annual Haifa Experimental Systems Conference

I/O intensive workloads running in virtual machines can suffer massive performance degradation. Direct assignment of I/O devices to virtual machines is the best performing I/O virtualization mechanism, but its performance still remains far from the bare-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing

June 2015

446 pages

ISBN:9781450335591

DOI:10.1145/2751205

General Chair:
Laxmi N. Bhuyan
University of California, Riverside
,
Program Chairs:
Fred Chong
University of California, Santa Barbara
,
Vivek Sarkar
Rice University

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

2013 Google European Doctoral Fellowship in Parallel Computing

Conference

ICS'15

Sponsor:

SIGARCH

ICS'15: 2015 International Conference on Supercomputing

June 8 - 11, 2015

California, Newport Beach, USA

Acceptance Rates

ICS '15 Paper Acceptance Rate 40 of 160 submissions, 25%;

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
243
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Besta MFischer MKalavri VKapralov MHoefler T(2021)Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.3131677(1-1)Online publication date: 2021
https://doi.org/10.1109/TPDS.2021.3131677
Besta MDomke JSchneider MKonieczny MGirolamo SSchneider TSingla AHoefler T(2021)High-Performance Routing With Multipathing and Path Diversity in Ethernet and HPC NetworksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.303576132:4(943-959)Online publication date: 1-Apr-2021
https://doi.org/10.1109/TPDS.2020.3035761
Besta MCarigiet AJanda KVonarburg-Shmaria ZGianinazzi LHoefler TCuicchi CQualters IKramer W(2020)High-performance parallel graph coloring with strong guarantees on work, depth, and qualityProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433833(1-17)Online publication date: 9-Nov-2020
https://dl.acm.org/doi/10.5555/3433701.3433833
Besta MSchneider MKonieczny MCynk KHenriksson EDi Girolamo SSingla AHoefler TCuicchi CQualters IKramer W(2020)FatPathsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433736(1-18)Online publication date: 9-Nov-2020
https://dl.acm.org/doi/10.5555/3433701.3433736
Besta MFischer MBen-Nun TStanojevic DLicht JHoefler T(2020)Substream-Centric Maximum Matchings on FPGAACM Transactions on Reconfigurable Technology and Systems10.1145/337787113:2(1-33)Online publication date: 24-Apr-2020
https://dl.acm.org/doi/10.1145/3377871
Besta MCarigiet AJanda KVonarburg-Shmaria ZGianinazzi LHoefler T(2020)High-Performance Parallel Graph Coloring with Strong Guarantees on Work, Depth, and QualitySC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00103(1-17)Online publication date: Nov-2020
https://doi.org/10.1109/SC41405.2020.00103
Besta MSchneider MKonieczny MCynk KHenriksson EGirolamo SSingla AHoefler T(2020)FatPaths: Routing in Supercomputers and Data Centers when Shortest Paths Fall ShortSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00031(1-18)Online publication date: Nov-2020
https://doi.org/10.1109/SC41405.2020.00031
Besta MKanakagiri RMustafa HKarasikov MRatsch GHoefler TSolomonik E(2020)Communication-Efficient Jaccard similarity for High-Performance Distributed Genome Comparisons2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00118(1122-1132)Online publication date: May-2020
https://doi.org/10.1109/IPDPS47924.2020.00118
LeBeane MPotter BPan ADutu AAgarwala VLee WMajeti DGhimire BVan Tassell EWasmundt SBenton BBreternitz MChu MThottethodi MJohn LReinhardt SWest J(2016)Extended task queuingProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3015012(1-12)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3014904.3015012
LeBeane MPotter BPan ADutu AAgarwala VLee WMajeti DGhimire BTassell EWasmundt SBenton BBreternitz MChu MThottethodi MJohn LReinhardt S(2016)Extended Task Queuing: Active Messages for Heterogeneous SystemsSC16: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2016.79(933-944)Online publication date: Nov-2016
https://doi.org/10.1109/SC.2016.79
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents