research-article

Open access

Remote Memory Calls

Authors:

Emmanuel Amaro,

Amy Ousterhout,

Arvind Krishnamurthy,

Sylvia Ratnasamy,

Scott ShenkerAuthors Info & Claims

HotNets '20: Proceedings of the 19th ACM Workshop on Hot Topics in Networks

Pages 38 - 44

https://doi.org/10.1145/3422604.3425923

Published: 04 November 2020 Publication History

Abstract

In this paper we propose an extension to RDMA, called Remote Memory Calls (RMCs), that allows applications to install a customized set of 1-sided RDMA operations. We then explain how RMCs can be implemented on the forthcoming generation of SmartNICs and discuss the resulting tradeoffs between RMCs, 1-sided and 2-sided RDMA operations.

References

[1]

M. K. Aguilera, K. Keeton, S. Novakovic, and S. Singhal. Designing far memory data structures: Think outside the box. In Workshop on Hot Topics in Operating Systems, HotOS'19, pages 120--126, 2019.

Digital Library

[2]

E. Amaro, C. Branner-Augmon, Z. Luo, A. Ousterhout, M. K. Aguilera, A. Panda, S. Ratnasamy, and S. Shenker. Can far memory improve job throughput? In European Conference on Computer Systems, EUROSYS'17, pages 1--16, 2020.

[3]

ARM. Neon programmer guides for armv8-a, Accessed 2020/06/10. https://developer.arm.com/architectures/instruction-sets/simd-isas/neon.

[4]

P. Bailis. Communication Costs in Real World Networks, Accessed 2020/06/10. http://www.bailis.org/blog/communication-costs-in-real-world-networks/.

[5]

L. Barroso, M. Marty, D. Patterson, and P. Ranganathan. Attack of the killer microseconds. Communications of the ACM, 60(4):48--54, 2017.

Digital Library

[6]

B. N. Bershad, S. Savage, P. Pardyak, E. G. Sirer, M. E. Fiuczynski, D. Becker, C. Chambers, and S. Eggers. Extensibility safety and performance in the spin operating system. In ACM Symposium on Operating Systems Principles, SOSP'95, pages 267--283, 1995.

Digital Library

[7]

The ccix consortium, Accessed 2020/09/24. https://www.ccixconsortium.com/.

[8]

Compute express link, Accessed 2020/09/24. https://www.computeexpresslink.org/.

[9]

J. Do, S. Sengupta, and S. Swanson. Programmable solid-state storage in future cloud datacenters. Communications of the ACM, 62(6):54--62, 2019.

Digital Library

[10]

A. Dragojeviç, D. Narayanan, O. Hodson, and M. Castro. Farm: Fast remote memory. In Symposium on Networked Systems Design and Implementation, NSDI'14, pages 401--414, 2014.

[11]

D. Firestone, A. Putnam, S. Mundkur, D. Chiou, A. Dabagh, M. Andrewartha, H. Angepat, V. Bhanu, A. Caulfield, E. Chung, et al. Azure accelerated networking: Smartnics in the public cloud. In Symposium on Networked Systems Design and Implementation, NSDI'18, pages 51--66, 2018.

[12]

P. X. Gao, A. Narayan, S. Karandikar, J. Carreira, S. Han, R. Agarwal, S. Ratnasamy, and S. Shenker. Network requirements for resource disaggregation. In Symposium on Operating Systems Design and Implementation, OSDI'16, pages 249--264, 2016.

[13]

The gen-z consortium, Accessed 2020/09/24. https://genzconsortium.org/.

[14]

E. Gershuni, N. Amit, A. Gurfinkel, N. Narodytska, J. A. Navas, N. Rinetzky, L. Ryzhyk, and S. Sagiv. Simple and precise static analysis of untrusted linux kernel extensions. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'19, 2019.

Digital Library

[15]

J. Gu, Y. Lee, Y. Zhang, M. Chowdhury, and K. G. Shin. Efficient memory disaggregation with infiniswap. In Symposium on Networked Systems Design and Implementation, NSDI'17, pages 649--667, 2017.

[16]

K. Hamidouche, A. Venkatesh, A. A. Awan, H. Subramoni, C.-H. Chu, and D. K. Panda. Exploiting gpudirect rdma in designing high performance openshmem for nvidia gpu clusters. In IEEE Transactions on Parallel and Distributed Systems, TPDS'15, pages 78--87, 2015.

[17]

G. C. Hunt and J. R. Larus. Singularity: rethinking the software stack. ACM SIGOPS Operating Systems Review, 41(2):37--49, 2007.

Digital Library

[18]

R. Imaoka. Using ping to test AWS VPC network latency within a single region, Accessed 2020/06/10. https://richardimaoka.github.io/blog/network-latency-analysis-with-ping-aws/.

[19]

K. Kaffes, T. Chong, J. T. Humphries, A. Belay, D. Mazières, and C. Kozyrakis. Shinjuku: Preemptive scheduling for μsecond-scale tail latency. In Symposium on Networked Systems Design and Implementation, NSDI'19, pages 345--360, 2019.

[20]

A. Kalia, M. Kaminsky, and D. G. Andersen. Using rdma efficiently for key-value services. In ACM Special Interest Group on Data Communications, SIGCOMM'14, pages 295--306, 2014.

Digital Library

[21]

A. Kalia, M. Kaminsky, and D. G. Andersen. Fasst: Fast, scalable and simple distributed transactions with two-sided (rdma) datagram rpcs. In Symposium on Operating Systems Design and Implementation, OSDI'16, pages 185--201, 2016.

[22]

A. Li, S. L. Song, J. Chen, X. Liu, N. Tallent, and K. Barker. Tartan: evaluating modern gpu interconnect via a multi-gpu benchmark suite. In 2018 IEEE International Symposium on Workload Characterization (IISWC), pages 191--202. IEEE, 2018.

[23]

B. Li, Z. Ruan, W. Xiao, Y. Lu, Y. Xiong, A. Putnam, E. Chen, and L. Zhang. Kv-direct: High-performance in-memory key-value store with programmable nic. In ACM Symposium on Operating Systems Principles, SOSP'17, pages 137--152, 2017.

Digital Library

[24]

S. Li, H. Lim, V. W. Lee, J. H. Ahn, A. Kalia, M. Kaminsky, D. G. Andersen, O. Seongil, S. Lee, and P. Dubey. Architecting to achieve a billion requests per second throughput on a single key-value store server platform. In International Symposium on Computer Architecture, ISCA'15, pages 476--488, 2015.

Digital Library

[25]

M. Liu, T. Cui, H. Schuh, A. Krishnamurthy, S. Peter, and K. Gupta. Offloading distributed applications onto smartnics using ipipe. In ACM Special Interest Group on Data Communications, SIGCOMM'19, 2019.

Digital Library

[26]

C. Mitchell, Y. Geng, and J. Li. Using one-sided rdma reads to build a fast, cpu-efficient key-value store. In USENIX Annual Technical Conference, ATC'13, 2013.

[27]

R. Mittal, V. T. Lam, N. Dukkipati, E. R. Blem, H. M. G. Wassel, M. Ghobadi, A. Vahdat, Y. Wang, D. Wetherall, and D. Zats. Timely: Rtt-based congestion control for the datacenter. In ACM Special Interest Group on Data Communications, SIGCOMM'15, 2015.

Digital Library

[28]

Netronome. Nfp-6000 intelligent ethernet controller family, Accessed 2020/06/10. https://www.netronome.com/static/app/img/products/silicon-solutions/PB_NFP6000.pdf.

[29]

R. Neugebauer, G. Antichi, J. F. Zazo, Y. Audzevich, S. López-Buedo, and A. W. Moore. Understanding pcie performance for end host networking. In ACM Special Interest Group on Data Communications, SIGCOMM'18, pages 327--341, 2018.

Digital Library

[30]

A. Panda, S. Han, K. Jang, M. Walls, S. Ratnasamy, and S. Shenker. Netbricks: Taking the v out of nfv. In Symposium on Operating Systems Design and Implementation, OSDI'16, pages 203--216, 2016.

[31]

Pci-sig specifications library, Accessed 2020/09/24. https://pcisig.com/specifications.

[32]

D. A. Popescu. Latency-driven performance in data center. PhD thesis, University of Cambridge, 2019.

[33]

Y. Shan, Y. Huang, Y. Chen, and Y. Zhang. Legoos: A disseminated, distributed os for hardware resource disaggregation. In Symposium on Operating Systems Design and Implementation, OSDI'18, pages 69--87, 2018.

[34]

J. Shi, Y. Yao, R. Chen, H. Chen, and F. Li. Fast and concurrent rdf queries with rdma-based distributed graph exploration. In Symposium on Operating Systems Design and Implementation, OSDI'16, 2016.

[35]

A. Shpiner, E. Zahavi, V. Zdornov, T. Anker, and M. Kadosh. Unlocking credit loop deadlocks. 2016.

Digital Library

[36]

D. Sidler, Z. Wang, M. Chiosa, A. Kulkarni, and G. Alonso. Strom: smart remote memory. In European Conference on Computer Systems, EUROSYS'20, pages 1--16, 2020.

Digital Library

[37]

G. Singh, L. Chelini, S. Corda, A. J. Awan, S. Stuijk, R. Jordans, H. Corporaal, and A.-J. Boonstra. Near-memory computing: Past, present, and future. Microprocess. Microsystems, 71, 2019.

[38]

M. Technologies. Mellanox innova-2 flex open programmable smartnic, Accessed 2020/06/10. https://www.mellanox.com/sites/default/files/doc-2020/pb-innova-2-flex.pdf.

[39]

M. Technologies. Nvidia mellanox bluefield-1 smartnic, Accessed 2020/06/10. https://www.mellanox.com/files/doc-2020/pb-bluefield-smart-nic.pdf.

[40]

M. Technologies. Nvidia mellanox bluefield-2 smartnic, Accessed 2020/06/10. https://www.mellanox.com/files/doc-2020/pb-bluefield-2-smart-nic-eth.pdf.

[41]

M. Technologies. Rdma aware networks programming user manual, Accessed 2020/06/10. https://www.mellanox.com/related-docs/prod_software/RDMA_Aware_Programming_user_manual.pdf.

[42]

L. A. Torrey, J. Coleman, and B. P. Miller. A comparison of interactivity in the linux 2.6 scheduler and an mlfq scheduler. Software - Practice and Experience, 37:347--364, 2007.

[43]

P. R. A. Vahdat. Plotting a Course to a Continued Moore's Law - Keynote, Accessed 2020/06/10. https://youtu.be/6wq6g_vi6yw.

[44]

K. Vipin and S. A. Fahmy. Fpga dynamic and partial reconfiguration: a survey of architectures, methods, and applications. ACM Computing Surveys (CSUR), 51(4):1--39, 2018.

[45]

H. Wang, S. Potluri, D. Bureddy, C. Rosales, and D. K. Panda. Gpu-aware mpi on rdma-enabled clusters: Design, implementation and evaluation. IEEE Transactions on Parallel and Distributed Systems, 25:2595--2605, 2014.

[46]

Xilinx. Xilinx alveo u280, Accessed 2020/06/10. https://www.xilinx.com/publications/product-briefs/alveo-u280-product-brief.pdf.

[47]

J. Xue, Y. Miao, C. Chen, M. Wu, L. Zhang, and L. Zhou. Fast distributed deep learning over rdma. In European Conference on Computer Systems, EUROSYS'19, 2019.

Digital Library

[48]

Y. Zhu, H. Eran, D. Firestone, C. Guo, M. Lipshteyn, Y. Liron, J. Padhye, S. Raindel, M. H. Yahia, and M. Zhang. Congestion control for large-scale rdma deployments. In ACM Special Interest Group on Data Communications, SIGCOMM'15, 2015.

Digital Library

Cited By

Tang YLee SBhattacharjee AKhandelwal AEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)pulse: Accelerating Distributed Pointer-Traversals on Disaggregated MemoryProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707253(858-875)Online publication date: 30-Mar-2025
https://dl.acm.org/doi/10.1145/3669940.3707253
Ma HQiao YLiu SYu SNi YLu QWu JZhang YKim MXu HGavrilovska ATerry D(2024)DRustProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691944(97-115)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691944
Chen LLiu SWang CMa HQiao YWang ZWu CLu YFeng XCui HLu SXu HGavrilovska ATerry D(2024)A tale of two pathsProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691943(77-95)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691943
Show More Cited By

Index Terms

Remote Memory Calls
1. Networks
  1. Network architectures
    1. Programming interfaces
  2. Network types
    1. Data center networks
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Distributed memory
    2. Software system structures
      1. Distributed systems organizing principles
        Client-server architectures

Recommendations

Rcmp: Reconstructing RDMA-Based Memory Disaggregation via CXL
Memory disaggregation is a promising architecture for modern datacenters that separates compute and memory resources into independent pools connected by ultra-fast networks, which can improve memory utilization, reduce cost, and enable elastic scaling of ...
Exploring Efficient Architectures on Remote In-Memory NVM over RDMA
Special Issue ESWEEK 2021, CASES 2021, CODES+ISSS 2021 and EMSOFT 2021
Efficiently accessing remote file data remains a challenging problem for data processing systems. Development of technologies in non-volatile dual in-line memory modules (NVDIMMs), in-memory file systems, and RDMA networks provide new opportunities ...
Accelerating Relational Databases by Leveraging Remote Memory and RDMA
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

Memory is a crucial resource in relational databases (RDBMSs). When there is insufficient memory, RDBMSs are forced to use slower media such as SSDs or HDDs, which can significantly degrade workload performance. Cloud database services are deployed in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HotNets '20: Proceedings of the 19th ACM Workshop on Hot Topics in Networks

November 2020

228 pages

ISBN:9781450381451

DOI:10.1145/3422604

General Chairs:
Ben Zhao
University of Chicago
,
Heather Zheng
University of Chicago
,
Program Chairs:
Harsha V. Madhyastha
University of Michigan
,
Venkat Padmanabhan
Microsoft Research India

Copyright © 2020 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGCOMM: ACM Special Interest Group on Data Communication

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 November 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

HotNets '20

Sponsor:

SIGCOMM

HotNets '20: The 19th ACM Workshop on Hot Topics in Networks

November 4 - 6, 2020

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 110 of 460 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
1,723
Total Downloads

Downloads (Last 12 months)211
Downloads (Last 6 weeks)19

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tang YLee SBhattacharjee AKhandelwal AEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)pulse: Accelerating Distributed Pointer-Traversals on Disaggregated MemoryProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707253(858-875)Online publication date: 30-Mar-2025
https://dl.acm.org/doi/10.1145/3669940.3707253
Ma HQiao YLiu SYu SNi YLu QWu JZhang YKim MXu HGavrilovska ATerry D(2024)DRustProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691944(97-115)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691944
Chen LLiu SWang CMa HQiao YWang ZWu CLu YFeng XCui HLu SXu HGavrilovska ATerry D(2024)A tale of two pathsProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691943(77-95)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691943
Geng LWang HMeng JFan DBen-Romdhane SPichumani HPhegade VZhang X(2024)RR-Compound: RDMA-Fused gRPC for Low Latency, High Throughput, and Easy InterfaceIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.340439435:8(1488-1505)Online publication date: Aug-2024
https://doi.org/10.1109/TPDS.2024.3404394
Sun HTan YWu YZhu JHuang QYao XZhang G(2024) RB 2 : Narrow the Gap between RDMA Abstraction and Performance via a Middle Layer IEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621169(1071-1080)Online publication date: 20-May-2024
https://doi.org/10.1109/INFOCOM52122.2024.10621169
Lerner AAlonso G(2024)Data Flow Architectures for Data Processing on Modern Hardware2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00439(5511-5522)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00439
Psistakis AChaix FTorrellas J(2024)MINOS: Distributed Consistency and Persistency Protocol Implementation & Offloading to SmartNICs2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00076(1-17)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00076
Liu HXing JHuang YZhuo DDevadas SChen ACalandrino JTroncoso C(2023)Remote direct memory introspectionProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620575(6043-6060)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.5555/3620237.3620575
Yan BLu YWang QXie MShu JNaor DGoel A(2023)PatronusProceedings of the 21st USENIX Conference on File and Storage Technologies10.5555/3585938.3585958(315-330)Online publication date: 21-Feb-2023
https://dl.acm.org/doi/10.5555/3585938.3585958
Langlet JBen Basat ROliaro GMitzenmacher MYu MAntichi GSchulzrinne HKohler EMaltz DMisra V(2023)Direct Telemetry AccessProceedings of the ACM SIGCOMM 2023 Conference10.1145/3603269.3604827(832-849)Online publication date: 10-Sep-2023
https://dl.acm.org/doi/10.1145/3603269.3604827
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten