Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3422604.3425923acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
Open access

Remote Memory Calls

Published: 04 November 2020 Publication History


In this paper we propose an extension to RDMA, called Remote Memory Calls (RMCs), that allows applications to install a customized set of 1-sided RDMA operations. We then explain how RMCs can be implemented on the forthcoming generation of SmartNICs and discuss the resulting tradeoffs between RMCs, 1-sided and 2-sided RDMA operations.


M. K. Aguilera, K. Keeton, S. Novakovic, and S. Singhal. Designing far memory data structures: Think outside the box. In Workshop on Hot Topics in Operating Systems, HotOS'19, pages 120--126, 2019.
E. Amaro, C. Branner-Augmon, Z. Luo, A. Ousterhout, M. K. Aguilera, A. Panda, S. Ratnasamy, and S. Shenker. Can far memory improve job throughput? In European Conference on Computer Systems, EUROSYS'17, pages 1--16, 2020.
ARM. Neon programmer guides for armv8-a, Accessed 2020/06/10. https://developer.arm.com/architectures/instruction-sets/simd-isas/neon.
P. Bailis. Communication Costs in Real World Networks, Accessed 2020/06/10. http://www.bailis.org/blog/communication-costs-in-real-world-networks/.
L. Barroso, M. Marty, D. Patterson, and P. Ranganathan. Attack of the killer microseconds. Communications of the ACM, 60(4):48--54, 2017.
B. N. Bershad, S. Savage, P. Pardyak, E. G. Sirer, M. E. Fiuczynski, D. Becker, C. Chambers, and S. Eggers. Extensibility safety and performance in the spin operating system. In ACM Symposium on Operating Systems Principles, SOSP'95, pages 267--283, 1995.
The ccix consortium, Accessed 2020/09/24. https://www.ccixconsortium.com/.
Compute express link, Accessed 2020/09/24. https://www.computeexpresslink.org/.
J. Do, S. Sengupta, and S. Swanson. Programmable solid-state storage in future cloud datacenters. Communications of the ACM, 62(6):54--62, 2019.
A. Dragojeviç, D. Narayanan, O. Hodson, and M. Castro. Farm: Fast remote memory. In Symposium on Networked Systems Design and Implementation, NSDI'14, pages 401--414, 2014.
D. Firestone, A. Putnam, S. Mundkur, D. Chiou, A. Dabagh, M. Andrewartha, H. Angepat, V. Bhanu, A. Caulfield, E. Chung, et al. Azure accelerated networking: Smartnics in the public cloud. In Symposium on Networked Systems Design and Implementation, NSDI'18, pages 51--66, 2018.
P. X. Gao, A. Narayan, S. Karandikar, J. Carreira, S. Han, R. Agarwal, S. Ratnasamy, and S. Shenker. Network requirements for resource disaggregation. In Symposium on Operating Systems Design and Implementation, OSDI'16, pages 249--264, 2016.
The gen-z consortium, Accessed 2020/09/24. https://genzconsortium.org/.
E. Gershuni, N. Amit, A. Gurfinkel, N. Narodytska, J. A. Navas, N. Rinetzky, L. Ryzhyk, and S. Sagiv. Simple and precise static analysis of untrusted linux kernel extensions. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'19, 2019.
J. Gu, Y. Lee, Y. Zhang, M. Chowdhury, and K. G. Shin. Efficient memory disaggregation with infiniswap. In Symposium on Networked Systems Design and Implementation, NSDI'17, pages 649--667, 2017.
K. Hamidouche, A. Venkatesh, A. A. Awan, H. Subramoni, C.-H. Chu, and D. K. Panda. Exploiting gpudirect rdma in designing high performance openshmem for nvidia gpu clusters. In IEEE Transactions on Parallel and Distributed Systems, TPDS'15, pages 78--87, 2015.
G. C. Hunt and J. R. Larus. Singularity: rethinking the software stack. ACM SIGOPS Operating Systems Review, 41(2):37--49, 2007.
R. Imaoka. Using ping to test AWS VPC network latency within a single region, Accessed 2020/06/10. https://richardimaoka.github.io/blog/network-latency-analysis-with-ping-aws/.
K. Kaffes, T. Chong, J. T. Humphries, A. Belay, D. Mazières, and C. Kozyrakis. Shinjuku: Preemptive scheduling for μsecond-scale tail latency. In Symposium on Networked Systems Design and Implementation, NSDI'19, pages 345--360, 2019.
A. Kalia, M. Kaminsky, and D. G. Andersen. Using rdma efficiently for key-value services. In ACM Special Interest Group on Data Communications, SIGCOMM'14, pages 295--306, 2014.
A. Kalia, M. Kaminsky, and D. G. Andersen. Fasst: Fast, scalable and simple distributed transactions with two-sided (rdma) datagram rpcs. In Symposium on Operating Systems Design and Implementation, OSDI'16, pages 185--201, 2016.
A. Li, S. L. Song, J. Chen, X. Liu, N. Tallent, and K. Barker. Tartan: evaluating modern gpu interconnect via a multi-gpu benchmark suite. In 2018 IEEE International Symposium on Workload Characterization (IISWC), pages 191--202. IEEE, 2018.
B. Li, Z. Ruan, W. Xiao, Y. Lu, Y. Xiong, A. Putnam, E. Chen, and L. Zhang. Kv-direct: High-performance in-memory key-value store with programmable nic. In ACM Symposium on Operating Systems Principles, SOSP'17, pages 137--152, 2017.
S. Li, H. Lim, V. W. Lee, J. H. Ahn, A. Kalia, M. Kaminsky, D. G. Andersen, O. Seongil, S. Lee, and P. Dubey. Architecting to achieve a billion requests per second throughput on a single key-value store server platform. In International Symposium on Computer Architecture, ISCA'15, pages 476--488, 2015.
M. Liu, T. Cui, H. Schuh, A. Krishnamurthy, S. Peter, and K. Gupta. Offloading distributed applications onto smartnics using ipipe. In ACM Special Interest Group on Data Communications, SIGCOMM'19, 2019.
C. Mitchell, Y. Geng, and J. Li. Using one-sided rdma reads to build a fast, cpu-efficient key-value store. In USENIX Annual Technical Conference, ATC'13, 2013.
R. Mittal, V. T. Lam, N. Dukkipati, E. R. Blem, H. M. G. Wassel, M. Ghobadi, A. Vahdat, Y. Wang, D. Wetherall, and D. Zats. Timely: Rtt-based congestion control for the datacenter. In ACM Special Interest Group on Data Communications, SIGCOMM'15, 2015.
Netronome. Nfp-6000 intelligent ethernet controller family, Accessed 2020/06/10. https://www.netronome.com/static/app/img/products/silicon-solutions/PB_NFP6000.pdf.
R. Neugebauer, G. Antichi, J. F. Zazo, Y. Audzevich, S. López-Buedo, and A. W. Moore. Understanding pcie performance for end host networking. In ACM Special Interest Group on Data Communications, SIGCOMM'18, pages 327--341, 2018.
A. Panda, S. Han, K. Jang, M. Walls, S. Ratnasamy, and S. Shenker. Netbricks: Taking the v out of nfv. In Symposium on Operating Systems Design and Implementation, OSDI'16, pages 203--216, 2016.
Pci-sig specifications library, Accessed 2020/09/24. https://pcisig.com/specifications.
D. A. Popescu. Latency-driven performance in data center. PhD thesis, University of Cambridge, 2019.
Y. Shan, Y. Huang, Y. Chen, and Y. Zhang. Legoos: A disseminated, distributed os for hardware resource disaggregation. In Symposium on Operating Systems Design and Implementation, OSDI'18, pages 69--87, 2018.
J. Shi, Y. Yao, R. Chen, H. Chen, and F. Li. Fast and concurrent rdf queries with rdma-based distributed graph exploration. In Symposium on Operating Systems Design and Implementation, OSDI'16, 2016.
A. Shpiner, E. Zahavi, V. Zdornov, T. Anker, and M. Kadosh. Unlocking credit loop deadlocks. 2016.
D. Sidler, Z. Wang, M. Chiosa, A. Kulkarni, and G. Alonso. Strom: smart remote memory. In European Conference on Computer Systems, EUROSYS'20, pages 1--16, 2020.
G. Singh, L. Chelini, S. Corda, A. J. Awan, S. Stuijk, R. Jordans, H. Corporaal, and A.-J. Boonstra. Near-memory computing: Past, present, and future. Microprocess. Microsystems, 71, 2019.
M. Technologies. Mellanox innova-2 flex open programmable smartnic, Accessed 2020/06/10. https://www.mellanox.com/sites/default/files/doc-2020/pb-innova-2-flex.pdf.
M. Technologies. Nvidia mellanox bluefield-1 smartnic, Accessed 2020/06/10. https://www.mellanox.com/files/doc-2020/pb-bluefield-smart-nic.pdf.
M. Technologies. Nvidia mellanox bluefield-2 smartnic, Accessed 2020/06/10. https://www.mellanox.com/files/doc-2020/pb-bluefield-2-smart-nic-eth.pdf.
M. Technologies. Rdma aware networks programming user manual, Accessed 2020/06/10. https://www.mellanox.com/related-docs/prod_software/RDMA_Aware_Programming_user_manual.pdf.
L. A. Torrey, J. Coleman, and B. P. Miller. A comparison of interactivity in the linux 2.6 scheduler and an mlfq scheduler. Software - Practice and Experience, 37:347--364, 2007.
P. R. A. Vahdat. Plotting a Course to a Continued Moore's Law - Keynote, Accessed 2020/06/10. https://youtu.be/6wq6g_vi6yw.
K. Vipin and S. A. Fahmy. Fpga dynamic and partial reconfiguration: a survey of architectures, methods, and applications. ACM Computing Surveys (CSUR), 51(4):1--39, 2018.
H. Wang, S. Potluri, D. Bureddy, C. Rosales, and D. K. Panda. Gpu-aware mpi on rdma-enabled clusters: Design, implementation and evaluation. IEEE Transactions on Parallel and Distributed Systems, 25:2595--2605, 2014.
Xilinx. Xilinx alveo u280, Accessed 2020/06/10. https://www.xilinx.com/publications/product-briefs/alveo-u280-product-brief.pdf.
J. Xue, Y. Miao, C. Chen, M. Wu, L. Zhang, and L. Zhou. Fast distributed deep learning over rdma. In European Conference on Computer Systems, EUROSYS'19, 2019.
Y. Zhu, H. Eran, D. Firestone, C. Guo, M. Lipshteyn, Y. Liron, J. Padhye, S. Raindel, M. H. Yahia, and M. Zhang. Congestion control for large-scale rdma deployments. In ACM Special Interest Group on Data Communications, SIGCOMM'15, 2015.

Cited By

View all
  • (2025)pulse: Accelerating Distributed Pointer-Traversals on Disaggregated MemoryProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707253(858-875)Online publication date: 30-Mar-2025
  • (2024)DRustProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691944(97-115)Online publication date: 10-Jul-2024
  • (2024)A tale of two pathsProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691943(77-95)Online publication date: 10-Jul-2024
  • Show More Cited By



Information & Contributors


Published In

cover image ACM Conferences
HotNets '20: Proceedings of the 19th ACM Workshop on Hot Topics in Networks
November 2020
228 pages
This work is licensed under a Creative Commons Attribution International 4.0 License.



Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 November 2020


Request permissions for this article.

Check for updates

Author Tags

  1. memory disaggregation
  2. rdma


  • Research-article

Funding Sources


HotNets '20

Acceptance Rates

Overall Acceptance Rate 110 of 460 submissions, 24%


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)211
  • Downloads (Last 6 weeks)19
Reflects downloads up to 05 Mar 2025

Other Metrics


Cited By

View all
  • (2025)pulse: Accelerating Distributed Pointer-Traversals on Disaggregated MemoryProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707253(858-875)Online publication date: 30-Mar-2025
  • (2024)DRustProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691944(97-115)Online publication date: 10-Jul-2024
  • (2024)A tale of two pathsProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691943(77-95)Online publication date: 10-Jul-2024
  • (2024)RR-Compound: RDMA-Fused gRPC for Low Latency, High Throughput, and Easy InterfaceIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.340439435:8(1488-1505)Online publication date: Aug-2024
  • (2024) RB 2 : Narrow the Gap between RDMA Abstraction and Performance via a Middle Layer IEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621169(1071-1080)Online publication date: 20-May-2024
  • (2024)Data Flow Architectures for Data Processing on Modern Hardware2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00439(5511-5522)Online publication date: 13-May-2024
  • (2024)MINOS: Distributed Consistency and Persistency Protocol Implementation & Offloading to SmartNICs2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00076(1-17)Online publication date: 2-Mar-2024
  • (2023)Remote direct memory introspectionProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620575(6043-6060)Online publication date: 9-Aug-2023
  • (2023)PatronusProceedings of the 21st USENIX Conference on File and Storage Technologies10.5555/3585938.3585958(315-330)Online publication date: 21-Feb-2023
  • (2023)Direct Telemetry AccessProceedings of the ACM SIGCOMM 2023 Conference10.1145/3603269.3604827(832-849)Online publication date: 10-Sep-2023
  • Show More Cited By

View Options

View options


View or Download as a PDF file.



View online with eReader.


Login options






Share this Publication link

Share on social media