Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3477132.3483587acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article

PRISM: Rethinking the RDMA Interface for Distributed Systems

Published: 26 October 2021 Publication History

Abstract

Remote Direct Memory Access (RDMA) has been used to accelerate a variety of distributed systems, by providing low-latency, CPU-bypassing access to a remote host's memory. However, most of the distributed protocols used in these systems cannot easily be expressed in terms of the simple memory READs and WRITEs provided by RDMA. As a result, designers face a choice between introducing additional protocol complexity (e.g., additional round trips) or forgoing the benefits of RDMA entirely.
This paper argues that an extension to the RDMA interface can resolve this dilemma. We introduce the PRISM interface, which adds four new primitives: indirection, allocation, enhanced compare-and-swap, and operation chaining. These increase the expressivity of the RDMA interface, while still being implementable using the same underlying hardware features. We show their utility by designing three new applications using PRISM primitives, that require little to no server-side CPU involvement: (1) PRISM-KV, a key-value store; (2) PRISM-RS, a replicated block store; and (3) PRISM-TX, a distributed transaction protocol. Using a software-based implementation of the PRISM primitives, we show that these systems outperform prior RDMA-based equivalents.

References

[1]
Atul Adya, Robert Gruber, Barbara Liskov, and Umesh Maheshwari. 1995. Efficient Optimistic Concurrency Control Using Loosely Synchronized Clocks. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data. ACM, San Jose, CA, USA.
[2]
Marcos K. Aguilera, Kimberly Keeton, Stanko Novakovic, and Sharad Singhal. 2019. Designing Far Memory Data Structures: Think Outside the Box. In Proceedings of the Workshop on Hot Topics in Operating Systems (HotOS).
[3]
Emmanuel Amaro, Zhihong Luo, Amy Ousterhout, Arvind Krishnamurthy, Aurojit Panda, Sylvia Ratnasamy, and Scott Shenker. 2020. Remote Memory Calls. In Proceedings of the 16th Workshop on Hot Topics in Networks (HotNets '20). ACM, Chicago, IL, USA.
[4]
Hagit Attiya, Amotz Bar-Noy, and Danny Dolev. 1990. Sharing memory robustly in message-passing systems. In Proceedings of the 9th ACM Symposium on Principles of Distributed Computing (PODC '90). ACM, Quebec City, QC, Canada.
[5]
Philip Bernstein, Vassos Hadzilacos, and Nathan Goodman. 1987. Concurrency Control and Recovery in Database Systems. Addison-Wesley.
[6]
Yanzhe Chen, Xinda Wei, Jiaxin Shi, Rong Chen, and Haibo Chen. 2016. Fast and General Distributed Transactions Using RDMA and HTM. In Proceedings of the 11th ACM SIGOPS EuroSys (EuroSys '16). ACM, London, United Kingdom.
[7]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of SOCC 2010.
[8]
James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2012. Spanner: Google's Globally-Distributed Database. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI '12). USENIX, Hollywood, CA, USA.
[9]
Akon Dey, Alan Fekete, Raghunath Nambiar, and Uwe Rohm. 2014. YCSB+T: Benchmarking web-scale transactional databases. In Proceedings of the 30th International Conference on Data Engineering Workshops (ICDEW).
[10]
Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, and Miguel Castro. 2014. FaRM: Fast Remote Memory. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI '14). USENIX, Seattle, WA, USA.
[11]
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google File System. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP '03). ACM, Bolton Landing, NY, USA.
[12]
Chuanxiong Guo. 2017. RDMA in Data Centers: Looking Back and Looking Forward. Keynote at APNet.
[13]
Chuanxiong Guo, Haitao Wu, Zhong Deng, Jianxi Ye Gaurav Soni, Jitendra Padhye, and Marina Lipshteyn. 2016. RDMA over Commodity Ethernet at Scale. In Proceedings of ACM SIGCOMM 2016. ACM, Florianopolis, Brazil.
[14]
Sagar Jha, Jonathan Behrens, Theo Gkountouvas, Matthew Milano, Weijia Song, Edward Tremel, Robbert van Renesse, Sydney Zink, and Kenneth P. Birman. 2019. Derecho: Fast State Machine Replication for Cloud Services. ACM Trans. Comput. Syst. 36, 2 (2019), 4:1--4:49.
[15]
Jithin Jose, Hari Subramoni, Krishna Kandalla, Md. Wasi-ur Rahman, Hao Wang, Sundeep Narravula, and Dhabaleswar K. Panda. 2012. Scalable Memcached Design for InfiniBand Clusters Using Hybrid Transports. In Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012). IEEE, Ottawa, ON, Canada.
[16]
Anuj Kalia, Michael Kaminsky, and David Andersen. 2019. Datacenter RPCs can be general and fast. In Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). USENIX, Boston, MA, USA.
[17]
Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2014. Using RDMA Efficiently for Key-Value Services. In Proceedings of ACM SIGCOMM 2014. ACM, Chicago, IL, USA.
[18]
Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016. Design Guidelines for High Performance RDMA Systems. In Proceedings of the 2016 USENIX Annual Technical Conference. USENIX, Denver, CO, USA.
[19]
Anuj Kalia, Michael Kaminsky, and David G Andersen. 2016. FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI '16). USENIX, Savannah, GA, USA.
[20]
Daehyeok Kim, Amirsaman Memaripour, Anirudh Badam, Yibo Zhu, Hongqiang Harry Liu, Jitu Padhye, Shachar Raindel, Steven Swanson, Vyas Sekar, and Srinivasan Seshan. 2018. Hyperloop: group-based NIC-offloading to accelerate replicated transactions in multi-tenant storage systems. In Proceedings of ACM SIGCOMM 2018. ACM, Budapest, Hungary.
[21]
H. T. Kung and John T. Robinson. 1981. On Optimistic Methods for Concurrency Control. ACM Transactions on Database Systems 6, 2 (June 1981), 213--226.
[22]
Jiaxin Lin, Kiran Patel, Brent E. Stephens, Anirudh Sivaraman, and Aditya Akella. 2020. PANIC: A High-Performance Programmable NIC for Multi-tenant Networks. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX, Banff, AL, Canada.
[23]
Ming Liu, Tianyi Cui, Henry Schuh, Arvind Krishnamurthy, Simon Peter, and Karan Gupta. 2019. Offloading distributed applications onto SmartNICs using iPipe. In Proceedings of ACM SIGCOMM 2019. ACM, Beijing, China.
[24]
Youyou Lu, Jiwu Shu, Youmin Chen, and Tao Li. 2017. Octopus: an RDMA-enabled distributed persistent memory file system. In Proceedings of the 2017 USENIX Annual Technical Conference. USENIX, Santa Clara, CA, USA.
[25]
Nancy Lynch and Alex Shvartsman. 1997. Robust emulation of shared memory using dynamic quorum-acknowledged broadcasts. In Proceedings of the 27th Annual International Symposium on Fault-Tolerant Computing (FTCS '97). IEEE, Seattle, WA, USA, 272--281.
[26]
Michael Marty, Marc de Kruijf, Jacob Adriaens, Christopher Alfeld, Sean Bauer, Carlo Contavalli, Michael Dalton, Nandita Dukkipati, William C. Evans, Steve Gribble, Nicholas Kidd, Roman Kononov, Gautam Kumar, Carl Mauer, Emily Musick, Lena Olson, Erik Rubow, Michael Ryan, Kevin Springborn, Paul Turner, Valas Valancius, Xi Wang, and Amin Vahdat. 2019. Snap: A Microkernel Approach to Host Networking. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP '19). ACM, Shanghai, China.
[27]
Paul E. McKenney, Jonathan Appavoo, Andi Kleen, Orran Krieger, Rusty Russell, Dipankar Sarma, and Maneesh Soni. 2002. Read-Copy Update. In Proceedings of the 2002 Ottawa Linux Symposium. Ottawa, ON, CA, 336--367.
[28]
Mellanox Technologies. [n.d.]. ConnectX Ethernet Adapters. https://www.mellanox.com/products/ethernet/connectx-smartnic.
[29]
Mellanox Technologies. [n.d.]. RDMA Extended Atomics. https://docs.mellanox.com/display/rdmacore50/Extended%20Atomics.
[30]
Mellanox Technologies 2015. RDMA Aware Networks Programming User Manual. Mellanox Technologies. Revision 1.7.
[31]
Christopher Mitchell, Yifeng Geng, and Jinyang Li. 2013. Using Onesided RDMA Reads to Build a Fast, CPU-efficient Key-value Store. In Proceedings of the 2013 USENIX Annual Technical Conference. USENIX, San Jose, CA, USA.
[32]
Christopher Mitchell, Kate Montgomery, Lamont Nelson, Siddhartha Sen, and Jinyang Li. 2016. Balancing CPU and Network in the Cell Distributed B-Tree Store. In Proceedings of the 2016 USENIX Annual Technical Conference. USENIX, Denver, CO, USA.
[33]
S. Narravula, A. Marnidala, A. Vishnu, K. Vaidyanathan, and D. K. Panda. 2007. High Performance Distributed Lock Management Services using Network-based Remote Atomic Operations. In Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2007). IEEE, Rio de Janeiro, Brazil.
[34]
Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin. 2015. Latency-Tolerant Software Distributed Shared Memory. In Proceedings of the 2015 USENIX Annual Technical Conference. USENIX, Santa Clara, CA, USA.
[35]
Rolf Neugebauer, Gianni Antichi, José Fernando Zazo, Yury Audzevich, Sergio López-Buedo, and Andrew W. Moore. 2018. Understanding PCIe Performance for End Host Networking. In Proceedings of the 2018 ACM SIGCOMM (Budapest, Hungary).
[36]
Waleed Reda, Marco Canini, Dejan Kostić, and Simon Peter. 2022. RDMA is Turing complete, we just did not know it yet!. In Proceedings of NSDI '22.
[37]
David Sidler, Zeke Wang, Monica Chiosa, Amit Kulkarni, and Gustavo Alonso. 2020. StRoM: Smart Remote Memory. In Proceedings of the 15th ACM SIGOPS EuroSys (EuroSys '20). ACM, Heraklion, Crete, Greece.
[38]
Adriana Szekeres, Michael Whittaker, Naveen Kr. Sharma, Jialin Li, Arvind Krishnamurthy, Irene Zhang, and Dan R. K. Ports. 2020. Meerkat: Scalable Replicated Transactions Following the Zero-Coordination Principle. In Proceedings of the 15th ACM SIGOPS EuroSys (EuroSys '20). ACM, Heraklion, Crete, Greece.
[39]
Chandramohan A. Thekkath, Timothy Mann, and Edward K. Lee. 1997. Frangipani: A Scalable Distributed File System. In Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP '97). ACM, Saint-Malo, France.
[40]
Robert H. Thomas. 1979. A Majority Consensus Approach to Concurrency Control for Multiple Copy Databases. ACM Transactions on Database Systems 4, 2 (June 1979), 180--209.
[41]
Tao Wang, Hang Zhu, Fabian Ruffy, Xin Jin, Anirudh Sivaraman, Dan R. K. Ports, and Aurojit Panda. 2020. Multitenancy for fast and programmable networks in the cloud. In Proceedings of the 11th Hot Topics in Cloud Computing (HotCloud '20). Boston, MA, USA.
[42]
Xingda Wei, Rong Chen, and Haibo Chen. 2020. Fast RDMA-based Ordered Key-Value Store using Remote Learned Cache. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI '20). USENIX, Banff, AL, Canada.
[43]
Xingda Wei, Zhiyuan Dong, Rong Chen, and Haibo Chen. 2018. Deconstructing RDMA-enabled Distributed Transactions: Hybrid is Better!. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI '18). USENIX, Carlsbad, CA USA.
[44]
Xingda Wei, Jiaxin Shi, Yanzhe Chen, Rong Chen, and Haibo Chen. 2015. Fast In-memory Transaction Processing using RDMA and HTM. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP '15). ACM, Monterey, CA, USA.
[45]
Dong Young Yoon, Mosharaf Chowdhury, and Barzan Mozafari. 2018. Distributed Lock Management with RDMA: Decentralization without Starvation. In Proceedings of the 2018 ACM SIGMOD International Conference on Management of Data. ACM, Houston, TX, USA.
[46]
Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan R. K. Ports. 2015. Building Consistent Transactions with Inconsistent Replication. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP '15). ACM, Monterey, CA, USA.

Cited By

View all
  • (2024)Optimizing Application Performance with BlueField: Accelerating Large-Message Blocking and Nonblocking Collective OperationsISC High Performance 2024 Research Paper Proceedings (39th International Conference)10.23919/ISC.2024.10528935(1-12)Online publication date: May-2024
  • (2024)CHIME: A Cache-Efficient and High-Performance Hybrid Index on Disaggregated MemoryProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695959(110-126)Online publication date: 4-Nov-2024
  • (2024)A Memory-Disaggregated Radix TreeACM Transactions on Storage10.1145/366428920:3(1-41)Online publication date: 6-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SOSP '21: Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles
October 2021
899 pages
ISBN:9781450387095
DOI:10.1145/3477132
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. RDMA
  2. distributed systems
  3. remote memory access

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SOSP '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 174 of 961 submissions, 18%

Upcoming Conference

SOSP '25
ACM SIGOPS 31st Symposium on Operating Systems Principles
October 13 - 16, 2025
Seoul , Republic of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)221
  • Downloads (Last 6 weeks)21
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Optimizing Application Performance with BlueField: Accelerating Large-Message Blocking and Nonblocking Collective OperationsISC High Performance 2024 Research Paper Proceedings (39th International Conference)10.23919/ISC.2024.10528935(1-12)Online publication date: May-2024
  • (2024)CHIME: A Cache-Efficient and High-Performance Hybrid Index on Disaggregated MemoryProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695959(110-126)Online publication date: 4-Nov-2024
  • (2024)A Memory-Disaggregated Radix TreeACM Transactions on Storage10.1145/366428920:3(1-41)Online publication date: 6-Jun-2024
  • (2024)RR-Compound: RDMA-Fused gRPC for Low Latency, High Throughput, and Easy InterfaceIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.340439435:8(1488-1505)Online publication date: Aug-2024
  • (2024)HADES: Hardware-Assisted Distributed Transactions in the Age of Fast Networks and SmartNICs2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00062(785-800)Online publication date: 29-Jun-2024
  • (2024) RB 2 : Narrow the Gap between RDMA Abstraction and Performance via a Middle Layer IEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621169(1071-1080)Online publication date: 20-May-2024
  • (2024)MINOS: Distributed Consistency and Persistency Protocol Implementation & Offloading to SmartNICs2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00076(1-17)Online publication date: 2-Mar-2024
  • (2023)Make It Real: An End-to-End Implementation of A Physically Disaggregated Data CenterACM SIGOPS Operating Systems Review10.1145/3606557.360655957:1(1-9)Online publication date: 28-Jun-2023
  • (2023)Honeycomb: Ordered Key-Value Store Acceleration on an FPGA-Based SmartNICIEEE Transactions on Computers10.1109/TC.2023.334517373:3(857-871)Online publication date: 20-Dec-2023
  • (2023)iWriter: An Offloading Method for Indirectly Writing Remote Data2023 IEEE International Performance, Computing, and Communications Conference (IPCCC)10.1109/IPCCC59175.2023.10253850(132-139)Online publication date: 17-Nov-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media