Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3448016.3452817acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

CoRM: Compactable Remote Memory over RDMA

Published: 18 June 2021 Publication History

Abstract

Distributed memory systems are becoming increasingly important since they provide a system-scale abstraction where physically separated memories can be addressed as a single logical one. This abstraction enables memory disaggregation, allowing systems as in-memory databases, caching services, and ephemeral storage to be naturally deployed at large scales. While this abstraction effectively increases the memory capacity of these systems, it faces additional overheads for remote memory accesses. To narrow the difference between local and remote accesses, low latency RDMA networks are a key element for efficient memory disaggregation. However, RDMA acceleration poses new obstacles to efficient memory management and particularly to memory compaction: network controllers and CPUs can concurrently access memory, potentially leading to inconsistencies if memory management operations are not synchronized. To ensure consistency, most distributed memory systems do not provide memory compaction and are exposed to memory fragmentation. We introduce CoRM, an RDMA-accelerated shared memory system that supports memory compaction and ensures strict consistency while providing one-sided RDMA accesses. We show that CoRM sustains high read throughput during normal operations, comparable to similar systems not providing memory compaction while experiencing minimal overheads during compaction. CoRM never disrupts RDMA connections and can reduce applications' active memory up to 6x by performing memory compaction.

Supplementary Material

MP4 File (3448016.3452817.mp4)
Distributed memory systems are becoming increasingly important since they provide a system-scale abstraction where physically separated memories can be addressed as a single logical one. This abstraction enables memory disaggregation, allowing systems as in-memory databases, caching services, and ephemeral storage to be naturally deployed at large scales. While this abstraction effectively increases the memory capacity of these systems, it faces increased overheads for remote memory accesses. To narrow the difference between local and remote accesses, low latency RDMA networks are a key element for efficient memory disaggregation. However, RDMA acceleration poses new obstacles to efficient memory management and particularly to memory compaction, since memory can be accessed concurrently by the CPU and the network controller leading to inconsistent memory accesses. To provide consistency, most distributed memory systems do not provide memory compaction and are exposed to memory fragmentation. We introduce CoRM, an RDMA-accelerated shared memory system that supports memory compaction and ensures strict consistency even in the presence of one-sided RDMA accesses. We show that CoRM sustains high read throughput during normal operations, comparable to similar systems not providing memory compaction, while experiencing minimal overheads during compaction. CoRM never disrupts RDMA connections and can reduce applications' active memory up to 6x by performing memory compaction.

References

[1]
Marcos K. Aguilera, Nadav Amit, Irina Calciu, Xavier Deguillard, Jayneel Gandhi, Stanko Novakovic, Arun Ramanathan, Pratap Subrahmanyam, Lalith Suresh, Kiran Tati, Rajesh Venkatasubramanian, and Michael Wei. 2018. Remote Regions: A Simple Abstraction for Remote Memory. In Proceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference (Boston, MA, USA) (USENIX ATC 18). USENIX Association, USA, 775--787.
[2]
Marcos K. Aguilera, Kimberly Keeton, Stanko Novakovic, and Sharad Singhal. 2019. Designing Far Memory Data Structures: Think Outside the Box. In Proceedings of the Workshop on Hot Topics in Operating Systems (Bertinoro, Italy) (HotOS 19). Association for Computing Machinery, New York, NY, USA, 120--126. https://doi.org/10.1145/3317550.3321433
[3]
InfiniBand Trade Association et almbox. 2000. The InfiniBand Architecture Specification. http://www.infinibandta.org/specs/ (2000).
[4]
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload Analysis of a Large-Scale Key-Value Store. SIGMETRICS Perform. Eval. Rev., Vol. 40, 1 (June 2012), 53--64. https://doi.org/10.1145/2318857.2254766
[5]
Claude Barthels, Simon Loesing, Gustavo Alonso, and Donald Kossmann. 2015. Rack-Scale In-Memory Join Processing Using RDMA. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (Melbourne, Victoria, Australia) (SIGMOD 15). Association for Computing Machinery, New York, NY, USA, 1463--1475. https://doi.org/10.1145/2723372.2750547
[6]
Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. 2000. Hoard: A Scalable Memory Allocator for Multithreaded Applications. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (Cambridge, Massachusetts, USA) (ASPLOS IX). Association for Computing Machinery, New York, NY, USA, 117--128. https://doi.org/10.1145/378993.379232
[7]
Carsten Binnig, Andrew Crotty, Alex Galakatos, Tim Kraska, and Erfan Zamanian. 2016. The End of Slow Networks: It's Time for a Redesign. Proc. VLDB Endow., Vol. 9, 7 (March 2016), 528--539. https://doi.org/10.14778/2904483.2904485
[8]
Chiranjeeb Buragohain, Knut Magne Risvik, Paul Brett, Miguel Castro, Wonhee Cho, Joshua Cowhig, Nikolas Gloy, Karthik Kalyanaraman, Richendra Khanna, John Pao, Matthew Renzelmann, Alex Shamis, Timothy Tan, and Shuheng Zheng. 2020. A1: A Distributed In-Memory Graph Database. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD 20). Association for Computing Machinery, New York, NY, USA, 329--344. https://doi.org/10.1145/3318464.3386135
[9]
Qingchao Cai, Wentian Guo, Hao Zhang, Divyakant Agrawal, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, Yong Meng Teo, and Sheng Wang. 2018. Efficient Distributed Memory Management with RDMA and Caching. Proc. VLDB Endow., Vol. 11, 11 (July 2018), 1604--1617. https://doi.org/10.14778/3236187.3236209
[10]
Barbara Chapman, Tony Curtis, Swaroop Pophale, Stephen Poole, Jeff Kuehn, Chuck Koelbel, and Lauren Smith. 2010. Introducing OpenSHMEM: SHMEM for the PGAS community. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model . 1--3.
[11]
Haibo Chen, Rong Chen, Xingda Wei, Jiaxin Shi, Yanzhe Chen, Zhaoguo Wang, Binyu Zang, and Haibing Guan. 2017. Fast In-Memory Transaction Processing Using RDMA and HTM. ACM Trans. Comput. Syst., Vol. 35, 1, Article 3 (July 2017), bibinfonumpages37 pages. https://doi.org/10.1145/3092701
[12]
Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM symposium on Cloud computing. 143--154.
[13]
Aleksandar Dragojevic, Dushyanth Narayanan, and Miguel Castro. 2017. RDMA reads: To use or not to use? IEEE Data Eng. Bull., Vol. 40, 1 (2017), 3--14.
[14]
Aleksandar Dragojević, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast Remote Memory. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). USENIX Association, Seattle, WA, 401--414. https://www.usenix.org/conference/nsdi14/technical-sessions/dragojević
[15]
Aleksandar Dragojeviundefined, Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro. 2015. No Compromises: Distributed Transactions with Consistency, Availability, and Performance. In Proceedings of the 25th Symposium on Operating Systems Principles (Monterey, California) (SOSP 15). Association for Computing Machinery, New York, NY, USA, 54--70. https://doi.org/10.1145/2815400.2815425
[16]
Brad Fitzpatrick. 2004. Distributed Caching with Memcached. Linux J., Vol. 2004, 124 (Aug. 2004).
[17]
Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, and Kang G. Shin. 2017. Efficient Memory Disaggregation with INFINISWAP. In Proceedings of the 14th USENIX Conference on Networked Systems Design and Implementation (Boston, MA, USA) (NSDI 17). USENIX Association, USA, 649--667.
[18]
Sagar Jha, Jonathan Behrens, Theo Gkountouvas, Matthew Milano, Weijia Song, Edward Tremel, Robbert Van Renesse, Sydney Zink, and Kenneth P. Birman. 2019. Derecho: Fast State Machine Replication for Cloud Services. ACM Trans. Comput. Syst., Vol. 36, 2, Article 4 (April 2019), bibinfonumpages49 pages. https://doi.org/10.1145/3302258
[19]
Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2014. Using RDMA Efficiently for Key-value Services. In Proceedings of the 2014 ACM Conference on SIGCOMM (Chicago, Illinois, USA) (SIGCOMM 14). ACM, New York, NY, USA, 295--306. https://doi.org/10.1145/2619239.2626299
[20]
Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016a. Design Guidelines for High Performance RDMA Systems. In 2016 USENIX Annual Technical Conference (USENIX ATC 16). USENIX Association, Denver, CO, 437--450. https://www.usenix.org/conference/atc16/technical-sessions/presentation/kalia
[21]
Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016b. FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (Savannah, GA, USA) (OSDI 16). USENIX Association, USA, 185--201.
[22]
Antonios Katsarakis, Vasilis Gavrielatos, M.R. Siavash Katebzadeh, Arpit Joshi, Aleksandar Dragojevic, Boris Grot, and Vijay Nagarajan. 2020. Hermes: A Fast, Fault-Tolerant and Linearizable Replication Protocol. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS 20). Association for Computing Machinery, New York, NY, USA, 201--217. https://doi.org/10.1145/3373376.3378496
[23]
Daehyeok Kim, Amirsaman Memaripour, Anirudh Badam, Yibo Zhu, Hongqiang Harry Liu, Jitu Padhye, Shachar Raindel, Steven Swanson, Vyas Sekar, and Srinivasan Seshan. 2018. Hyperloop: Group-Based NIC-Offloading to Accelerate Replicated Transactions in Multi-Tenant Storage Systems. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (Budapest, Hungary) (SIGCOMM 18). Association for Computing Machinery, New York, NY, USA, 297--312. https://doi.org/10.1145/3230543.3230572
[24]
Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J. Rossbach, and Emmett Witchel. 2016. Coordinated and Efficient Huge Page Management with Ingens. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, Savannah, GA, 705--721. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/kwon
[25]
Bojie Li, Zhenyuan Ruan, Wencong Xiao, Yuanwei Lu, Yongqiang Xiong, Andrew Putnam, Enhong Chen, and Lintao Zhang. 2017. KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC. In Proceedings of the 26th Symposium on Operating Systems Principles (Shanghai, China) (SOSP 17). Association for Computing Machinery, New York, NY, USA, 137--152. https://doi.org/10.1145/3132747.3132756
[26]
Feng Li, Sudipto Das, Manoj Syamala, and Vivek R. Narasayya. 2016. Accelerating Relational Databases by Leveraging Remote Memory and RDMA. In Proceedings of the 2016 International Conference on Management of Data (San Francisco, California, USA) (SIGMOD 16). Association for Computing Machinery, New York, NY, USA, 355--370. https://doi.org/10.1145/2882903.2882949
[27]
Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. 2014. MICA: A Holistic Approach to Fast In-Memory Key-Value Storage. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14) . USENIX Association, Seattle, WA, 429--444.
[28]
Martin Maas, David G. Andersen, Michael Isard, Mohammad Mahdi Javanmard, Kathryn S. McKinley, and Colin Raffel. 2020. Learning-Based Memory Allocation for C
[29]
Server Workloads. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS 20). Association for Computing Machinery, New York, NY, USA, 541--556. https://doi.org/10.1145/3373376.3378525
[30]
Linux Programmer's Manual. 2019. memfd_create - create an anonymous file. http://man7.org/linux/man-pages/man2/memfd_create.2.html (2019).
[31]
Christopher Mitchell, Yifeng Geng, and Jinyang Li. 2013. Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store. In Proceedings of the 2013 USENIX Conference on Annual Technical Conference (San Jose, CA) (USENIX ATC 13). USENIX Association, USA, 103--114.
[32]
Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin. 2015. Latency-Tolerant Software Distributed Shared Memory. In 2015 USENIX Annual Technical Conference (USENIX ATC 15). USENIX Association, Santa Clara, CA, 291--305. https://www.usenix.org/conference/atc15/technical-session/presentation/nelson
[33]
John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra, Aravind Narayanan, Diego Ongaro, Guru Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, and Ryan Stutsman. 2011. The Case for RAMCloud. Commun. ACM, Vol. 54, 7 (July 2011), 121--130. https://doi.org/10.1145/1965724.1965751
[34]
Ashish Panwar, Sorav Bansal, and K. Gopinath. 2019. HawkEye: Efficient Fine-Grained OS Support for Huge Pages. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (Providence, RI, USA) (ASPLOS 19). Association for Computing Machinery, New York, NY, USA, 347--360. https://doi.org/10.1145/3297858.3304064
[35]
Ashish Panwar, Aravinda Prasad, and K. Gopinath. 2018. Making Huge Pages Actually Useful. SIGPLAN Not., Vol. 53, 2 (March 2018), 679--692. https://doi.org/10.1145/3296957.3173203
[36]
Marius Poke and Torsten Hoefler. 2015. DARE: High-Performance State Machine Replication on RDMA Networks. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (Portland, Oregon, USA) (HPDC 15). ACM, New York, NY, USA, 107--118. https://doi.org/10.1145/2749246.2749267
[37]
Bobby Powers, David Tench, Emery D. Berger, and Andrew McGregor. 2019. Mesh: Compacting Memory Management for C/C
[38]
Applications. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (Phoenix, AZ, USA) (PLDI 2019). ACM, New York, NY, USA, 333--346. https://doi.org/10.1145/3314221.3314582
[39]
John M Robson. 1977. Worst case fragmentation of first fit and best fit storage allocation strategies. Comput. J., Vol. 20, 3 (1977), 242--244.
[40]
Wolf Rödiger, Tobias Mühlbauer, Alfons Kemper, and Thomas Neumann. 2015. High-Speed Query Processing over High-Speed Networks. Proc. VLDB Endow., Vol. 9, 4 (Dec. 2015), 228--239. https://doi.org/10.14778/2856318.2856319
[41]
Stephen M. Rumble, Ankita Kejriwal, and John Ousterhout. 2014. Log-structured Memory for DRAM-based Storage. In 12th USENIX Conference on File and Storage Technologies (FAST 14). USENIX Association, Santa Clara, CA, 1--16. https://www.usenix.org/conference/fast14/technical-sessions/presentation/rumble
[42]
Salvatore Sanfilippo. 2009. Redis. http://redis.io (2009).
[43]
Patrick Stuedi, Animesh Trivedi, Jonas Pfefferle, Ana Klimovic, Adrian Schuepbach, and Bernard Metzler. 2019. Unification of Temporary Storage in the NodeKernel Architecture. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 767--782. https://www.usenix.org/conference/atc19/presentation/stuedi
[44]
Yacine Taleb, Ryan Stutsman, Gabriel Antoniu, and Toni Cortes. 2018. Tailwind: Fast and Atomic RDMA-based Replication. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 851--863. https://www.usenix.org/conference/atc18/presentation/taleb
[45]
Xingda Wei, Zhiyuan Dong, Rong Chen, and Haibo Chen. 2018. Deconstructing RDMA-enabled Distributed Transactions: Hybrid is Better!. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) . USENIX Association, Carlsbad, CA, 233--251. https://www.usenix.org/conference/osdi18/presentation/wei
[46]
Erfan Zamanian, Carsten Binnig, Tim Harris, and Tim Kraska. 2017. The End of a Myth: Distributed Transactions Can Scale. Proc. VLDB Endow., Vol. 10, 6 (Feb. 2017), 685--696. https://doi.org/10.14778/3055330.3055335
[47]
Erfan Zamanian, Xiangyao Yu, Michael Stonebraker, and Tim Kraska. 2019. Rethinking Database High Availability with RDMA Networks. Proc. VLDB Endow., Vol. 12, 11 (July 2019), 1637--1650. https://doi.org/10.14778/3342263.3342639
[48]
Yili Zheng, Amir Kamil, Michael B Driscoll, Hongzhang Shan, and Katherine Yelick. 2014. UPC
[49]
: a PGAS extension for C
[50]
. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium. IEEE, 1105--1114.
[51]
Weixi Zhu, Alan L. Cox, and Scott Rixner. 2020. A Comprehensive Analysis of Superpage Management Mechanisms and Policies. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, 829--842. https://www.usenix.org/conference/atc20/presentation/zhu-weixi
[52]
Tobias Ziegler, Sumukha Tumkur Vani, Carsten Binnig, Rodrigo Fonseca, and Tim Kraska. 2019. Designing Distributed Tree-Based Index Structures for Fast RDMA-Capable Networks. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD 19). Association for Computing Machinery, New York, NY, USA, 741--758. https://doi.org/10.1145/3299869.3300081

Cited By

View all
  • (2024)PolarDB-MP: A Multi-Primary Cloud-Native Database via Disaggregated Shared MemoryCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653377(295-308)Online publication date: 9-Jun-2024
  • (2024)Optimizing LSM-based indexes for disaggregated memoryThe VLDB Journal10.1007/s00778-024-00863-yOnline publication date: 19-Jun-2024
  • (2023)Design Guidelines for Correct, Efficient, and Scalable Synchronization using One-Sided RDMAProceedings of the ACM on Management of Data10.1145/35892761:2(1-26)Online publication date: 20-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
June 2021
2969 pages
ISBN:9781450383431
DOI:10.1145/3448016
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. memory allocation
  2. memory compaction
  3. rdma

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)142
  • Downloads (Last 6 weeks)27
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)PolarDB-MP: A Multi-Primary Cloud-Native Database via Disaggregated Shared MemoryCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653377(295-308)Online publication date: 9-Jun-2024
  • (2024)Optimizing LSM-based indexes for disaggregated memoryThe VLDB Journal10.1007/s00778-024-00863-yOnline publication date: 19-Jun-2024
  • (2023)Design Guidelines for Correct, Efficient, and Scalable Synchronization using One-Sided RDMAProceedings of the ACM on Management of Data10.1145/35892761:2(1-26)Online publication date: 20-Jun-2023
  • (2023)dLSM: An LSM-Based Index for Memory Disaggregation2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00217(2835-2849)Online publication date: Apr-2023
  • (2022)The case for distributed shared-memory databases with RDMA-enabled memory disaggregationProceedings of the VLDB Endowment10.14778/3561261.356126316:1(15-22)Online publication date: 1-Sep-2022
  • (2022)A Survey of Storage Systems in the RDMA EraIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.318865633:12(4395-4409)Online publication date: 1-Dec-2022
  • (2022)Building Blocks for Network-Accelerated Distributed File SystemsSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00015(1-14)Online publication date: Nov-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media