Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3472456.3472480acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Fast and Consistent Remote Direct Access to Non-volatile Memory

Published: 05 October 2021 Publication History

Abstract

For performance benefits, recent trends of modern data centers tend to use NVM (Non-volatile Memory) as storage and utilize one-sided primitives of RDMA (Remote Direct Memory Access) to directly access NVM for I/O requests. However, one-sided RDMA doesn’t have durability semantics currently, and data may partially exist in the NVM when crashes happen, thus causing unexpected data inconsistency.
Existing solutions sacrifice either read or write performance in exchange for consistency guarantees. In this paper, we introduce a multi-version log-structuring design, named eFactory, which delivers high performance for both read and write, yet also provides data consistency. In eFactory, a multi-version list is built for each object to solve the problem of crash consistency, especially when multiple clients update the same object concurrently. To transfer data directly with RDMA write as well as reduce CRC overhead on the read critical path, a single background thread is assigned to conduct integrity verification and data persisting. Furthermore, eFactory proposes a hybrid read scheme to gain high performance for reads without losing consistency guarantee. With a durability flag embedded in the object, the client can detect the durability status of data and re-read it if necessary. Evaluations show that eFactory outperforms IMM and SAW (solutions that sacrifice write performance) by 0.42x-2.79x and 0.66x-2.85x for write respectively, while maintains comparable read performance with them. In addition, the read throughput of eFactory is 1.3x-1.96x and 1.24x-1.67x of Erda and Forca (solutions that guarantee consistency at the cost of reading performance) respectively.

References

[1]
[n.d.]. Cyclic Redundancy Check. https://en.wikipedia.org/wiki/Cyclic_redundancy_check.
[2]
[n.d.]. Forca. https://github.com/huanghaixin008/Forca.
[3]
[n.d.]. Intel Data Direct I/O Technology (Intel DDIO). https://www.intel.com/content/dam/www/public/us/en/documents/technologybriefs/data-direct-i-o-technology-brief.pdf.
[4]
2016. How to emulate Persistent Memory. http://pmem.io/2016/02/22/pm-emulation.html.
[5]
2019. What Is Intel Optane DC Persistent Memory?https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html
[6]
Marcos K Aguilera, Naama Ben-David, Rachid Guerraoui, Virendra J Marathe, Athanasios Xygkis, and Igor Zablotchi. 2020. Microsecond consensus for microsecond applications. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 599–616.
[7]
Thomas E Anderson, Marco Canini, Jongyul Kim, Dejan Kostić, Youngjin Kwon, Simon Peter, Waleed Reda, Henry N Schuh, and Emmett Witchel. 2019. Assise: Performance and Availability via NVM Colocation in a Distributed File System. arXiv preprint arXiv:1910.05106(2019).
[8]
Dmytro Apalkov, Alexey Khvalkovskiy, Steven Watts, Vladimir Nikitin, Xueti Tang, Daniel Lottis, Kiseok Moon, Xiao Luo, Eugene Chen, Adrian Ong, 2013. Spin-transfer torque magnetic random access memory (STT-MRAM). ACM Journal on Emerging Technologies in Computing Systems (JETC) 9, 2(2013), 1–35.
[9]
Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM symposium on Cloud computing. 143–154.
[10]
Chet Douglas. 2015. RDMA with PMEM: Software mechanisms for enabling access to remote persistent memory. In Storage Developer Conference.
[11]
Aleksandar Dragojević, Dushyanth Narayanan, Edmund B Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro. 2015. No compromises: distributed transactions with consistency, availability, and performance. In Proceedings of the 25th symposium on operating systems principles. 54–70.
[12]
Xing Hu, Matheus Ogleari, Jishen Zhao, Shuangchen Li, Abanti Basak, and Yuan Xie. 2018. Persistence parallelism optimization: A holistic approach from memory bus to rdma network. In MICRO.
[13]
Haixin Huang, Kaixin Huang, Litong You, and Linpeng Huang. 2018. Forca: Fast and atomic remote direct access to persistent memory. In 2018 IEEE 36th International Conference on Computer Design (ICCD). IEEE, 246–249.
[14]
Ram Huggahalli, Ravi Iyer, and Scott Tetrick. 2005. Direct cache access for high bandwidth network I/O. In 32nd International Symposium on Computer Architecture (ISCA’05). IEEE, 50–59.
[15]
Nusrat Sharmin Islam, Md Wasi-ur Rahman, Xiaoyi Lu, and Dhabaleswar K Panda. 2016. High performance design for HDFS with byte-addressability of NVM and RDMA. In Proceedings of the 2016 International Conference on Supercomputing. 1–14.
[16]
Anuj Kalia, Michael Kaminsky, and David G Andersen. 2014. Using RDMA efficiently for key-value services. In Proceedings of the 2014 ACM Conference on SIGCOMM. 295–306.
[17]
Anuj Kalia, Michael Kaminsky, and David G Andersen. 2016. Fasst: Fast, scalable and simple distributed transactions with two-sided (RDMA) datagram rpcs. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 185–201.
[18]
Sanidhya Kashyap, Dai Qin, Steve Byan, Virendra J Marathe, and Sanketh Nalli. 2019. Correct, Fast Remote Persistence. arXiv preprint arXiv:1909.02092(2019).
[19]
Antonios Katsarakis, Vasilis Gavrielatos, MR Siavash Katebzadeh, Arpit Joshi, Aleksandar Dragojevic, Boris Grot, and Vijay Nagarajan. 2020. Hermes: a fast, fault-tolerant and linearizable replication protocol. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 201–217.
[20]
Benjamin C Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting phase change memory as a scalable dram alternative. In Proceedings of the 36th annual international symposium on Computer architecture. 2–13.
[21]
Xinxin Liu, Yu Hua, Xuan Li, and Qifan Liu. 2019. Write-optimized and consistent rdma-based nvm systems. arXiv preprint arXiv:1906.08173(2019).
[22]
Youyou Lu, Jiwu Shu, Youmin Chen, and Tao Li. 2017. Octopus: an rdma-enabled distributed persistent memory file system. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). 773–785.
[23]
Teng Ma, Tao Ma, Zhuo Song, Jingxuan Li, Huaixin Chang, Kang Chen, Hai Jiang, and Yongwei Wu. 2019. X-RDMA: Effective RDMA Middleware in Large-scale Production Environments. In 2019 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 1–12.
[24]
Stanko Novakovic, Yizhou Shan, Aasheesh Kolli, Michael Cui, Yiying Zhang, Haggai Eran, Boris Pismenny, Liran Liss, Michael Wei, Dan Tsafrir, 2019. Storm: a fast transactional dataplane for remote data structures. In Proceedings of the 12th ACM International Conference on Systems and Storage. 97–108.
[25]
Diego Ongaro, Stephen M Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. 2011. Fast crash recovery in RAMCloud. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. 29–41.
[26]
Marius Poke and Torsten Hoefler. 2015. Dare: High-performance state machine replication on rdma networks. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing. 107–118.
[27]
Yizhou Shan, Shin-Yeh Tsai, and Yiying Zhang. 2017. Distributed shared persistent memory. In Proceedings of the 2017 Symposium on Cloud Computing. 323–337.
[28]
Yacine Taleb, Ryan Stutsman, Gabriel Antoniu, and Toni Cortes. 2018. Tailwind: fast and atomic rdma-based replication. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). 851–863.
[29]
Tom Talpey and Jim Pinkerton. [n.d.]. RDMA Durable Write Commit. https://tools.ietf.org/html/draft-talpey-rdma-commit-00
[30]
Arash Tavakkol, Aasheesh Kolli, Stanko Novakovic, Kaveh Razavi, Juan Gómez-Luna, Hasan Hassan, Claude Barthels, Yaohua Wang, Mohammad Sadrosadati, Saugata Ghose, 2018. Enabling efficient RDMA-based synchronous mirroring of persistent memory transactions. arXiv preprint arXiv:1810.09360(2018).
[31]
Shin-Yeh Tsai, Yizhou Shan, and Yiying Zhang. 2020. Disaggregating Persistent Memory and Controlling Them Remotely: An Exploration of Passive Disaggregated Key-Value Stores. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). 33–48.
[32]
Cheng Wang, Jianyu Jiang, Xusheng Chen, Ning Yi, and Heming Cui. 2017. Apus: Fast and scalable paxos on rdma. In Proceedings of the 2017 Symposium on Cloud Computing. 94–107.
[33]
Xingda Wei, Zhiyuan Dong, Rong Chen, and Haibo Chen. 2018. Deconstructing RDMA-enabled distributed transactions: Hybrid is better!. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 233–251.
[34]
Xingda Wei, Jiaxin Shi, Yanzhe Chen, Rong Chen, and Haibo Chen. 2015. Fast in-memory transaction processing using RDMA and HTM. In Proceedings of the 25th Symposium on Operating Systems Principles. 87–104.
[35]
Jilong Xue, Youshan Miao, Cheng Chen, Ming Wu, Lintao Zhang, and Lidong Zhou. 2019. Fast distributed deep learning over rdma. In Proceedings of the Fourteenth EuroSys Conference 2019. 1–14.
[36]
Jian Yang, Joseph Izraelevitz, and Steven Swanson. 2019. Orion: A distributed file system for non-volatile main memory and RDMA-capable networks. In 17th USENIX Conference on File and Storage Technologies (FAST 19). 221–234.
[37]
Jian Yang, Joseph Izraelevitz, and Steven Swanson. 2020. FileMR: Rethinking RDMA Networking for Scalable Persistent Memory. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). 111–125.

Cited By

View all
  • (2022)Recovery of Distributed Iterative Solvers for Linear Systems Using Non-Volatile RAM2022 IEEE/ACM 12th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS)10.1109/FTXS56515.2022.00007(11-23)Online publication date: Nov-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '21: Proceedings of the 50th International Conference on Parallel Processing
August 2021
927 pages
ISBN:9781450390682
DOI:10.1145/3472456
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. RDMA
  2. crash consistency
  3. distributed storage systems
  4. high performance
  5. non-volatile memory
  6. remote persistence

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICPP 2021

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)37
  • Downloads (Last 6 weeks)2
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Recovery of Distributed Iterative Solvers for Linear Systems Using Non-Volatile RAM2022 IEEE/ACM 12th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS)10.1109/FTXS56515.2022.00007(11-23)Online publication date: Nov-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media