research-article

Fast and Consistent Remote Direct Access to Non-volatile Memory

Authors:

Fan LiAuthors Info & Claims

ICPP '21: Proceedings of the 50th International Conference on Parallel Processing

Article No.: 6, Pages 1 - 11

https://doi.org/10.1145/3472456.3472480

Published: 05 October 2021 Publication History

Abstract

For performance benefits, recent trends of modern data centers tend to use NVM (Non-volatile Memory) as storage and utilize one-sided primitives of RDMA (Remote Direct Memory Access) to directly access NVM for I/O requests. However, one-sided RDMA doesn’t have durability semantics currently, and data may partially exist in the NVM when crashes happen, thus causing unexpected data inconsistency.

Existing solutions sacrifice either read or write performance in exchange for consistency guarantees. In this paper, we introduce a multi-version log-structuring design, named eFactory, which delivers high performance for both read and write, yet also provides data consistency. In eFactory, a multi-version list is built for each object to solve the problem of crash consistency, especially when multiple clients update the same object concurrently. To transfer data directly with RDMA write as well as reduce CRC overhead on the read critical path, a single background thread is assigned to conduct integrity verification and data persisting. Furthermore, eFactory proposes a hybrid read scheme to gain high performance for reads without losing consistency guarantee. With a durability flag embedded in the object, the client can detect the durability status of data and re-read it if necessary. Evaluations show that eFactory outperforms IMM and SAW (solutions that sacrifice write performance) by 0.42x-2.79x and 0.66x-2.85x for write respectively, while maintains comparable read performance with them. In addition, the read throughput of eFactory is 1.3x-1.96x and 1.24x-1.67x of Erda and Forca (solutions that guarantee consistency at the cost of reading performance) respectively.

References

[1]

[n.d.]. Cyclic Redundancy Check. https://en.wikipedia.org/wiki/Cyclic_redundancy_check.

[2]

[n.d.]. Forca. https://github.com/huanghaixin008/Forca.

[3]

[n.d.]. Intel Data Direct I/O Technology (Intel DDIO). https://www.intel.com/content/dam/www/public/us/en/documents/technologybriefs/data-direct-i-o-technology-brief.pdf.

[4]

2016. How to emulate Persistent Memory. http://pmem.io/2016/02/22/pm-emulation.html.

[5]

2019. What Is Intel Optane DC Persistent Memory?https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html

[6]

Marcos K Aguilera, Naama Ben-David, Rachid Guerraoui, Virendra J Marathe, Athanasios Xygkis, and Igor Zablotchi. 2020. Microsecond consensus for microsecond applications. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 599–616.

[7]

Thomas E Anderson, Marco Canini, Jongyul Kim, Dejan Kostić, Youngjin Kwon, Simon Peter, Waleed Reda, Henry N Schuh, and Emmett Witchel. 2019. Assise: Performance and Availability via NVM Colocation in a Distributed File System. arXiv preprint arXiv:1910.05106(2019).

[8]

Dmytro Apalkov, Alexey Khvalkovskiy, Steven Watts, Vladimir Nikitin, Xueti Tang, Daniel Lottis, Kiseok Moon, Xiao Luo, Eugene Chen, Adrian Ong, 2013. Spin-transfer torque magnetic random access memory (STT-MRAM). ACM Journal on Emerging Technologies in Computing Systems (JETC) 9, 2(2013), 1–35.

Digital Library

[9]

Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM symposium on Cloud computing. 143–154.

Digital Library

[10]

Chet Douglas. 2015. RDMA with PMEM: Software mechanisms for enabling access to remote persistent memory. In Storage Developer Conference.

[11]

Aleksandar Dragojević, Dushyanth Narayanan, Edmund B Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro. 2015. No compromises: distributed transactions with consistency, availability, and performance. In Proceedings of the 25th symposium on operating systems principles. 54–70.

Digital Library

[12]

Xing Hu, Matheus Ogleari, Jishen Zhao, Shuangchen Li, Abanti Basak, and Yuan Xie. 2018. Persistence parallelism optimization: A holistic approach from memory bus to rdma network. In MICRO.

[13]

Haixin Huang, Kaixin Huang, Litong You, and Linpeng Huang. 2018. Forca: Fast and atomic remote direct access to persistent memory. In 2018 IEEE 36th International Conference on Computer Design (ICCD). IEEE, 246–249.

[14]

Ram Huggahalli, Ravi Iyer, and Scott Tetrick. 2005. Direct cache access for high bandwidth network I/O. In 32nd International Symposium on Computer Architecture (ISCA’05). IEEE, 50–59.

Digital Library

[15]

Nusrat Sharmin Islam, Md Wasi-ur Rahman, Xiaoyi Lu, and Dhabaleswar K Panda. 2016. High performance design for HDFS with byte-addressability of NVM and RDMA. In Proceedings of the 2016 International Conference on Supercomputing. 1–14.

Digital Library

[16]

Anuj Kalia, Michael Kaminsky, and David G Andersen. 2014. Using RDMA efficiently for key-value services. In Proceedings of the 2014 ACM Conference on SIGCOMM. 295–306.

Digital Library

[17]

Anuj Kalia, Michael Kaminsky, and David G Andersen. 2016. Fasst: Fast, scalable and simple distributed transactions with two-sided (RDMA) datagram rpcs. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 185–201.

[18]

Sanidhya Kashyap, Dai Qin, Steve Byan, Virendra J Marathe, and Sanketh Nalli. 2019. Correct, Fast Remote Persistence. arXiv preprint arXiv:1909.02092(2019).

[19]

Antonios Katsarakis, Vasilis Gavrielatos, MR Siavash Katebzadeh, Arpit Joshi, Aleksandar Dragojevic, Boris Grot, and Vijay Nagarajan. 2020. Hermes: a fast, fault-tolerant and linearizable replication protocol. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 201–217.

Digital Library

[20]

Benjamin C Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting phase change memory as a scalable dram alternative. In Proceedings of the 36th annual international symposium on Computer architecture. 2–13.

Digital Library

[21]

Xinxin Liu, Yu Hua, Xuan Li, and Qifan Liu. 2019. Write-optimized and consistent rdma-based nvm systems. arXiv preprint arXiv:1906.08173(2019).

[22]

Youyou Lu, Jiwu Shu, Youmin Chen, and Tao Li. 2017. Octopus: an rdma-enabled distributed persistent memory file system. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). 773–785.

Digital Library

[23]

Teng Ma, Tao Ma, Zhuo Song, Jingxuan Li, Huaixin Chang, Kang Chen, Hai Jiang, and Yongwei Wu. 2019. X-RDMA: Effective RDMA Middleware in Large-scale Production Environments. In 2019 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 1–12.

[24]

Stanko Novakovic, Yizhou Shan, Aasheesh Kolli, Michael Cui, Yiying Zhang, Haggai Eran, Boris Pismenny, Liran Liss, Michael Wei, Dan Tsafrir, 2019. Storm: a fast transactional dataplane for remote data structures. In Proceedings of the 12th ACM International Conference on Systems and Storage. 97–108.

Digital Library

[25]

Diego Ongaro, Stephen M Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. 2011. Fast crash recovery in RAMCloud. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. 29–41.

Digital Library

[26]

Marius Poke and Torsten Hoefler. 2015. Dare: High-performance state machine replication on rdma networks. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing. 107–118.

Digital Library

[27]

Yizhou Shan, Shin-Yeh Tsai, and Yiying Zhang. 2017. Distributed shared persistent memory. In Proceedings of the 2017 Symposium on Cloud Computing. 323–337.

Digital Library

[28]

Yacine Taleb, Ryan Stutsman, Gabriel Antoniu, and Toni Cortes. 2018. Tailwind: fast and atomic rdma-based replication. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). 851–863.

[29]

Tom Talpey and Jim Pinkerton. [n.d.]. RDMA Durable Write Commit. https://tools.ietf.org/html/draft-talpey-rdma-commit-00

[30]

Arash Tavakkol, Aasheesh Kolli, Stanko Novakovic, Kaveh Razavi, Juan Gómez-Luna, Hasan Hassan, Claude Barthels, Yaohua Wang, Mohammad Sadrosadati, Saugata Ghose, 2018. Enabling efficient RDMA-based synchronous mirroring of persistent memory transactions. arXiv preprint arXiv:1810.09360(2018).

[31]

Shin-Yeh Tsai, Yizhou Shan, and Yiying Zhang. 2020. Disaggregating Persistent Memory and Controlling Them Remotely: An Exploration of Passive Disaggregated Key-Value Stores. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). 33–48.

[32]

Cheng Wang, Jianyu Jiang, Xusheng Chen, Ning Yi, and Heming Cui. 2017. Apus: Fast and scalable paxos on rdma. In Proceedings of the 2017 Symposium on Cloud Computing. 94–107.

Digital Library

[33]

Xingda Wei, Zhiyuan Dong, Rong Chen, and Haibo Chen. 2018. Deconstructing RDMA-enabled distributed transactions: Hybrid is better!. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 233–251.

Digital Library

[34]

Xingda Wei, Jiaxin Shi, Yanzhe Chen, Rong Chen, and Haibo Chen. 2015. Fast in-memory transaction processing using RDMA and HTM. In Proceedings of the 25th Symposium on Operating Systems Principles. 87–104.

Digital Library

[35]

Jilong Xue, Youshan Miao, Cheng Chen, Ming Wu, Lintao Zhang, and Lidong Zhou. 2019. Fast distributed deep learning over rdma. In Proceedings of the Fourteenth EuroSys Conference 2019. 1–14.

Digital Library

[36]

Jian Yang, Joseph Izraelevitz, and Steven Swanson. 2019. Orion: A distributed file system for non-volatile main memory and RDMA-capable networks. In 17th USENIX Conference on File and Storage Technologies (FAST 19). 221–234.

[37]

Jian Yang, Joseph Izraelevitz, and Steven Swanson. 2020. FileMR: Rethinking RDMA Networking for Scalable Persistent Memory. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). 111–125.

Cited By

Fridman YSnir YLevin HHendler DAttiya HOren G(2022)Recovery of Distributed Iterative Solvers for Linear Systems Using Non-Volatile RAM2022 IEEE/ACM 12th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS)10.1109/FTXS56515.2022.00007(11-23)Online publication date: Nov-2022
https://doi.org/10.1109/FTXS56515.2022.00007

Recommendations

Makalu: fast recoverable allocation of non-volatile memory
OOPSLA 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications

Byte addressable non-volatile memory (NVRAM) is likely to supplement, and perhaps eventually replace, DRAM. Applications can then persist data structures directly in memory instead of serializing them and storing them onto a durable block device. ...
Redesign the Memory Allocator for Non-Volatile Main Memory
Special Issue on Hardware and Algorithms for Learning On-a-chip and Special Issue on Alternative Computing Systems

The non-volatile memory (NVM) has the merits of byte-addressability, fast speed, persistency and low power consumption, which make it attractive to be used as main memory. Commonly, user process dynamically acquires memory through memory allocators. ...
Makalu: fast recoverable allocation of non-volatile memory
OOPSLA '16

Byte addressable non-volatile memory (NVRAM) is likely to supplement, and perhaps eventually replace, DRAM. Applications can then persist data structures directly in memory instead of serializing them and storing them onto a durable block device. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '21: Proceedings of the 50th International Conference on Parallel Processing

August 2021

927 pages

ISBN:9781450390682

DOI:10.1145/3472456

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Conference

ICPP 2021

ICPP 2021: 50th International Conference on Parallel Processing

August 9 - 12, 2021

IL, Lemont, USA

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
244
Total Downloads

Downloads (Last 12 months)37
Downloads (Last 6 weeks)2

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Fridman YSnir YLevin HHendler DAttiya HOren G(2022)Recovery of Distributed Iterative Solvers for Linear Systems Using Non-Volatile RAM2022 IEEE/ACM 12th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS)10.1109/FTXS56515.2022.00007(11-23)Online publication date: Nov-2022
https://doi.org/10.1109/FTXS56515.2022.00007

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents