ESH: Design and Implementation of an Optimal Hashing Scheme for Persistent Memory
Abstract
:1. Introduction
- (i)
- It efficiently utilizes the available space in the hash table by redistributing overflow records from a bucket to neighboring buckets within the segment.
- (ii)
- It reduces or delays the need for full-table rehashing, resulting in improved insertion performance.
- (iii)
- It increases the load factor and enhances PM/memory utilization efficiency without significant performance loss, even for varying data sizes and thread counts.
- (iv)
- It presents a scalable hashing scheme that minimizes unnecessary PM read and write operations, conserving PM bandwidth and demonstrating scalability in multi-threaded environments.
2. Background and Motivation
2.1. Optane Persistent Memory
2.2. Optane Architecture and Instructions Support
2.3. Dynamic Hashing
2.4. Effect of NUMA Access
2.5. Inter-Thread Interference
2.6. Locking
2.7. Impact of Segment Resizing
3. Related Work
4. Design and Implementation
4.1. High-Level Design
- (a)
- Avoid both unnecessary reads and writes: In a hashing scheme, write operations typically involve frequent access to the underlying storage media, which significantly impacts the overall performance of the scheme. These frequent read and write operations have a cumulative effect on all operations, including reads, writes, and other related tasks. Furthermore, devices with slower speeds compared to DRAM experience even more severe performance overhead due to the frequency of these read-and-write operations. To achieve high end-to-end performance, ESH addresses this issue by reducing unnecessary reads and writes to persistent memory.
- (b)
- Bucket level locking to allow multi-threading: A well-designed locking strategy aims to minimize the frequency of lock and unlock requests for sequential data access and manipulation, resulting in reduced CPU costs. In order to achieve better concurrency, ESH adopts a fine-grained locking approach that reduces the need for locks and unlocks, thus minimizing lock contention. Specifically, during write operations, only the bucket being operated on is locked, allowing other threads to access different buckets concurrently. Other operations are lock-free, enabling higher levels of concurrency. While readers can access buckets freely, certain operations such as segment splitting and directory doubling are not lock-free to ensure data consistency. To prevent data inconsistencies, the active writer thread takes responsibility for creating and locking segments or directories during splitting or doubling. This approach in ESH focuses on using locks at the minimal data block level, specifically, the bucket level.
- (c)
- Optimistic scaling on multicore machines: To fully leverage the parallel resources of CPUs, ESH incorporates an optimistic scaling approach in its design. Traditional research efforts have primarily concentrated on minimizing cache-line flushes and utilizing PM writes to achieve scalable performance. However, these approaches encounter scalability challenges when deployed on actual PM devices. Due to the limited bandwidth of PM, ESH focuses on reducing unnecessary PM reads and implementing lightweight concurrency control mechanisms to further minimize PM writes, ensuring persistence with reduced overheads. As a result, ESH treats a bucket as a single block of PM, effectively reducing the PM access overhead.
Algorithm 1 ESH key–value insert algorithm |
|
Algorithm 2 ESH key–value search algorithm |
|
4.2. Implementation
4.3. Concurrency
4.4. Recovery
5. Experimental Evaluation and Results
- Efficiently utilizes the space at the segment level, ensuring that no empty space is left within a segment before triggering a segment split operation or directory doubling.
- In a multicore environment, our scheme exhibits strong scalability in terms of performance compared to state-of-the-art hashing schemes that employ similar techniques.
- Our scheme achieves a high load factor without compromising performance and recovery, all while maintaining minimal costs. It demonstrates competitive performance in this regard.
5.1. Experimental Setup
5.2. Comparative Performance
5.3. Benefits of Metadata
5.4. Concurrency
5.5. Scalability
5.6. Load Factor
5.7. Recovery
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Intel. Intel® Optane™ DC Persistent Memory; Release 1; Intel: Santa Clara, CA, USA, 2021. [Google Scholar]
- Intel Corporation. Intel® Optane™ DC Persistent Memory; Release 2; Intel: Santa Clara, CA, USA, 2022. [Google Scholar]
- Arulraj, J.; Levandoski, J.; Minhas, U.F.; Larson, P.A. BzTree: A high-performance latch-free range index for non-volatile memory. Proc. VLDB Endow. 2018, 11, 553–565. [Google Scholar] [CrossRef]
- Chen, S.; Gibbons, P.B.; Nath, S. Rethinking Database Algorithms for Phase Change Memory. In Proceedings of the Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA, 9–12 January 2011; Volume 11, pp. 9–12. [Google Scholar]
- Oracle, S. Architectural Overview of the Oracle ZFS Storage Appliance; Oracle: Redwood Shores, CA, USA, 2018. [Google Scholar]
- Patil, S.; Gibson, G. Scale and Concurrency of GIGA+: File System Directories with Millions of Files. In Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST 11), San Jose, CA, USA, 15–17 February 2011. [Google Scholar]
- Soltis, S.R.; Ruwart, T.M.; O’keefe, M.T. The global file system. Proc. Nasa Conf. Publ. 1996, 2, 319–342. [Google Scholar]
- Whitehouse, S. The GFS2 filesystem. In Proceedings of the Linux Symposium, Ottawa, ON, Canada, 27–30 June 2007; pp. 253–259. [Google Scholar]
- Redis. In-Memory Data Structure; Store. Redis Inc.: Mountain View, CA, USA, 2021. [Google Scholar]
- Memcached. 2021. Available online: https://memcached.org/ (accessed on 2 July 2023).
- Hwang, D.; Kim, W.H.; Won, Y.; Nam, B. Endurable transient inconsistency in byte-addressable persistent b+-tree. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST 18), Oakland, CA, USA, 12–15 February 2018; pp. 187–200. [Google Scholar]
- Yang, J.; Wei, Q.; Chen, C.; Wang, C.; Yong, K.L.; He, B. NV-Tree: Reducing consistency cost for NVM-based single level systems. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST 15), Santa Clara, CA, USA, 16–19 February 2015; pp. 167–181. [Google Scholar]
- Lee, S.K.; Lim, K.H.; Song, H.; Nam, B.; Noh, S.H. WORT: Write optimal radix tree for persistent memory storage systems. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST 17), Santa Clara, CA, USA, 27 February–2 March 2017; pp. 257–270. [Google Scholar]
- Nam, M.; Cha, H.; Choi, Y.r.; Noh, S.H.; Nam, B. Write-optimized dynamic hashing for persistent memory. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST 19), Boston, MA, USA, 25–28 February 2019; pp. 31–44. [Google Scholar]
- Zuo, P.; Hua, Y.; Wu, J. Write-optimized and high-performance hashing index scheme for persistent memory. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), Carlsbad, CA, USA, 8–10 October 2018; pp. 461–476. [Google Scholar]
- Lee, S.K.; Mohan, J.; Kashyap, S.; Kim, T.; Chidambaram, V. Recipe: Converting concurrent DRAM indexes to persistent-memory indexes. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, Huntsville, ON, Canada, 27–30 October 2019; pp. 462–477. [Google Scholar]
- Chen, S.; Jin, Q. Persistent b+-trees in non-volatile main memory. Proc. VLDB Endow. 2015, 8, 786–797. [Google Scholar] [CrossRef]
- Lu, B.; Hao, X.; Wang, T.; Lo, E. Dash: Scalable hashing on persistent memory. arXiv 2020, arXiv:2003.07302. [Google Scholar] [CrossRef]
- Intel. Intel and Micron Produce Breakthrough Memory Technology; Intel: Santa Clara, CA, USA, 2021. [Google Scholar]
- Intel Corporation®. Enabling Persistent Memory Programming; Intel: Santa Clara, CA, USA, 2022. [Google Scholar]
- Ren, J.; Hu, Q.; Khan, S.; Moscibroda, T. Programming for non-volatile main memory is hard. In Proceedings of the 8th Asia-Pacific Workshop on Systems, Mumbai, India, 2 September 2017; pp. 1–8. [Google Scholar]
- Chandramouli, B.; Prasaad, G.; Kossmann, D.; Levandoski, J.; Hunter, J.; Barnett, M. Faster: A concurrent key-value store with in-place updates. In Proceedings of the 2018 International Conference on Management of Data, Houston, TX, USA, 10–15 June 2018; pp. 275–290. [Google Scholar]
- Fan, B.; Andersen, D.G.; Kaminsky, M. MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), Lombard, IL, USA, 2–5 April 2013; pp. 371–384. [Google Scholar]
- Shavit, N. Data structures in the multicore age. Commun. ACM 2011, 54, 76–84. [Google Scholar] [CrossRef]
- Sánchez Barrera, I.; Black-Schaffer, D.; Casas, M.; Moretó, M.; Stupnikova, A.; Popov, M. Modeling and optimizing numa effects and prefetching with machine learning. In Proceedings of the 34th ACM International Conference on Supercomputing, Barcelona, Spain, 29 June–2 July 2020; pp. 1–13. [Google Scholar]
- Dashti, M.; Fedorova, A.; Funston, J.; Gaud, F.; Lachaize, R.; Lepers, B.; Quema, V.; Roth, M. Traffic management: A holistic approach to memory placement on NUMA systems. ACM SIGPLAN Not. 2013, 48, 381–394. [Google Scholar] [CrossRef]
- Diener, M.; Cruz, E.H.; Navaux, P.O.; Busse, A.; Heiß, H.U. kMAF: Automatic kernel-level management of thread and data affinity. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, Edmonton, AB, Canada, 24–27 August 2014; pp. 277–288. [Google Scholar]
- Diener, M.; Cruz, E.H.; Pilla, L.L.; Dupros, F.; Navaux, P.O. Characterizing communication and page usage of parallel applications for thread and data mapping. Perform. Eval. 2015, 88, 18–36. [Google Scholar] [CrossRef]
- Memarzia, P.; Ray, S.; Bhavsar, V.C. Toward Efficient In-memory Data Analytics on NUMA Systems. arXiv 2019, arXiv:1908.01860. [Google Scholar]
- Majo, Z.; Gross, T.R. Memory management in numa multicore systems: Trapped between cache contention and interconnect overhead. In Proceedings of the International Symposium on Memory Management, San Jose, CA, USA, 4–5 June 2011; pp. 11–20. [Google Scholar]
- Blagodurov, S.; Zhuravlev, S.; Fedorova, A. Contention-aware scheduling on multicore systems. ACM Trans. Comput. Syst. (TOCS) 2010, 28, 1–45. [Google Scholar] [CrossRef]
- Xu, J.; Kim, J.; Memaripour, A.; Swanson, S. Finding and fixing performance pathologies in persistent memory software stacks. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Providence, RI, USA, 13–17 April 2019; pp. 427–439. [Google Scholar]
- Thomas, S.; Hayne, R.; Pulaj, J.; Mendes, H. Using Skip Graphs for Increased NUMA Locality. In Proceedings of the 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), Porto, Portugal, 9–11 September 2020; pp. 157–166. [Google Scholar]
- Metreveli, Z.; Zeldovich, N.; Kaashoek, M.F. Cphash: A cache-partitioned hash table. ACM SIGPLAN Not. 2012, 47, 319–320. [Google Scholar] [CrossRef]
- Calciu, I.; Gottschlich, J.; Herlihy, M. Using elimination and delegation to implement a scalable NUMA-friendly stack. In Proceedings of the 5th USENIX Workshop on Hot Topics in Parallelism (HotPar 13), San Jose, CA, USA, 24–25 June 2013. [Google Scholar]
- Daly, H.; Hassan, A.; Spear, M.F.; Palmieri, R. NUMASK: High performance scalable skip list for NUMA. In Proceedings of the 32nd International Symposium on Distributed Computing (DISC 2018), Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Augusta, GA, USA, 25–27 October 2018. [Google Scholar]
- Dong, M.; Yu, Q.; Zhou, X.; Hong, Y.; Chen, H.; Zang, B. Rethinking benchmarking for nvm-based file systems. In Proceedings of the 7th ACM SIGOPS Asia-Pacific Workshop on Systems, Hong Kong, China, 4–5 August 2016; pp. 1–7. [Google Scholar]
- Ma, S.; Chen, K.; Chen, S.; Liu, M.; Zhu, J.; Kang, H.; Wu, Y. ROART: Range-query Optimized Persistent ART. In Proceedings of the 19th USENIX Conference on File and Storage Technologies (FAST 21), Santa Clara, CA, USA, 23–25 February 2021; pp. 1–16. [Google Scholar]
- Prout, A. The Story Behind Memsql’s Skiplist Indexes. 2014. Available online: https://www.singlestore.com/blog/what-is-skiplist-why-skiplist-index-for-memsql/ (accessed on 2 July 2023).
- Diaconu, C.; Freedman, C.; Ismert, E.; Larson, P.A.; Mittal, P.; Stonecipher, R.; Verma, N.; Zwilling, M. Hekaton: SQL server’s memory-optimized OLTP engine. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, 22–27 June 2013; pp. 1243–1254. [Google Scholar]
- Oukid, I.; Lasperas, J.; Nica, A.; Willhalm, T.; Lehner, W. FPTree: A hybrid SCM-DRAM persistent and concurrent B-tree for storage class memory. In Proceedings of the 2016 International Conference on Management of Data, San Francisco, CA, USA, 26 June–1 July 2016; pp. 371–386. [Google Scholar]
- Imamura, S.; Sato, M.; Yoshida, E. Evaluating a Trade-Off between DRAM and Persistent Memory for Persistent-Data Placement on Hybrid Main Memory. In Proceedings of the Workshop on Hardware/Software Techniques for Minimizing Data Movement, Portland, OR, USA, 9 September 2017. [Google Scholar]
- Fagin, R.; Nievergelt, J.; Pippenger, N.; Strong, H.R. Extendible hashing—A fast access method for dynamic files. ACM Trans. Database Syst. (TODS) 1979, 4, 315–344. [Google Scholar] [CrossRef]
- Liu, Z.; Calciu, I.; Herlihy, M.; Mutlu, O. Concurrent Data Structures for Near-Memory Computing. In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, New York, NY, USA, 24–26 July 2017; pp. 235–245. [Google Scholar]
- Intel. Intel Threading Building Blocks Developer Reference. 2021. Available online: https://software.intel.com/en-us/tbb-reference-manual/ (accessed on 2 July 2023).
- Kim, W.H.; Seo, J.; Kim, J.; Nam, B. clfB-tree: Cacheline friendly persistent B-tree for NVRAM. ACM Trans. Storage (TOS) 2018, 14, 1–17. [Google Scholar] [CrossRef]
- Debnath, B.; Haghdoost, A.; Kadav, A.; Khatib, M.G.; Ungureanu, C. Revisiting hash table design for phase change memory. ACM SIGOPS Oper. Syst. Rev. 2016, 49, 18–26. [Google Scholar] [CrossRef]
- Pagh, R.; Rodler, F.F. Cuckoo hashing. J. Algorithms 2004, 51, 122–144. [Google Scholar] [CrossRef]
- Coburn, J.; Caulfield, A.M.; Akel, A.; Grupp, L.M.; Gupta, R.K.; Jhala, R.; Swanson, S. NV-Heaps: Making persistent objects fast and safe with next-generation, non-volatile memories. ACM SIGARCH Comput. Archit. News 2011, 39, 105–118. [Google Scholar] [CrossRef]
- Zuo, P.; Hua, Y. A write-friendly and cache-optimized hashing scheme for non-volatile memory systems. IEEE Trans. Parallel Distrib. Syst. 2017, 29, 985–998. [Google Scholar] [CrossRef]
- Intel. Intel® Persistent Memory Development Kit. 2021. Available online: https://pmem.io/pmdk/libpmem/ (accessed on 16 July 2023).
- Huang, W.; Ji, Y.; Zhou, X.; He, B.; Tan, K.L. A Design Space Exploration and Evaluation for Main-Memory Hash Joins in Storage Class Memory. Proc. Vldb Endow. 2023, 16, 1249–1263. [Google Scholar] [CrossRef]
- Zuo, P.; Hua, Y.; Wu, J. Level hashing: A high-performance and flexible-resizing persistent hashing index structure. ACM Trans. Storage (TOS) 2019, 15, 1–30. [Google Scholar] [CrossRef]
- Appleby, A. MurmurHash—MurmurHashUnaligned; Google Groups. Available online: https://chromium.googlesource.com/external/smhasher/+/58dd8869da8c95f5c26ec70a6cdd243a7647c8fc/MurmurHash3.cpp (accessed on 2 July 2023).
- Herlihy, M.; Shavit, N.; Luchangco, V.; Spear, M. The Art of Multiprocessor Programming; Morgan Kaufmann: Newnes, Australia, 2020. [Google Scholar]
- Van Renen, A.; Vogel, L.; Leis, V.; Neumann, T.; Kemper, A. Building blocks for persistent memory. VLDB J. 2020, 29, 1223–1241. [Google Scholar] [CrossRef]
Setup Used | Specifications |
---|---|
Server | Intel Xeon®Gold 5218 |
Processor—Intel CPU® 2.30 GHz 32 cores (64 hyper threads) | |
32 GB DDR4 RAM | |
256 GB Optane DCPMM (2 × 128 GB) DIMMs | |
OS | Ubuntu server 18.04 LTS with Kernel 5.4.9-47-generic |
Platform | PMDK 1.8 and GCC 9.0 |
Hashing Schemes | Record Size in GB | ||||
---|---|---|---|---|---|
1 | 5 | 10 | 20 | 30 | |
CCEH | 50 | 121 | 256 | 503 | 1082 |
Dash | 62 | 63 | 65 | 65 | 65 |
ESH | 62 | 65 | 66 | 66 | 67 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Regassa, D.; Yeom, H.Y.; Hwang, J. ESH: Design and Implementation of an Optimal Hashing Scheme for Persistent Memory. Appl. Sci. 2023, 13, 11528. https://doi.org/10.3390/app132011528
Regassa D, Yeom HY, Hwang J. ESH: Design and Implementation of an Optimal Hashing Scheme for Persistent Memory. Applied Sciences. 2023; 13(20):11528. https://doi.org/10.3390/app132011528
Chicago/Turabian StyleRegassa, Dereje, Heon Young Yeom, and Junseok Hwang. 2023. "ESH: Design and Implementation of an Optimal Hashing Scheme for Persistent Memory" Applied Sciences 13, no. 20: 11528. https://doi.org/10.3390/app132011528
APA StyleRegassa, D., Yeom, H. Y., & Hwang, J. (2023). ESH: Design and Implementation of an Optimal Hashing Scheme for Persistent Memory. Applied Sciences, 13(20), 11528. https://doi.org/10.3390/app132011528