Abstract
Persistent memory (PM) promises byte-addressability, large capacity, and durability. Main memory systems, such as key-value stores and in-memory databases, benefit from such features of PM. Due to the great popularity of hashing index in main memory systems, a number of research efforts are made to provide high average performance persistent hashing. However, suboptimal tail performance in terms of tail throughput and tail latency is still observed for existing persistent hashing. In this paper, we analyze major sources of suboptimal tail performance from key design issues of persistent hashing. We identify the global hash structure and concurrency control as remaining explorable design spaces for improving tail performance. We propose Directory-sharing Multi-level Extendible Hashing (Dalea) for PM. Dalea designs ancestor link-based extendible hashing as well as fine-grained transient lock to address the two main sources (rehashing and locking) affecting tail performance. The evaluation results show that, compared with state-of-the-art persistent hashing Dash, Dalea achieves increased tail throughput by 4.1x and reduced tail latency by 5.4x. Moreover, in order to provide design guidelines for improving tail performance, we adopt Dalea as a testbed to identify different impacts of four factors on tail performance, including fine-grained rehashing, transient locking, memory pre-allocation, and fingerprinting.
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Burr G W, Breitwisch M J, Franceschini M, Garetto D, Gopalakrishnan K, Jackson B, Kurdi B, Lam C, Lastras L A, Padilla A, Rajendran B, Raoux S, Shenoy R S. Phase change memory technology. Journal of Vacuum Science & Technology B, 2010, 28(2): 223–262. DOI: https://doi.org/10.1116/1.3301579.
Ohno H, Endoh T, Hanyu T, Ando Y, Ikeda S. 15-spintransfer-torque magnetoresistive random access memory (STT-MRAM) technology. In Advances in Non-Volatile Memory and Storage Technology, Nishi Y (ed.), Woodhead Publishing, 2014, pp.455–494. DOI: https://doi.org/10.1533/9780857098092.3.455.
Yang J J, Williams R S. Memristive devices in computing system: Promises and challenges. ACM Journal on Emerging Technologies in Computing Systems, 2013, 9(2): Article No. 11. DOI: https://doi.org/10.1145/2463585.2463587.
Lee S K, Mohan J, Kashyap S, Kim T, Chidambaram V. Recipe: Converting concurrent DRAM indexes to persistent-memory indexes. In Proc. the 27th ACM Symposium on Operating Systems Principles, Oct. 2019, pp.462–477. DOI: 10.1145/3341301.3359635.
Kim W H, Krishnan R M, Fu X W, Kashyap S, Min C. PACTree: A high performance persistent range index using PAC guidelines. In Proc. the 28th ACM SIGOPS Symposium on Operating Systems Principles, Oct. 2021, pp.424–439. DOI: 10.1145/3477132.3483589.
Chandramouli B, Prasaad G, Kossmann D, Levandoski J, Hunter J, Barnett M. FASTER: A concurrent key-value store with in-place updates. In Proc. the 2018 International Conference on Management of Data, May 2018, pp.275–290. DOI: 10.1145/3183713.3196898.
Fan B, Andersen D G, Kaminsky M. MemC3: Compact and concurrent memcache with dumber caching and smarter hashing. In Proc. the 10th USENIX Symposium on Networked Systems Design and Implementation, Apr. 2013, pp.371–384.
Lim H, Han D S, Andersen D G, Kaminsky M. MICA: A holistic approach to fast In-Memory Key-Value storage. In Proc. the 11th USENIX Conference on Networked Systems Design and Implementation, Apr. 2014, pp.429–444. DOI: 10.5555/2616448.2616488.
Xu S T, Lee S, Jun S W, Liu M, Hicks J, Arvind N. Bluecache: A scalable distributed flash-based keyvalue store. Proceedings of the VLDB Endowment, 2016, 10(4): 301–312. DOI: https://doi.org/10.14778/3025111.3025113.
Debnath B, Haghdoost A, Kadav A, Khatib M G, Ungureanu C. Revisiting hash table design for phase change memory. In Proc. the 3rd Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads, Oct. 2015. DOI: 10.1145/2819001.2819002.
Zuo P F, Hua Y. A write-friendly and cache-optimized hashing scheme for non-volatile memory systems. IEEE Trans. Parallel and Distributed Systems, 2018, 29(5): 985–998. DOI: https://doi.org/10.1109/TPDS.2017.2782251.
Zuo P F, Hua Y, Wei J. Write-optimized and high-performance hashing index scheme for persistent memory. In Proc. the 13th USENIX Conference on Operating Systems Design and Implementation, Oct. 2018, pp.461–476. DOI: 10.5555/3291168.3291202.
Nam M, Cha H, Choi Y R, Noh S H, Nam B. Write-Optimized dynamic hashing for persistent memory. In Proc. the 17th USENIX Conference on File and Storage Technologies, Feb. 2019, pp.31–44. DOI: 10.5555/3323298.3323302.
Chen Z Y, Hua Y, Ding B, Zuo P F. Lock-free concurrent level hashing for persistent memory. In Proc. the 2020 Conference on USENIX Annual Technical Conference, Jul. 2020, p.55. DOI: https://doi.org/10.5555/3489146.3489201.
Lu B T, Hao X P, Wang T Z, Lo E. Dash: Scalable hashing on persistent memory. Proceedings of the VLDB Endowment, 2020, 13(8): 1147–1161. DOI: https://doi.org/10.14778/3389133.3389134.
Yang J, Kim J, Hoseinzadeh M, Izraelevitz J, Swanson S. An empirical guide to the behavior and use of scalable persistent memory. In Proc. the 18th USENIX Conference on File and Storage Technologies, Feb. 2020, pp.169–182.
Liang J K, Chai Y P. CruiseDB: An LSM-tree key-value store with both better tail throughput and tail latency. In Proc. the 37th IEEE International Conference on Data Engineering (ICDE), Apr. 2021, pp.1032–1043. DOI: 10.1109/ICDE51399.2021.00094.
Fagin R, Nievergelt J, Pippenger N, Strong H R. Extendible hashing—A fast access method for dynamic files. ACM Trans. Database Systems, 1979, 4(3): 315–344. DOI: https://doi.org/10.1145/320083.320092.
Cooper B F, Silberstein A, Tam E, Ramakrishnan R, Sears R. Benchmarking cloud serving systems with YCSB. In Proc. the 1st ACM Symposium on Cloud Computing, Jun. 2010, pp.143–154. DOI: 10.1145/1807128.1807152.
Volos H, Tack A J, Swift M M. Mnemosyne: Lightweight persistent memory. In Proc. the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, Mar. 2011, pp.91–104. DOI: 10.1145/1950365.1950379.
Coburn J, Caulfield A M, Akel A, Grupp L M, Gupta R K, Jhala R, Swanson S. NV-Heaps: Making persistent objects fast and safe with next-generation, non-volatile memories. In Proc. the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, Mar. 2011, pp.105–118. DOI: 10.1145/1950365.1950380.
Hu D K, Chen Z W, Wu J B, Sun J H, Chen H. Persistent memory hash indexes: An experimental evaluation. Proceedings of the VLDB Endowment, 2021, 14(5): 785–798. DOI: https://doi.org/10.14778/3446095.3446101.
Herlihy M. Wait-free synchronization. ACM Trans. Programming Languages and Systems, 1991, 13(1): 124–149. DOI: https://doi.org/10.1145/114005.102808.
David T, Guerraoui R, Trigonakis V. Asynchronized concurrency: The secret to scaling concurrent search data structures. In Proc. the 20th International Conference on Architectural Support for Programming Languages and Operating Systems, Mar. 2015, pp.631–644. DOI: 10.1145/2694344.2694359.
David T, Guerraoui R. Concurrent search data structures can be blocking and practically wait-free. In Proc. the 28th ACM Symposium on Parallelism in Algorithms and Architectures, Jul. 2016, pp.337–348. DOI: 10.1145/2935764.2935774.
Kaiyrakhmet O, Lee S, Nam B, Noh S H, Choi C. SLMDB: Single-level key-value store with persistent memory. In Proc. the 17th USENIX Conference on File and Storage Technologies, Feb. 2019, pp.191–205.
Wei X D, Xie X T, Chen R, Chen H B, Zang B Y. Characterizing and optimizing remote persistent memory with RDMA and NVM. In Proc. the 2021 USENIX Annual Technical Conference, Jul. 2021, pp.523–536.
Lersch L, Hao X P, Oukid I, Wang T Z, Willhalm T. Evaluating persistent memory range indexes. Proceedings of the VLDB Endowment, 2019, 13(4): 574–587. DOI: https://doi.org/10.14778/3372716.3372728.
Desnoyers M, Mckenney P E, Stern A S, Dagenais M R, Walpole J. User-level implementations of read-copy update. IEEE Trans. Parallel and Distributed Systems, 2012, 23(2): 375–382. DOI: https://doi.org/10.1109/TPDS.2011.159.
Micheal M M. Hazard pointers: Safe memory reclamation for lock-free objects. IEEE Trans. Parallel and Distributed Systems, 2004, 15(6): 491–504. DOI: https://doi.org/10.1109/TPDS.2004.8.
Atikoglu B, Xu Y H, Frachtenberg E, Jiang S, Paleczny M. Workload analysis of a large-scale key-value store. In Proc. the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, Jun. 2012, pp.53–64. DOI: 10.1145/2254756.2254766.
Oukid I, Lasperas J, Nica A, Willhalm T, Lehner W. FPTree: A hybrid SCM-DRAM persistent and concurrent b-tree for storage class memory. In Proc. the 2016 International Conference on Management of Data, Jul. 2016, pp.371–386. DOI: 10.1145/2882903.2915251.
Kocberber O, Grot B, Picorel J, Falsafi B, Lim K, Ranganathan P. Meet the walkers: Accelerating index traversals for in-memory databases. In Proc. the 46th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2013, pp.468–479. DOI: 10.1145/2540708.2540748.
Azar Y, Broder A, Upfal E. Balanced allocations. SIAM Journal on Computing, 1999, 29(1): 180–200. DOI: https://doi.org/10.1137/S0097539795288490.
Li Y, Zeng L F, Chen G, Gu C H, Luo F, Ding W C, Shi Z, Fuentes J. A multi-hashing index for hybrid DRAMNVM memory systems. Journal of Systems Architecture, 2022, 128: 102547. DOI: https://doi.org/10.1016/j.sysarc.2022.102547.
Benson L, Makait H, Rabl T. Viper: An efficient hybrid PMem-DRAM key-value store. Proceedings of the VLDB Endowment, 2021, 14(9): 1544–1556. DOI: https://doi.org/10.14778/3461535.3461543.
Hu D K, Chen Z W, Che W K, Sun J H, Chen H. Halo: A hybrid PMem-DRAM persistent hash index with fast recovery. In Proc. the 2022 International Conference on Management of Data, Jun. 2022, pp.1049–1063. DOI: 10.1145/3514221.3517884.
Lee S K, Lim K H, Song H, Nam B, Noh S H. WORT: Write optimal radix tree for persistent memory storage systems. In Proc. the 15th USENIX Conference on File and Storage Technologies, Feb. 27–Mar. 2, 2017, pp.257–270. DOI: 10.5555/3129633.3129657.
Yang J, Wei Q S, Chen C, Wang C D, Yong K L, He B S. NV-Tree: Reducing consistency cost for NVM-based single level systems. In Proc. the 13th USENIX Conference on File and Storage Technologies, Feb. 2015, pp.167–181.
Chen S M, Jin Q. Persistent B+-trees in non-volatile main memory. Proceedings of the VLDB Endowment, 2015, 8(7): 786–797. DOI: https://doi.org/10.14778/2752939.2752947.
Lu Y S, Chang Y H, Chang Y W. WB-Trees: A meshed tree representation for finFET analog layout designs. In Proc. the 55th Annual Design Automation Conference, June 2018. DOI: 10.1145/3195970.3196137.
Hwang D, Kim W H, Won Y, Nam B. Endurable transient inconsistency in Byte-Addressable persistent B+- Tree. In Proc. the 16th USENIX Conference on File and Storage Technologies, Feb. 2018, pp.187–200.
Arulraj J, Levandoski J, Minhas U F, Larson P A. Bztree: A high-performance latch-free range index for non-volatile memory. Proceedings of the VLDB Endowment, 2018, 11(5): 553–565. DOI: https://doi.org/10.1145/3164135.3164147.
Xia F, Jiang D J, Xiong J, Sun N H. HiKV: A hybrid index key-value store for DRAM-NVM memory systems. In Proc. the 2017 USENIX Conference on USENIX Annual Technical Conference, Jul. 2017, pp.349–362.
Shalev O, Shavit N. Split-ordered lists: Lock-free extensible hash tables. Journal of the ACM, 2006, 53(3): 379–405. DOI: https://doi.org/10.1145/1147954.1147958.
Nguyen N, Tsigas P. Lock-free cuckoo hashing. In Proc. the 34th IEEE International Conference on Distributed Computing Systems, Jun. 2014, pp.627–636. DOI: 10.1109/ICDCS.2014.70.
Lamport L. A new solution of Dijkstra’s concurrent programming problem. Communications of the ACM, 1974, 17(8): 453–455. DOI: https://doi.org/10.1145/361082.361093.
Fatourou P, Kallimanis N D, Ropars T. An efficient waitfree resizable hash table. In Proc. the 30th Symposium on Parallelism in Algorithms and Architectures, Jul. 2018, pp.111–120. DOI: 10.1145/3210377.3210408.
David T, Guerraoui R, Trigonakis V. Everything you always wanted to know about synchronization but were afraid to ask. In Proc. the 24th CM Symposium on Operating Systems Principles, Nov. 2013, pp.33–48. DOI: 10.1145/2517349.2522714.
Author information
Authors and Affiliations
Corresponding author
Supplementary Information
ESM 1
(PDF 194 kb)
Rights and permissions
About this article
Cite this article
Xiong, ZW., Jiang, DJ., Xiong, J. et al. Dalea: A Persistent Multi-Level Extendible Hashing with Improved Tail Performance. J. Comput. Sci. Technol. 38, 1051–1073 (2023). https://doi.org/10.1007/s11390-023-2957-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-023-2957-8